Efficient Human Pose Estimation with Image-dependent Interactions
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
figs
.gitignore
CPS.tex
PennDiss.sty
README
abstract.tex
ack.tex
cascade_results_table.tex
commands.tex
conclusion.tex
contributions-technical.tex
dedication.tex
discussion.tex
ensembles.tex
experiments.tex
features.tex
figure-template.tex
future.tex
inference-alg.tex
intro.tex
llps.tex
make.vim
ml.tex
mm-inference-alg.tex
mm-inference.tex
preface.tex
ps.tex
qual-res-stub.tex
qual-res.tex
refs.bib
rel-work-hps-table.tex
rel.tex
res-table.tex
results.tex
thesis.tex

README

Efficient Human Pose Estimation with Image-dependent Interactions

Ben Sapp, Ph.D. Candidate
Advisor: Ben Taskar
University of Pennsylvania
bensapp@cis.upenn.edu
http://www.cis.upenn.edu/~bensapp/

Thesis committee:
Kostas Daniilidis (chair)
C.J. Taylor
Jianbo Shi
David Forsyth (external, UIUC)
Abstract

Human pose estimation from monocular images is one of the most challenging and computationally demanding problems in computer vision. Standard models such as Pictorial Structures consider interactions between kinematically connected joints or limbs, leading to inference cost that is quadratic in the number of pixels. As a result, researchers and practitioners have restricted themselves to simple models which only measure the quality of limb-pair possibilities by their 2D geometric plausibility.
In this talk, we propose novel methods which allow for efficient inference in richer models with data-dependent interactions. First, we introduce structured prediction cascades, a structured analog of binary cascaded classifiers, which learn to focus computational effort where it is needed, filtering out many states cheaply while ensuring the correct output is unfiltered. Second, we propose a way to decompose models of human pose with cyclic dependencies into a collection of tree models, and provide novel methods to impose model agreement.

These techniques allow for sparse and efficient inference on the order of minutes per image or video clip. As a result, we can afford to model pairwise interaction potentials much more richly with data-dependent features such as contour continuity, segmentation alignment, color consistency, optical flow and more. We show empirically that these richer models are worthwhile, obtaining significantly more accurate pose estimation on popular datasets.

Bio

Ben Sapp is Ph.D. candidate in Computer and Information Science at the University of Pennsylvania, advised by Ben Taskar. His work uses machine learning to tackle computer vision problems, with a focus on graphical models to solve human pose estimation in 2D images or video - specifically, studying how to overcome computational bottlenecks that handicap most models applied to this problem. Previously, Ben obtained a MS in Computer Science from Stanford University, and a B.Eng. in Computer Engineering / Minor in Mathematics from the University of Illinois at Urbana-Champaign, where he also spent most of his childhood.