Skip to content

Commit

Permalink
-
Browse files Browse the repository at this point in the history
  • Loading branch information
bensapp committed Apr 12, 2012
1 parent e7eb1e0 commit 0f898dd
Show file tree
Hide file tree
Showing 17 changed files with 759 additions and 79 deletions.
8 changes: 4 additions & 4 deletions CPS.tex
@@ -1,4 +1,4 @@
\chapter{Cascaded Pictorial Structures} \chapter{Cascaded Pictorial Structures}\label{sec:CPS}


Pictorial structure models~\cite{fischler1973ps} are a popular method for human body pose estimation~\cite{felz05,fergus2005sparse,devacrf,ferrari08,andriluka09}. Pictorial structure models~\cite{fischler1973ps} are a popular method for human body pose estimation~\cite{felz05,fergus2005sparse,devacrf,ferrari08,andriluka09}.
The model is a Conditional Random Field over pose variables that characterizes The model is a Conditional Random Field over pose variables that characterizes
Expand Down Expand Up @@ -105,9 +105,9 @@ \subsection*{Structured Prediction Cascades} \label{cascades}
\begin{figure}[t] \begin{figure}[t]
\begin{center} \begin{center}
\includegraphics[width=0.75\textwidth]{figs/empty.jpg} \includegraphics[width=0.75\textwidth]{figs/empty.jpg}
\caption{Upper right: Detector-based pruning by thresholding (for the lower \caption[SHORT TITLE]{Upper right: Detector-based pruning by thresholding (for
right arm) yields many hypotheses far way from the true one. Lower row: The the lower right arm) yields many hypotheses far way from the true one. Lower
CPS, however, exploits global information to perform better pruning.} row: The CPS, however, exploits global information to perform better pruning.}
\label{fig:cascade_pruning} \label{fig:cascade_pruning}
\end{center} \end{center}
\end{figure} \end{figure}
Expand Down
7 changes: 3 additions & 4 deletions PennDiss.sty
Expand Up @@ -214,10 +214,9 @@ Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy\\
\thispagestyle{empty}% \thispagestyle{empty}%
\null\vfill \null\vfill
\begin{center} \begin{center}
\Large \MYTITLE \\ \Large \mytitle \\
\Large COPYRIGHT\\ \Large \copyright~Copyright by \@author\\
\@copyrightyear\\ \@copyrightyear
\@author\\
\end{center} \end{center}
\vfill\newpage} \vfill\newpage}


Expand Down
25 changes: 24 additions & 1 deletion abstract.tex
@@ -1 +1,24 @@
Heyyyy abstract... Human pose estimation from monocular images is one of the most challenging and
computationally demanding problems in computer vision. Standard models such as
Pictorial Structures consider interactions between kinematically connected
joints or limbs, leading to inference cost that is quadratic in the number of
pixels. As a result, researchers and practitioners have restricted themselves
to simple models which only measure the quality of limb-pair possibilities by
their 2D geometric plausibility.

In this talk, we propose novel methods which allow for efficient inference in
richer models with data-dependent interactions. First, we introduce structured
prediction cascades, a structured analog of binary cascaded classifiers, which
learn to focus computational effort where it is needed, filtering out many
states cheaply while ensuring the correct output is unfiltered. Second, we
propose a way to decompose models of human pose with cyclic dependencies into a
collection of tree models, and provide novel methods to impose model agreement.

These techniques allow for sparse and efficient inference on the order of
minutes per image or video clip. As a result, we can afford to model pairwise
interaction potentials much more richly with data-dependent features such as
contour continuity, segmentation alignment, color consistency, optical flow and
more. We show empirically that these richer models are worthwhile, obtaining
significantly more accurate pose estimation on popular datasets.


28 changes: 24 additions & 4 deletions commands.tex
Expand Up @@ -8,7 +8,7 @@
\newcommand{\LossA}{\mathcal{L}_{\psi}} \newcommand{\LossA}{\mathcal{L}_{\psi}}
\newcommand{\LossMAX}{\mathcal{L}^{max}_{\psi}} \newcommand{\LossMAX}{\mathcal{L}^{max}_{\psi}}
\newcommand{\X}{\mathcal{X}} \newcommand{\X}{\mathcal{X}}
\newcommand{\E}{\mathbf{E}} \newcommand{\E}{\mathbb{E}}
\newcommand{\bw}{\mathbf{w}} \newcommand{\bw}{\mathbf{w}}
\newcommand{\bft}{\mathbf{f}} \newcommand{\bft}{\mathbf{f}}
\newcommand{\bx}{\mathbf{x}} \newcommand{\bx}{\mathbf{x}}
Expand All @@ -20,6 +20,7 @@


\newcommand{\Ind}{\mathbf{1}} \newcommand{\Ind}{\mathbf{1}}
\newcommand{\argmax}{\mathop{\arg\max}} \newcommand{\argmax}{\mathop{\arg\max}}
\newcommand{\argmin}{\mathop{\arg\min}}


\newcommand{\Vones}[1]{\ensuremath{\mathbf{1}_{#1}}} \newcommand{\Vones}[1]{\ensuremath{\mathbf{1}_{#1}}}
\newcommand{\eqdef}{\stackrel{\rm def}{=}} \newcommand{\eqdef}{\stackrel{\rm def}{=}}
Expand All @@ -35,20 +36,37 @@
\newcommand{\w}{\mathbf{w}} \newcommand{\w}{\mathbf{w}}
\newcommand{\f}{\mathbf{f}} \newcommand{\f}{\mathbf{f}}


\newcommand{\naive}{naive\xspace}
\newcommand{\CPS}{CPS\xspace} \newcommand{\CPS}{CPS\xspace}
\newcommand{\LLPS}{LLPS\xspace} \newcommand{\LLPS}{LLPS\xspace}
\newcommand{\LLPSlong}{Local Linear Pictorial Structures\xspace} \newcommand{\LLPSlong}{Local Linear Pictorial Structures\xspace}


% some common mathcals % some common mathcals
\newcommand{\cH}{\mathcal{H}}
\newcommand{\cC}{\mathcal{C}}
\newcommand{\cD}{\mathcal{D}}
\newcommand{\cL}{\mathcal{L}} \newcommand{\cL}{\mathcal{L}}
\newcommand{\cX}{\mathcal{X}}
\newcommand{\cY}{\mathcal{Y}} \newcommand{\cY}{\mathcal{Y}}
\newcommand{\cR}{\mathcal{R}} \newcommand{\cR}{\mathcal{R}}
\newcommand{\cE}{\mathcal{R}} \newcommand{\cE}{\mathcal{E}}
\newcommand{\cV}{\mathcal{R}} \newcommand{\cV}{\mathcal{V}}


\newcommand{\reals}{\mathbb{R}} \newcommand{\reals}{\mathbb{R}}
\newcommand{\defn}{\triangleq}


\newcommand{\tree}{\Upsilon} \newcommand{\tree}{\Upsilon}
\newcommand{\attrib}[1]{ \nopagebreak{\raggedleft\footnotesize #1\par}}
\newcommand{\myquotation}[2]{{\em #1}\\\attrib{#2}}

\newcommand{\secref}[1]{\hyperref[sec:#1]{\textsection\ref{sec:#1}}}
\newcommand{\equref}[1]{\hyperref[eq:#1]{Equation~\ref{eq:#1}}}
\newcommand{\algref}[1]{\hyperref[alg:#1]{Algorithm~\ref{alg:#1}}}
\newcommand{\thmref}[1]{\hyperref[thm:#1]{Theorem~\ref{thm:#1}}}
\newcommand{\lemref}[1]{\hyperref[lem:#1]{Lemma~\ref{lem:#1}}}
\newcommand{\tabref}[1]{\hyperref[tab:#1]{Table~\ref{tab:#1}}}
\newcommand{\figref}[1]{\hyperref[fig:#1]{Figure~\ref{fig:#1}}}



\newcommand{\score}[1]{\theta(x,#1)} % score function \newcommand{\score}[1]{\theta(x,#1)} % score function
\newcommand{\scoremax}[0]{\theta^\star(x)} % argmax score \newcommand{\scoremax}[0]{\theta^\star(x)} % argmax score
Expand All @@ -70,7 +88,7 @@
%\renewcommand{\includegraphics}[2]{} %\renewcommand{\includegraphics}[2]{}


%% usual commands %% usual commands
\newcommand{\todo}[1]{\textcolor{red}{TODO: #1}} \newcommand{\todo}[1]{\textcolor{red}{\\{\bf TODO:} #1 \\}}
%\newcommand{\todo}[1]{{\bf{TODO: #1}}} %\newcommand{\todo}[1]{{\bf{TODO: #1}}}
%\newcommand{\todo}[1]{} %\newcommand{\todo}[1]{}


Expand Down Expand Up @@ -135,6 +153,8 @@
%\makeatother %\makeatother




\renewcommand{\algorithmicrequire}{\textbf{Input:}}
\renewcommand{\algorithmicensure}{\textbf{Ouput:}}


%% specific commands %% specific commands
\newcommand{\trans}[1]{{#1}^{\ensuremath{\mathsf{T}}}} % transpose \newcommand{\trans}[1]{{#1}^{\ensuremath{\mathsf{T}}}} % transpose
Expand Down
2 changes: 1 addition & 1 deletion ensembles.tex
@@ -1,4 +1,4 @@
\chapter{Ensembles} \chapter{Ensembles} \label{sec:stretchable}




\begin{figure}[t!] \begin{figure}[t!]
Expand Down
6 changes: 6 additions & 0 deletions features.tex
@@ -1,3 +1,9 @@
\chapter{Features}\label{features}

\myquotation{Do not call me a computer vision engineer \ldots I am a perceptual
scientist!}{Yiannis Alimonous}


The introduced \CPS model allows us to capture appearance, geometry and shape information of parts and pairs of parts in the final level of the cascade---much richer than the standard geometric deformation costs and texture filters of previous PS models~\cite{felz05,devacrf,ferrari08,andriluka09}. The introduced \CPS model allows us to capture appearance, geometry and shape information of parts and pairs of parts in the final level of the cascade---much richer than the standard geometric deformation costs and texture filters of previous PS models~\cite{felz05,devacrf,ferrari08,andriluka09}.
%Table~\ref{feat_table} lists all features that we use and will describe in this section. %Table~\ref{feat_table} lists all features that we use and will describe in this section.
Each part is modeled as a rectangle anchored at the part joint with the major axis defined as the line segment between the joints (see Figure~\ref{fig:ps}). For training and evaluation, our datasets have been annotated only with this part axis. Each part is modeled as a rectangle anchored at the part joint with the major axis defined as the line segment between the joints (see Figure~\ref{fig:ps}). For training and evaluation, our datasets have been annotated only with this part axis.
Expand Down
Binary file modified figs/empty.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figs/empty.jpg0
Binary file not shown.
1 change: 1 addition & 0 deletions future.tex
@@ -0,0 +1 @@
\section{Future work}
33 changes: 33 additions & 0 deletions inference-alg.tex
@@ -0,0 +1,33 @@
\begin{algorithm}
\caption[max-sum inference]{Max-sum message passing to solve
$$\argmax_{y \in \cY} h(x,y) = \argmax \sum_{i \in \cV} \phi_i + \sum_{ij \in \cE} \phi_{ij}$$}
\label{alg:max-inference}
\begin{algorithmic}

\REQUIRE $ $ \\
Factors $\{\phi_i\}, \{\phi_{ij}\}$\\
Tree graph $G$ with (arbitrary) root node index $r$ and topological ordering $\pi$, where $\pi_n = r$.

\ENSURE $y^\star = \argmax_{y} h(x,y)$
\FOR{$i = \pi_1, \pi_2, \ldots, \pi_n $ }
\STATE
$m_i = \phi_i + \sum_{j \in \text{kids}(i)} m_{j \rightarrow i}$
\IF{$i == r$} \STATE \textbf{break} \ENDIF
\STATE
$p = \text{parent}(i) $
\STATE
$m_{i \rightarrow p} = \max_{y_i} \phi_{ip} + m_i$
\STATE
$a_i = \argmax_{y_i} \phi_{ip} + m_i$
\ENDFOR

\STATE
$y^\star_r = \argmax_{[1 \ldots k]} m_r$
\FOR{$i = \pi_{n-1},\pi_{n-2},\ldots,1$}
\STATE
$y^\star_i = a_i\left[y^\star_{\text{parent}(i)}\right]$
\ENDFOR

\end{algorithmic}
\end{algorithm}

73 changes: 28 additions & 45 deletions intro.tex
@@ -1,45 +1,28 @@
Please refer to~\citet{sapp2010cascades}. \chapter{Introduction}


\chapter{Human pose estimation} ``Geman quote''


\chapter{Structured prediction} why i love what i do:

One of the most compelling problems of computer vision is general object
\chapter{Pictorial structures: Pose estimation meets structured prediction} recognition. The ability for computers or robots to do this is blah
We first summarize the basic pictorial structure model and then
describe the inference and learning in the cascaded pictorial structures. why pose?
%\subsection{Basic PS Model} it's super hard: Human pose estimation inherits all the difficulties of object
Classical pictorial structures are a class of graphical models where the nodes of the graph represents object parts, and edges between parts encode pairwise geometric relationships. For modeling human pose, the standard PS model decomposes as a tree structure into unary potentials (also referred to as appearance terms) and pairwise terms between pairs of physically connected parts. Figure~\ref{fig:ps} shows a PS model for 6 upper body parts, with lower arms connected to upper arms, and upper arms and head connected to torso. In previous work~\cite{devacrf,felz05,ferrari08,posesearch,andriluka09}, the pairwise terms do not depend on data and are hence referred to as a spatial or structural prior. recognition
%\begin{figure}[] it shows off computation
%\begin{center}
%\centerline{\includegraphics[width=0.75\columnwidth]{data/model_parameters2.pdf}} philosopical question - humans can do it, babies, dogs can do it - why can't a
%\caption{Basic upper-body model with part state $l$ and part support rectangle of size $(w,h)$.} computer
%\label{fig:ps}
%\end{center} \section{Problem Statement}
%% \vskip -0.5in
%\end{figure} defn: 2D means two dimensional
The state of part $i$, denoted as $y_i \in \mathcal{Y}_i$, encodes the joint \subsection{Related problems}
location of the part in image coordinates and the direction of the limb as a
unit vector: $y_i = [y_{ix} \; y_{iy} \; y_{iu} \; y_{iv}]^T$. The state of the \section{Inherent difficulties}
model is the collection of states of $M$ parts: $p(ys = ys) = p(y_1 = y_1, \subsection{perceptual}
\ldots, y_M = y_M)$. The size of the state space for each part, \subsection{computational}
$|\mathcal{Y}_i|$, the number of possible locations in the image times the
number of pre-defined discretized angles. For example, standard PS \section{PREVIEW OF MY WORK}
implementations typically model the state space of each part in a roughly $100
\times 100$ grid for $y_{ix} \times y_{iy}$, with 24 different possible values
of angles, yielding $|\mathcal{Y}_i| = 100 \times 100 \times 24 = 240,000$. The
standard PS formulation (see~\cite{felz05}) is usually written in a
log-quadratic form:
\begin{align}
p( ys | x) &\propto \prod_{ij} \exp(-\frac{1}{2}||\Sigma_{ij}^{-1/2}(T_{ij}(y_i) - y_j - \mu_{ij})||_2^2) \times \prod_{i=1}^M \exp(\mu_i^T\phi_i(y_i,x))
\label{eqn:standard_ps}
\end{align}
The parameters of the model are $\mu_i,\mu_{ij}$ and $\Sigma_{ij}$, and $\phi_i(y_i,x)$ are features of the (image) data $x$ at location/angle $y_i$. The affine mapping $T_{ij}$ transforms the part coordinates into a relative reference frame. The PS model can be interpreted as a set of springs at rest in default positions $\mu_{ij}$, and stretched according to tightness $\Sigma^{-1}_{ij}$ and displacement $\phi_{ij}(ys) = T_{ij}(y_i) - y_j$. The unary terms pull the springs toward locations $y_i$ with higher scores $\mu_i^T\phi_i(y_i,x)$ which are more likely to be a location for part $i$.

This form of the pairwise potentials allows inference to be performed faster than $O(|\mathcal{Y}_i|^2)$: MAP estimates $\argmax_{ys} p(ys | x)$ can be computed efficiently using a generalized distance transform for max-product message passing in $O(|\mathcal{Y}_i|)$ time. Marginals of the distribution, $p(y_i | x)$, can be computed efficiently using FFT convolution for sum-product message passing in $O(|\mathcal{Y}_i| \log |\mathcal{Y}_i|)$~\cite{felz05}.

While fast to compute and intuitive from a spring-model perspective, this model has two significant limitations. One, the pairwise costs are unimodal Gaussians, which cannot capture the true multimodal interactions between pairs of body parts. Two, the pairwise terms are only a function of the geometry of the state configuration, and are oblivious to the image cues, for example, appearance similarity or contour continuity of the a pair of parts.

\section{Inference tricks (DT, conv)}
\section{Issues}

\chapter{Thesis contributions}
3 changes: 1 addition & 2 deletions make.vim
@@ -1,3 +1,2 @@
wa wa
!pdflatex thesis.tex !pdflatex thesis.tex && bibtex thesis && bibtex thesis && pdflatex thesis
" # && bibtex thesis && bibtex thesis && pdflatex thesis

0 comments on commit 0f898dd

Please sign in to comment.