Permalink
Browse files

-

  • Loading branch information...
1 parent e7eb1e0 commit 0f898dd2b40350ba52ff06db1db3209f3fb59727 @bensapp committed Apr 12, 2012
Showing with 759 additions and 79 deletions.
  1. +4 −4 CPS.tex
  2. +3 −4 PennDiss.sty
  3. +24 −1 abstract.tex
  4. +24 −4 commands.tex
  5. +1 −1 ensembles.tex
  6. +6 −0 features.tex
  7. BIN figs/empty.jpg
  8. BIN figs/empty.jpg0
  9. +1 −0 future.tex
  10. +33 −0 inference-alg.tex
  11. +28 −45 intro.tex
  12. +1 −2 make.vim
  13. +194 −0 ml.tex
  14. +118 −1 preface.tex
  15. +251 −0 ps.tex
  16. +55 −10 refs.bib
  17. +16 −7 thesis.tex
View
@@ -1,4 +1,4 @@
-\chapter{Cascaded Pictorial Structures}
+\chapter{Cascaded Pictorial Structures}\label{sec:CPS}
Pictorial structure models~\cite{fischler1973ps} are a popular method for human body pose estimation~\cite{felz05,fergus2005sparse,devacrf,ferrari08,andriluka09}.
The model is a Conditional Random Field over pose variables that characterizes
@@ -105,9 +105,9 @@ \subsection*{Structured Prediction Cascades} \label{cascades}
\begin{figure}[t]
\begin{center}
\includegraphics[width=0.75\textwidth]{figs/empty.jpg}
-\caption{Upper right: Detector-based pruning by thresholding (for the lower
-right arm) yields many hypotheses far way from the true one. Lower row: The
-CPS, however, exploits global information to perform better pruning.}
+\caption[SHORT TITLE]{Upper right: Detector-based pruning by thresholding (for
+the lower right arm) yields many hypotheses far way from the true one. Lower
+row: The CPS, however, exploits global information to perform better pruning.}
\label{fig:cascade_pruning}
\end{center}
\end{figure}
View
@@ -214,10 +214,9 @@ Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy\\
\thispagestyle{empty}%
\null\vfill
\begin{center}
-\Large \MYTITLE \\
-\Large COPYRIGHT\\
-\@copyrightyear\\
-\@author\\
+\Large \mytitle \\
+\Large \copyright~Copyright by \@author\\
+\@copyrightyear
\end{center}
\vfill\newpage}
View
@@ -1 +1,24 @@
-Heyyyy abstract...
+Human pose estimation from monocular images is one of the most challenging and
+computationally demanding problems in computer vision. Standard models such as
+Pictorial Structures consider interactions between kinematically connected
+joints or limbs, leading to inference cost that is quadratic in the number of
+pixels. As a result, researchers and practitioners have restricted themselves
+to simple models which only measure the quality of limb-pair possibilities by
+their 2D geometric plausibility.
+
+In this talk, we propose novel methods which allow for efficient inference in
+richer models with data-dependent interactions. First, we introduce structured
+prediction cascades, a structured analog of binary cascaded classifiers, which
+learn to focus computational effort where it is needed, filtering out many
+states cheaply while ensuring the correct output is unfiltered. Second, we
+propose a way to decompose models of human pose with cyclic dependencies into a
+collection of tree models, and provide novel methods to impose model agreement.
+
+These techniques allow for sparse and efficient inference on the order of
+minutes per image or video clip. As a result, we can afford to model pairwise
+interaction potentials much more richly with data-dependent features such as
+contour continuity, segmentation alignment, color consistency, optical flow and
+more. We show empirically that these richer models are worthwhile, obtaining
+significantly more accurate pose estimation on popular datasets.
+
+
View
@@ -8,7 +8,7 @@
\newcommand{\LossA}{\mathcal{L}_{\psi}}
\newcommand{\LossMAX}{\mathcal{L}^{max}_{\psi}}
\newcommand{\X}{\mathcal{X}}
-\newcommand{\E}{\mathbf{E}}
+\newcommand{\E}{\mathbb{E}}
\newcommand{\bw}{\mathbf{w}}
\newcommand{\bft}{\mathbf{f}}
\newcommand{\bx}{\mathbf{x}}
@@ -20,6 +20,7 @@
\newcommand{\Ind}{\mathbf{1}}
\newcommand{\argmax}{\mathop{\arg\max}}
+\newcommand{\argmin}{\mathop{\arg\min}}
\newcommand{\Vones}[1]{\ensuremath{\mathbf{1}_{#1}}}
\newcommand{\eqdef}{\stackrel{\rm def}{=}}
@@ -35,20 +36,37 @@
\newcommand{\w}{\mathbf{w}}
\newcommand{\f}{\mathbf{f}}
+\newcommand{\naive}{naive\xspace}
\newcommand{\CPS}{CPS\xspace}
\newcommand{\LLPS}{LLPS\xspace}
\newcommand{\LLPSlong}{Local Linear Pictorial Structures\xspace}
% some common mathcals
+\newcommand{\cH}{\mathcal{H}}
+\newcommand{\cC}{\mathcal{C}}
+\newcommand{\cD}{\mathcal{D}}
\newcommand{\cL}{\mathcal{L}}
+\newcommand{\cX}{\mathcal{X}}
\newcommand{\cY}{\mathcal{Y}}
\newcommand{\cR}{\mathcal{R}}
-\newcommand{\cE}{\mathcal{R}}
-\newcommand{\cV}{\mathcal{R}}
+\newcommand{\cE}{\mathcal{E}}
+\newcommand{\cV}{\mathcal{V}}
\newcommand{\reals}{\mathbb{R}}
+\newcommand{\defn}{\triangleq}
\newcommand{\tree}{\Upsilon}
+\newcommand{\attrib}[1]{ \nopagebreak{\raggedleft\footnotesize #1\par}}
+\newcommand{\myquotation}[2]{{\em #1}\\\attrib{#2}}
+
+\newcommand{\secref}[1]{\hyperref[sec:#1]{\textsection\ref{sec:#1}}}
+\newcommand{\equref}[1]{\hyperref[eq:#1]{Equation~\ref{eq:#1}}}
+\newcommand{\algref}[1]{\hyperref[alg:#1]{Algorithm~\ref{alg:#1}}}
+\newcommand{\thmref}[1]{\hyperref[thm:#1]{Theorem~\ref{thm:#1}}}
+\newcommand{\lemref}[1]{\hyperref[lem:#1]{Lemma~\ref{lem:#1}}}
+\newcommand{\tabref}[1]{\hyperref[tab:#1]{Table~\ref{tab:#1}}}
+\newcommand{\figref}[1]{\hyperref[fig:#1]{Figure~\ref{fig:#1}}}
+
\newcommand{\score}[1]{\theta(x,#1)} % score function
\newcommand{\scoremax}[0]{\theta^\star(x)} % argmax score
@@ -70,7 +88,7 @@
%\renewcommand{\includegraphics}[2]{}
%% usual commands
-\newcommand{\todo}[1]{\textcolor{red}{TODO: #1}}
+\newcommand{\todo}[1]{\textcolor{red}{\\{\bf TODO:} #1 \\}}
%\newcommand{\todo}[1]{{\bf{TODO: #1}}}
%\newcommand{\todo}[1]{}
@@ -135,6 +153,8 @@
%\makeatother
+\renewcommand{\algorithmicrequire}{\textbf{Input:}}
+\renewcommand{\algorithmicensure}{\textbf{Ouput:}}
%% specific commands
\newcommand{\trans}[1]{{#1}^{\ensuremath{\mathsf{T}}}} % transpose
View
@@ -1,4 +1,4 @@
-\chapter{Ensembles}
+\chapter{Ensembles} \label{sec:stretchable}
\begin{figure}[t!]
View
@@ -1,3 +1,9 @@
+\chapter{Features}\label{features}
+
+\myquotation{Do not call me a computer vision engineer \ldots I am a perceptual
+scientist!}{Yiannis Alimonous}
+
+
The introduced \CPS model allows us to capture appearance, geometry and shape information of parts and pairs of parts in the final level of the cascade---much richer than the standard geometric deformation costs and texture filters of previous PS models~\cite{felz05,devacrf,ferrari08,andriluka09}.
%Table~\ref{feat_table} lists all features that we use and will describe in this section.
Each part is modeled as a rectangle anchored at the part joint with the major axis defined as the line segment between the joints (see Figure~\ref{fig:ps}). For training and evaluation, our datasets have been annotated only with this part axis.
View
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
View
Binary file not shown.
View
@@ -0,0 +1 @@
+\section{Future work}
View
@@ -0,0 +1,33 @@
+\begin{algorithm}
+\caption[max-sum inference]{Max-sum message passing to solve
+$$\argmax_{y \in \cY} h(x,y) = \argmax \sum_{i \in \cV} \phi_i + \sum_{ij \in \cE} \phi_{ij}$$}
+\label{alg:max-inference}
+\begin{algorithmic}
+
+\REQUIRE $ $ \\
+Factors $\{\phi_i\}, \{\phi_{ij}\}$\\
+Tree graph $G$ with (arbitrary) root node index $r$ and topological ordering $\pi$, where $\pi_n = r$.
+
+\ENSURE $y^\star = \argmax_{y} h(x,y)$
+\FOR{$i = \pi_1, \pi_2, \ldots, \pi_n $ }
+\STATE
+$m_i = \phi_i + \sum_{j \in \text{kids}(i)} m_{j \rightarrow i}$
+\IF{$i == r$} \STATE \textbf{break} \ENDIF
+\STATE
+$p = \text{parent}(i) $
+\STATE
+$m_{i \rightarrow p} = \max_{y_i} \phi_{ip} + m_i$
+\STATE
+$a_i = \argmax_{y_i} \phi_{ip} + m_i$
+\ENDFOR
+
+\STATE
+$y^\star_r = \argmax_{[1 \ldots k]} m_r$
+\FOR{$i = \pi_{n-1},\pi_{n-2},\ldots,1$}
+\STATE
+$y^\star_i = a_i\left[y^\star_{\text{parent}(i)}\right]$
+\ENDFOR
+
+\end{algorithmic}
+\end{algorithm}
+
View
@@ -1,45 +1,28 @@
-Please refer to~\citet{sapp2010cascades}.
-
-\chapter{Human pose estimation}
-
-\chapter{Structured prediction}
-
-\chapter{Pictorial structures: Pose estimation meets structured prediction}
-We first summarize the basic pictorial structure model and then
-describe the inference and learning in the cascaded pictorial structures.
-%\subsection{Basic PS Model}
-Classical pictorial structures are a class of graphical models where the nodes of the graph represents object parts, and edges between parts encode pairwise geometric relationships. For modeling human pose, the standard PS model decomposes as a tree structure into unary potentials (also referred to as appearance terms) and pairwise terms between pairs of physically connected parts. Figure~\ref{fig:ps} shows a PS model for 6 upper body parts, with lower arms connected to upper arms, and upper arms and head connected to torso. In previous work~\cite{devacrf,felz05,ferrari08,posesearch,andriluka09}, the pairwise terms do not depend on data and are hence referred to as a spatial or structural prior.
-%\begin{figure}[]
-%\begin{center}
-%\centerline{\includegraphics[width=0.75\columnwidth]{data/model_parameters2.pdf}}
-%\caption{Basic upper-body model with part state $l$ and part support rectangle of size $(w,h)$.}
-%\label{fig:ps}
-%\end{center}
-%% \vskip -0.5in
-%\end{figure}
-The state of part $i$, denoted as $y_i \in \mathcal{Y}_i$, encodes the joint
-location of the part in image coordinates and the direction of the limb as a
-unit vector: $y_i = [y_{ix} \; y_{iy} \; y_{iu} \; y_{iv}]^T$. The state of the
-model is the collection of states of $M$ parts: $p(ys = ys) = p(y_1 = y_1,
-\ldots, y_M = y_M)$. The size of the state space for each part,
-$|\mathcal{Y}_i|$, the number of possible locations in the image times the
-number of pre-defined discretized angles. For example, standard PS
-implementations typically model the state space of each part in a roughly $100
-\times 100$ grid for $y_{ix} \times y_{iy}$, with 24 different possible values
-of angles, yielding $|\mathcal{Y}_i| = 100 \times 100 \times 24 = 240,000$. The
-standard PS formulation (see~\cite{felz05}) is usually written in a
-log-quadratic form:
-\begin{align}
-p( ys | x) &\propto \prod_{ij} \exp(-\frac{1}{2}||\Sigma_{ij}^{-1/2}(T_{ij}(y_i) - y_j - \mu_{ij})||_2^2) \times \prod_{i=1}^M \exp(\mu_i^T\phi_i(y_i,x))
-\label{eqn:standard_ps}
-\end{align}
-The parameters of the model are $\mu_i,\mu_{ij}$ and $\Sigma_{ij}$, and $\phi_i(y_i,x)$ are features of the (image) data $x$ at location/angle $y_i$. The affine mapping $T_{ij}$ transforms the part coordinates into a relative reference frame. The PS model can be interpreted as a set of springs at rest in default positions $\mu_{ij}$, and stretched according to tightness $\Sigma^{-1}_{ij}$ and displacement $\phi_{ij}(ys) = T_{ij}(y_i) - y_j$. The unary terms pull the springs toward locations $y_i$ with higher scores $\mu_i^T\phi_i(y_i,x)$ which are more likely to be a location for part $i$.
-
-This form of the pairwise potentials allows inference to be performed faster than $O(|\mathcal{Y}_i|^2)$: MAP estimates $\argmax_{ys} p(ys | x)$ can be computed efficiently using a generalized distance transform for max-product message passing in $O(|\mathcal{Y}_i|)$ time. Marginals of the distribution, $p(y_i | x)$, can be computed efficiently using FFT convolution for sum-product message passing in $O(|\mathcal{Y}_i| \log |\mathcal{Y}_i|)$~\cite{felz05}.
-
-While fast to compute and intuitive from a spring-model perspective, this model has two significant limitations. One, the pairwise costs are unimodal Gaussians, which cannot capture the true multimodal interactions between pairs of body parts. Two, the pairwise terms are only a function of the geometry of the state configuration, and are oblivious to the image cues, for example, appearance similarity or contour continuity of the a pair of parts.
-
-\section{Inference tricks (DT, conv)}
-\section{Issues}
-
-\chapter{Thesis contributions}
+\chapter{Introduction}
+
+``Geman quote''
+
+why i love what i do:
+One of the most compelling problems of computer vision is general object
+recognition. The ability for computers or robots to do this is blah
+
+why pose?
+it's super hard: Human pose estimation inherits all the difficulties of object
+recognition
+it shows off computation
+
+philosopical question - humans can do it, babies, dogs can do it - why can't a
+computer
+
+\section{Problem Statement}
+
+defn: 2D means two dimensional
+\subsection{Related problems}
+
+\section{Inherent difficulties}
+\subsection{perceptual}
+\subsection{computational}
+
+\section{PREVIEW OF MY WORK}
+
+
View
@@ -1,3 +1,2 @@
wa
-!pdflatex thesis.tex
-" # && bibtex thesis && bibtex thesis && pdflatex thesis
+!pdflatex thesis.tex && bibtex thesis && bibtex thesis && pdflatex thesis
Oops, something went wrong.

0 comments on commit 0f898dd

Please sign in to comment.