# "WxBS: Wide Multiple Baseline Stereo as a task"
> "Problem definition"
- toc: false
- image: images/wxbs_problems400.png
- branch: master
- badges: true
- comments: true
- hide: false
- search_exclude: true

### Definition of WxBS

Let us denote observations $O_{i}, i=1..n$, each of which belongs to one of the views $V_{j}, i=1..m$, $m \leq n$. 
Observations can be, for example, pixels and views are images, respectfully. Observations and views can be of different nature and dimentionality. E.g. $V_1$, $V_2$  - RGB images, $V_3$ - point cloud from a laser scaner, $V_4$ - image from a thermal camera, and so on. 

We will call two of observations $(O_{i},O_{k})$, a correspondence $c_{ik}$ if they are belong to different views $V_{j}$. The group of observations is called correspondence set $C_o$, when there is exactly one observation $O_i$ per view $V_j$ inside, some of observations $O_i$ can be empty $\varnothing$, i.e. not observed in the specific view $V_j$. Multiple correspondences are called consistent if they form a correspondence set.

We can now define a wide baseline stereo. 

By **wide baseline stereo** we understand the process of establishing correspondence sets \(C_o\) from observations \(O_i\) and images $V_{j}$ under the following constraints:
- images \(V_j\) belong to camera planes taken by cameras \(K_j\) of the same static rigid scene \(S\). 

In addition to recovering the correspondence sets wide baseline stereo process also recovers (unknown) camera poses \(K_i\), \(K_j\), \(K_k\). 

By "rigid" we mean that the only "motion" possible is the pose difference of the cameras, which is called "baseline" for the case where there are only two cameras. 
We could also assume some scene structure or model, consisting of latent objects $X_i$, which we are could only observe. 

For example, on image below, observations $O_i$ are blue circles and the correspondences $c_{jk}$ are shown as lines. The assumed object $X_i$ is a red circle.

![](imgs/WxBS_house.jpeg "Example of multimodal wide baseline stereo")

We will call "**wide multiple baseline stereo**" or **WxBS** \cite{Mishkin2015WXBS} if the observations have different nature or the conditions under which observation were made are different. 

The different between **wide baseline stereo** and **short baseline stereo**, or, simply **stereo** is the follwong. In **stereo** the baseline is small -- less then 1 meter -- and typically known and fixed. The task is to establish correspondences, which can be done by 1D search along the known epipolar lines. 

In contrast, in  **wide baseline stereo** the baseline is unknown, mostly unconstrained and the viewpoints of the cameras can vary drastically.
 
The wide baseline stereo, which also outputs the estimation of the latent objects, e.g. in form of 3d point world coordinates we would call **rigid structure-from-motion** (rigid SfM) or **3D reconstruction**. We do not consider object shape approximation with voxels, meshes, etc in the current thesis. Nor we consider the recovery of scene albedo, illumination, and other appearance properties. 

While the difference between **SfM** and **WBS** is often blurred and the terms are used interchangeably, we would  consider WBS as a part of SfM pipeline prior to recovering 3d point cloud.  

![](imgs/wxbs_problems.png "Example of  WxBS problems")

# References

(<a id="cit-Mishkin2015WXBS" href="#call-Mishkin2015WXBS">Mishkin, Matas <em>et al.</em>, 2015</a>) D. Mishkin, J. Matas, M. Perdoch <em>et al.</em>, ``_WxBS: Wide Baseline Stereo Generalizations_'', BMVC,  2015.

