# "WxBS: Wide Multiple Baseline Stereo as a task"
> "Problem definition"
- toc: false
- image: images/wxbs_problems400.png
- branch: master
- badges: true
- comments: true
- hide: false
- search_exclude: true

### Definition of WxBS

Let us denote $ X_{i} $  a set of objects in 3D space (or 4D for space-time). 

Let $O_{j}^{i}$ the observation of object $X_i$. 
Then the set $C_{i}$ of observations $O_{j}^{i}$ of the same object $X_i$ is called a **true correspondence set**. The set can be empty if no observation of the specific object is made. The element of the $C_i$, $c_{i}^{jk}$ is called a **true correspondence** between observation $ O_{j} $ and $O_{k}$ if and only if $O_{j}$ and $O_{k}$ are the observation of the same object $X_i$.

We would also call the correspondence as $c_{jk}$ as "consistent, but incorrect" if $O_{j}$ and $O_{k}$ are the observation of the same object $X_i$, but are wrongly attributed to the other object $X_z$.

Let us give a couple of examples. One of the commonly used cases in the current thesis is the following. The objects $X_i$ are 3d coordinates of some point on object surface in real world, the observations $O_{j}^{i}$ are their 2d projections to the camera place, i.e. pixels. The correspondence might be multimodal, as observation $O_{j}^{i}$  might have different nature: RGB image, painting, IR-imaging or LIDAR-scan.

For example, on image below the object $X_i$ is red circle, observation are blue circles and the correspondences $c_{jk}$ are shown as lines. 

![](imgs/WxBS_house.jpeg "Example of multimodal wide baseline stereo")

Observations do not have to be the down-projections: one could think of observations $O_{j}$ as points in 3d point cloud. And the correspondences would connect two point cloud models of the same object. 


By "**wide baseline stereo**" we understand the process of establishing correspondences $c_{jk}$ and correspondence sets $C_i^{jk}$ from observations $O_j$ under the following constraints: observations $O_j$ belong to different views (camera planes) $V_i$, $V_j$, $V_k$ taken by cameras $K_i$, $K_j$, $K_k$ of the same static rigid scene $S$. In addition to recovering the correspondence set wide baseline stereo process also recovers (unknown) camera poses $K_i$, $K_j$, $K_k$.

By "rigid" we mean that all the parts (objects) $X_i$ do not change their 3d world coordinates and the only "motion" possible is the pose difference of the cameras, which is called "baseline" for the case where there are only two cameras. Any moving object is considered to be an occlusion and not the part of the scene. 

We will call "**wide multiple baseline stereo**" or **WxBS** \cite{Mishkin2015WXBS} if the observations have different nature or the conditions under which observation were made are different. 


The wide baseline stereo, which also outputs the estimation of $X_i$ in form of 3d world coordinates we would call "**rigid structure-from-motion**" or "**3D reconstruction**". We do not consider object shape approximation with voxels, meshes, etc in the current thesis. Nor we consider the recovery of scene albedo, illumination, and other appearance properties. 


![](imgs/wxbs_problems.png "Example of  WxBS problems")

# References

(<a id="cit-Mishkin2015WXBS" href="#call-Mishkin2015WXBS">Mishkin, Matas <em>et al.</em>, 2015</a>) D. Mishkin, J. Matas, M. Perdoch <em>et al.</em>, ``_WxBS: Wide Baseline Stereo Generalizations_'', BMVC,  2015.

