# <font style = "color:rgb(50,120,229)">Object-Contextual Representations for Semantic Segmentation</font>

## <font style = "color:rgb(50,120,229)">Paper Details</font>

1. **Authors**: Yuhui Yuan, Xilin Chen, Jingdong Wang
2. **Paper Link**: https://arxiv.org/pdf/1909.11065v2.pdf
3. **Category**: Semantic Segmentation

## <font style = "color:rgb(50,120,229)">Introduction</font>

**Semantic Segmentation**: Assigning a label to each pixel in an image.

**Approach**: Contextual Aggregation.

**Motivation**: Class label assigned to one pixel is the category of the object that the pixel belongs to.

### <font style = "color:rgb(8,133,37)">What does Context mean?</font>

<div class="alert alert-block alert-info">
The context of one position typically refers to a set of positions, e.g., the surrounding pixels.
</div>

If we refer to another paper (**Context Based Object Categorization: A Critical Survey**), Contextual Features are used to represent the interaction of an object with its surroundings. It can be divided into the following 3 categories:
1. **Semantic Context** - This focuses on object co-occurence and allows to correct label of one object without affecting the label of other objects. For example, a tree is more likely to co-occur with a plant than a whale.
2. **Spatial Context** - This focuses on the position of objects. For example, a dog is more likely to be present above grass and below sky rater than above sky and below grass. 
3. **Scale Context** - This focuses on relative size of objects. For example, a car is relatively smaller than a truck and not the other way round.

### <font style = "color:rgb(8,133,37)">Approach</font>
The approach discussed in the paper consists of the following 3 steps:

1. **Coarse Soft Segmentation** - This involves dividing the contextual pixels (surrounding pixels) into soft object regions. The word "soft" here means that our focus is NOT on carrying out accurate segmentation.
2. **Object Region Representation** - We use the soft segmentation obtained from the above step and the pixel representation to represent each object region.
3. **Object-Contextual Representation** (OCR) - We use the output from the above 2 steps along with Pixel-Region relation to obtain the augmented representations.

<img src="images/paper1/image01.png" alt="Pipeline of the approach" title="The Pipeline of the approach" />
<center><b>Figure 1</b>: The pipeline of the approach discussed in the paper. <a href="https://arxiv.org/pdf/1909.11065v2.pdf">Source</a></center>

### <font style = "color:rgb(8,133,37)">Differences</font>

**OCR vs Multi-Scale Context**

1. OCR differentiates contextual pixels which belong to the same class to the contextual pixels which belong to different class.
2. Multi-Scale Context approach only differentiates pixels present at different positions.

**OCR vs other Relational Context schemes**

The approach discussed in the paper considers not only the object region representations but also the pixel and pixel-region relations, unlike other approaches.

It should also be mentioned here that the current approach is also a relational context approach.

**OCR vs Coarse-to-fine Segmentation**

While "Coarse-to-fine Segmentation" is also followed in the current approach, the difference is the way the coarse segmentation is used. The OCR approach uses the coarse segmentation to generate a contextual representation, whereas the other approaches use it directly as an extra representation.

**OCR vs Region-wise Segmentation**

The region-wise segmentation first groups the pixels into **super pixels** which are then assigned a label. OCR on the other hand, uses the grouped regions to learn a better labelling for the pixels, instead of directly using them for segmentation.

## <font style = "color:rgb(50,120,229)">Approach</font>

It's now time to go into the mathematical details of the approach.

<div class="alert alert-block alert-info">
<b>Semantic Segmentation:</b> Given $K$ classes, assign each pixel $p_i$ of image $I$ a label $l_i$ (which is one of the $K$ <b>unique</b> classes).
</div>



## <font style = "color:rgb(50,120,229)">References</font>
1. Object-Contextual Representations for Semantic Segmentation - https://arxiv.org/pdf/1909.11065v2.pdf
2. Context Based Object Categorization: A Critical Survey - https://vision.cornell.edu/se3/wp-content/uploads/2014/09/context_review08_0.pdf
3. Jupyter Markdown - https://www.ibm.com/support/knowledgecenter/en/SSGNPV_1.1.3/dsx/markd-jupyter.html