# Report
---

- The work will be evaluated regarding the ability to explore a solution in detail, assess its quality, and
produce critical thinking about it.
- We will pay attention to the quality of the code, as well as the clarity of the report and your ability to present
results.
- Please provide clear instruction for easy installation and testing of your code.

Please communicate the number of hours spent on this test in the report.

### Time spent on code and report
About 35h

### Hardware used

Everything was done on my Linux PopOS 20.04 laptop with a quadro T2000 GPU

# Traditional Computer Vision Approach
---

### Build and Run instructions:

Please checkout the README in AutoGrabCut_Cpp folder

### Approach :

#### Idea & Assumptions :

- Sky region are low entropy areas (no much edge)  
- Sky in located in the top 33% percent of the image
- There is a strong separation between the sky and the rest of the image : there is an edge

The approach is based on the GrabCut algorithm from OpenCV :
["GrabCut": interactive foreground extraction using iterated graph cuts](https://dl.acm.org/doi/10.1145/1015706.1015720)

It is usually used in an interactive manner (the user select the ROI). Here the area of interest in statically define as the top 33% of the image. 

GrabCut from OpenCV uses 4 labels, here the top 33% of the image is labelled as "probable foreground" and the rest of the image as "probable background". 

A graph minCut then occurs to optimize a split between foreground / background, hopefully sky and the rest of the image

#### Edges are not Sky :
* In order to help the grabCut algorithm, edges are detected and removed from the initialization. The initialization mask become "the top 33% of the image, but not where there is an edge"

* This assumption is also exploited as a final stage, to remove from the final binary mask every strong edges of the images. 

* -> This assumption seems decent enough to be also exploited on deep learning mask, as a post processing step

#### Make it faster :
* GrabCut is slow. To speedup the process, the image is resized in two stages pyramid. The grabCut is run first on the smaller resolution to get a first mask approximated. This first result then feed a second GrabCut stage to obtain the final mask.

* In between steps, the intermediary mask is processed with the same "edge trick" and eroded as used as "annotation" to the second GrabCut pass.

#### Final stage :

* the output mask from the last GrabCut is resize to the full scale image resolution. The final mask is processed with morphological operations and blur for de-noising and smoothing.  

* Once again the edges from image are exploited to exclude those area from the sky mask.

### Subjective Performances and Obvious limitation :

* This is fast enough to process video of moderate resolution (720p) at near real time

* It work great in case of un-occluded sky and strong sky / land delimitation.

* However, it struggle when the conditions deviate too much from the assumptions. 

* GrabCut algorithm has some internal random initialization and may give unstable result on static image, from run to run

* The 33% sky in image is a too strong assumptions, and must be manually adjusted in various cases. 

* Vegetation branches is hard (you see the sky through branch). Thin structures, such as poles and wires, are hard 

* Are cloud in the sky (class) ? 

### Objective Performance :

This remains to be done on coco dataset 

### Things I tried and did not work :

* Histogram back-projection :
The idea was to get the color statistic of the segmented sky to further refine the mask. On video, it is unstable, and the mask tends to flash.

When it works, it performs great around contrasted objet, such as tree branch. It basically perform a color based segmentation. The idea was to only trust it on mask edge, in order to refine them.    

### Things I did not tried, but wish I did :

* Machine learning color segmentation (KNN, regression, ...) to categorize pixel based on color value. 
* Machine learning pixel segmentation based on feature+color. (colors + texture)
* block based features classification
* Connected component post processing / filtering (remove region based on size...)
* Some kind of region growing (floodfill, watershed) from color priors (like photoshop magic wand)
* explore colorspace

# Deep Learning Approach
---

## Dataset Preparation

**00_Dataset** contains scripts and notebook to generate binary mask images from Coco and ADE20K dataset

both dataset mask are exported in a single folder (one per dataset) along with link to rgb images 

### COCO
A pure python script read the annotation, get the proper labels for a given image, and save the mask accordingly

### ADE20K
A bit simpler than coco, as RGB color value in annotation images encode the class. 

## Build and Run instructions
Setup conda : 
```
conda env create -f environment.yml
```

in case it doesn't work, checkout full environmentWithVersion.yml 



## Project description

unet

# Going Further

## TODO 

- save the test image list (save the path)
- move confusion matrix code to lib
- build an evaluator from image + gtMask + prediction folder
- update the cpp to save a binary mask
- evaluate the cpp solution on the same test images

- train and evaluate on ADE20K