# <center>Computer Vision Team 3: project 2 </center>

# Introduction

While the previous project focused on edges and lines detection, the limitations observed were that curved lines were not recognized by algorithms such as Hough and Ransac. This is an important restriction, as most lines present on images are curved and far from straight. As the basic object containing curves is the ellipse, the goal of this project is to develop the necessary modules to detect and match ellipses on images, as well as assessing their performances

Detecting ellipses in an image is actually part of one of the major field of computer vision: object classification. Object classification includes detecting given objects, localizing and matching them. In the following sections, these different tasks will be analysed one by one.


# Task 2.1: Performance assessment of line segment detection

## Preprocessing

Csv files containing the annotations of the different groups were provided, so it was necessary to ensure 
that the information contained in them was relevant. First, we took all the annotations from 
"CV2019_Annots.csv", isolated all the annotations corresponding to line images. As soon as there was a 
wrong annotation, we deleted all the annotations from the image in question and stopped considering it. 
It was important not to introduce a bias in our performance measures in 2.1. As an erroneous annotation 
for the lines, there were: 1) People who drew ellipses on line images 2) Multi-lines 


## Performance assessment

To assess the performance of our line detection module, we used one metric that can translate to a confusion matrix. The principle is the following :  

 1) Using our line detection algorithm, draw every line detected from image 'x' on an empty black image with the same size as image 'x'. Every pixel drawn this way will have an RGB value of (155,155,155). We can vary the thickness of the line drawn. This is our primary parameter to change the outcome of our results.  
 2) Using the ground truth pixel values, draw every line on an empty black image with the same size as image 'x'. Every pixel drawn this way will have an RGB value of (100,100,100). We can also vary the thickness of the line drawn. We vary it the same as the thickness from our line detection algorithm to keep everything consistent.  
 3) Simply add both image together. This will create another image with the same size as image 'x'. Everytime a pixel is present in both the ground truth and our line detection algorithm, it will have a value of (255,255,255).  
 4) Parse every pixel in the image and classify it depending on its RGB value.  
 
<img src="ImagesReport/algoResult.png" alt="Drawing" style="width: 650px;"/> 
 
The 4 classifications of the confusion matrix are :  
 
 a) **True positive** : If the parsed pixel has a value of (255,255,255) it means it is detected in both the ground truth and the line detection image. It means we guessed there was a line there correctly.  
 b) **True negative** : If the parsed pixel has a value of (0,0,0) it means it is empty in both the ground truth and the line detection image. It means we guessed there was no line there correctly.  
 c) **False positive** : If the parsed pixel has a value of (155,155,155) it means it was detected as a line in the line detection algorithm but not in the ground truth image. It means we guessed there was a line but in reality there was no line.  
 d) **False negative** : If the parsed pixel has a value of (100,100,100) it means that our algorithm guessed there was no line there but in the ground truth there was a line.  
 
As mentioned previously, we are able to modify the thickness of the drawn line and this can impact significantly our results in the confusion matrix. The impact also varies greatly between image types.  
 
There is one consistent result across all image type : If we increase the thickness, the overall 'Good hits' (True positive + True negative) diminshes while the overall 'Bad hits'(False positive + False negative) increases.  
 
Here are the 3 confusion matrices and the reading key: 

<img src="ImagesReport/key.png" alt="Drawing" style="width: 600px;"/> 

<img src="ImagesReport/sudoku.png" alt="Drawing" style="width: 600px;"/> 

<img src="ImagesReport/road.png" alt="Drawing" style="width: 600px;"/> 

<img src="ImagesReport/soccer.png" alt="Drawing" style="width: 600px;"/> 

We can see that in the sudoku algorithm if we go beyond 3 thickness, the Good Hits to Bad Hits ratio does not really change but we get more exchange true negatives for true positives. This makes sense because we are filling the image with more colored pixels.  
 
For the road and soccer detection, we used the same line detection algorithm as the sudoku instead of using the deep learning algorith that we really used during the first part of the project. We made this decision because the deep learning added non-negligeable computation time and the objective of this project was not to have the best line detection algorithm but simply to classify the results. We noticed that by increasing line thickness we barely increased the true positive ratio but increased significantly the false positive ratio.  The thinner the line the more accurate the results were. Also, there were no significant differences in computation time depending on the line thickness.  


# Task 2.2: Ellipse matching

Ellipse matching consists in first detecting the presence of ellipses and their number, then to perform regression on either parameters of the ellipses or the parameters of boundind boxes fitting the ellipse of the image.

## Preprocessing tasks

The first thing we did was to check if the annotations of the images were correct. We have isolated the ellipse images from "CV2019_Annots.csv" and placed them, as well as their annotations in a particular csv called "Ellipses.csv". At first sight, the information containing these annotations did not seem erroneous so we started creating a dataset for ellipses counting, the 
simplest step in this part of the project. To do this, we have gathered all the "elps" images in a single Ellipses folder. Then, we created a script to match our input images and corresponding annotations. We realized that the images ‘elps_soccer03_1295’ and ‘elps_soccer03_1354’ did not have any annotations, so we deleted them from our dataset. As with ellipses counting, images without ellipses are also relevant, we have created a folder containing all images of this type. 

In order to improve the results of ellipse matching, some preprocessing and some edge detection methods (Canny and HED) were used on the images, coupled to a Gaussian Filter. These methods were chosen as they lead to what appears to be the best feature highlighting of the images. However, this kind of preprocessing didnt bring any better results and actually made the results far worse.

The only preprocessings performed were: for the RandomForests and the boosting algorithms, the images were turned to black and white, and resized.

For the ResNet algorithm, the images were resized but kept in color.

## Ellipse detection

This part of the project focused on detecting the presence and the number of ellipses on a given image.

For this, several algorithms of machine learning were tested. First, RandomForests were tested by turning the images in black and white, which ended up having pretty good results. Then, boosting algorithms such as Adaboost, XGBoost and gradient booster were implemented. Finally, a ResNet-50 was adapted from an online code as ResNets are the current most powerful machine learning techniques in image classification on ImageNet, and are widely used in many computer vision applications.

Concerning the number of ellipses in the case of the soccers, another method developed using YOLO and is detailed further. This method only works with the soccers however.

Nonetheless, a default was quickly detected in this approach. Given the very small amount of images labelled with 3 ellipses ( only 5 in a dataset of over 5000 images) and 2 ellipses (around 150), it was difficult to correctly train the models on these classes and be sure that the results are reliable, if they are themselves even good. Different possibilities were therefore considered:

- The first one was to put all 5 images in the training set to get a model as trained as possible on these kind of images, but with no way to evaluate it
- The second was to put all 5 images in the testing set, but this meant that the model was not even aware that there was a 4th class.
- The third possibility was to put a share of the 5 images in each set, 2 or 3 in each one. But even if good results end up being achieved, nothing permits to make sure that this is not the result of luck, and that it would generalize to other similar images.
- Finally, a fourth possibility was to perform data augmentation (such as oversampling on the minority classes using SMOTE, undersampling on the majority class with near miss, or performing data manipulation). Unfortunately, these kind of data augmentation are either difficult to apply on images, or have the drawback to introduce a bias in the model by changing the proportions of each class in the dataset. It is however possible to address the later problem by increasing the number of images with 3 ellipses while keeping a limit of around 50. This would result in a proportion of 1% and, since there is currently one image out a thousand in that category, this would not change drastically the repartition of the data.
    
But since the 3-ellipses images represent less than a perthousand of the dataset and could therefore be considered as a "detail", it was decided to more or less ignore the problem for now. The following results will therefore be expressed without adressing any concern to this particular issue of certainty of results. This is why the main metric used to represent the quality of the predictions of the following algorithms is the accuracy. Confusion matrices and classification report metrics were also used on boosting algorithms and RandomForests (computed in a python file modelLauncher.py)


### 1. ResNet

Due to the universal approximation theorem, it has been proved that a neural network with a single hidden layer is able to approximate any function which could be met by changing the number of neurons in the layer. However, it has also been proved that models with very large hidden layers were quickly overfitting. For this reason, scientists tried to make neural networks deeper instead of larger.

However, a major problem when building a deep neural network is that a phenomenon called vanishing gradient can happen. When the network grows deeper, the accuracy decreases but this is not caused by overfitting but by repeatedly applying the chain rules when performing backpropagation. This problem was limiting the depth of neural networks until the ImageNet competition of 2015, where an algorithm solving this problem was presented. This algorithm is called ResNet.

<img src="ImagesReport/Resnet.png" width=400 >
<center> Structure of a residual layer of a ResNet </center>

The image above shows the structure of a residual layer of the ResNet. A shortcut composed of an identity mapping allows to feed the input directly to the output. This architecture allows to build deeper networks because shortcutting the blocks with the gradient helps to preserve it while backpropagating. This structure also allows an easier optimization, even for small networks.

The ResNet used here is implemented from https://github.com/priya-dwivedi/Deep-Learning/tree/master/resnet_keras. It uses a ResNet-50 prebuilt by Keras. This helps in quickly building the ResNet instead of building it from scratch. The weights are uninitialized. Also, the final pooling and fully connected layer of the original model is not included. Instead, a Global Average Pooling layer was added as output to the model.

Results of the ResNet algorithm were highly dependent on the training run. Indeed, due to the algorithm's nearly infinite flexibility (23 000 000 tunable parameters), overfitting on the training set could occur very quickly, leading in a high variance of the algorithm over the different runs. Nonetheless, the algorithm was able to reach the impressive accuracy of 96%, and often crossed the 90% accuracy threshold. Some models with their datasets were saved and are available on the GitHub.

### 2. RandomForest

In machine learning, one of the simplest algorithm to understand is the Decision Tree. A decision tree will classify data the same way humans do: it will select in what is called a node, among a set of input features the parameter and the threshold which will separate (called *splitting*) the data while reducing to the maximum an impurety (generally, the Giny impurety). By performing this operation repeatedly, it will be able to classify the training data, even perfectly if its depth was not limited.

While the performances of decision trees on training sets are excellent, they have an important default: they have a high tendency to overfit. Because of this reason, an algorithm has been developed, which is able to reduce the overfitting problem of decision trees by combining the results obtained by multiple trees. This algorithm is called RandomForest.

RandomForest is an algorithm which not simply combines multiple decision trees. By performing a random sampling of training data points when building trees and by limiting the features available at each node split, it is able to add randomness in the training of the trees and thus prevents the algorithm to overfit.

Concerning the implementation of the RandomForest, we chose to use Scikit-Learn's random forest classifier, a free Python library for automatic learning. Reading the documentation of this function, we realized that it could only take 2D tables as input: that is, a table of type number of X samples as input vector. As in computer vision, black and white images usually give better results, we started by converting our ellipse dataset into color greyscale image, thus removing a dimension. Since the 2D tables taken by s Scikit-Learn's as input are of fixed sizes, we reshaped each image in the 320x240 format, that is, the eye format, allowing us to change the dimensions of only a small number of images. In fact, since the calculation time is directly proportional to the number of parameters used, it was therefore wise to use the smallest possible format while keeping enough information on our images, a trade off therefore. We then reshaped the corresponding 2D table into a one-dimensional vector. In csv file, when an image contained multiple ellipses, its name was inscribed as the first element on several lines, 1 line per ellipse. So we simply used the Counter function of the collections library, which allows us to count the number of appearances of each first line element. From there, we had some of our outings. for images without ellipses, we simply associate a vector of 0 with the corresponding images. Our dataset was at the end of the form: 

- Input: 2D table with each line corresponding to the pixels of a reshaped Black/White image as a one-dimensional vector. 

- Output: Column vector corresponding to the number of ellipses per image. With the function Random forest, a first method was thus trained. Afterwards, we tried a series of other preprocessing (thresholding, edge detection,...) on different methods (Adaboost, Gradient booster,...), these and their results are detailed below. We also looked at the Resnet network and created a dataset with identical outputs but inputs replaced by a 4D table since Resnet takes color input images. In this case, the 3D images are thus stacked on the first dimension in order to respect the input format of Resnet 

The parameters of the RandomForest which were played with are n_estimators, max_depth, max_features, warm_start, but overall the default parameters already achieved excellent performances.

Results with Random forests were continually improved, until it was concluded with a model that managed to reach up to 94% accuracy on the testing set.

### 3. Boosters

In order to take into account the minority classes nonetheless, it was decided to investigate Boosting algorithms. These algorithms present the advantage to focus on the "difficult" cases that previous sub models failed to classify. The results were not perfect, but helped in solving the minority classes problem, while remaining robust for the rest of the dataset and avoiding any data augmentation method for now. 
    In this context, the following algorithms were implemented: Adaboost, XGBoost and the classic Gradient Booster methods, either available on sklearn or on the official python documentation. 

#### a. Adaboost

Adaboost is a boosting algorithm that takes a group of weak models (which are here decision trees) and combines their predictions into a global prediction by attributing each individual learner a separate weight that represents how well it predicts the results. The final prediction is simply a vote, where each individual model makes its prediction. When a model predicts a given class, it gives its weight to that class. At the end, all these weight are summed, and the class with the higest weight value is chosen as the global prediction. 
It is also important to note that each model is built by taking into account the errors of the previous model. For this, each sample used to train the model is attributed a weight, set to 1/N at the beginning, where N is the number of samples. Then, each time a model is built, the weights of these samples are modified. An incorrectly classified sample will have its weight increased, and a correctly predicted sample will have its weight decreased. Then, when creating the next model, it will be built by paying more attention to samples with higher weights. This way, hard to classify cases are being focused on by the next models and this helps increasing the accuracy of the general model. 
This method can be adapted for regression.

Adaboost produced acceptable results gravitating between 70 and 80% of accuracy for around 10-30 estimators, and a learning rate of 1, but when trying to raise the amount of estimators, the program never managed to finish successfully and the method was thus abandoned. It was also noticed that while the classification of the mainly populated classes was not bad, the classification of the less populated classes was most of the time completely failed, as expected. 

#### b. Gradient Booster

Here is a citation of Jake Hoare on the DisplayR blog explaining the principles of gradient boosting alogorithms: "... Gradient boosting is a type of machine learning boosting. It relies on the intuition that the best possible next model, when combined with previous models, minimizes the overall prediction error. The key idea is to set the target outcomes for this next model in order to minimize the error. How are the targets calculated? The target outcome for each case in the data depends on how much changing that case's prediction impacts the overall prediction error:

If a small change in the prediction for a case causes a large drop in error, then next target outcome of the case is a high value. Predictions from the new model that are close to its targets will reduce the error.
If a small change in the prediction for a case causes no change in error, then next target outcome of the case is zero. Changing this prediction does not decrease the error.
The name gradient boosting arises because target outcomes for each case are set based on the gradient of the error with respect to the prediction. Each new model takes a step in the direction that minimizes prediction error, in the space of possible predictions for each training case..."

The gradient booster algorithm with around 600 estimators and a learning rate of 0.9, gave the best results for the booster algorithms. It managed to reach around 94% match rate on the testing set, and 100% on the training set, reaching a combined score of 98%. It could be observed that less populated classes were also fairly well evaluated.

#### c. XGBoost

XGBoost is a variant of the gradient booster algorithm.

XGBoost gave results relatively similar to Adaboost, but slightly better as it allowed to run a bigger amount of estimators. Results were still under the bar of the 90% though, which was deemed unsatisfying. 

### 4. Yolo

The Yolo algorithm used in the next section is able to detect bounding boxes on ellipses of soccer images. This also means that, starting from the bounding boxes detected by Yolo, it is extremely easy to determine the number of ellipses on an image as it just corresponds to the number of bounding boxes.

For more details on YOLO, see the description in the Ellipse matching section.

### 5. Potential ideas

Stacking with bad models


## Ellipse matching

### 1. Regression task on bounding boxes

This task was intended to fit bounding boxes around ellipses present on a soccer field image. The parameters of the bounding boxes were the x_min, y_min, x_max and y_max of the bounding box, which were the minimal and maximal boundaries of the bounding box on the two axis of the image.

#### a. Yolo

Concerning the regression task on bounding boxes, different algorithms were tested (e.g. R-CNN) but the only one which gave satisfying results was the YOLO algorithm.

The first deep-learning-based object detector developed in computer vision was the R-CNN. The original R-CNN is a two stage object detector which follows the following steps:

- It first locates candidates of a bounding box in the image using a search algorithm such as Selective Search
- Then, it passes those regions into a CNN for classification, which classified those regions as bounding boxes or not

While this kind of procedure is extremely accurate, it also has a major problem: it is extremely slow in terms of computation time. Even the following variants of the R-CNN (Fast R-CNN or improvements of the R-CNN by replacing the Selective Search by a Region Proposal Network (RPN)), the speed remained the limiting factor of these algorithms due to the two stages. This is for this reason that YOLO was developed.

YOLO works by applying a neural network to the complete image. This NN divides the image into probability regions, and predict bounding boxes based on these probabilities. This way, the neural network is influenced by the context of the image, as it analyses the complete image 

Three versions of YOLO were developed to this day. The one used in the code is the third version. Here is the architecture of the algorithm.

<img src="ImagesReport/archi.png" width=400 >
<center> Architecture of Yolov3 </center>

As can be seen in the image above, the architecture is an aternance of convolutionnal and residual layers. There are 53 convolutionnal layers which is why it is called darknet-53. The total number of layers is 252.

The code is adapted from https://blog.insightdatascience.com/how-to-train-your-own-yolov3-detector-from-scratch-224d10e55de2. As recommended on the github page the pretrained weights from the yolo model from their website https://pjreddie.com/darknet/yolo/ was used as a base.

By changing the axes of the coordinates of the bounding boxes drawn by hand with Cytomine, it was possible to directly re-use the code to train YOLO on detecting ellipses. The steps followed are:

- Separation of the soccer images into training and test set
- Conversion of the bounding boxes coordinates to the ones used in the code, then conversion of the file to the format used by YOLO
- Starting from pretrained weights downloaded from Darknet, YOLO training on the training soccer images and labels
- Predictions on the test sets

Concerning the implementation of the code, the input of the network is (416x416) and the program automatically converts the images and bounding boxes to fit this dimension. The network is then trained in two steps. First, the 249 first layers are frozen, leaving only the 3 last layers to change. Proceeding by batches of 32, the training goes through 50 epochs. Then, all the layers are unfrozen and another 50 epochs are perfomed using a batch size of only 4 as it requires more space. Over the epochs of the second phase the learning rate is adjusted. The optimizer used is Adam (https://arxiv.org/abs/1412.6980). The parameters tuned are the confidence threshold (i.e. the treshold probability above which a region is considered as a bounding box) fixed to 0.25. The two following graphs show the evolution of the loss over time.

<img src="ImagesReport/loss1.png" width=400 >
<center> Loss of the yolo algorithm during the first training phase </center>

For the first phase the loss starts at high values but quickly diminishes. Over the epochs the loss decreases slower. In the lasts epochs of the first part, the loss changes at a rate of about 0.5 per epoch. The loss over the validation set is consistently lower than the one of the training set.

<img src="ImagesReport/loss2.png" width=400 >
<center> Loss of the yolo algorithm during the first training phase </center>

Over the second part the loss still follow a negative exponential - like shape. The total improvement is not much, and the early stopping kicks in at a loss of approximately 12.

Using the obtained weights for the model good results are obtained. The following images come from the testing set and have not been processed before by the network. 

<img src="ImagesReport/elps_soccer03_2207_catface.png" width=400 >
<center> Image of a correct prediction by the yolo algorithm </center>

In a large majority of cases the network finds all ellipses with pretty accurate bounding boxes. Usually, if there is a problem in a bounding box, it is that the box is too small or too large, which could be a result of approximate human annotations. For example if we take the human annotations of the image above we can see the blue boxes on the image below. The left box is clearly too large.   

<img src="ImagesReport/annot.png" width=400 >
<center> Image of a human annotation using Cytomine </center>

In a very small proportion of images the network can miss an ellipse. The other problem is that it sometimes draws several boxes on the same ellipse but could be mitigated by increasing the confidence necessary to draw a box or by refusing to draw a box if to close to another. Both these problems are illustrated in the following image.

<img src="ImagesReport/elps_soccer03_2189_catface.png" width=400 >
<center> Image of a wrong annotation of the yolo algorithm </center>

Additionnal image results are available in the provided code.


### 2. Regression task on ellipse parameters

This task was intended to fit 5 geometric parameters of the ellipses on eyes' pupils. It was performed on eyes images whose pupils were present. Indeed, considering the previous works on ellipse detection, it was considered that the algorithms were able to separate images into images with and without ellipses. This way, the training set used for the regression task was the set of images containing an ellipse only.

The parameters chosen are the two coordinates of the center, the orientation of the ellipse and the half-length of the major and minor axes.

#### Preprocessing

The preprocessing for this part is very similar to what was done for the classification of the number of ellipses. The only difference is that this preprocessing had for final purpose a regression on the parameters of the ellipse: the coordinates of the center of the ellipse, the length of its long axis, that of its small axis and its orientation in space. These were obtained with fitEllipse function of open CV applied to the coordinates of the eye points available in the file Annots.csv. The coordinates provided by Cytomine were in x-y format (orthonormalized axes), we converted them into a reference system of type numpy array (axis centers at the top left of the screen rather than the bottom left) 

For this regression, we focused on the random forest regressor function, the data being in the following 
format: 
- Input: each line contains a reshaped greyscale image in a one-dimensional vector 
- Output: each line contains the 5 ellipse parameters 

#### a. RandomForest

The first try to the regression task was to apply random forests on the images. This led surprinsingly to really good results from the get go, reaching predictions which were already fitting ellipses pretty well on the eyes images.

It became pretty obvious quickly that the most problematic parameter was the angle, which here was wrong on average by around 45°. 
We tried other kind of models, including gradient boosting, SVR, neural networks, in order to get better prediction on all 5 parameters. But these methods proved to be inefficient. Some parameters would get slightly better and other slightly worse results, and none was rally improving the angle accuracy, which was our main goal. 
In order to counter this problem, we decided to build a single model specifically dedicate to the prediction of this specific parameters. We used again various models such as Adaboost, gradient descent, LGBM, XGBoost, neural networks, SVR. LGBM and XGBOost never managed to finish running, while neural networks, adaboost managed to reduce the error on the angle to 21°, which is a good improvement compared to our previous 46°
But the real improvement came from gradient descent, which we tuned to TODO, and managed to reach a 16° error average. 
A bit more tuning allowed us to reach 9.9°  average error, which was considerably better and when applied on the actual images, delivered pretty good and consistent results. We re computed the error average on the images we applied it on, and we reached a 2° error difference. We seeded the separation of the dataset in training and testing sets with the same seed.

Here are some results of the ellipse regressions:

<table><tr><td><img src="ImagesReport/no_correc_image1.png" width=250 height=250></td><td><img src="ImagesReport/no_correc_image2.png" width=250 height=250></td><td><img src="ImagesReport/no_correc_image3.png" width=250 height=250></td></tr></table>

<table><tr><td><img src="ImagesReport/no_correc_image4.png" width=250 height=250></td><td><img src="ImagesReport/no_correc_image5.png" width=250 height=250></td><td><img src="ImagesReport/no_correc_image6.png" width=250 height=250></td></tr></table>

Important: we realized the angle correction code was in fact not completely reliable as it required an enormous training set to work correctly and thus, the predictions are tested on too few images. However, this was found too late to be corrected. Hence, additional discussion had to be deleted from the report as it was not reliable anymore. Here are however some images obtained using the angle correction code:

<table><tr><td><img src="ImagesReport/correc_image1.png" width=250 height=250></td><td><img src="ImagesReport/correc_image2.png" width=250 height=250></td><td><img src="ImagesReport/correc_image3.png" width=250 height=250></td></tr></table>


# Task 2.3: Image annotation

This task was performed on Thursday, the 21 of November. It consisted in adding manual annotations on a set of images using the tools provided by Cytomine. These images and their annotations have then been used in the other tasks to train the algorithms and assess their performances 


# Task 2.4: Performance assessment of the ellipse matching module

Concerning the performances assessment of the different algorithms used to match ellipses, it is important to note that their classification ability is compared to that of a human. Indeed, the dataset of training images at our disposal contains images labelled by humans (see task 2.3) so an algorithm able to perfectly classify those images would have performances comparable to humans (they may even be better if the algorithm is able to correctly classify images which were not correctly classified during step 2.3, however this aspect will not be analysed).

Here are the performances obtained for each module of the project.


## 1. Classification task performances

The main metric used for the classification is the accuracy. The different results of maximum accuracy obtained for each algorithm are: 

- Resnet: 96% accuracy
- RandomForests: 93.9% accuracy
- GradientBooster: 93.9% accuracy
- XGBoost and Adaboost: < 90% accuracy

Different models trained and their datasets are available on the GitHub of the project.

Furthermore, confusion matrices and classification report metrics were also computed on boosting algorithms and RandomForests.
The metrics were computed in modelLauncher.py, which loads a model and computes the different metrics based on its predictions on a given test set.


## 2. Regression task performances on bounding boxes

### Yolo

To evaluate the performances of the bouding boxes detection, a simple metric called 'Intersection over union' was used. This metric is interesting because it is almost impossible to get a perfect match to the ground truth when using bounding box detection. Since it is a percentage, it can easily classify which images are detected more efficently. The results obtained were excellent. Indeed, the mean IoU performances were equal to 0.8097. Furthermore, the median also reached 0.8689, meaning that 50% of the data achieved a score over that value. Finally, here is an histogram showing the performances distribution of YOLO:

<img src="ImagesReport/histogram.png" width=400 >
<center> Histogram of the distribution of predictions by performances </center>


## 3. Regression task performances on ellipse parameters

To measure the accuracy of the model, it was decided to compute the average error between the predicted values and the expected values for all 5 parameters. The squared root of the square value of the difference as applied to insure positive errors elements. Finally, the errors were averaged on the number of images to have errors representing the regression of the test set itself, instead of the error on each image. This allowed to obtain metrics representative of the global quality of the predictions of the regression model.

The errors obtained are in average: 

- x diff: 4.57
- y diff: 5.28
- main axis diff: 5.4
- minor axis diff: 7.26
- angle: 45 (without angle correction) / 10 (with angle correction, maybe biased!!)

NB: the angle correction code was in fact not completely reliable as it required an enormous training set to work correctly and thus, the predictions are tested on too few images.

These metrics were computed in the eyes_regression.py file, which is the file combining the two regression models to compute the final parameters of the ellipses.


## Work share
Damien : Yolo, IoU, performances, report

Julien : classification, regression, performances, report

Alexandre : YOLO, ResNet, performances, report 

Pierre : Classification, regression, performances, report

Jérémie : Line performances assessment, report

# References

## Task 2.1

Computer vision team 3: project 1

## Task 2.2

### ResNet

https://towardsdatascience.com/understanding-and-coding-a-resnet-in-keras-446d7ff84d33
http://pabloruizruiz10.com/resources/CNNs/ResNets.pdf
http://arxiv.org/abs/1512.03385
https://towardsdatascience.com/understanding-and-coding-a-resnet-in-keras-446d7ff84d33
https://stackoverflow.com/questions/54537674/modify-resnet50-output-layer-for-regression
https://github.com/priya-dwivedi/Deep-Learning/tree/master/resnet_keras

### RandomForests

https://towardsdatascience.com/understanding-random-forest-58381e0602d2
https://towardsdatascience.com/random-forest-and-its-implementation-71824ced454f
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
http://www.montefiore.ulg.ac.be/~lwh/AIA/model-evaluation-29_10_2012.pdf
http://www.montefiore.ulg.ac.be/~lwh/AIA/ensembles-21-11-2017.pdf
https://www.quora.com/What-are-the-advantages-and-disadvantages-for-a-random-forest-algorithm
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

### Boostings

https://scikit-learn.org/stable/modules/ensemble.html#adaboost
https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html
https://xgboost.readthedocs.io/en/latest/python/python_api.html
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
https://www.displayr.com/gradient-boosting-the-coolest-kid-on-the-machine-learning-block/
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostRegressor.html?fbclid=IwAR2GXdrlLkhW8O7BvhmBPihq9rT_8dH9XrsfhAqogVsvyddMN1MFO8inerU
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html?fbclid=IwAR2W9FTQsGGXGyHCS6T3M2yq3e2pFqBQgTOQx5kLL3OFA6fs8QwMXihbG5k
https://www.youtube.com/watch?v=LsK-xG1cLYA&fbclid=IwAR1TngUrzGCIk7z6_7Q-4tNg32_cCp4I2TUbx6yRbp1Zgr2nmxnOskQ3IHk
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html
https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostRegressor.html


### Yolo

https://www.pyimagesearch.com/2018/11/12/yolo-object-detection-with-opencv/
https://pjreddie.com/darknet/yolo/
https://pjreddie.com/media/files/papers/YOLOv3.pdf
https://www.learnopencv.com/training-yolov3-deep-learning-based-custom-object-detector/
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/object_localization_and_detection.html
https://github.com/wizyoung/YOLOv3_TensorFlow
https://blog.insightdatascience.com/how-to-train-your-own-yolov3-detector-from-scratch-224d10e55de2
https://datascience.stackexchange.com/questions/49546/keras-load-pre-trained-weights-shape-mismatch
https://github.com/qqwweee/keras-yolo3/issues/417

### Other

https://www.geeksforgeeks.org/ml-handling-imbalanced-data-with-smote-and-near-miss-algorithm-in-python/
http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/
https://github.com/syagev/deep-ellipse
https://www.geeksforgeeks.org/python-opencv-cv2-ellipse-method/
https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/
https://shapely.readthedocs.io/en/stable/manual.html

## Task 2.4

### IoU

https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/