# Capstone proposal - Deep Learning for LEPUS

## The problem to be solved

### (a) What are the main project idea and goals?

By thinking about the capstone project, one emerged when discussing with prof. Yves Hausser from the nature management department at hepia (HES-SO Geneva) about a deep learning project of natural images. As follows, the introduction of this project is presented:

The Lepus software [1] was designed to help scientists analyse wildlife images acquired using photographic traps. At present, species recognition and individual identification are carried out manually, which is time-consuming.

The objective of this project is to test Deep Learning technology to automate certain tasks such as

1. detecting the presence or not of an animal in the image
2. locating animals using bounding boxes 
3. identifying certain species or more generally its type or family
4. ideally, identify each individual animal of a specific specie with respect to physical characteristics (e.g. to help the Wildlife Conservation Society)

Each of these problematic can be independent and with another Extension EPFL school learner (*Julien Smets*), we decided to share this project. Here are the chosen configurations for our specific capstone projects:

**Project 1** *Blerim Arslani : Detection of the presence of an animal in the image (binomial classification)*

**Project 2** *Julien Smets : Identification of the type of an animal in the image (multinomial classification)*

Please let us shortly motivate our choice. By solving these two problematic, the saved time for nature management scientists could be very high (days of work), especially for the presence detection problem because only a small number of images contain animals (this will be more deeply detailed below).
Moreover, the labelled data do not include the bounding boxes which excludes the second project (here we consider only supervised learning to ensure a validation metric) and the fourth one is limited to the significant inspection variance of the manual identification and the very low amount of data available. 

Note that these two chosen distinct projects are individual (will not be team-based but only the dataset will) and can be combined together by performing the second work right after the first one in order to classify the detected animals (cf. diagram as follows).

![General view and separation between the two projects](drawioDiagram.png)


As follows, you can see the project cloud for more details: 

- [1] http://ec2-35-157-78-9.eu-central-1.compute.amazonaws.com/

The first project focuses only on the detection of an animal in the pictures. To perform this application and by looking at the data one question arises. Is there a difference in the final result if RGB or grayscale images are used ? And even more important, what is the gain or the loss obtained after using the difference between 2 images following in a sequence ? Grayscale images differences or RBG images differences? These are the main project ideas and goals. Effectively, if 2 images follow one after the other with a relatively short amount of time separating them, the background would substract and the result image would focus on the differences between them. For example, the movement of an animal.

To perform these analyses, our machine learning models will be trained on :
- Grayscale images
- Colour images
- Grayscale images differences
- Colour images differences

A second question would be which model to use ? Is there a model which is more efficient with images differences than row images ? What about grayscale and RBG ? For this purpose several models will be used.
- k-nearest neighbours
- Decision trees and random forest
- Support vector machines
- Fully connected neural networks
- Convolutional neural network

As we saw during the fourth project, training over well define features increase the overall accuracy of the machine learning models. For this purpose and to keep the scope of the project on the main goals, the pixels will be used as input for each model. 

### (b) What story you would like to tell with the data and what would you like to achieve at the end?

The first project consists of detecting the presence or absence of an animal in the pictures. To achieve this goal, several techniques can be used. One of them is to work with the pixel difference intensity between 2 images of a sequence. In a typical case, an animal can be present in the first image and not in the second. A second case, represented in the following images, can be the presence of an animal in the first picture and the same animal with a different position in the following one.

![Example of image sequence](images/image_proposal_substraction.png)

As the background doesn't change between the first and the second picture, it accentuates the differences between them and so, the animal features. Using the image difference, 4 cases emerge :

No animals in both pictures:

![first case: no animals in both pictures](images/image_proposal_case_1.png)

No animal on the first picture and at least one on the second:

![Second case: no animal on the first picture and one on the second](images/image_proposal_case_2.png)

At least one animal on the first picture and none on the second:

![Third case: one animal on the first picture and none on the second](images/image_proposal_case_3.png)

At least one animal on the first and the second picture:

![Fourth case: one animal on the first and the second picture](images/image_proposal_case_4.png)

Compared to the row images which have two possible labels, the image differences can have four different labels. By choosing appropriate sequences of images, it could be possible to find out the presence of an animal for an individual image looking at her corresponding image difference.



### (c) What is the main motivation behind your project?

The main idea following Lepus project is to reduce the high time consuming manual detection, localisation and identification (and ideally with better accuracy than humans but it will not be experimented here) of animal species in photographic traps.

## The data set

### (a) What is the size and format of the data that you plan to use?

#### *Data Information*

The given species can be very small depending on the animal size and its distance from the camera or very large taking a large part of the image. Animals can be **occluded** by background objects (trees) to even be **partially viewed** (especially for large animals such as giraffes or elephants). The number of species is in proportion irregular depending on the rarity of these species. Moreover, a majority of the given images doesn't contain any animals due to trees movement, dust tornadoes, butterflies, etc. causing false positive captures. The proportion of empty pictures is **~60-85%** depending on the device environment.

In addition, cameras have many different fields of view (FOV) and resolutions including colour and gray level images. The latter comes from the difference between day and night acquisition devices. Note that these results in **highly non-uniform representation of species** w.r.t. different situations (day/night, background situation, etc.). As an example some species are only nocturne. This will need robust data preprocessing in order to avoid biases.

Some of the images are time correlated due to animals running and get captured several times, i.e. **small-time lapse images**. These set of images are already grouped by the Lepus software in specific folders ready to process. These grouped images are **called events independent capture (EIC)** and numbered from 3 up to ~300 if an animal stays in front of the camera the whole day/night. This can be an important information and will be discussed after.

#### *Images*

The first data set received "M1 2015" (M1 means a grid corresponding to a set of 36 photographic traps) contains 4056 colour and grayscale images. Only ~12% of the data obtained contain animals or humans on them. The rest of the data will be received soon, representing a covered area of 10'000 km$^2$ with hundreds of cameras placed in different nature spots in Tanzania. It represents a period starting from 2013 to 2016 with 176'000 images.



In [3]:
# Import images
# Location of all documents
import sys
import os
import pandas as pd

absFilePath = os.path.abspath(os.getcwd())
#print(absFilePath)
#fileDir = os.path.dirname(os.path.abspath(os.getcwd()))
data = pd.read_csv(os.path.join(absFilePath, 'DeepLearningExport.csv'), sep = ',', header = 'infer')
print(data.shape)
data.head()

(4057, 12)


Unnamed: 0,file_id,file_path,session_dir,file_datetime,file_period,event_id,prev_file_id,session_id,place_id,taxon_id,taxon_tsn,taxon_name
0,1,2015/M1/M1_01/CDY_0001.JPG,M1 2015,09.03.15 17:08,day,1,,5,1,,,
1,2,2015/M1/M1_01/CDY_0002.JPG,M1 2015,10.03.15 12:13,day,2,,5,1,,,
2,3,2015/M1/M1_01/CDY_0003.JPG,M1 2015,28.03.15 17:38,day,3,,5,1,Team,Team,[TEAM]
3,4,2015/M1/M1_02/03080001.JPG,M1 2015,08.03.15 16:57,day,4,,5,2,Team,Team,[TEAM]
4,5,2015/M1/M1_02/03080002.JPG,M1 2015,08.03.15 16:57,day,4,4.0,5,2,Team,Team,[TEAM]


All images information is given by a CSV file. Following the column information. Taxon meaning species.

- **file id**: index of the images
- **file_path**: path to reach the image: {year}/{grid}/{camera}/{picture}
- **session_dir**: grid + years
- **file_datetime**: date and time
- **file_period**: day, night, twilight
- **event_id**: identifier of each event
- **prev_file_id**: identifier of the chronologically previous image for the same event_id.
- **session_id**: session identifier
- **place_id**: place identifier
- **taxon_id**: species identifier
- **taxon_tsn**: unique identifier of an official database of the living world
- **taxon_name**: literal name of the species. May have the value "[TEAM]" if it is the human team captured.

Each event identifier contains from one to hundreds of pictures. Based on the file data time, each file_id is linked to the previous file identifier. If the value is a NaN, the picture is the first of the event. To work with images differences, we need at least 2 images per independent event.

![Histogram of the number of images by independent events](images/histogram_events.png)

Looking at the histogram of the number of images per independent events, it is clear that a large majority of independent events contain fewer than 10 images. Zooming on the histogram, almost half of the events contain between 2 and 3 images. Fifty of them contain 1 image and are not usable for image differences.

It is also important to look at the species present in the data set.


![Cumulative plot of the percentage of the data containing species](images/cumulative_plot_species.png)

Thirty-one different species represent in the data represent 12.5% of the data. The limit of red dashed line on the graph. The first 3 represent in quantity almost half of the species present in the data.

![3 Top species present in the data](images/representation_top_3_species.png)

It is interesting to observe that the most present species is seen during the day. In opposition to the third one which is mostly present hunting at night. Also very interestingly, the second most present species is the human in charge of the traps.

#### *Other*

Note that the data can need to be confidential with DNA (standard Non-Disclosure Agreement) due to some very rare species which are often hunted for money. If this is the case, it will be demanded to the EPFL soon. But anyways the precise location of the picture will not be transmitted.

### (b) How do you expect to get, manage and process the data?

#### *Receiving the data*
the first grid of data is shared within a switch drive transfer. The rest of the data is shared by physical transfer in order to conserve privacy. 

#### *Managing the data*
In resume the size of the data represent approximately.
- 99.5 Go
- 179'000 images

As the data will be trained on colour, grayscale, colour difference and grayscale difference. The final data set will go between 300 and 400 Go. This quantity is too big for my own capacity of storage. The solution is to resize the resolution of the images.

#### *Data pre-processing*
**Colour images**: The column 'taxon_name' from the csv file gives the name of the species present in the image and 'NaN' for an empty image. The first step is to create a column 'presence' with two labels, empty or animal.

**Grayscale images**: The colour images have to be first transform into grayscale based on the following equation[1]: $L = R * 299/1000 + G * 587/1000 + B * 114/1000$

'R' represents the red component, 'G' the green and 'B' blue.
Similarly than for the colour images, a column 'presence' is added with two labels, empty or animals.

- [1] https://pillow.readthedocs.io/en/3.2.x/reference/Image.html#PIL.Image.Image.convert

**Colour differences**: Based on the colour images and the columns 'file_id' and 'prev_file_id' each image can be referred to her own previous images. This is possible only if the images are part of the same type and the same event. How to know which images are part of an event ? The column 'event_id' index each independent event between 1 and 552. There are fewer events than pictures because some events can go up to several hundreds of pictures. The first image of an event is referred as 'NaN' in the 'prev_file_id' and is not used as a reference for the image differences. As two images are needed only events with 2 or more pictures are taken into account. As mentioned before, image differences take two images on the input and reduce it into one by subtracting the pixel intensity of a first picture considered as a reference to the previous one. This process aggregate the information of two pictures into one and focuses only on the differences between them. As the results is needed for each picture, a column 'presence' is added. The labels are quite different than the one for colour and grayscale images. 

For each picture pair, the label will be

- *NoAndNo*: no animals on both pictures
- *NoAndYes*: no animal on the first picture and at least one on the second
- *YesAndNo*: at least one animal on the first picture and none on the second
- *YesAndYes*: at least one animal on the first and the second picture

**Grayscale differences**: Grayscale differences are very similar to colour differences. The only difference is that are based on the grayscale images.

## The analysis and methods

### (a) What are the main challenges that you envision for completing the project and how do you plan to get around each one?

**Environment modification**: Due to weather conditions and unvolontair factors, the background can change for the same trap leading to a significant modification of the environment. To counter this issue only images from the same independent event will be used. In addition, only images following in a period of time smaller than the time for change to occur will be used.

**Independent events with only one image**: As we saw previously, some events only contain one image. These events can't be used in case of the difference method and won't be used in the data set.

**Resolution of the images**: The resolution of the images is too large for the application (1536, 2048, 3) resulting in 3'145'728 features. To handle this issue, the resolution of each image will be reduced to (192, 256, 3), leading to 49'152 features.

### (b) What are the steps that you plan to take to achieve the end goals?

The steps of the work are given as follows:

1. Preprocess the dataset

2. Split the data in train and test sets.

3. Test different models of machine learning by tuning their hyperparameters.

4. Compare the results.

5. Define future improvements

### (c) Show us that you have a pipeline in place and that you understand the feasibility of your project goals.

#### *1. Preprocessing: \**
- Resizing the original data
- Creating grayscale data set
- Creating colour difference data set
- Creating grayscale difference data set
- Removing all images which are used with is not labelled by difference method
- Applying some preprocessing methods on the data sets (e.g.* histogram equalisation, denying, resizing, data augmentation such as noise, horizontal flip, rotations or shear).

#### *2. Splitting the data:*
- Split the data into stratified train, test sets.

#### *3. Fit and evaluate models*
- k-nearest neighbours
- Decision trees and random forest
- Support vector machines
- Fully connected neural networks
- Convolutional neural network

#### *4. Compare the results*
- Evaluate the accuracy of each model with each data set.

#### *5. Define future improvements based on the results*
- Based on the results observe the kinds of images with which models have a problem and find out a preprocessing improvement.
- Define a baseline model from literature which could develop features to inject at the inlet of our models instead of the images pixels.


## The communication

The code sample will be implemented in several python scripts (.py) for convenience and simplicity (e.g. separation of pre-processing, training and testing). An additional notebook (.ipynb) will be included with analysis details and visualisation figures as a small report (similarly to this document).