# Deep Learning - Project 1 - Convolutional Neural Networks

## Basic information

### Rules for completing the course

- The grade for this project is $50\%$ of the final grade for the Deep Learning course
- In order to pass the course, it is necessary to pass ($\geq 50\%$) each of the 2 projects

### Purpose of the project

- Getting acquainted with one of the most popular frameworks used in Deep Learning - TensorFlow
- Practical exercises related to Deep Learning (in particular Convolutional Neural Networks)
- Empirical confirmation of the influence of individual elements of the algorithm and data on the results (based on experimental comparative analyzes)
- Improving programming skills

### Project prerequisites

- Student has the skills allowing to independently implement the code in Python, is able to use Jupyter Notebook
- Student is able to search for information on the documentation of Python packages and use them in practice
- Student has knowledge of the basics of Deep Learning, in particular about the Convolutional Neural Networks (knowledge obtained during laboratories; materials available at https://www.cs.put.poznan.pl/gmiebs/students/dl/)

### Implementation of the project

- Individually or in groups of two

### Parts of the project

Part 1 - Implementation of an image classifier ($35\%$)

Part 2 - Experimental comparative analyses ($25\%$ for each subtask):
 - Task's implementation ($15\%$)
 - Task's description ($10\%$)
 
Maximum rating for the project - $100\%$.
 
**Note:** If one of the task parts (implementation or description) is missing, the latter does not count. The percentages are only used to show how which part contributes to the assessment.
 
**Note:** Without correctly completing the Part 1, it will likely be difficult (or even impossible) to complete tasks in the Part 2.

### Technical requirements

- Implementation of the solution in Python, using the TensorFlow package with the Keras API. You can use TensorFlow with GPU support, but it's not mandatory.
- Preferred form - Notebook(s). It is also possible to implement in regular Python files (.py).
- Suggested support packages (e.g. for displaying plots, tables, dataset preparation etc.) - NumPy, matplotlib, pandas, scikit-learn (others can be used)

### Deadline

- Midnight between 2021-11-14 / 2021-11-15
- In case of delay, each subsequent week started will lower the grade by 10%

### Solution content

Please:

- Provide the source code performing tasks from the Parts 1 and 2. As long as the Notebook is not too big (over a few MB) - please **don't** reset the Notebook(s) output. If I have problems testing the code, I will be able to check your solutions at least by looking at the output
- Attach also a list of packages with the versions you used when implementing the project (*pip freeze > requirements.txt*)
- If you are not using Markdowns in Notebook(s) or comments in case of regular Python files, please describe in the email where I should look for the code for specific parts of the tasks.
- If you decide to prepare a separate report (preffered PDF file), attach it as well.
- If the file structure is different than that proposed in the content of the tasks, provide information about the file structure - i.e. where should I put the dataset in order to run your code
- Provide any graphs, tables, files, resources etc. on the basis of which conclusions about the tasks were made, and are not visible in the Markdowns or output of the Notebook. Please describe in the report or in the e-mail what the attached materials relate to.
- Add files with model and parameters from Part 1, task 6) (as long as they do not occupy more than a few MB)

Please don't:

- Send the dataset - I'd rather avoid downloading mega- or gigabytes of data from each of you :)

### Delivery method

- Send the above-mentioned content (ZIP file preferred) to michal.wojcik@doctorate.put.poznan.pl
- Title: **[DL] Project 1 - [Student1_First_Name] [Student1_Surname] [Student1_ID], [Student2_First_Name] [Student2_Surname] [Student2_ID]** (Erazmus students - omit the ID)
- Title example: **[DL] Project 1 - Anna Nowak 123456, John Doe 789012**

### Project modifications

This document is a project proposal and describes the preferred way to complete and pass the first part of the Deep Learning Labs.

If you believe that introducing minor (e.g. changing the dataset, new task definition) or major modifications to this project or changing the approach (e.g. implementing a different project that solves another practical problem - mainly for those who are more experienced in Deep Learning) will be more interesting and will allow you to better understand and learn about previously unknown issues related to the basics of Deep Learning (in particular Convolutional Neural Networks), please do not hesitate to present your proposal to modify the project definition. 

In order for the proposed changes to be accepted, the concept must be presented individually (by e-mail or during the class), then described in detail and approved via e-mail by the teacher.

## Part 1 - Implementation of an image classifier

The goal of the Part 1 is to implement an image classifier from scratch. The task consists in loading the data (images & labels), preprocessing, preparing the datasets for learning process, preparing the model, conducting the learning and evaluating the model on the test set. A more detailed scheme is presented below, broken down into subtasks. 

### Data
Proposed datasets:

- Caltech-101 (http://www.vision.caltech.edu/Image_Datasets/Caltech101/) (131 MB)
- Caltech-256 (http://www.vision.caltech.edu/Image_Datasets/Caltech256/) (1.2 GB)

Both of the above collections consist of multiple JPG images containing an object from one of the many classes. For each class of objects, there are from several dozen to several hundred examples of images.

If you decide you want to use a different dataset, here are some requirements:
- The size of the images should not be smaller than 100x100 pixels
- Images should be in color (RGB/RGBA), not grayscale
- Images should be in such a format that they can be easily loaded into numpy.array of pixels (png, jpg)
- The collection should contain at least 20 classes with at least 80 examples for each class
- Images must be labeled - that is, assigned exactly to 1 of N decision classes
- The classification problem cannot be trivial (e.g. it is possible to assign photos to classes by calculating the average pixel color)

As mentioned above, the proposed change of the dataset must be accepted.

### Before you start

- Start by reading all the steps and see also the Part 2 - this will make it easier for you to plan the entire flow and how to organize your code
- Implement tasks as a functions - it's always a good idea if your code is reusable and parameterizable, especially in that case, because some part of the source code will be useful in the Part 2
- Use Markdowns - they help organize the code and allow you to describe additional information. Each time the task includes a question or request to describe your conclusions, comments, observations, etc. your answer should be put in Markdowns (preferred form) or in the report file that you send after completing all the tasks
- **Step verification** should also be implemented - it should be visible as the output of the cell(s) execution after implementing and executing the code that carries out the tasks specified in a given step
- The hints, order of subtasks, functions mentioned in the description below are only a suggestion - you can present your own approach and improvements, but in such a way that it is possible to observe the changes taking place between subsequent steps Please, don't use *magic*, one-line functions that do all the steps at once - the aim of the project is also to familiarize you with the features offered by the packages (mainly TensorFlow with Keras API). If you have made any exceptions from the presented approach (e.g. you loaded a dataset with the tensorflow_datasets package), please mention and describe this fact in the step 7)

### Steps to follow:

#### 1) Download the data and load it in the Notebook (5%)

**Note:** The *traditional* dataset loading sequence is described below. You can also use tensorflow_datasets package for this purpose.

- Download the archive with the dataset
- Create directory "data" and extract downloaded files to it
- Implement loading images and labels
- Each image should be represented in the form of numpy.array (shape: (height, width, channels))
- Load all interesting images and labels into two lists

**Note:** Watch out for file extensions!

**Step verification:**

Display one of the loaded images, print out the shape of the image, check if the label is correct.

#### 2) Standardize the images (5%)

**Note:** Some of these operations can be performed while files are being loaded

Unify the images:

- Number and sequence of channels (RGB) if needed
- Images shape (e.g. $32 \times 32 \times 3$) - including channel convention (channels_last suggested)
- Standardization of pixel values ($\frac{x - \mu}{\sigma}$) - calculate $\mu$ and $\sigma$ for the whole dataset, separately for each channel

**Step verification:**

Check what the image selected in step 1) now looks like.

#### 3) Divide the collection into Train and Test set (5%)


- Reduce the number of classes - filter the collection and leave images from ~15-25 classes, select those classes that have the largest number of examples. Make sure your collection is balanced (roughly the same number of samples for each class). You can do this by discarding classes that cause imbalance, or you can reduce the number of samples in larger classes.
- Randomly split the set into train (70%) and test (30%) set - X_train, X_test (images) and y_train, y_test (labels). (hint: check *sklearn.model_selection.train_test_split*)
- Make sure that the proportions of each class in both sets are more or less the same as in the whole set (hint: *stratify* parameter in *train_test_split*)
- Change labels in y vectors to one-hot encoding
- Ensure image and label collections are in the form of numpy.array

**Step verification:**

- Check the shape of the X_train, X_test (images) and the y_train, y_test (labels)
- Check on the example image if the label is in the correct form
- Check how many samples from each class are in particular subsets (train and test) and if the proportions are kept

**4) Define the model (5%)**

Suggestions:

- Activation functions - *ReLU*
- At least 3 *Convolutional blocks* - (Conv2D, Activation, BatchNormalization, Dropout, MaxPooling2D); Conv2D - *kernel=(3,3)*, *padding='same'*; MaxPooling2D - *pool_size=(2,2)*
- Flatten layer
- At least 2 layers Fully-Connected (Dense)
- Output layer - Dense with number_of_classes outputs (remember to use softmax)

**Note:** You can add *Activation* as a separate layer or as *activation='relu'* parameter in Conv2D

**Step verification:**

Compile the model with *'adam'* optimizer, the Categorical Crossentropy as the loss function, and measure the accuracy value.

**5) Train the model (5%)**

Suggested hyperparameters:

- Batch_size = 32
- Epochs = 250 (first, just check if everything works on several epochs to save time)
- Monitor the value of measures for the test set
- Add [EarlyStopping](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping) - stop training when there is no improvement in accuracy for the test set within 5 consecutive epochs (you can first test the behavior for patience = 1)

**Step verification:**

After completing the learning process, show:

- Learning curves for loss_function and accuracy, changing over the epochs on train and test set
- Confusion matrix and calculate precision and recall for each class for test set
- Show few images from the test set, display the probabilities of assigning them to each class (at least two examples: one image example that was classified correctly and one which was classified incorrectly)

**Note:** Functions to display learning curves and confusion matrices will be useful in the Part 2.

**6) Save the model to disk (5%)**

- Prepare 2 functions - for saving the model and for loading the model
- Model structure should be saved as JSON file, model parameters in HDF5 file

**Step verification:**

Checking the operation of both functions by:

- Save the model
- Load the model
- Make predictions on the loaded model for the test set
- Display the confusion matrix for the test set and compare it with the matrix obtained in the previous step

**7) Summary of the Part 1 - describe your observations (5%)**

Describe your observations on the tasks performed. Supporting questions:

- What kind of modifications have you made? Why? (*Describe, if you made any*)
- What results have you achieved?
- Is the underfitting or overfitting of the model visible? If so, what are your suggestions for solving this problem?
- Which class(es) the model had a problem with? Which it did best with? Which pair of classes were most often confused with each other? Can you guess why?
- What improvement opportunities do you see?

**Step verification:**

Use Markdowns to describe your conclusions or put them into the report.

## Part 2 - Experimental comparative analysis

The purpose of the Part 2 is to examine the dependence of the quality of the resulting model on factors such as hyperparameters, model structure, number of training data, number of decision classes, etc. Below is a suggestion of simple tasks - most of them consist of a simple experiment to compare several models that differ in some detail.

### Before you start

- You don't have to complete all of the tasks, you can choose the issues that seem interesting to you. You can also suggest your own ideas for the experiment definition (in that case - let me know). As mentioned above, each task is worth $25\%$ of the project grade (the value of the last task is doubled). Assuming you've completed the Part 1, as you may have already calculated:
  - You should complete at least 1 task *almost* correctly to pass ($35\% + 25\% = 60\% \geq 50\%$)
  - You should complete at least 3 tasks *almost* correctly to get the maximum score for a Project ($35\% + 3 \cdot 25\% = 110\% \rightarrow 100\%$)
  
  You can complete more than 3 tasks, because it may save you from a possible loss of points and will allow you to obtain a satisfactory result.
- It is possible to combine tasks into one larger experiment - most of the tasks consist of comparing models that differ in one aspect. Instead of conducting individual experiments, you can perform [grid search](https://en.wikipedia.org/wiki/Hyperparameter_optimization#Grid_search) and modify a set of several parameters. However, it is important to remember that the conclusions about a given experiment should take into account observations about each of the subtasks.
- If you perform several experiments one after another, take into account the conclusions from the previous ones. For example, if in the first experiment you are examining the differences between the quality of individual models, and in the next experiment you are examining the effect of the size of the train set, then use the better model from the first experiment.
- If you have not already done so, modify the functions in the Part 1 so that they can be used for the following subtasks. You can also rewrite the code and adapt it to the requirements of the tasks.
- Pay attention to the last task - it is more demanding and requires you to find additional information, therefore it is worth 50%.

### Collecting experimental data

In addition to implementing the code and running experiments to carry out a given task, conclusions should be formulated about the results obtained. In that case, it may be helpful to collect the following information about the models compared:

- The number of network parameters (as a measure of the memory complexity of the model)
- Training time (as a measure of the temporal complexity of the model)
- Values of the obtained quality measures (loss function, accuracy - maybe some other measures?) For train and set test in the resulting model
- Values of loss_function and measures in successive epochs (learning curves - can be useful in the context of discussing overfitting, optimization speed and model stability while learning)
- Confusion matrix (mainly in the test set, but maybe the train set matrix will help to better understand the learning process)
- Precision and recall (maybe some other measures?) for decision classes (assessment of classification difficulty for individual classes; comparison of the precision-recall curve between classes and between models)
- Display some examples that were correctly and incorrectly classified by one or more models (example-based explanation)

In general, there are many possibilities to visualize data, to compare models with each other and draw conclusions from it.

### Task description

Each of the following tasks assumes the implementation of an appropriate code that will carry out a given experiment - and therefore it will probably modify the dataset or model (its structure or hyperparameters). After implementing the experiment, write:
- Which task did you choose?
- What are the differences between the models or data sets you tested?

After obtaining the results for the compared models or datasets, describe your observations and conclusions about the experiment. Observations may in particular concern:
- The overall predictive ability of the model - which model is better, which model is worse, maybe one is better with class X and one with class Y, etc.
- The impact of the changes made on overfitting and underfitting
- Comparison of the duration of learning, the complexity of the model
- Assessment of whether it was profitable to make such a modification
- Learning curve shape - in which model the optimization was faster, whether the loss function and accuracy on the set test decreased steadily, were there any deviations, etc.
- Sample images where one model is better than the other - try to answer *why?*
- If something is not working well, write about your assumptions, why it failed and what could be changed. If you want, try the changes you suggest and also describe whether it has brought the desired effect.

In general, it is worth describing everything that you find interesting in the context of the specific task. The description of the conclusions should be included in Markdown(s) (or in the report if you want to prepare one).

### Tasks

#### 1) The impact of the size of the training set on the results - choose one option:
- Compare the results achieved by the same model, e.g. for 15-20 classes that have ~ 80 samples and 15-20 classes that have ~ 30 samples
- You can also use the same classes and reduce the number of samples in the training set (e.g. 50, 40, 30, 20 samples in the training set)
- Another idea - regulate the proportion of the division of the set into Train and Test set (e.g. 80-20, 60-40, 40-60, 20-80)

#### 2) The impact of the number of decision classes on the results (e.g. 10, 25, 50, all of the available classes)

#### 3) Compare the models without and with Dropout with different rates (e.g. 0.1, 0.2, 0.5)

#### 4) Compare the models without and with regularization (L1, L2, L1 + L2)

#### 5) Compare the models without and with batch normalization (before or after the activation function)

#### 6) Compare the models for different preprocessing approaches (perform operations separately per channel):
- Raw data - $X$
- Subtracting the mean ($X - \mu$)
- Normalization ($\frac{X - min}{max - min}$)
- Standardization ($\frac{X - \mu}{\sigma}$)

#### 7) Compare the models with different activation functions (ReLU, tanh, sigmoid - you can use others as well)

#### 8) Compare the models with different Pooling layers (MaxPooling, AveragePooling; you can also check GlobalMaxPooling and GlobalAveragePooling after the last convolution layer)

#### 9) Compare the models with different number of Convolutional blocks

#### 10) Compare the models with different number of Neurons in each Convolutional block

#### 11) Training with different batch sizes (e.g. 1, 16, 32, number_of_samples)

#### 12) Training for unbalanced datasets  (Task double scored, worth 50%)
- Take one class with a lot of samples (e.g. in Caltech101 - airplanes with 800 samples) and 1 or several classes with significantly fewer samples (e.g. 1 class with ~80 samples or 5 classes with ~30 samples per class)
- Split the dataset proportionally in the Train and Test set
- Train the default model - what was your accuracy, precision, recall?
- Check what values would you get if you made a simple decision rule model which always classifies samples into the most numerous class? Is your model clearly better than it or has it achieved quite similar results?
- Find information about traning on the imbalanced dataset and how to deal with that problem e.g. change loss function, change quality measures, check how you can modify dataset
- Apply the changes and check if you managed to get a better model