# Data Science for Manufacturing - Workshop 8-1: Introduction to Deep Learning




## Objectives
- Review of Tensorflow and Keras
- Keras for a simple Convolutional Neural Network (CNN)
 - Load in the dataset, show example individuals
 - Prepare the dataset for DL models
    - One-hot encoding
    - Normalisation
- Build the model: three building blocks
  - Model definition: instanciate deep learning layers
  - Model compiling: define the loss function, metric, optimiser
  - Model training: use train data to train the model
- Hyperparameter tuning
  - Batch size
  - Learning rate

## Other
- Highly recommended resource: [Book: Dive into Deep Learning](https://d2l.ai/)
- There is no expectation of applying DL models in assignment 2

## 1. Review of Tensorflow and Keras

## 2. Introduction to the dataset


(60000, 28, 28) here means:
- There are 60000 image individuals in the train dataset
- Each image has a size of 28*28

Problem to solve:
- Train a CNN to recognise and classify the hand-written digits.

## 3. Prepare the data for DL models
Feature data:
- The image data, x, should be float type, instead of unit8 type.
- Pixel values should be normalised. Pixel values usually range from 0 to 255, by /255, the range of pixel values is scaled to [0, 1].
- Convolutional layers for images in Keras only receive images with RGB channels, which contains 3 channels, in the format of (pixel number, pixel number, channel). An example is (224, 224, 3). Therefore the dimension of each picture (28, 28) here needs to be expandded by 1 extra dimension.



Label data:
- The label data, y, should be categorical data described by numeric types, instead of unit8.

![One-hot encoding](https://media.licdn.com/dms/image/D4E12AQHFBow7MerIqw/article-inline_image-shrink_1000_1488/0/1704601209165?e=1714608000&v=beta&t=gQPFX7NihUfEQYW4CN6XIvUCW7gt-egdU_PUFT_tPz0)

(60000, 10) here means:
- As in the image feature dataset, there are 60000 individuals
- A single unit8 is converted to 10-digit one-hot coding vectors

## 4. Build the model
Types of DL models:
- CNNs:
  - tasks: computer vision related tasks, such as image classification, object detection, image segmentation, and facial recognition
  - data: structure spatial data, images
  - successful models: ResNet, VGGNet, InceptionNet
- RNNs (recurrent neural network):
  - tasks: natural language processing (NLP), time series prediction, speech recognition, language translation
  - data: sequential/time series data, text data, financial data, etc
  - successful models: LTSM, GRU
- Transformers:
  - tasks: natural language understanding, language generation, machine translation
  - data: tokenised text sequences
  - successful models: GPT, BERT

<br>

Building up and training a model in Keras and Tensorflow:
- model definition
- model compiling
- model training

### 4.1 Model definition
Baisc elements of a CNN:
- convolutional layers
- pooling layers
- dense layers (mlp)

Optional elements of a CNN:
- dropout layers
- batch normalisation layers

### 4.2 Model compiling
Basic elements of model compilings:
- loss function: used to update weights
  - categorical loss for classification tasks
  - MSE (mean squared error), MAE (mean absolute error) for regression tasks
- evaluation metric: not used to update weights, but providing additional insights into the model's behaviour. More human-understandable than loss
  - accuracy, the most commonly used metric for classification task
- optimiser: optimisation algorithm used to update weights
  - 'adam', 'RMSprop', the most commonly used optimisers for common DL models

### 4.3 Model training
Basic elements of model training:
- batch size: the size of sample data a model is looking at during an iteration
  - common batch size ranges:
    - small batch size (2-32)
    - medium batch size (32-128)
    - large batch size (128-512+)
    - full batch (batch size equal to dataset size)
  - selecting a batch size:
    - in general, a batch size of 32, 64, 128 should work well
    - large batch size may lead to smoother convergence, but is highly demanding on computational resources
    - small batch size may lead to quicker training processes, but can introduce noise, and sometimes the noise can be too big for the model to converge

- number epochs: number of iterations of optimisation process
  - depend on the size and complexity of the datasets and the model

- datasets for training: data the model used to update weights
  
- datasets for validation: data the model used to reflect the performances during training, not used for weight updates
  - common training/validation ratio: 9/1, 8/2

## 5. Evaluation and predictions with the trained model

## 6. Hyperparameter tuning
Common hyperparameters:
- Batch size
- Learning rate
- Number of epochs
- Optimiser
- Number of layers and neuron units on each layer
- ...

Because plotting learning curves is needed in every experiment, so it's good to create a function for it.

### 6.1 Batch size

£££ When ever modifying your model, either it's a hyperparameter or other configurations, to make your modifications effective, rerun the three building blocks of a deep learning model.

The effects of batch size:
- Batch sizes too small make convergence fluctuating more. In worst scenarios, there will not be a proper convergence at all.
- Batch sizes too big make training really slow, sometimes it can be too big for a program to run.

### 6.2 Learning rate

The effects of learning rates:
- Learning rates too small make the learning process really slow, and within a numer of epochs, the model may not learn enough.
- Learning rates too big make the learning process run fast in the beginning, but when approaching convergence, it may not converge to the optimal level because of the rough accuracy level of a step, i.e. learning rate.

### 6.3 More hyperparameters

In [None]:
"""
Homework: modify other hyperparameters and see how they affect the performances
"""