---
**Francois Chollet (2E, 2021)**
# **Deep Learning with Python**
---



#### **1. What is deep learning?** 
   
1. Keras to train, predict, and evaluate on mnist dataset

#### **2. The mathematical building blocks of neural networks**
1. Initialize weight and bias
2. Find prediction
3. Find loss
4. Find gradient
5. Update weight and bias

#### **3. Introduction to Keras and TensorFlow**
1. TensorFlow variable and constant
2. First order gradient
3. Second order gradient
4. Linear classifier — TensorFlow
5. Linear classifier — Keras Loss and Optimizer
6. Linear classifier — Keras Model

#### **4. Getting started with neural networks**
1. Common Terminologies
2. Binary Classification
3. Multiclass Classification
4. Non-Linear Scalar Regression
5. Feature-wise Normalization
6. K-Fold Cross-Validation

#### **5. Fundamentals of machine learning**
1. White Noise and Zero Channels
2. Manifold Hypothesis
3. Dataset Split
4. Gradient Descent Parameters
5. Architecture Priors
6. Model Capacity
7. Feature Engineering
8. Early Stopping
9. L1/L2 Regularization
10. Dropout

#### **6. The universal workflow of machine learning**
1. Task Definition
2. Model Development
3. Model Deployment

#### **7. Working with Keras: A deep dive**
1. Module Patterns:
   1. Keras.Sequential
   2. Keras.Model
   3. Subclass Keras.Model

2. Mixing Module Patterns

3. Builtin Training and Evaluation Loops
   1. Mnist using Keras.Model
   2. Custom Metrics
   3. Builtin Callbacks
   4. Custom Callbacks
   5. Save and Load Model
   6. TensorBboard
   7. Training vs Inference
   8. Low-level Metrics

4. Builtin fit with custom train_step method
   1. Custom train_step
   2. Builtin fit

5. Custom Training and Evaluation Loops
   1. Tensorflow function decorator
   2. Custom train step
   3. Custom fit
   4. Custom test step
   5. Custom evaluate

#### **8. Introduction to deep learning for computer vision**

1. Convolution theory:
2. Tensorflow Dataset API
3. Explore maxpooling and padding
4. Mnist convolution network
5. Kaggle cats-vs-dog dataset
6. Training from scratch:
   1. Without augmentation
   2. With augmentation
7. Training using feature extraction:
   1. Without augmentation
   2. With augmentation
8. Training using fine tuning:
   1. With augmentation
  
#### **9.  Advanced deep learning for computer vision**

#### **10. Deep learning for timeseries**

#### **11. Deep learning for text**

#### **12. Generative deep learning**

#### **13. Best practices for the real world**

#### **14. Conclusions**

---
## **Most Important Research Papers (Ilya Sutskever)**
---

Ilya Sutskever to John Carmack: "If you really learn all of these, you’ll know 90% of what matters today.

1. The Annotated Transformer ( https://lnkd.in/evrqygtu)
2. The First Law of Complexodynamics ( https://lnkd.in/eu5aucVm)
3. The Unreasonable Effectiveness of RNNs ( https://lnkd.in/e9wht6Js)
4. Understanding LSTM Networks ( https://lnkd.in/eY4WnawT)
5. Recurrent Neural Network Regularization ( https://lnkd.in/ebrwzuwY)
6. Keeping Neural Networks Simple by Minimizing the Description Length of the Weights ( https://lnkd.in/e4f4s9h6)
7. Pointer Networks ( https://lnkd.in/e6qcSXYT)
8. ImageNet Classification with Deep CNNs ( https://lnkd.in/etrjwGmY)
9. Order Matters: Sequence to sequence for sets ( https://lnkd.in/eYrjEHRP)
10. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism ( https://lnkd.in/ezFVyhyk)
11. Deep Residual Learning for Image Recognition ( https://lnkd.in/ejJT79DE)
12. Multi-Scale Context Aggregation by Dilated Convolutions ( https://lnkd.in/eN-p4Hi9)
13. Neural Quantum Chemistry ( https://lnkd.in/eChquKQi)
14. Attention Is All You Need: Transformer ( https://lnkd.in/eakhSPXf)
15. Neural Machine Translation by Jointly Learning to Align and Translate ( https://lnkd.in/eZfrwxDG)
16. Identity Mappings in Deep Residual Networks ( https://lnkd.in/eVuuYTTy)
17. A Simple NN Module for Relational Reasoning ( https://lnkd.in/e9xYieKc)
18. Variational Lossy Autoencoder ( https://lnkd.in/e8XZrzcn)
19. Relational RNNs ( https://lnkd.in/eEs3e_MJ)
20. Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton (https://lnkd.in/e7V-jw8S)
21. Neural Turing Machines ( https://lnkd.in/e3qidTvP)
22. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin ( https://lnkd.in/eYDgB9cA)
23. Scaling Laws for Neural LMs ( https://lnkd.in/ev9s6Pz2)
24. A Tutorial Introduction to the Minimum Description Length Principle ( https://lnkd.in/eUJtMXDU)
25. Machine Super Intelligence Dissertation ( https://lnkd.in/ebCNq64x)
26. PAGE 434 onwards: Kolmogorov Complexity ( https://lnkd.in/ecV-qdfV)
27. CS231n Convolutional Neural Networks for Visual Recognition ( https://cs231n.github.io/)

---
## **Common Concepts**
---

#### **Terminologies:**
1. Sample / Input
2. Prediction / Output
3. Ground truth / Annotation
4. Target
5. Loss
6. Classes
7. Label
8. Binary Classification
9. Multiclass Classification
10. Multilabel Classification
11. Scalar regression
12. Vector regression
13. Batch

#### **Datasets :**
1. Mnist — Multiclass singlelabel classification
2. Imdb — Binary classification
3. Reuters — Multiclass singlelabel classification 
4. Boston — Scalar regression
5. ImageNet — Image classification
6. Kaggle Cats-vs-Dog — Image classification

#### **Model :**
1. VGG16
2. Xception
3. ResNet
4. MobileNet
5. EfficientNEt
6. DenseNet

 
#### **Layers (tf.keras.layers):**
1. (Rank 1) Dense
2. (Rank 2) LSTM
3. (Rank 3) Conv1D
4. (Rank 4) Conv2D

#### **Optimizers (tf.keras.optimizers):**
1. SGD (with / without momentum)
2. RMSprop
3. Adam
4. Adagrad

#### **Loss Functions (tf.keras.losses):**
1. CategoricalCrossEntropy
2. SparseCategoricalCrossentropy
3. BinaryCrossEntropy
4. MeanSquaredError (MSE)

#### **Metrics (tf.keras.metrics):**
1. CategoricalAccuracy
2. SparseCategoricalAccuracy
3. BinaryAccuracy
4. MeanAbsoluteError (MAE)
5. Precision
6. Recall

#### **K-Fold Cross-Validation**
- When dataset is small, K-Fold allows to use whole dataset for training as well as validation
- Training dataset is divided into K number of folds.
- Each fold uses different part of data as validation dataset
- In each iteration:
  - K-1 partitions are used for training
  - 1 partition is used for validation
- Validation score is the average of validations of all folds

#### **Training Parameters**
- **Network Parameters:**
  - Weight 
  - Bias

- **Training Parameters:**
  - Number of layers
  - Learning rate
  - Batchsize
  - Optimizer
  - Epoch

#### **Training Priors**
  - Dataset
  - Model

#### **Overfit and underfit**
  - Overfit
    - From features point-of-view:
      - Many features
      - Small dataset
      - High information density
    - From model point-of-view:
      - Many parameters / layers (large model)
      - Small dataset
      - High learning capacity
    - Given infinite data, model never overfits
  - Underfit
    - From features point-of-view:
      - Few features
      - Large dataset
      - Low information density
    - From model point-of-view:
      - Few parameters / layers (small model)
      - Large dataset
      - Low learning capacity
  - Recommendation:
    - Medium information density (number of features relative to dataset size)
    - Medium learning capacity (number of layers relative to dataset size)

---
---
---