# 2.0. Introduction to Deep Learning

**Learning Objectives:** By the end of this lesson, you should be able to:

* Understand what deep learning is and how it differs from traditional machine learning.
* Understand the basic structure of a neural network.
* Gain familiarity with key concepts like neurons, layers, activation functions, and loss functions.
* Build and train a simple deep learning model using TensorFlow or PyTorch.
* Evaluate a deep learning model on a sample dataset.
  
Deep Learning is a subset of machine learning, which itself is a subset of artificial intelligence. It uses neural networks with many layers (hence the term "deep") to learn from large amounts of data.

**Neural Networks:** Neural networks are inspired by the human brain and consist of interconnected "neurons" (nodes). These networks learn patterns and relationships in data to make predictions or decisions.

**Key difference from traditional ML:** In traditional machine learning, you manually extract features from data (e.g., using techniques like PCA or decision trees). In deep learning, the model automatically learns features directly from raw data (e.g., images, text).

# 2.1. Key Concepts in Deep Learning

* **Loss Function:** A function that measures how well the model's predictions match the true labels. The goal is to minimize the loss function during training. Common loss functions include cross-entropy loss that is often used for classification tasks, and mean squared error (MSE) that is often used for regression tasks.
* **Optimization:** Gradient Descent is the most commonly used optimization algorithm. It iteratively adjusts the weights to minimize the loss function. Some variants of Gradient Descent:
  - Batch Gradient Descent: Uses the entire dataset to compute the gradient.
  - Stochastic Gradient Descent (SGD): Uses one data point at a time.
  - Mini-Batch Gradient Descent: Uses a small subset of data.
* **Overfitting and Regularization:** Overfitting occurs when the model learns the training data too well and loses the ability to generalize to new data. Regularization techniques like dropout, L2 regularization, and early stopping help prevent overfitting.
                     
# 2.2. Neural Networks
## 2.2.1. Basic Structure of a Neural Network

* **Neurons:** Basic units that take inputs, apply weights, pass through an activation function, and output a value.
Layers:
* **Input Layer:** The initial layer that receives the raw input data (e.g., pixels of an image, or features of a dataset).
* **Hidden Layers:** Layers between the input and output layers where the model learns to extract features.
* **Output Layer:** The final layer that produces the prediction or output of the model.

## 2.2.2. Key Components of Neural Networks

* **Weights:** These determine the strength of the connection between neurons.
* **Biases:** Additional parameters that help the model make better predictions.
* **Activation Function:** After the weighted sum of inputs is calculated, the activation function determines whether a neuron should "fire" or not. Common activation functions include: ReLU (Rectified Linear Unit), Sigmoid, and Tanh, amongst any others. 
* **Forward Propagation:** The process of passing the input data through the network layer by layer to make a prediction.
* **Backpropagation:** A method for updating the weights of the network to minimize the error between the predicted and actual output. This is done using optimization algorithms like Gradient Descent.

## 2.2.3. Types of Neural Networks
DL includes various types of learning:

* **Feedforward Neural Networks (FNN):**
  - The most basic type of neural network, where information flows in one direction, from input to output.
  - Used for simple tasks like classification or regression.
* **Convolutional Neural Networks (CNNs):**
  - Primarily used for image data (e.g., object detection, image classification).
  - Uses convolutional layers that apply filters to input images to extract spatial hierarchies of features.
  - Contains pooling layers to reduce dimensionality.
* **Recurrent Neural Networks (RNNs):**
  - Designed for sequential data like time series or text.
  - RNNs have "memory" that allows them to retain information from previous time steps in the sequence.
* **Generative Adversarial Networks (GANs):**
  - A class of models that consist of two networks (a generator and a discriminator) that are trained together in a competitive setting.
  - Used for tasks like image generation, style transfer, and data augmentation.

**Applications of deep learning:**
* Image recognition (e.g., identifying objects in photos)
* Game playing (e.g., AlphaGo).                                              
* Speech recognition (e.g., Siri, Alexa)
* Natural Language Processing (e.g., BERT, GPT for text generation and translation)
* Autonomous vehicles (e.g., self-driving cars using vision and sensor data)

# 2.3. Steps in Building a Deep Learning Model
1. **Define the Problem**
* Objective: Identify the problem you are trying to solve (e.g., classification, regression, image recognition, etc.).
* Input/Output: Understand what the input data is (e.g., images, text, time-series) and what the desired output is (e.g., class labels, continuous values, sequences).
* Evaluation Metric: Choose an appropriate metric to evaluate model performance (e.g., accuracy, F1-score, mean squared error, etc.).
* Example: For image classification, the goal is to assign each image to a class label, and the metric could be accuracy.

2. **Prepare the Data**
* Collect Data: Gather the relevant dataset for your task. For supervised learning, you need labeled data.
* Clean Data: Handle missing values, remove duplicates, and address any data quality issues.
* Preprocess Data:
  - Scaling/Normalization: Scale features to a similar range (e.g., normalizing pixel values between 0 and 1 for image data).
  - Encoding: Convert categorical variables to numerical format (e.g., one-hot encoding for categorical labels).
  - Reshaping: For image data, ensure that the images are reshaped into a consistent size (e.g., 28x28 pixels for MNIST).
* Split Data: Split the dataset into training, validation, and test sets (typically 70%-80% for training, 10%-15% for validation, and 10%-15% for testing).
* Example: For image data, resize the images to a fixed size (e.g., 28x28 pixels) and normalize the pixel values to a range of 0-1.

3. **Build the Neural Network Architecture**
* Define Layers: Specify the number of layers and the type of layers to use in the model:
  - Input Layer: The first layer that takes in the input data.
  - Hidden Layers: Layers in between the input and output layers where the learning happens. You can experiment with different architectures (e.g., shallow vs. deep networks).
  - Output Layer: The last layer that produces the output (e.g., a single neuron for regression, or multiple neurons with softmax for classification).
  - Activation Functions: Choose appropriate activation functions for each layer (e.g., ReLU for hidden layers, softmax for the output layer in classification).
* Choice of Model Type:
  - Feedforward Neural Networks (FNN) for basic tasks.
  - Convolutional Neural Networks (CNNs) for image data.
  - Recurrent Neural Networks (RNNs) for sequential data (e.g., text, time-series).

4. **Choose a Loss Function**
* The loss function measures how well the model's predictions match the actual results. The goal during training is to minimize the loss.

* Common Loss Functions:
  - Binary Cross-Entropy: Used for binary classification.
  - Categorical Cross-Entropy: Used for multi-class classification.
  - Mean Squared Error (MSE): Used for regression tasks.
* Example: If you're doing a binary classification task, you might use binary cross-entropy as the loss function.

5. **Choose an Optimizer**
* The optimizer adjusts the weights of the network based on the gradients of the loss function during backpropagation.
* Popular Optimizers:
  - Stochastic Gradient Descent (SGD): Simple and widely used.
  - Adam: Adaptive moment estimation (a variant of SGD) that often works well out of the box for deep learning models.
  - RMSprop: Another variant of SGD that adapts the learning rate during training.
* Example: Adam is often a good choice for training neural networks.

6. **Compile the Model**
* Compile the model by specifying the loss function, optimizer, and evaluation metrics. This step prepares the model for training.
```model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])```

7. **Train the Model**
* Feed the training data: Train the model on the training set using the fit() method.
* Validation Data: Use validation data to monitor the performance of the model on unseen data during training.
* Epochs: The number of times the entire dataset is passed through the network.
* Batch Size: The number of samples processed before the model's internal parameters are updated.
```model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val))```

During training, the model adjusts its weights to minimize the loss function, and validation accuracy helps track its generalization performance.

8. **Evaluate the Model**
* Once training is complete, evaluate the model on the test data to see how well it generalizes to unseen data.
* Metrics: Use appropriate metrics (e.g., accuracy, precision, recall, F1-score) to assess the performance.
```test_loss, test_acc = model.evaluate(X_test, y_test)```
```print(f'Test accuracy: {test_acc:.4f}')```

9. **Tuning Hyperparameters**
* Hyperparameter Tuning: Experiment with different configurations such as the number of layers, number of neurons in each layer, learning rate, batch size, and optimizer choice.
* Grid Search or Random Search can be used to systematically search over a range of hyperparameters.
* Use cross-validation (e.g., k-fold cross-validation) to ensure that the model performs well on different subsets of the data.

10. **Save and Deploy the Model**
* Save the trained model for future use, so you don’t have to retrain it every time.
* Export the model (e.g., in .h5 format in Keras) and deploy it in production or integrate it into an application.
```model.save('model.h5')  # Save the model```
* Deployment: The model can be deployed via cloud services (e.g., AWS, Google Cloud) or integrated into a web application, mobile app, or IoT device.

**Example Workflow in TensorFlow/Keras (Classification task):**                               

In [1]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# Load dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(60000, 28 * 28) / 255
X_test = X_test.reshape(10000, 28 * 28) / 255
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Build model
model = Sequential([
    Dense(128, input_dim=28*28, activation='relu'),
    Dense(10, activation='softmax')
])

# Compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train model
model.fit(X_train, y_train, epochs=5, batch_size=32, validation_split=0.2)

# Evaluate model
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f'Test accuracy: {test_acc}')

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step
Epoch 1/5


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 577us/step - accuracy: 0.8666 - loss: 0.4789 - val_accuracy: 0.9559 - val_loss: 0.1549
Epoch 2/5
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 534us/step - accuracy: 0.9591 - loss: 0.1394 - val_accuracy: 0.9628 - val_loss: 0.1257
Epoch 3/5
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 544us/step - accuracy: 0.9747 - loss: 0.0888 - val_accuracy: 0.9695 - val_loss: 0.0982
Epoch 4/5
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 537us/step - accuracy: 0.9815 - loss: 0.0621 - val_accuracy: 0.9718 - val_loss: 0.0956
Epoch 5/5
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 561us/step - accuracy: 0.9857 - loss: 0.0481 - val_accuracy: 0.9730 - val_loss: 0.0863
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 220us/step - accuracy: 0.9702 - loss: 0.0911
Test accuracy: 0.9747999906539917


**Example Workflow in TensorFlow/Keras (Regression task):** 

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error

# Step 1: Load the data
# Assuming you have a CSV file where 'target' is the column to predict
data = pd.read_csv("data.csv")  # Replace with your actual dataset

# Separate features and target
X = data.drop(columns=['target'])  # Features (all columns except 'target')
y = data['target']  # Target variable

# Step 2: Data Preprocessing
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling (important for neural networks)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Step 3: Build the Model
# Create a simple feedforward neural network for regression
model = keras.Sequential([
    keras.layers.InputLayer(input_shape=(X_train.shape[1],)),  # Input layer
    keras.layers.Dense(64, activation='relu'),  # First hidden layer with 64 neurons and ReLU activation
    keras.layers.Dense(32, activation='relu'),  # Second hidden layer with 32 neurons and ReLU activation
    keras.layers.Dense(1)  # Output layer for regression (1 output for continuous prediction)
])

# Step 4: Compile the Model
model.compile(optimizer='adam',  # Optimizer (Adam is a good default choice)
              loss='mean_squared_error',  # Loss function (MSE for regression)
              metrics=['mae'])  # We can also track Mean Absolute Error (MAE)

# Step 5: Train the Model
history = model.fit(X_train, y_train, 
                    epochs=50,  # Number of epochs
                    batch_size=32,  # Batch size
                    validation_data=(X_test, y_test),  # Validation data to track performance
                    verbose=1)  # Verbosity level (1: shows progress bar)

# Step 6: Evaluate the Model
# Evaluate the model on the test data
test_loss, test_mae = model.evaluate(X_test, y_test, verbose=0)

# Print the evaluation metrics
print(f"Test Loss (MSE): {test_loss}")
print(f"Test MAE (Mean Absolute Error): {test_mae}")

# Step 7: Make Predictions
y_pred = model.predict(X_test)

# Step 8: Calculate the Mean Squared Error (MSE) manually
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error on Test Set: {mse}")

# Optional: Save the trained model
model.save("regression_model.h5")  # Save the model in HDF5 format

# Optional: Load the saved model and make predictions
loaded_model = keras.models.load_model("regression_model.h5")
y_loaded_pred = loaded_model.predict(X_test)

**Model Performance Visualizations:**
You can also plot the training and validation loss/MAE curves to visualize model performance:

In [None]:
import matplotlib.pyplot as plt

# Plot training & validation loss values
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()

# Plot training & validation MAE values
plt.plot(history.history['mae'])
plt.plot(history.history['val_mae'])
plt.title('Model MAE')
plt.xlabel('Epochs')
plt.ylabel('Mean Absolute Error')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()

# 2.4. Example: Building a Simple Deep Learning Model
**Step-by-Step Code Walkthrough:**

**1. Import necessary libraries:**

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

**2. Load the Dataset (Using MNIST, a dataset of handwritten digits)**

In [None]:
# Load MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Preprocess the data
X_train = X_train.reshape(60000, 28 * 28)  # Flatten the images
X_test = X_test.reshape(10000, 28 * 28)
X_train = X_train.astype('float32') / 255  # Normalize the data
X_test = X_test.astype('float32') / 255

# One-hot encode the labels
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

**3. Build the Neural Network Model**

In [None]:
# Build a simple feedforward neural network
model = Sequential()
model.add(Dense(128, input_dim=28*28, activation='relu'))  # First hidden layer
model.add(Dense(10, activation='softmax'))  # Output layer for 10 classes

# Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

**4. Train the Model**

In [None]:
# Train the model on the training data
model.fit(X_train, y_train, epochs=5, batch_size=32, verbose=1)

**5. Evaluate the Model**

In [None]:
# Evaluate the model on the test data
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f'Test accuracy: {test_acc:.4f}')

**Homework:** Experiment with different neural network architectures (e.g., adding more hidden layers, changing activation functions) and see how the performance changes.

**Resources:**
* TensorFlow Documentation
* PyTorch Documentation
* Deep Learning with Python by François Chollet (Book)