# Deep Learning Basics

## Objectives

- Demonstrate the basics of setting up and training deep learning models using TensorFlow and Keras.
- Understand and visualize how different data distributions affect model training and performance.
- Compare the impact of different neural network architectures and training parameters on model accuracy.

## Background

This notebook explores deep learning techniques, focusing on creating and training neural networks to classify structured data and understanding the influence of different architectures.

## Datasets Used

- Synthetic Dataset 1: Two adjacent classes were generated using make_blobs with 1000 samples, which were helpful for binary classification.
- Synthetic Dataset 2: Two circular classes were generated using make_circles, introducing complexity to test model robustness in handling non-linear boundaries.
- MNIST Dataset: Handwritten digits that serve as a benchmark for evaluating image classification models.

## Introduction to Deep Learning

In [1]:
import numpy as np
import pandas as pd

from PIL import Image

import plotly.express as px
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook_connected"

In [2]:
from sklearn.datasets import make_blobs, make_circles
from sklearn.model_selection import train_test_split

In [3]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.optimizers import Adam

Deep learning is a potent subset of machine learning that uses neural networks for modeling intricate data relationships and patterns.

![](\AI_ML_DL.png)

The "deep" in deep learning stands for this idea of successive layers of representations in neural networks. The number of layers that contribute to a model is called the depth of the model. 

## Example 1: Two Adjacent Classes

In [4]:
# Generating the data
X, cl = make_blobs(n_samples=1000, centers=2, n_features=2, random_state=54, cluster_std=2)

dfb = pd.DataFrame()
dfb['X1'] = X[:,0]
dfb['X2'] = X[:,1]
dfb['Class'] = cl
dfb.head()

Unnamed: 0,X1,X2,Class
0,-2.151579,-3.291621,0
1,-4.176291,-2.153766,0
2,0.602819,0.401883,0
3,-10.203529,1.738114,1
4,-5.573777,-1.308363,1


In [5]:
# Ploting the data
px.scatter(x=dfb.X1, y=dfb.X2, color=dfb.Class.astype('str'), 
           color_discrete_map={'0':'cornflowerblue', '1':'indianred'},  
           labels={'color': 'Class'}, title="Make Blobs Data", 
           width=800, height=500)            

In [6]:
X = dfb[['X1','X2']]    # Feature Matrix
y = dfb.Class           # Target variable

In [7]:
# Creating training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=23)
print('Train = %i cases \t Test = %i cases' %(len(X_train), len(X_test)))

Train = 800 cases 	 Test = 200 cases


In [8]:
# Creating a DataFrame for plotting
mapping_train = {0: '0-train', 1: '1-train'}
mapping_test  = {0: '0-test', 1: '1-test'}
train = pd.concat([X_train, y_train.map(mapping_train)], axis=1)
test = pd.concat([X_test, y_test.map(mapping_test)], axis=1)
t = pd.concat([train, test], axis=0)
t.sample(5)

Unnamed: 0,X1,X2,Class
246,-1.651582,-7.097987,0-train
363,1.280587,-2.500779,0-test
951,-6.082064,-1.639193,1-test
757,-4.007715,-1.20701,1-train
9,0.480565,-1.826865,0-train


In [9]:
# Ploting training and testing sets
px.scatter(t, x='X1', y='X2', color=t.Class.astype(str), 
            color_discrete_map={'0-train':'cornflowerblue', '1-train':'indianred', 
                                '0-test':'darkblue', '1-test':'darkred'},  
            labels={'color': 'Class'}, title="Make Blobs Data: Training and Testing Sets", 
            width=800, height=500)  

Any neural network has the following:
- `layers` combined into a network (or model)
- `input data` and `target`
- `loss function`, which defines the feedback signal used for learning
- `optimizer`, which determines how learning proceeds


Let's define a sequential model with the following:
- one hidden layer with 10 nodes and `relu` activation function,
- an output layer with one node and a `sigmoid` activation function.


The argument passed to the Dense layer (`10`) is its number of hidden units or nodes. Having more hidden units (a higher-dimensional representation space) allows the network to learn more complex representations, but it makes it more computationally expensive and may lead to overfitting. 

The parameter `input_dim=2` is because we have `2` input variables: `X1` and `X2`.

In [10]:
# Define model
model = Sequential([
    Input(shape=(2,)),  # Specify the input shape explicitly
    Dense(10, activation='relu'),
    Dense(1, activation='sigmoid')
])

Without an activation function like `relu` (a non-linearity), the `Dense` layer would consist of two linear operations: a dot product and an addition. 

The layer could only learn linear transformations of the input data: the hypothesis space of the layer would be the set of all possible linear transformations of the input data into a 10-dimensional space. Such a hypothesis space is too restricted and would not benefit from multiple layers of representations: adding more layers would not extend the hypothesis space.

In conclusion, to get a non-linearity representation, we use an `activation function.`

In [11]:
model.summary()

We compile the model using:
- binary cross-entropy loss (`binary_crossentropy`) because we are solving a binary classification problem, 
- the `Adam` optimizer with a learning rate of 0.01, 
- and `accuracy` as the evaluation metric.

In [12]:
# Compile model
model.compile(loss='binary_crossentropy', 
              optimizer=Adam(learning_rate=0.01), 
              metrics=['accuracy'])

In [13]:
# Number of iteration of training where the entire dataset is passed through the neural network
epochs = 50

Training a neural network involves using a set of input data (`X_train`) along with their corresponding output labels (`y_train`) to optimize the model's parameters such that it can accurately predict the output labels for new input data.

`model.fit()` is a function in Keras that trains the model using the training data. During training, the model is fed a batch of input data (of size `batch_size`) and their corresponding output labels. The model makes a prediction on this batch of data, computes the loss (or error) between the predicted labels and the true labels, and updates its parameters to minimize the loss. This process is repeated for multiple times (`epochs`), where each epoch corresponds to a full iteration over the entire training dataset.

By specifying a batch size of 10, we are telling the model to update its parameters after every 10 samples. This is a technique called mini-batch gradient descent, which is more computationally efficient than updating the parameters after every single sample, but still allows the model to update its parameters frequently enough to converge to a good solution.

In [14]:
# Train model
history1 = model.fit(X_train, y_train,                  # training data
                    epochs=epochs,                      # number of epochs 
                    validation_data=(X_test, y_test),   # to evaluate the model's performance on unseen data
                    batch_size=10);                     # number of observations per batch

Epoch 1/50
[1m80/80[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.7958 - loss: 0.4412 - val_accuracy: 0.8950 - val_loss: 0.2421
Epoch 2/50
[1m80/80[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8961 - loss: 0.2608 - val_accuracy: 0.9000 - val_loss: 0.2251
Epoch 3/50
[1m80/80[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.9103 - loss: 0.2208 - val_accuracy: 0.9050 - val_loss: 0.2116
Epoch 4/50
[1m80/80[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.9084 - loss: 0.2324 - val_accuracy: 0.9050 - val_loss: 0.2078
Epoch 5/50
[1m80/80[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.9013 - loss: 0.2468 - val_accuracy: 0.9000 - val_loss: 0.2176
Epoch 6/50
[1m80/80[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.9084 - loss: 0.2048 - val_accuracy: 0.9150 - val_loss: 0.2047
Epoch 7/50
[1m80/80[0m [32m━━━━━━━━━━

`model.fit()` returns a history object. This object has a member history: a dictionary containing data about everything that happened during training.

In [15]:
history_dict1 = history1.history
print('Keys:', history_dict1.keys())

Keys: dict_keys(['accuracy', 'loss', 'val_accuracy', 'val_loss'])


In [16]:
# For ploting the results
def plot_history(history):
    '''
    Plotting the results of the neural network training process
    '''
    hist = history.history
    d = pd.DataFrame({'epochs': [epoch + 1 for epoch in history.epoch],
                      'accuracy': hist['accuracy'],
                      'val_accuracy': hist['val_accuracy'],
                      'loss': hist['loss'],
                      'val_loss': hist['val_loss']})
    
    fig = px.line(d, x='epochs', y=['loss', 'val_loss', 'accuracy', 'val_accuracy'],
                  color_discrete_sequence=['orange', 'peru', 'yellowgreen', 'darkolivegreen'],
                  labels={'epochs': 'Epochs', 'value': 'Loss/Accuracy', 'variable': 'Legend'},
                  title='Neural Network Training History', width=800, height=500)
    
    fig.update_traces(mode='lines+markers')
    
    return fig.show()

In [17]:
plot_history(history1)

Notice that `binary_crossentropy` loss function typically produces values between 0 and 1. That is why we can plot loss and accuracy in the same graph. 


What about the validation data?

In [18]:
# Evaluate model on testing set
test_loss, test_accuracy = model.evaluate(X_test, y_test)
print('Testing Loss = %.4f' % test_loss)
print('Testing Accuracy = %.4f' % test_accuracy)

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8990 - loss: 0.2392  
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8990 - loss: 0.2392  
Testing Loss = 0.2143
Testing Accuracy = 0.9150


## Example 2: Two Circular Classes

Let's analyse a very different example.

In [19]:
# Generating new data
X2, cl2 = make_circles(1000, factor=.4, noise=.3, random_state=2)

df2 = pd.DataFrame()
df2['X1'] = X2[:,0]
df2['X2'] = X2[:,1]
df2['Class'] = cl2

In [20]:
px.scatter(x=df2.X1, y=df2.X2, color=df2.Class.astype('str'), 
           color_discrete_map={'0':'cornflowerblue', '1':'indianred'},  
           labels={'color': 'Class'}, title="Make Circles Data", 
           width=800, height=500) 

In [21]:
X2 = df2[['X1','X2']]    # Feature Matrix
y2 = df2.Class           # Target variable

In [22]:
# Splitting in training and testing sets
X_train2, X_test2, y_train2, y_test2 = train_test_split(X2, y2, test_size=0.2, random_state=23)
print('Train = %i cases \t Test = %i cases' %(len(X_train2), len(X_test2)))

Train = 800 cases 	 Test = 200 cases


In [23]:
# Ploting training and testing sets
train2 = pd.concat([X_train2, y_train2.map(mapping_train)], axis=1)
test2 = pd.concat([X_test2, y_test2.map(mapping_test)], axis=1)
t2 = pd.concat([train2, test2], axis=0)
px.scatter(t2, x='X1', y='X2', color=t2.Class.astype(str), 
            color_discrete_map={'0-train':'cornflowerblue', '1-train':'indianred',                                 
                                '0-test':'darkblue', '1-test':'darkred'},  
            labels={'color': 'Class'}, title="Make Circles Data: Training and Testing Sets", 
            width=800, height=500)  

Let's use the previous model but train it with the new data.

In [24]:
# Train model
history2 = model.fit(X_train2, y_train2,                    # training data
                    epochs=epochs,                          # number of epochs
                    batch_size=10,                          # number of observations per batch
                    validation_data=(X_test2, y_test2),     # validation data
                    verbose=0                               # no output
);

In [25]:
plot_history(history2)

In [26]:
# Evaluate model on testing set
test_loss, test_accuracy = model.evaluate(X_test2, y_test2)
print('Testing Loss = %.4f' % test_loss)
print('Testing Accuracy = %.4f' % test_accuracy)

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6453 - loss: 0.6383  
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6453 - loss: 0.6383  
Testing Loss = 0.6315
Testing Accuracy = 0.6550


Let's define a more complicated model with:
- two hidden layer with `relu` activation function,
- an output layer with one node and a `sigmoid` activation function.

In [27]:
# Define model
model2 = Sequential([
    Input(shape=(2,)),              # Specify the input shape explicitly for 2 features
    Dense(20, activation='relu'),   # First hidden layer with 20 neurons
    Dense(10, activation='relu'),   # Second hidden layer with 10 neurons
    Dense(1, activation='sigmoid')  # Output layer with 1 neuron (binary output)
])

model2.summary()

In [28]:
# Compile model
model2.compile(loss='binary_crossentropy', 
              optimizer=Adam(learning_rate=0.01), 
              metrics=['accuracy'])

In [29]:
# Train model
history2b = model2.fit(X_train2, y_train2,                  # training data
                    epochs=epochs,                          # number of epochs
                    batch_size=10,                          # number of observations per batch
                    validation_data=(X_test2, y_test2),     # validation data
                    verbose=0);                             # no output

In [30]:
plot_history(history2b)

In [31]:
# Evaluate model on testing set
test_loss, test_accuracy = model2.evaluate(X_test2, y_test2)
print('Testing Loss = %.4f' % test_loss)
print('Testing Accuracy = %.4f' % test_accuracy)

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7985 - loss: 0.4261  
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7985 - loss: 0.4261  
Testing Loss = 0.4639
Testing Accuracy = 0.7800


## Data representations for Neural Networks. Examples

`tensors`: multidimensional Numpy arrays

- `rank`: is the number of axes of a tensor
- `ndim`: shows the number of axes of a Numpy tensor.
- `shape`: is a tuple of integers that describes how many dimensions the tensor has along each axis.

### Tensors

All current machine-learning systems use `tensors` as their basic data structure. A `tensor` is a fundamental data structure used to represent multi-dimensional arrays of numerical data. It is a generalization of vectors and matrices to higher dimensions.

#### Scalars (0D tensors)

`scalar`: a tensor that contains only one number

In [32]:
# A scalar tensor has 0 axes.
t0 = np.array(3.14)
print(' Tensor: t0\n', t0)
print(' Number of dimensions: ', t0.ndim)
print(' Dimensions (or shape):', t0.shape)

 Tensor: t0
 3.14
 Number of dimensions:  0
 Dimensions (or shape): ()


#### Vectors (1D tensors)

An array of numbers is called a vector, or 1D tensor.

In [33]:
t1 = np.array([1, 2, 34, 54, 3])
print(' Tensor: t1\n', t1)
print(' Number of dimensions: ', t1.ndim)
print(' Dimensions (or shape):', t1.shape)

 Tensor: t1
 [ 1  2 34 54  3]
 Number of dimensions:  1
 Dimensions (or shape): (5,)


#### Matrices (2D tensors)

An array of vectors is a matrix, or 2D tensor. A matrix has two axes (often referred to rows and columns).

In [34]:
t2 = np.array(  [[5, 7, 2, 1], [6, 0, 7, 3], [7, 4, 0, 4]])
print(' Tensor: t2\n', t2)
print(' Number of dimensions: ', t2.ndim)
print(' Dimensions (or shape):', t2.shape)

 Tensor: t2
 [[5 7 2 1]
 [6 0 7 3]
 [7 4 0 4]]
 Number of dimensions:  2
 Dimensions (or shape): (3, 4)


#### 3D tensors and higher-dimensional tensors

If you pack such matrices in a new array, you get a 3D tensor.

In [35]:
t3 =  np.array([[[5, 8, 2, 0],
                [6, 9, 3, 1],
                [7, 0, 4, 2]],
               [[5, 8, 2, 0],
                [6, 9, 3, 1],
                [7, 8, 4, 2]],
               [[5, 8, 2, 0],
                [6, 9, 3, 1],
                [7, 0, 4, 2]]])
print(' Tensor: t3\n', t3)
print(' Number of dimensions: ', t3.ndim)
print(' Dimensions (or shape):', t3.shape)

 Tensor: t3
 [[[5 8 2 0]
  [6 9 3 1]
  [7 0 4 2]]

 [[5 8 2 0]
  [6 9 3 1]
  [7 8 4 2]]

 [[5 8 2 0]
  [6 9 3 1]
  [7 0 4 2]]]
 Number of dimensions:  3
 Dimensions (or shape): (3, 3, 4)


By packing 3D tensors in an array, you can create a 4D tensor, and so on.

### Some real-world examples

- **Vector data**: `2D` tensors of shape (samples, features)

- **Timeseries data or sequence data**: `3D` tensors of shape (samples, timesteps, features)

- **Images**: `4D` tensors of shape (samples, height, width, channels) or (samples, channels, height, width)

- **Video**: `5D` tensors of shape (samples, frames, height, width, channels) or (samples, frames, channels, height, width)

#### Grayscale Digits

In [36]:
from tensorflow.keras.datasets import mnist

# Load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# You may select any number up to 60000-1
image_index = 59999        
selected_image = X_train[image_index]
print(' Number of dimensions: ', X_train[image_index].ndim)
print(' Dimensions (or shape):', X_train[image_index].shape)

fig = px.imshow(selected_image, color_continuous_scale='gray', 
                width=400, height=400)
fig.update_layout(title=f"Label: {y_train[image_index]}", title_font_size=16)
# Remove the x-axis and y-axis
fig.update_xaxes(visible=False)
fig.update_yaxes(visible=False)
fig.show()

 Number of dimensions:  2
 Dimensions (or shape): (28, 28)


#### Color Image

In [37]:
cat = Image.open("HCat.jpg")
print(' Number of dimensions: ', np.array(cat).ndim)
print(' Dimensions (or shape):', np.array(cat).shape)
fig = px.imshow(cat, width=400, height=400)
fig.update_layout(title = f"Label: Cat", title_font_size=16)
# Remove the x-axis and y-axis
fig.update_xaxes(visible=False)
fig.update_yaxes(visible=False)
fig.show()

 Number of dimensions:  3
 Dimensions (or shape): (631, 503, 3)


## Conclusions

Key Takeaways:
- The simple neural network with one hidden layer performed adequately on linearly separable data but struggled with more complex circular data, indicating the need for deeper or more complex architectures for such cases.
- Increasing the complexity of the model (more layers and nodes) improved the handling of non-linear data patterns but raised concerns about overfitting and computational efficiency.
- The training process and visualization of the results helped us understand the model's behavior over epochs, highlighting the importance of parameters like batch size and number of epochs in achieving high accuracy and low loss. 

## References

- Chollet, F. (2021) *Deep Learning with Python*, Second Edition, Manning Publications Co, chap 2