# Week 7: Deep Learning Fundamentals
## Day 1: Introduction to Deep Learning
### Overview

<font size = 4>The session introduces <br>
    - Fundamental concepts of Deep Learning, <br>
    - Including neural networks, their components, and <br>
    - An overview of popular libraries like TensorFlow and Keras. <br>
    
 <font size = 4> The objective is to establish foundational knowledge and practical understanding of how these technologies work.



![image-2.png](attachment:image-2.png)

### Topics Covered
- Basics of Neural Networks
    - What are neural networks?
    - Neurons and layers: Input, hidden, and output layers.
    - Activation functions: ReLU, Sigmoid, Tanh.
    - Forward propagation and backpropagation.

- Introduction to TensorFlow
    - Overview of TensorFlow as a deep learning framework.
    - Building computational graphs.
    - Tensors, operations, and sessions.

- Keras API
    - Introduction to Keras for easier neural network implementation.
    - Keras vs TensorFlow: Understanding the difference.
    - Building simple neural networks using Keras.

### 1. What are Neural Networks?
- A neural network is a computational model inspired by the human brain's structure, designed to recognize patterns and learn from data. At its core, it consists of layers of interconnected nodes or neurons that process input data and generate predictions or classifications.

![neural%200.png](attachment:neural%200.png)
### Key Concepts:
- Artificial Neurons: These are mathematical functions that mimic biological neurons, receiving inputs, applying weights, adding biases, and passing the result through an activation function.
- Learning: Neural networks learn by adjusting weights through a process called training, minimizing error or loss in predictions.


### Example:
- Imagine a simple neural network designed to classify whether an image is of a cat or a dog. The input to the network would be pixel values from the image, and after a series of transformations (forward propagation), the output would be a probability of whether the image is a cat or a dog.

![with%20bias.png](attachment:with%20bias.png)

![neural%201.png](attachment:neural%201.png)

## Particular problem is called Image Classification

<font size = 4>We have an input image of any size, format, or color and the output from the network are three 
    - numbers between 0 and 1 where each output corresponds to the probability that the input image is<br>
        - either a “Cat”,  <br>
        - a “Dog” or <br>
        - another category which we simply call “Other.”
    
<font size = 4> <b>Note:</b> The input is an image, and the output is a numeric value for each of the three possible classes. 

![image.png](attachment:image.png)

<font size = 4>
- The network produces 0.97 for the first output, 0.01 for the 2nd, and 0.02 for the 3rd. <br>
- Notice that the three outputs sum to one since they represent probabilities. <br>
- Since the first output has the highest probability, we say the network predicted the input image to be a Cat.
    
<b>NOTE:</b> A Perfect neural network would output <br>
    - (1,0,0) if the input image was a cat and likewise <br>
    - (0,1,0) if the input image was a dog, and <br>
    - finally (0,0,1) if the image was something other than a cat or a dog. In reality, even well-trained networks do not give such perfect results.

![image.png](attachment:image.png)

#### Grayscale images are represented as an array of pixel values where each pixel value represents an intensity from pure black to pure white.

![image.png](attachment:image.png)

#### Color images are very similar, except they have three components for each pixel representing the color intensity for red, green, and blue, respectively. So, in this case, a 256 x 256 color image is represented by 196,608 numbers.

![image.png](attachment:image.png)

## What happens if our input image is some other size?

![image.png](attachment:image.png)

- The main thing to note here is that when neural networks are designed, they are done so as to accept a certain size and shape for the input. 
- It’s not uncommon for different image classification networks to require different size inputs depending on the application they are designed to support. 

- For example, networks that are designed for mobile devices typically require small input images due to the limited resources that are associated with mobile devices. 
- But that’s ok, because all we need to do is pre-process our images to conform to the size and shape required by any particular network.



![image.png](attachment:image.png)

## What does it mean to train a Neural Network?


- The main thing to understand about neural networks is that they contain many tunable parameters, which you can think of as knob settings on the black box (in technical jargon, these settings are referred to as weights).
![image.png](attachment:image.png)
- It typically requires a significant amount of data and takes many iterations to determine the optimal settings for the neural network weights.
- When you train a neural network, you need to show it several thousand examples of the various classes that you want it to learn, for example, images of cats, images of dogs, and images of other types of objects --> supervised learning

![image.png](attachment:image.png)

### If the network makes an incorrect prediction, we compute an error associated with the incorrect prediction, and that error is used to adjust the weights in the network so that the accuracy of subsequent predictions is improved.

![image.png](attachment:image.png)

# Section - 2: Training Neural Networks for Beginners


![image.png](attachment:image.png)

In this section, we will delve deeper into how neural networks are trained without getting into the details of a particular network architecture. This will allow us to discuss the training process at a conceptual level covering the following topics.

- How labeled training data is modeled. 
- How a loss function is used to quantify the error between the input and the predicted output.
- How gradient descent is used to update the weights in the network.

 ![Screenshot%202024-09-24%20at%204.42.03%20AM.png](attachment:Screenshot%202024-09-24%20at%204.42.03%20AM.png)

![Screenshot%202024-09-24%20at%204.41.56%20AM.png](attachment:Screenshot%202024-09-24%20at%204.41.56%20AM.png)

![image.png](attachment:image.png)

## The Loss Function

One way to quantify the error between the network output and the expected result is to compute the Sum of Squared Errors (SSE), as shown below. This is also referred to as a loss. 

![image.png](attachment:image.png)

When neural networks are trained in practice, many images are used to compute a loss before the network weights are updated. Therefore, the next equation is often used to compute the Mean Squared Error (MSE)

![image.png](attachment:image.png)

## Gradient Descent (Optimization) - Principled way to tune the weights of a neural network 

![image.png](attachment:image.png)

### The slope of a line is defined as the rise over the run and that when the weight is to the left of the optimum value, the slope of the function is negative, and when the weight is to the right of the optimum value, the slope of the function is positive. So it’s the sign of the gradient that’s important.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

# 

In [1]:
import tensorflow as tf
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt

In [2]:
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [19]:
print(train_images)

[[[[0.]
   [0.]
   [0.]
   ...
   [0.]
   [0.]
   [0.]]

  [[0.]
   [0.]
   [0.]
   ...
   [0.]
   [0.]
   [0.]]

  [[0.]
   [0.]
   [0.]
   ...
   [0.]
   [0.]
   [0.]]

  ...

  [[0.]
   [0.]
   [0.]
   ...
   [0.]
   [0.]
   [0.]]

  [[0.]
   [0.]
   [0.]
   ...
   [0.]
   [0.]
   [0.]]

  [[0.]
   [0.]
   [0.]
   ...
   [0.]
   [0.]
   [0.]]]


 [[[0.]
   [0.]
   [0.]
   ...
   [0.]
   [0.]
   [0.]]

  [[0.]
   [0.]
   [0.]
   ...
   [0.]
   [0.]
   [0.]]

  [[0.]
   [0.]
   [0.]
   ...
   [0.]
   [0.]
   [0.]]

  ...

  [[0.]
   [0.]
   [0.]
   ...
   [0.]
   [0.]
   [0.]]

  [[0.]
   [0.]
   [0.]
   ...
   [0.]
   [0.]
   [0.]]

  [[0.]
   [0.]
   [0.]
   ...
   [0.]
   [0.]
   [0.]]]


 [[[0.]
   [0.]
   [0.]
   ...
   [0.]
   [0.]
   [0.]]

  [[0.]
   [0.]
   [0.]
   ...
   [0.]
   [0.]
   [0.]]

  [[0.]
   [0.]
   [0.]
   ...
   [0.]
   [0.]
   [0.]]

  ...

  [[0.]
   [0.]
   [0.]
   ...
   [0.]
   [0.]
   [0.]]

  [[0.]
   [0.]
   [0.]
   ...
   [0.]
   [0.]
   [0.]]

  [[0.

In [10]:
train_images = train_images.reshape((60000, 28, 28, 1)).astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1)).astype('float32') / 255
test_images

array([[[[0.],
         [0.],
         [0.],
         ...,
         [0.],
         [0.],
         [0.]],

        [[0.],
         [0.],
         [0.],
         ...,
         [0.],
         [0.],
         [0.]],

        [[0.],
         [0.],
         [0.],
         ...,
         [0.],
         [0.],
         [0.]],

        ...,

        [[0.],
         [0.],
         [0.],
         ...,
         [0.],
         [0.],
         [0.]],

        [[0.],
         [0.],
         [0.],
         ...,
         [0.],
         [0.],
         [0.]],

        [[0.],
         [0.],
         [0.],
         ...,
         [0.],
         [0.],
         [0.]]],


       [[[0.],
         [0.],
         [0.],
         ...,
         [0.],
         [0.],
         [0.]],

        [[0.],
         [0.],
         [0.],
         ...,
         [0.],
         [0.],
         [0.]],

        [[0.],
         [0.],
         [0.],
         ...,
         [0.],
         [0.],
         [0.]],

        ...,

        [[0.],
 

In [4]:
model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

In [5]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])


In [6]:
model.fit(train_images, train_labels, epochs=5, batch_size=64)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x177806e10>

In [7]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f'Test accuracy: {test_acc}')

Test accuracy: 0.9850999712944031


In [9]:
predictions = model.predict(test_images)
predictions



array([[4.6353925e-08, 4.3363055e-08, 1.6883812e-06, ..., 9.9997437e-01,
        1.8006978e-07, 7.5965895e-06],
       [7.7317374e-08, 2.2966742e-05, 9.9997699e-01, ..., 4.7292781e-10,
        2.7498247e-08, 1.5069139e-09],
       [3.0990866e-06, 9.9902701e-01, 2.5119785e-05, ..., 2.3373132e-04,
        9.5030729e-05, 8.7201840e-07],
       ...,
       [1.4301852e-11, 8.1500033e-09, 1.6569345e-11, ..., 2.5200052e-07,
        2.9314579e-06, 7.7607374e-06],
       [1.2226355e-09, 6.4458849e-10, 1.4762652e-10, ..., 1.5198955e-08,
        9.4275674e-05, 7.8016861e-08],
       [4.2201131e-09, 7.7723245e-11, 2.1050791e-07, ..., 4.6058766e-09,
        1.5473811e-06, 2.4745392e-09]], dtype=float32)

In [14]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Create synthetic data
np.random.seed(42)

# Generate synthetic features: tenure, monthly_charges, contract_type (0 for month-to-month, 1 for one year, 2 for two year), 
# internet_service (0 for DSL, 1 for Fiber optic, 2 for None), and a few others
num_samples = 1000

tenure = np.random.randint(1, 73, num_samples)  # Tenure in months (1 to 72)
monthly_charges = np.random.uniform(20, 100, num_samples)  # Monthly charges between $20 and $100
contract_type = np.random.randint(0, 3, num_samples)  # 0: Month-to-month, 1: One year, 2: Two year contract
internet_service = np.random.randint(0, 3, num_samples)  # 0: DSL, 1: Fiber optic, 2: None

# Creating some other dummy features
phone_service = np.random.randint(0, 2, num_samples)  # 0: No, 1: Yes
multiple_lines = np.random.randint(0, 2, num_samples)  # 0: No, 1: Yes
online_security = np.random.randint(0, 2, num_samples)  # 0: No, 1: Yes

# Generate synthetic labels (1: churn, 0: no churn)
churn = np.random.randint(0, 2, num_samples)

# Combine features into a DataFrame
df = pd.DataFrame({
    'tenure': tenure,
    'monthly_charges': monthly_charges,
    'contract_type': contract_type,
    'internet_service': internet_service,
    'phone_service': phone_service,
    'multiple_lines': multiple_lines,
    'online_security': online_security,
    'churn': churn
})

# Split data into features and labels
X = df.drop('churn', axis=1).values
y = df['churn'].values

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features (mean 0, std 1)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

df.head()  # Return the first few rows of the synthetic data for review


Unnamed: 0,tenure,monthly_charges,contract_type,internet_service,phone_service,multiple_lines,online_security,churn
0,52,89.359501,0,2,0,0,0,1
1,15,52.63876,2,1,1,1,0,0
2,72,64.137808,2,0,1,0,1,1
3,61,40.311085,1,0,1,1,0,1
4,21,35.689048,2,1,0,1,1,0


In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Synthetic data generation
np.random.seed(42)
num_samples = 1000

tenure = np.random.randint(1, 73, num_samples)
monthly_charges = np.random.uniform(20, 100, num_samples)
contract_type = np.random.randint(0, 3, num_samples)
internet_service = np.random.randint(0, 3, num_samples)
phone_service = np.random.randint(0, 2, num_samples)
multiple_lines = np.random.randint(0, 2, num_samples)
online_security = np.random.randint(0, 2, num_samples)
churn = np.random.randint(0, 2, num_samples)

df = pd.DataFrame({
    'tenure': tenure,
    'monthly_charges': monthly_charges,
    'contract_type': contract_type,
    'internet_service': internet_service,
    'phone_service': phone_service,
    'multiple_lines': multiple_lines,
    'online_security': online_security,
    'churn': churn
})

# Split into features and labels
X = df.drop('churn', axis=1).values
y = df['churn'].values

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the input features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Build the neural network
model = Sequential()
model.add(Dense(units=16, activation='relu', input_shape=(X_train.shape[1],)))
model.add(Dense(units=8, activation='relu'))
model.add(Dense(units=1, activation='sigmoid'))

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2)

# Evaluate the model
test_loss, test_acc = model.evaluate(X_test, y_test)

print(f"Test Accuracy: {test_acc}")


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Test Accuracy: 0.46000000834465027
