https://colab.research.google.com/github/bonchae/bucharest

https://colab.research.google.com/github/bonchae/bucharest/blob/main/Woskshop-Colab.ipynb

# How to Run This Notebook in Google Colab with GPU

1. Sign-in your Google account.

2. In Colab, go to **Runtime** → Change runtime type. Under Hardware accelerator, select **GPU (T4)**.

3. Click on **Run anyway**.

4. Click **Save**.

You're now running your notebook with GPU support in Colab!

# Developing Deep Learning Models from Scratch

> Before we begin — who here has heard of the **churn prediction** problem?

## A Quick Demo: Neural Networks (Feed Forward Neural Network - FFNN)

<center><img src="https://miro.medium.com/v2/format:webp/1*Ne7jPeR6Vrl1f9d7pLLG8Q.jpeg"></center>

[Source](https://medium.com/@b.terryjack/introduction-to-deep-learning-feed-forward-neural-networks-ffnns-a-k-a-c688d83a309d)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

# Traditional Machine Learning
from sklearn.tree import DecisionTreeClassifier

# Deep Learning Setup
import tensorflow as tf
from tensorflow.keras.models import Sequential           # Sequential model: stack layers linearly
from tensorflow.keras.layers import Dense, Input         # Dense: fully connected layer, Input: define input shape
from tensorflow.keras.optimizers import Adam             # Adam: an efficient optimizer for training

import warnings
warnings.filterwarnings("ignore")

# Set seeds for reproducibility
import random
seed_value = 42  # Choose any seed value you want
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)
tf.config.experimental.enable_op_determinism()  # TensorFlow 2.9+

In [None]:
# Sample churn dataset
url = 'https://raw.githubusercontent.com/bonchae/data/refs/heads/master/WA_Fn-UseC_-Telco-Customer-Churn.csv'
df = pd.read_csv(url)
df.head()

In [None]:
# number of rows and columns
df.shape

Telco Customer Churn Dataset: Feature Definitions

| Feature Name        | Description                                                                 |
|---------------------|-----------------------------------------------------------------------------|
| `customerID`        | Unique ID assigned to each customer (dropped in modeling)                  |
| `gender`            | Customer’s gender: `Male`, `Female`                                        |
| `SeniorCitizen`     | Indicates if the customer is a senior (1) or not (0)                       |
| `Partner`           | Whether the customer has a partner (`Yes`/`No`)                            |
| `Dependents`        | Whether the customer has dependents (`Yes`/`No`)                           |
| `tenure`            | Number of months the customer has been with the company                    |
| `PhoneService`      | Whether the customer has phone service (`Yes`/`No`)                        |
| `MultipleLines`     | If customer has multiple phone lines                                       |
| `InternetService`   | Type of internet: `DSL`, `Fiber optic`, or `No`                            |
| `OnlineSecurity`    | Whether the customer has online security add-on                            |
| `OnlineBackup`      | Whether the customer has online backup service                             |
| `DeviceProtection`  | Whether the customer has device protection plan                            |
| `TechSupport`       | Whether the customer has technical support access                          |
| `StreamingTV`       | Whether the customer has streaming TV service                              |
| `StreamingMovies`   | Whether the customer has streaming movies access                           |
| `Contract`          | Type of contract: `Month-to-month`, `One year`, `Two year`                |
| `PaperlessBilling`  | Whether billing is paperless (`Yes`/`No`)                                  |
| `PaymentMethod`     | Method of payment: `Electronic check`, `Mailed check`, etc.                |
| `MonthlyCharges`    | Amount charged to the customer monthly (in dollars)                        |
| `TotalCharges`      | Total amount charged over the customer’s lifetime                          |
| `Churn`             | Target variable: whether the customer left in the last month (`Yes`/`No`)  |

**Note:** After preprocessing, categorical variables are converted into dummy variables, and `Churn_Yes` becomes the binary target (1 = churned, 0 = stayed).


In [None]:
# Drop customerID and handle missing values
df.drop('customerID', axis=1, inplace=True)
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
df.dropna(inplace=True)

# Convert categorical columns to dummy variables
df = pd.get_dummies(df, drop_first=True)

In [None]:
# Split features and labels
X = df.drop('Churn_Yes', axis=1)
y = df['Churn_Yes']

In [None]:
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [None]:
print("Shape of the X: ", X_train_scaled.shape)
print("Number of X variables: ", X_train_scaled.shape[1])

Below is a ```FFNN-feedforward neural network``` (also called a dense or fully connected network);

- Data flows **one direction**: input → hidden layers → output

- There’s **no memory** or feedback loop like in **RNNs**

- The layers are typically fully connected using ```Dense``` layers

- Used for classification or regression (**tabular data**)

In [None]:
model = Sequential()
model.add(Input(shape=(X_train_scaled.shape[1],)))          # 🔷 Input layer: expects one sample with N features
model.add(Dense(32, activation='relu'))                     # 🧠 First hidden layer: learns non-linear patterns
model.add(Dense(16, activation='relu'))                     # 🧠 Second hidden layer: deeper feature learning
model.add(Dense(1, activation='sigmoid'))                   # 🎯 Output layer: sigmoid gives probability for binary classification

In [None]:
# compile: which optimizer, lost function or metrics to use
model.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
# This is the training process
history = model.fit(X_train_scaled, y_train, epochs=5)

Training Summary – Churn Model (5 Epochs)


| Epoch | Accuracy | Loss   | What Happened                                                     |
|-------|----------|--------|--------------------------------------------------------------------|
| 1️⃣    | 0.7414   | 0.5323 | Model begins learning; picks up basic churn vs. no-churn patterns |
| 2️⃣    | 0.7945   | 0.4258 | Big improvement — model captures strong early signals             |
| 3️⃣    | 0.7990   | 0.4139 | Learns better feature interactions (e.g., tenure + contract type) |
| 4️⃣    | 0.8038   | 0.4079 | Training stabilizes — confidence improves                         |
| 5️⃣    | 0.8063   | 0.4039 | Gradual gain — close to saturation, ready for evaluation          |

In [None]:
# Plot training history
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.title('Training History')
plt.show()

In [None]:
loss, accuracy = model.evaluate(X_test_scaled, y_test, verbose=0)
print(f"Test Accuracy: {accuracy:.2f}")

In [None]:
# Predict probabilities with the neural network
y_pred_probs = model.predict(X_test_scaled)

# Convert probabilities to binary predictions
y_pred_nn = (y_pred_probs > 0.5).astype(int)

# Create confusion matrix
cm = confusion_matrix(y_test, y_pred_nn)
cm

> Discussion Points:

1. What features in the dataset might be most predictive of customer churn?

2. How could this model be used by a marketing or customer success team?

3. What are the potential risks of acting on model predictions (e.g., false positives)?

## CNN from Scratch

<img src="https://i0.wp.com/developersbreach.com/wp-content/uploads/2020/08/cnn_banner.png?fit=1400%2C658&ssl=1">

In [None]:
# Load data
(images, labels), _ = tf.keras.datasets.fashion_mnist.load_data()

The images in the **Fashion MNIST dataset** are **black** and **white (grayscale only)**

In [None]:
# view the first image
plt.imshow(images[0], cmap='gray')
plt.title("Label: {}".format(labels[0]))
plt.show()
# Label 9 is Ankle boot

In [None]:
# view the actual value of the above image
images[0]

- 28 rows  → height of the image  
- 28 cols  → width of the image  
- Each number = brightness of a pixel

In [None]:
# In training ML/DL models, we normalize numerical values (actual values to the range between 0 and 1)
# Normalize pixel values from 0–255 ==> 0.0–1.0
images = images / 255.0

In [None]:
# Expected shape by a CNN: (height, width, channels)
# → Must be 3D per image: (28, 28, 1)
# The 1 is the channel → 1 for grayscale, 3 for RGB.
# Add channel dimension (needed for CNN)
images = images.reshape(-1, 28, 28, 1)

In [None]:
# Build a simple CNN model with softmax output

model = tf.keras.Sequential([
    tf.keras.Input(shape=(28, 28, 1)),                    # Input: grayscale image
    tf.keras.layers.Conv2D(16, (3, 3), activation='relu'), # Conv layer to extract patterns
    tf.keras.layers.MaxPooling2D(2, 2),                    # Downsample by 2
    tf.keras.layers.Flatten(),                             # Flatten 2D → 1D
    tf.keras.layers.Dense(64, activation='relu'),          # Hidden layer
    tf.keras.layers.Dense(10, activation='softmax')        # Output: 10 class probabilities
])

In [None]:
# Choose the optimizer, loss function, and metric:
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

In [None]:
# Train the model (using all data, no split)
history = model.fit(images, labels, epochs=3)

Learning Progress Over Epochs

| Epoch  | What Happens                             | Accuracy     |
|--------|-------------------------------------------|--------------|
| 1️⃣     | Model starts with random weights          | Low          |
| 2️⃣     | Learns basic patterns                     | Higher       |
| 3️⃣ | Learns finer patterns, reduces mistakes   | Even higher  |


In [None]:
# Plot training history
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.title('Training Accuracy')
plt.legend()
plt.grid(True)
plt.show()


In [None]:
# Accuracy after each epoch
print(history.history['accuracy'])

# Final accuracy
final_acc = history.history['accuracy'][-1]
print(f"Final Training Accuracy: {final_acc:.2f}")

In [None]:
probabilities = model.predict(images)

# Convert to class labels
y_pred = np.argmax(probabilities, axis=1)

# True labels
y_true = labels  # still integers 0–9
cm = confusion_matrix(y_true, y_pred)
cm

#### Predict New Images

In [None]:
# This is the second image in the dataset. It's a T-shirt/top and its label is 0

sample = images[1]  # already normalized, shape: (28, 28, 1)

plt.imshow(sample, cmap='gray')
plt.title("A sample image. Predict me!")
plt.show()

In [None]:
sample = sample.reshape(1, 28, 28, 1)

In [None]:
probabilities = model.predict(sample)
predicted_class = tf.argmax(probabilities, axis=1).numpy()[0]
print(f"Predicted class: {predicted_class}")

# Labe 0 is a T-shirt/top

Conclusion: Our image recognization model works well :)