# Deep Learning

## Why

What is deep learning?

→ “It’s a way to make predictions and decisions using layered learning, inspired by how humans learn.”

Why does it matter for business?

→ Chatbots, Netflix/Amazon recommendations, fraud detection, image recognition, voice assistants, etc.

## Neural Networks

<center><img src="https://towardsdatascience.com/wp-content/uploads/2021/12/1hkYlTODpjJgo32DoCOWN5w.png" width=700></center>

<center><img src="https://miro.medium.com/v2/format:webp/1*Ne7jPeR6Vrl1f9d7pLLG8Q.jpeg" width=500></center>

[Source](https://medium.com/@b.terryjack/introduction-to-deep-learning-feed-forward-neural-networks-ffnns-a-k-a-c688d83a309d)


```python
model = Sequential()
model.add(Input(shape=(X_train_scaled.shape[1],)))
model.add(Dense(32, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='sigmoid'))


🧱 Input Layer
- This is where data enters the model
- Each feature (column) in your data (e.g., tenure, contract type, monthly charges) is one neuron
- It passes the raw information to the next layer — no learning happens here

🧠 Hidden Layers
- These are the “thinking” layers of the model
- They learn patterns in the data by transforming inputs with weights and activations
- The more hidden layers or neurons, the more complex patterns the model can learn

🎯 Output Layer
- This layer gives the final prediction
- Example: 1 neuron with sigmoid activation → churn probability
- In multi-class problems: multiple neurons with softmax

For structured business data (like churn), 1–2 hidden layers are usually enough.
More layers don’t always help and can even hurt if the data is small or noisy.

# Deep Learning for Business Applications: Predicting Customer Churn with Neural Networks

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

# Traditional Machine Learning
from sklearn.tree import DecisionTreeClassifier

# Deep Learning Setup
import tensorflow as tf
from tensorflow.keras.models import Sequential           # Sequential model: stack layers linearly
from tensorflow.keras.layers import Dense, Input         # Dense: fully connected layer, Input: define input shape
from tensorflow.keras.optimizers import Adam             # Adam: an efficient optimizer for training

import warnings
warnings.filterwarnings("ignore")

# Set seeds for reproducibility
import random
seed_value = 42  # Choose any seed value you want
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

## Load and Explore the Dataset

In [None]:
# Sample churn dataset
url = 'https://raw.githubusercontent.com/bonchae/data/refs/heads/master/WA_Fn-UseC_-Telco-Customer-Churn.csv'
df = pd.read_csv(url)
df.head()

In [None]:
df.shape

### Telco Customer Churn Dataset: Feature Definitions

| Feature Name        | Description                                                                 |
|---------------------|-----------------------------------------------------------------------------|
| `customerID`        | Unique ID assigned to each customer (dropped in modeling)                  |
| `gender`            | Customer’s gender: `Male`, `Female`                                        |
| `SeniorCitizen`     | Indicates if the customer is a senior (1) or not (0)                       |
| `Partner`           | Whether the customer has a partner (`Yes`/`No`)                            |
| `Dependents`        | Whether the customer has dependents (`Yes`/`No`)                           |
| `tenure`            | Number of months the customer has been with the company                    |
| `PhoneService`      | Whether the customer has phone service (`Yes`/`No`)                        |
| `MultipleLines`     | If customer has multiple phone lines                                       |
| `InternetService`   | Type of internet: `DSL`, `Fiber optic`, or `No`                            |
| `OnlineSecurity`    | Whether the customer has online security add-on                            |
| `OnlineBackup`      | Whether the customer has online backup service                             |
| `DeviceProtection`  | Whether the customer has device protection plan                            |
| `TechSupport`       | Whether the customer has technical support access                          |
| `StreamingTV`       | Whether the customer has streaming TV service                              |
| `StreamingMovies`   | Whether the customer has streaming movies access                           |
| `Contract`          | Type of contract: `Month-to-month`, `One year`, `Two year`                |
| `PaperlessBilling`  | Whether billing is paperless (`Yes`/`No`)                                  |
| `PaymentMethod`     | Method of payment: `Electronic check`, `Mailed check`, etc.                |
| `MonthlyCharges`    | Amount charged to the customer monthly (in dollars)                        |
| `TotalCharges`      | Total amount charged over the customer’s lifetime                          |
| `Churn`             | Target variable: whether the customer left in the last month (`Yes`/`No`)  |

**Note:** After preprocessing, categorical variables are converted into dummy variables, and `Churn_Yes` becomes the binary target (1 = churned, 0 = stayed).


## Preprocessing

In [None]:
# Drop customerID and handle missing values
df.drop('customerID', axis=1, inplace=True)
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
df.dropna(inplace=True)

# Convert categorical columns to dummy variables
df = pd.get_dummies(df, drop_first=True)
df.head()

In [None]:
# Split features and labels
X = df.drop('Churn_Yes', axis=1)
y = df['Churn_Yes']

In [None]:
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## Build and Train the Decision Tree Model (ML)

In [None]:
tree_model =
tree_model.

# Predict and evaluate
y_pred = tree_model.
print(f"Decision Tree Accuracy: {accuracy_score(y_test, y_pred):.2f}")

In [None]:
# Create the confusion matrix
cm = confusion_matrix(   ,   )
cm

## Build and Train the Neural Network (DL)

In [None]:
print("Shape of the X: ", X_train_scaled.shape)
print("Number of X variables: ", X_train_scaled.shape[1])

Below is a ```FFNN-feedforward neural network``` (also called a dense or fully connected network);

- Data flows **one direction**: input → hidden layers → output

- The layers are typically fully connected using ```Dense``` layers

- Used for classification or regression (tabular data)

- There’s **no memory** or feedback loop like in **RNNs (Recurrent Neural Network)**

<center><img src="https://miro.medium.com/v2/format:webp/1*Ne7jPeR6Vrl1f9d7pLLG8Q.jpeg" width=500></center>

[Source](https://medium.com/@b.terryjack/introduction-to-deep-learning-feed-forward-neural-networks-ffnns-a-k-a-c688d83a309d)

In [None]:
model =
model.add(Input(shape=(X_train_scaled.shape[1],)))          #  Input layer: expects one sample with N features
model.                                                      #  First hidden layer: learns non-linear patterns
model.                                                      #  Second hidden layer: deeper feature learning
model.add(Dense(1, activation='sigmoid'))                   #  Output layer: sigmoid gives probability for binary classification

What does **activition** functions like ```relu``` do? Introduce **non-linearity** into the model so that it discovers complex, nonlinear pattern from data.

| Name      | Formula                             | Use                                                        |
|-----------|-------------------------------------|-------------------------------------------------------------|
| **ReLU** (Rectified Linear Unit) | `max(0, z)`                        | Default in most networks (fast + simple)                    |
| **Sigmoid** | `1 / (1 + e^(-z))`                   | Good for binary classification                           |                                  |
| **Softmax** | Converts a vector into probabilities | Used in final layer for multi-class classification         |


In [None]:
model.compile(optimizer='',
              loss='',
              metrics=['accuracy'])

```compile``` is like **prepping your model** for **battle** — defining the rules of the game before it starts learning. In short, “configuring the training process"

- Which **optimizer** to use (how to adjust weights)? ```Adam()``` is the optimizer – it efficiently updates weights. It’s an **optimizer** used to **update weights** during training by **minimizing the loss function**.

- Which **loss** function to minimize (what the model is trying to get better at)? ```binary_crossentropy``` is the loss function – perfect for binary classification tasks

- Which metrics to track (like accuracy)? ```accuracy```



In [None]:
# This is the training process
history = model.fit(            ,        ,
                    epochs=20, batch_size=32)

Each ```epoch=20``` is one full pass through all training examples: The idea is to learn better with each pass (like reviewing material multiple times)

```batch_size=32```: update the model after every 32 examples
- Your data is split into mini-batches of 32 rows
- After processing each batch, the model updates its weights based on the error
- This makes training faster and more stable than using the whole dataset at once



In [None]:
# Plot training history
plt.plot(history.history['accuracy'], label='Train Accuracy')
#plt.plot(history.history['val_accuracy'], label='Val Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.title('Training History')
plt.show()

In [None]:
loss, accuracy = model.evaluate(X_test_scaled, y_test)
print(f"Test Accuracy: {accuracy:.2f}")

In [None]:
# Predict probabilities with the neural network
y_pred_probs = model.predict(X_test_scaled)

# Convert probabilities to binary predictions
y_pred_nn = (y_pred_probs > 0.5).astype(int)

# Create confusion matrix
cm = confusion_matrix(y_test, y_pred_nn)
cm

### We've just built a neural network to predict customer churn!
Let's reflect on how this model could help your business take action on high-risk customers.

---

# Discussion Prompts

1. What features in the dataset might be most predictive of customer churn?
2. How could this model be used by a marketing or customer success team?
3. What are the potential risks of acting on model predictions (e.g., false positives)?

Feel free to discuss with a partner or jot down your thoughts.
