# Deep Learning Model for Telecom Churn Prediction

This notebook builds the final preprocessing pipeline using engineered features
and trains an Artificial Neural Network (ANN) to predict customer churn.


In [1]:
import pandas as pd

data_path = "../data/processed/telecom_churn_feature_engineered.csv"
df = pd.read_csv(data_path)

df.shape


(7043, 26)

In [2]:
y = df['Churn'].map({'Yes': 1, 'No': 0})
X = df.drop(columns=['Churn', 'customerID'])


The target variable is separated from the feature set.
Customer identifiers are removed to avoid information leakage.


In [3]:
X.shape

(7043, 24)

In [4]:
categorical_features = X.select_dtypes(include='object').columns
numerical_features = X.select_dtypes(exclude='object').columns

len(categorical_features), len(numerical_features)


(16, 8)

In [5]:
categorical_features, numerical_features


(Index(['gender', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines',
        'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
        'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract',
        'PaperlessBilling', 'PaymentMethod', 'tenure_group'],
       dtype='object'),
 Index(['SeniorCitizen', 'tenure', 'MonthlyCharges', 'TotalCharges',
        'high_monthly_charge', 'long_term_contract', 'num_services',
        'electronic_check'],
       dtype='object'))

Categorical and numerical features are identified to apply appropriate
encoding and scaling techniques before training the neural network.


In [7]:
# Numerical features: median imputation
X[numerical_features] = X[numerical_features].fillna(
    X[numerical_features].median()
)

# Categorical features: mode imputation
for col in categorical_features:
    X[col] = X[col].fillna(X[col].mode()[0])


In [8]:
X.isna().sum().sum()


np.int64(0)

Missing values are handled using median imputation for numerical features
and mode imputation for categorical features to maintain data integrity.


In [9]:
X_encoded = pd.get_dummies(
    X,
    columns=categorical_features,
    drop_first=True
)

X_encoded.shape


(7043, 37)

Categorical variables are converted into numerical form using one-hot encoding.
Redundant dummy variables are dropped to reduce dimensionality.


In [10]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_encoded)

X_scaled.shape


(7043, 37)

Feature scaling is applied using standardization to ensure stable and
efficient training of the neural network.


In [11]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X_scaled,
    y,
    test_size=0.2,
    random_state=42,
    stratify=y
)


In [12]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape


((5634, 37), (1409, 37), (5634,), (1409,))

In [23]:
import numpy as np

np.save("../data/processed/X_test_final.npy", X_test)
np.save("../data/processed/y_test_final.npy", y_test)


The dataset is split into training and test sets using stratified sampling
to preserve the churn distribution across both sets.


In [15]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

In [16]:
model=Sequential()

# Input layer + first hidden layer
model.add(Dense(32, activation='relu', input_shape=(X_train.shape[1],)))

# Regularization
model.add(Dropout(0.3))

# Second hidden layer
model.add(Dense(16, activation='relu'))
model.add(Dropout(0.3))

# output Layer 

model.add(Dense(1, activation='sigmoid'))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [17]:
model.summary()


A feed-forward Artificial Neural Network is designed with two hidden layers.
Dropout is used to reduce overfitting, and a sigmoid activation function is
applied in the output layer for binary classification.


In [18]:
model.compile(
  optimizer='adam',
  loss='binary_crossentropy',
  metrics=['accuracy']
)

The model is compiled using the Adam optimizer and binary cross-entropy loss,
which are well-suited for binary classification problems such as churn prediction.


In [19]:
history = model.fit(
    X_train, y_train,
    epochs=20,
    batch_size=32,
    validation_split=0.2,
    verbose=1
)


Epoch 1/20
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.7173 - loss: 0.5684 - val_accuracy: 0.7613 - val_loss: 0.4801
Epoch 2/20
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.7653 - loss: 0.4778 - val_accuracy: 0.7764 - val_loss: 0.4598
Epoch 3/20
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 929us/step - accuracy: 0.7846 - loss: 0.4485 - val_accuracy: 0.7817 - val_loss: 0.4534
Epoch 4/20
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 910us/step - accuracy: 0.7826 - loss: 0.4408 - val_accuracy: 0.7870 - val_loss: 0.4489
Epoch 5/20
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 951us/step - accuracy: 0.7834 - loss: 0.4416 - val_accuracy: 0.7959 - val_loss: 0.4447
Epoch 6/20
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 878us/step - accuracy: 0.7905 - loss: 0.4345 - val_accuracy: 0.8012 - val_loss: 0.4431
Epoch 7/20
[1m141/1

In [22]:
import os
os.makedirs("../models", exist_ok=True)

model.save("../models/telecom_churn_ann.keras")


## Conclusion

In this notebook, a complete deep learning pipeline was built using engineered
telecom customer features. The data was preprocessed, an Artificial Neural Network
(ANN) architecture was designed, and the model was trained with validation monitoring.

Model performance evaluation and business interpretation are carried out
in the next notebook to ensure clear separation between model training
and decision-making insights.
