In [None]:
!git clone https://github.com/SamFisher8/Telco-Customer-Churn-Analysis.git
%cd Telco-Customer-Churn-Analysis

Cloning into 'Telco-Customer-Churn-Analysis'...
remote: Enumerating objects: 87, done.[K
remote: Counting objects: 100% (87/87), done.[K
remote: Compressing objects: 100% (74/74), done.[K
remote: Total 87 (delta 28), reused 20 (delta 5), pack-reused 0 (from 0)[K
Receiving objects: 100% (87/87), 173.79 KiB | 7.56 MiB/s, done.
Resolving deltas: 100% (28/28), done.
/content/Telco-Customer-Churn-Analysis/Telco-Customer-Churn-Analysis/Telco-Customer-Churn-Analysis


In [None]:
import pandas as pd

df = pd.read_csv(
    "Data_Preparation/data/processed/Dataset_Encoded_v1.csv"
)



In [None]:
print(df.columns.tolist())


['gender', 'SeniorCitizen', 'Dependents', 'tenure', 'PhoneService', 'MultipleLines', 'InternetService', 'Contract', 'MonthlyCharges', 'Churn']


It is noted that the column **"TotalCharges" does not exist in the dataset**. It is to be created and added first.

In [None]:
df['TotalCharges'] = df['MonthlyCharges'] * df['tenure']


**TotalCharges** **was not explicitly available in the source dataset** and was therefore derived using MonthlyCharges and tenure, which is consistent with standard telecom billing calculation

In [None]:
df[['MonthlyCharges', 'tenure', 'TotalCharges']].head()


Unnamed: 0,MonthlyCharges,tenure,TotalCharges
0,25,1,25
1,25,41,1025
2,19,52,988
3,76,1,76
4,51,67,3417


We should use **StandardScaler as the primary scaler**.

Why StandardScaler Is the Right Choice-


*   ANN benefits from zero-centered features
*   Gradient-based optimization converges faster


*   K-Means uses Euclidean distance â†’ sensitive to scale
*   StandardScaler preserves outliers better than MinMax in telecom data











In [None]:
scale_cols = ['MonthlyCharges', 'TotalCharges', 'tenure']
X_scale = df[scale_cols]


In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

df_scaled = df.copy()
df_scaled[scale_cols] = scaler.fit_transform(X_scale)


In [None]:
df_scaled[scale_cols].describe()



Unnamed: 0,MonthlyCharges,TotalCharges,tenure
count,6741.0,6741.0,6741.0
mean,-7.958163e-17,2.7405590000000003e-17,1.233252e-16
std,1.000074,1.000074,1.000074
min,-1.612094,-1.034039,-1.353972
25%,-0.8371054,-0.8347336,-0.9429941
50%,0.1737492,-0.3922763,-0.1210377
75%,0.8139572,0.6804281,0.9475057
max,1.791117,2.760731,1.605071


 After applying StandardScaler, all selected numerical features exhibit a mean close to zero and a standard deviation close to one, confirming successful normalization.

 This ensures that features contribute equally during neural network training and distance-based clustering.

In [None]:
df_scaled.to_csv(
    "Data_Preparation/data/processed/Dataset_Scaled_v1.csv",
    index=False
)


In [None]:
!git config --global user.name "SamFisher8"
!git config --global user.email "sameensadman8@gmail.com"

!git config --global --list

user.name=SamFisher8
user.email=sameensadman8@gmail.com
credential.helper=store


In [None]:
!ls Data_Preparation/data/processed

Dataset_ATS_v2_cleaned.csv  Dataset_Scaled_v1.csv
Dataset_Encoded_v1.csv	    US_2_5_Class_Imbalance_SMOTE.ipynb


In [None]:
!git add Data_Preparation/data/processed/Dataset_Scaled_v1.csv


In [None]:
!git status


On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	[32mnew file:   Data_Preparation/data/processed/Dataset_Scaled_v1.csv[m



In [None]:
!git commit -m "Data preparation: apply feature scaling to numerical variables (US-2.4)"
!git config --global credential.helper store


On branch main
Your branch is ahead of 'origin/main' by 1 commit.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean
Everything up-to-date


In [None]:
!git push https://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@github.com/SamFisher8/Telco-Customer-Churn-Analysis.git

Everything up-to-date
