# Customer Churn Prediction using KNN
This notebook walks you through a full Machine Learning project using the **K-Nearest Neighbors (KNN)** algorithm.
We will:
- Load and explore a churn dataset
- Preprocess and scale features
- Train a KNN model
- Evaluate it using accuracy and confusion matrix
- Tune hyperparameters

Dataset Source:
**Kaggle â€“ Telco Customer Churn**
https://www.kaggle.com/datasets/blastchar/telco-customer-churn

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.neighbors import KNeighborsClassifier




KeyError: '__reduce_cython__'

## Load Dataset
Replace the filename below with your downloaded Kaggle dataset.

In [None]:
df = pd.read_csv("Telco-Customer-Churn.csv")
df.head()

## Explore Data

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
sns.countplot(x='Churn', data=df)
plt.title("Churn Distribution")
plt.show()

## Select Features and Target
We select numerical columns and convert `Churn` into 0/1 labels.

In [None]:
df['Churn'] = df['Churn'].map({'Yes':1, 'No':0})

X = df[['tenure','MonthlyCharges','TotalCharges']].replace(" ", np.nan)
X['TotalCharges'] = pd.to_numeric(X['TotalCharges'])
X = X.fillna(X.mean())

y = df['Churn'].values

## Train/Test Split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

## Feature Scaling
KNN relies on distance, so scaling is *very important*.

In [None]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

## Train KNN Model

In [None]:
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

## Make Predictions

In [None]:
y_pred = knn.predict(X_test)

## Model Evaluation

In [None]:
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

In [None]:
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, cmap="Blues", fmt="d")
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()

In [None]:
print(classification_report(y_test, y_pred))

## Hyperparameter Tuning (Choosing Best K)

In [None]:
error = []
for k in range(1, 21):
    knn_test = KNeighborsClassifier(n_neighbors=k)
    knn_test.fit(X_train, y_train)
    pred_k = knn_test.predict(X_test)
    error.append(np.mean(pred_k != y_test))

plt.plot(range(1, 21), error, marker='o')
plt.title("Error Rate vs K Value")
plt.xlabel("K")
plt.ylabel("Error Rate")
plt.show()