# **Task1 - Customer Churn Prediction**

## 1. Problem Definition:
  + **Objective:**
     - Predict whether a telecom customer will churn (close their account) in the next month.
  + **Importance:**
     - Reduce customer loss.
     - Offer personalized promotions to customers at risk of churn.

## 2. Data Collection:
+  The dataset used for this project is the **Telco Customer Churn Dataset**, which contains information about telecom customers and whether they churned (left the service) or not.

In [1]:
# Import necessary libraries
import pandas as pd

# Load the dataset
data = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')

# Display the first 5 rows to understand the structure of the data
data.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


+ Handling missing values:
  - Check for any missing values in the dataset.

In [2]:
data.isnull().sum()

Unnamed: 0,0
customerID,0
gender,0
SeniorCitizen,0
Partner,0
Dependents,0
tenure,0
PhoneService,0
MultipleLines,0
InternetService,0
OnlineSecurity,0


- Drop rows with missing values.

In [3]:
data = data.dropna()

+ Dropping unnecessary columns and converting data types:

In [4]:
# Drop customerID as it is not useful for prediction
data = data.drop('customerID', axis=1)

# Convert TotalCharges to numeric
data['TotalCharges'] = pd.to_numeric(data['TotalCharges'], errors='coerce')

In [5]:
# Convert target variable to binary
data['Churn'] = data['Churn'].map({'Yes': 1, 'No': 0})

+ Separating Features and Target:

In [6]:
# Separate features and target
X = data.drop('Churn', axis=1)
y = data['Churn']

- Convert categorical variables into a format that can be provided to machine learning algorithm:

In [7]:
# One-hot encode categorical variables
X = pd.get_dummies(X, drop_first=True)
X = X.select_dtypes(exclude=['object', 'category'])

  - Split the data into training and testing sets.

In [8]:
from sklearn.model_selection import train_test_split
# Split data into train/test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

+ Scaling Numerical Features:

In [9]:
from sklearn.preprocessing import StandardScaler
# Scale numerical features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

## 3. Model Selection
+ Build the Random Forest Model:

In [10]:
from sklearn.ensemble import RandomForestClassifier

# Train the Random Forest model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42, class_weight='balanced')
rf_model.fit(X_train, y_train)

# Predict on the training and test data
y_train_pred = rf_model.predict(X_train)
y_test_pred = rf_model.predict(X_test)

## 4. Evaluation:

In [11]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Calculate accuracy
train_accuracy = accuracy_score(y_train, y_train_pred)
test_accuracy = accuracy_score(y_test, y_test_pred)

print("=====================================")
print(f"Training Accuracy: {train_accuracy:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")
print("=====================================")
print("\nClassification Report (Test Data):")
print(classification_report(y_test, y_test_pred))
print("=====================================")
print("\nConfusion Matrix (Test Data):")
print(confusion_matrix(y_test, y_test_pred))
print("=====================================")

Training Accuracy: 0.9984
Test Accuracy: 0.7999

Classification Report (Test Data):
              precision    recall  f1-score   support

           0       0.83      0.92      0.87      1036
           1       0.67      0.47      0.56       373

    accuracy                           0.80      1409
   macro avg       0.75      0.70      0.71      1409
weighted avg       0.79      0.80      0.79      1409


Confusion Matrix (Test Data):
[[950  86]
 [196 177]]


## 5. Deploy:

+ Saving the Model and Scaler:

In [12]:
import joblib
joblib.dump(rf_model, 'rf_model.pkl')
joblib.dump(scaler, 'scaler.pkl')
joblib.dump(X.columns, 'train_columns.pkl')

['train_columns.pkl']

+ Try an example:

In [13]:
# Load the model, scaler, and training columns
model = joblib.load('rf_model.pkl')
scaler = joblib.load('scaler.pkl')
train_columns = joblib.load('train_columns.pkl')

# New data for prediction
new_data = {
    'gender': ['Female'],
    'SeniorCitizen': [0],
    'Partner': ['Yes'],
    'Dependents': ['No'],
    'tenure': [12],
    'PhoneService': ['Yes'],
    'MultipleLines': ['No'],
    'InternetService': ['DSL'],
    'OnlineSecurity': ['No'],
    'OnlineBackup': ['Yes'],
    'DeviceProtection': ['No'],
    'TechSupport': ['No'],
    'StreamingTV': ['No'],
    'StreamingMovies': ['No'],
    'Contract': ['Month-to-month'],
    'PaperlessBilling': ['Yes'],
    'PaymentMethod': ['Electronic check'],
    'MonthlyCharges': [90],
    'TotalCharges': [850.0]
}

# Convert the data to a DataFrame
df = pd.DataFrame(new_data)

# Apply One-Hot Encoding
df_encoded = pd.get_dummies(df)

# Align columns with training data
df_encoded = df_encoded.reindex(columns=train_columns, fill_value=0)

# Apply scaling
scaled_data = scaler.transform(df_encoded)

# Make a prediction
prediction = model.predict(scaled_data)

# Display the result
print(f"Prediction: {'Churned Customer' if prediction[0] == 1 else 'Non-Churned Customer'}")

Prediction: Non-Churned Customer


+ Creating a Gradio Interface

In [14]:
!pip install gradio



In [15]:
import gradio as gr
import pandas as pd
import joblib

# Load required components
model = joblib.load('rf_model.pkl')
scaler = joblib.load('scaler.pkl')
train_columns = joblib.load('train_columns.pkl')

# Define categorical and numerical features
categorical_features = {
    'gender': ["Female", "Male"],
    'Partner': ["Yes", "No"],
    'Dependents': ["Yes", "No"],
    'PhoneService': ["Yes", "No"],
    'MultipleLines': ["Yes", "No", "No phone service"],
    'InternetService': ["DSL", "Fiber optic", "No"],
    'OnlineSecurity': ["Yes", "No", "No internet service"],
    'OnlineBackup': ["Yes", "No", "No internet service"],
    'DeviceProtection': ["Yes", "No", "No internet service"],
    'TechSupport': ["Yes", "No", "No internet service"],
    'StreamingTV': ["Yes", "No", "No internet service"],
    'StreamingMovies': ["Yes", "No", "No internet service"],
    'Contract': ["Month-to-month", "One year", "Two year"],
    'PaperlessBilling': ["Yes", "No"],
    'PaymentMethod': ["Electronic check", "Mailed check", "Bank transfer (automatic)", "Credit card (automatic)"]
}

numerical_features = ['tenure', 'MonthlyCharges', 'TotalCharges']

def predict_churn(*args):
    # Prepare input data
    input_data = {}
    index = 0

    # Process categorical features
    for feature in categorical_features:
        input_data[feature] = args[index]
        index += 1

    # Process numerical features
    for feature in numerical_features:
        input_data[feature] = float(args[index])
        index += 1

    # Create DataFrame
    df = pd.DataFrame([input_data])

    # One-hot encoding
    df_encoded = pd.get_dummies(df)

    # Align columns with training data
    for col in train_columns:
        if col not in df_encoded.columns:
            df_encoded[col] = 0
    df_encoded = df_encoded[train_columns]

    # Apply scaling
    scaled_data = scaler.transform(df_encoded)

    # Make prediction
    prediction = model.predict(scaled_data)

    return "Churned Customer" if prediction[0] else "Retained Customer"

# Create interface components
inputs = []
for feature in categorical_features:
    inputs.append(gr.Dropdown(choices=categorical_features[feature], label=feature))

for feature in numerical_features:
    inputs.append(gr.Number(label=feature, value=0.0))

interface = gr.Interface(
    fn=predict_churn,
    inputs=inputs,
    flagging_mode="never",
    outputs="text",
    title="Customer Churn Prediction",
    description="Enter customer details to predict churn status"
)

# Launch the application
interface.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://038133b57503608d0a.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


