# Customer Churn Prediction Model

This notebook implements a machine learning model to predict customer churn for a subscription service.

## Methodology

We use a Random Forest classifier with the following features:
- Customer tenure (months)
- Monthly charges
- Total charges
- Contract type
- Payment method

## Expected Accuracy

The model achieves approximately 85% accuracy on the test set.

In [None]:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

def load_customer_data(filepath):
    """Load customer data from CSV file"""
    return pd.read_csv(filepath)

def preprocess_features(df):
    """Preprocess and engineer features for the model"""
    df['tenure_group'] = pd.cut(df['tenure'], bins=[0, 12, 24, 60, 100], labels=['0-1 year', '1-2 years', '2-5 years', '5+ years'])
    df = pd.get_dummies(df, columns=['contract_type', 'payment_method', 'tenure_group'])
    return df

def train_churn_model(X_train, y_train):
    """Train Random Forest model for churn prediction"""
    model = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)
    model.fit(X_train, y_train)
    return model

def evaluate_model(model, X_test, y_test):
    """Evaluate model performance on test set"""
    predictions = model.predict(X_test)
    accuracy = accuracy_score(y_test, predictions)
    report = classification_report(y_test, predictions)
    return accuracy, report

## Data Loading and Preprocessing

We load the customer data and perform feature engineering to prepare it for the model.

In [None]:
# Load data
df = load_customer_data('customer_data.csv')

# Preprocess
df_processed = preprocess_features(df)

# Split features and target
X = df_processed.drop('churn', axis=1)
y = df_processed['churn']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Model Training

Train the Random Forest classifier on the training data.

In [None]:
# Train model
model = train_churn_model(X_train, y_train)

# Evaluate
accuracy, report = evaluate_model(model, X_test, y_test)

print(f'Model Accuracy: {accuracy:.2%}')
print('\nClassification Report:')
print(report)