<a href="https://colab.research.google.com/github/Stella-Achar-Oiro/Predicting-Customer-Churn/blob/main/Predicting_Customer_Churn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Stella Achar Oiro
### Follow Me on Social Media:

- [Twitter](https://twitter.com/Stella_Oiro)
- [GitHub](https://github.com/Stella-Achar-Oiro)
- [LinkedIn](https://www.linkedin.com/in/stella-achar-oiro/)


Predicting customer churn is a crucial task for telecom companies like Sprint, as retaining customers is often more cost-effective than acquiring new ones. To create a machine learning model that predicts customer churn, you can follow these steps:

## Step 1
**Data Collection and Preprocessing:**

**Gather historical data:** Collect relevant data on customer behavior, demographics, purchase history, customer service interactions, and past churn instances.


**Data cleaning:** Clean the dataset by handling missing values, outliers, and inconsistencies.


**Feature engineering:** Create new features or transform existing ones that might be informative for predicting churn, such as customer tenure, usage patterns, and customer feedback scores.



In [None]:
# Import necessary libraries
import pandas as pd

# Load the dataset
data = pd.read_csv('customer_churn_data.csv')

# Data cleaning
data.dropna(inplace=True)  # Handle missing values
data = data[~data['TotalCharges'].str.isspace()]  # Remove rows with empty TotalCharges
data['TotalCharges'] = pd.to_numeric(data['TotalCharges'])  # Convert TotalCharges to numeric


##Step 2
**Exploratory Data Analysis (EDA):**

**Analyze the dataset:** Conduct exploratory data analysis to gain insights into customer behavior, identify correlations, and understand patterns related to churn.


**Visualization:** Visualize data using charts and graphs to better understand relationships between variables.

In [None]:
# EDA - Get summary statistics
summary_stats = data.describe()

# Visualize churn distribution
import matplotlib.pyplot as plt
import seaborn as sns
sns.countplot(x='Churn', data=data)
plt.show()


## Step 3
**Data Splitting:**

Split the data into training and testing sets to assess the model's performance.

In [None]:
# Split the data into features and target variable
X = data.drop('Churn', axis=1)
y = data['Churn']

# Split the data into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


## Step 4
**Feature Selection:**

Identify and select the most relevant features that have the most significant impact on churn prediction.
Use feature importance from tree-based models, correlation analysis, or dimensionality reduction methods.

In [None]:
# Feature selection (example: using correlation)
correlation_matrix = data.corr()
relevant_features = correlation_matrix['Churn'].sort_values(ascending=False).index[:10]
X_train = X_train[relevant_features]
X_test = X_test[relevant_features]


## Step 5
**Model Selection:**

Choose appropriate machine learning algorithms for classification tasks. Common choices include Logistic Regression, Random Forest, Gradient Boosting, Support Vector Machines, and Neural Networks.
Experiment with multiple algorithms and evaluate their performance.

In [None]:
# Choose and initialize a machine learning algorithm
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()


## Step 6
**Model Training:**

Train the selected models using the training dataset. Tune hyperparameters for better model performance through techniques like grid search or random search.

In [None]:
# Train the model
model.fit(X_train, y_train)


##Step 7
**Model Evaluation:**

Evaluate the models' performance on the testing dataset using relevant metrics such as accuracy, precision, recall, F1-score, and ROC AUC.
Consider the business context and the cost of false positives and false negatives when selecting evaluation metrics.

In [None]:
# Evaluate the model
from sklearn.metrics import accuracy_score, classification_report
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
print(f'Accuracy: {accuracy}\nClassification Report:\n{report}')


## Step 8
Interpret the model's predictions to understand which features are driving churn predictions to understand the factors influencing customer decisions.

In [None]:
# Feature importances
feature_importances = pd.DataFrame({'Feature': X_train.columns, 'Importance': model.feature_importances_})
feature_importances = feature_importances.sort_values(by='Importance', ascending=False)
print(feature_importances)


## Step 9
Deploy the chosen model in a production environment where it can generate predictions for new customer data.
Implement automation for regular model retraining to keep it up to date.

In [None]:
### Deploy the chosen machine learning model using Docker and FastAPI
### Create FastAPI App
from fastapi import FastAPI
from pydantic import BaseModel
import joblib

# Load the trained model (replace 'model.pkl' with your actual model file)
model = joblib.load('model.pkl')

app = FastAPI()

class Item(BaseModel):
    feature1: float
    feature2: float

@app.post("/predict/")
async def predict(item: Item):
    # Prepare the input features for prediction
    input_features = [item.feature1, item.feature2]

    # Make predictions using the loaded model
    prediction = model.predict([input_features])

    return {"prediction": prediction[0]}


In [None]:
### Create a Dockerfile in the same directory as FastAPI app
# Use the official Python image
FROM python:3.9

# Set the working directory in the container
WORKDIR /app

# Copy the requirements file into the container and install dependencies
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

# Copy your FastAPI app script and model file into the container
COPY app.py app.py
COPY model.pkl model.pkl

# Expose port 8000 (or the port your FastAPI app is running on)
EXPOSE 8000

# Start the FastAPI app when the container is run
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]


In [None]:
###Create a Requirements File
### Create a requirements.txt file listing the Python dependencies required for the FastAPI app.
fastapi==0.68.1
uvicorn==0.15.0
joblib==1.0.1


In [None]:
###Build the Docker Image
### Open a terminal and navigate to the directory containing your Dockerfile, app.py, model.pkl, and requirements.txt files.
docker build -t my-fastapi-model .


In [None]:
### Run the Docker Container
docker run -d -p 8000:8000 my-fastapi-model


In [None]:
### Access the FastAPI App at http://localhost:8000/predict/


## Step 10
Continuously monitor the model's performance in real-world scenarios.
Collect feedback on false positives and false negatives from customer service teams or other relevant sources to improve the model over time.

## Step 11
Utilize the model's predictions to proactively identify high-risk customers.
Develop targeted retention strategies, such as personalized offers, discounts, or improved customer service, to reduce churn.

## Step 12
**A/B Testing:**

Implement A/B tests to evaluate the effectiveness of retention strategies and refine them based on real-world results.

## Step 13
**Documentation and Reporting:**

Maintain clear documentation of the entire model development process, including data preprocessing, feature engineering, and model training.
Provide regular reports to stakeholders with insights on churn predictions and the impact of retention strategies.

By following these steps and iterating on my model and strategies, I can create a robust machine learning model for predicting customer churn and implement effective retention efforts to minimize customer loss for Sprint.