ChurnSage: Advanced Customer Churn Prediction

ChurnSage is an AI-powered tool designed to predict customer churn with high accuracy. Leveraging a real dataset, the project involves extensive data cleaning, preprocessing, and model development using Python and PrismaML. The best performing model is deployed through an interactive Gradio interface, making it accessible and user-friendly.

Project Structure

ChurnSage
│
├── churn_predictor
│   ├── __init__.py
│   ├── churn_predictor.py
│   └── data_preparer.py
│
├── data
│   ├── actiavation_Aug_Oct2023.csv
│   ├── clean_churn_data.csv
│   └── processed_data.csv
│
├── flagged
│
├── models
│   └── knn_model.pkl
│
├── notebooks
│   ├── data_cleaning.ipynb
│   ├── model_preprocessing.ipynb
│   └── modeling.ipynb
│
├── variables
│   ├── le_distributer.pkl
│   ├── le_plan.pkl
│   ├── le_pos.pkl
│   ├── le_reason.pkl
│   └── scaler.pkl
│
├── app.py
├── LICENSE
├── poetry.lock
├── pyproject.toml
└── README.md

Notebooks Summary

Data Cleaning (`notebooks/data_cleaning.ipynb`)

The data_cleaning.ipynb notebook involves the following steps:

EDA (Exploratory Data Analysis)

Utilized the PrismaML DatasetInformation class to generate a comprehensive summary of the dataset, including basic statistics and data types.
Used PrismaML DatasetInformation class to analyze categorical variables.
Generated visualizations with the PrismaML Plotting class to better understand the distribution and relationships of categorical data.
Analyzed numerical variables using the PrismaML DatasetInformation class.
Created plots using the PrismaML Plotting class to explore distributions and identify potential outliers.

Data Cleaning

Data Suitability: Assessed the suitability of the data types with the data in the columns, identifying any issues that needed addressing.
Dropping Duplicates: Removed duplicate rows to ensure the dataset's integrity.
Changing Column Names: Standardized column names for consistency and clarity.
Column Value Cleaning: Cleaned and standardized column values, for example, changing tenure values to a consistent format:

tenure = {"Short": "Short-term", "Medium": "Medium-term", "Long": "Long-term"}

Filling Missing Data: Addressed missing data using multiple strategies:

Group By Mean/Median/Mode: Imputed missing values based on grouped statistics.
Using Data from Other Columns: Leveraged information from other columns for imputation.
Using Machine Learning: Applied machine learning techniques to predict and fill missing values.

Removing Unneeded Columns for Modeling: Dropped columns that were deemed unnecessary for the modeling process to streamline the dataset.

Model Preprocessing (`notebooks/model_preprocessing.ipynb`)

The model_preprocessing.ipynb notebook involves the following steps:

Encoding

Sklearn Label Encoder:
- Utilized sklearn's LabelEncoder to convert categorical variables into numerical format.
- Applied label encoding to columns with categorical data to facilitate model training.
Manual Label Encoding:
- Performed manual label encoding for specific columns that required custom encoding logic.
- Mapped categorical values to numerical codes for consistency.

Scaling

MinMaxScaler:
- Applied MinMaxScaler from sklearn to scale numerical features.
- Transformed data to a range of [0, 1] to normalize the feature values and improve model performance.

Modeling (`notebooks/modeling.ipynb`)

Select KBest

The section involves using the KBest SkLearn algorithm with KNN, RandomForest, ans SVM for building the models

Selecting the Features

PrismaML.MachineLearning.select_best_features(): This method selects the best features from the dataset based on their importance. It helps in reducing the dimensionality of the dataset by keeping only the most relevant features for the model.
PrismaML.MachineLearning.plot_accuracy_vs_features(): This method plots the model's accuracy against the number of selected features. It helps in visualizing the impact of different numbers of features on the model's performance.

Building the Model

PrismaML.MachineLearning.evaluate_model(): This method evaluates the model's performance using the selected features. It involves training the model, making predictions, and calculating performance metrics such as accuracy, precision, recall, and F1-score.

Comparing the Models

PrismaML.Plotting.plot_algorithm_comparison(): This method plots a comparison of the different models (KNN, Random Forest, SVM) based on their performance metrics. It helps in visualizing which model performs the best and under what conditions.

Without Feature Selection

In this section, we evaluate the models without performing feature selection. This helps in comparing the performance of models with and without feature selection.

Building the Model

PrismaML.MachineLearning.evaluate_model(): This method evaluates the KNN model's performance using all available features without any feature selection.

Comparing the Models

PrismaML.Plotting.plot_algorithm_comparison(): This method plots a comparison of the different models (KNN, Random Forest, SVM) based on their performance metrics. It helps in visualizing which model performs the best and under what conditions.

Gradio Interface Deployment

Overview

This section describes the setup and deployment of a Gradio interface for predicting customer churn using a trained KNN model. The interface collects user inputs and provides a prediction on whether a customer is likely to churn.

Files

app.py
churn_predictor.py
DataPreparer.py

app.py

This script sets up a Gradio interface to interact with the churn prediction model.

Model Loading: The script loads a pre-trained KNN model from a pickle file.
Prediction Function: model_prediction function takes in several inputs such as tenure, tenure_category, segment1, segment2, status, loyalty_points, and data_usage_tier, and processes them into a DataFrame.
Gradio Interface: The interface collects user inputs and maps them to the prediction function, then displays the prediction result.

DataPreparer.py

This script prepares the input data for the model by encoding categorical variables and scaling numerical variables.

Initialization: The class is initialized with a pandas DataFrame containing the input data.
Data Preparation: The prepare_input_data method encodes categorical variables using map_encode_columns and scales numerical data using scale_data.
Label Encoding: Encodes categorical columns by mapping string values to numerical values.
Scaling: Scales the input data using a MinMaxScaler loaded from a pickle file.
Utility Function: The load_from_pickle method is used to load pickled objects.

Usage

To run the project, follow these steps:

Clone the repository:

git clone https://github.com/yourusername/ChurnSage.git
cd ChurnSage

Install dependencies:

poetry install

Run the Jupyter notebooks in the notebooks directory to reproduce the data cleaning, preprocessing, and modeling steps.
Launch the Gradio interface:

python app.py

Contributing

Contributions are welcome! Please submit a pull request or open an issue for any bugs or feature requests.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChurnSage: Advanced Customer Churn Prediction

Project Structure

Notebooks Summary

Data Cleaning (`notebooks/data_cleaning.ipynb`)

EDA (Exploratory Data Analysis)

Data Cleaning

Model Preprocessing (`notebooks/model_preprocessing.ipynb`)

Encoding

Scaling

Modeling (`notebooks/modeling.ipynb`)

Select KBest

Selecting the Features

Building the Model

Comparing the Models

Without Feature Selection

Building the Model

Comparing the Models

Gradio Interface Deployment

Overview

Files

app.py

DataPreparer.py

Usage

Contributing

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
churn_predictor		churn_predictor
data		data
models		models
notebooks		notebooks
variables		variables
LICENSE		LICENSE
README.md		README.md
app.py		app.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

License

Yousinator/ChurnSage

Folders and files

Latest commit

History

Repository files navigation

ChurnSage: Advanced Customer Churn Prediction

Project Structure

Notebooks Summary

Data Cleaning (notebooks/data_cleaning.ipynb)

EDA (Exploratory Data Analysis)

Data Cleaning

Model Preprocessing (notebooks/model_preprocessing.ipynb)

Encoding

Scaling

Modeling (notebooks/modeling.ipynb)

Select KBest

Selecting the Features

Building the Model

Comparing the Models

Without Feature Selection

Building the Model

Comparing the Models

Gradio Interface Deployment

Overview

Files

app.py

DataPreparer.py

Usage

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Data Cleaning (`notebooks/data_cleaning.ipynb`)

Model Preprocessing (`notebooks/model_preprocessing.ipynb`)

Modeling (`notebooks/modeling.ipynb`)

Packages