Predict customer behavior to retain customers effectively using advanced machine learning techniques and MLOps practices. This project aims to analyze relevant customer data and develop focused customer retention programs.
Welcome to our Customer Churn Prediction project! In this end-to-end MLOps-driven endeavor, we've built a robust customer churn prediction model that enables businesses to anticipate and mitigate customer churn effectively. Our project covers everything from data collection to model deployment.
Key features:
- Comprehensive data analysis and preprocessing
- Machine learning model training for customer churn prediction
- MLOps integration for seamless model deployment
- Insights and recommendations for customer retention strategies
We used the IBM Sample Data Sets for this project, which includes information about customers who left within the last month, the services they signed up for, customer account information, and demographic details.
Here are some highlights:
- Churn: Indicates whether a customer left within the last month.
- Services: Details about the services each customer has signed up for.
- Customer Account: Information about contract, payment method, billing preferences, and more.
- Demographics: Gender, age range, and family status of customers.
To set up this project on your local machine, follow these steps:
- Clone the repository:
git clone https://github.com/your-username/your-project.git
- Install the required dependencies:
pip install -r requirements.txt
Let's jump into the Python packages you need. Within the Python environment of your choice, run:
git clone https://github.com/zenml-io/zenml-projects.git
cd zenml-projects/customer-satisfaction
pip install -r requirements.txt
Starting with ZenML 0.20.0, ZenML comes bundled with a React-based dashboard. This dashboard allows you to observe your stacks, stack components and pipeline DAGs in a dashboard interface. To access this, you need to launch the ZenML Server and Dashboard locally, but first you must install the optional dependencies for the ZenML server:
pip install zenml["server"]
zenml up
If you are running the run_deployment.py
script, you will also need to install some integrations using ZenML:
zenml integration install mlflow -y
The project can only be executed with a ZenML stack that has an MLflow experiment tracker and model deployer as a component. Configuring a new stack with the two components are as follows:
zenml integration install mlflow -y
zenml experiment-tracker register mlflow_tracker --flavor=mlflow
zenml model-deployer register mlflow --flavor=mlflow
zenml stack register mlflow_stack -a default -o default -d mlflow -e mlflow_tracker --set
Our standard training pipeline consists of several steps:
ingest_data
: This step will ingest the data and create aDataFrame
.clean_data
: This step will clean the data and remove the unwanted columns.train_model
: This step will train the model and save the model usingevaluation
: This step will evaluate the model and save the metrics -- using MLflow autologging -- into the artifact store.
We have another pipeline, the deployment_pipeline.py
, that extends the training pipeline, and implements a continuous deployment workflow. It ingests and processes input data, trains a model and then (re)deploys the prediction server that serves the model if it meets our evaluation criteria.
deployment_trigger
: The step checks whether the newly trained model meets the criteria set for deployment.model_deployer
: This step deploys the model as a service using MLflow (if deployment criteria is met).
In the deployment pipeline, ZenML's MLflow tracking integration is used for logging the hyperparameter values and the trained model itself and the model evaluation metrics -- as MLflow experiment tracking artifacts -- into the local MLflow backend. This pipeline also launches a local MLflow deployment server to serve the latest MLflow model if its accuracy is above a configured threshold.
The MLflow deployment server runs locally as a daemon process that will continue to run in the background after the example execution is complete. When a new pipeline is run which produces a model that passes the accuracy threshold validation, the pipeline automatically updates the currently running MLflow deployment server to serve the new model instead of the old one.
To round it off, we deploy a Streamlit application that consumes the latest model service asynchronously from the pipeline logic. This can be done easily with ZenML within the Streamlit code:
service = prediction_service_loader(
pipeline_name="continuous_deployment_pipeline",
pipeline_step_name="mlflow_model_deployer_step",
running=False,
)
...
service.predict(...) # Predict on incoming data from the application
While this ZenML Project trains and deploys a model locally, other ZenML integrations such as the Seldon deployer can also be used in a similar manner to deploy the model in a more production setting (such as on a Kubernetes cluster). We use MLflow here for the convenience of its local deployment.
You can run two pipelines as follows:
- Training pipeline:
python run_pipeline.py
- The continuous deployment pipeline:
python run_deployment.py
There is a live demo of this project using Streamlit which you can find here. It takes some input features for the product and predicts the customer satisfaction rate using the latest trained models. If you want to run this Streamlit app in your local system, you can run the following command:-
streamlit run streamlit_app.py
-
When running the continuous deployment pipeline, I get an error stating:
No Step found for the name mlflow_deployer
.Solution: It happens because your artifact store is overridden after running the continuous deployment pipeline. So, you need to delete the artifact store and rerun the pipeline. You can get the location of the artifact store by running the following command:
zenml artifact-store describe
and then you can delete the artifact store with the following command:
Note: This is a dangerous / destructive command! Please enter your path carefully, otherwise it may delete other folders from your computer.
rm -rf PATH
-
When running the continuous deployment pipeline, I get the following error:
No Environment component with name mlflow is currently registered.
Solution: You forgot to install the MLflow integration in your ZenML environment. So, you need to install the MLflow integration by running the following command:
zenml integration install mlflow -y
-
When running the prediction model with this command python run_deployment.py --config predict
you will encounter this error
Server daemon is not working.
Solution: run this command: zenml disconnect