Skip to content

Developed an end-to-end machine learning solution for predicting employee churn using Azure Databricks, leveraging Spark for data processing, MLflow for managing the ML workflow, and deploying the model using Databricks model serving.

License

Notifications You must be signed in to change notification settings

Abhi0323/End-to-End-Employee-Churn-Prediction-with-Azure-Databricks

Repository files navigation

End-to-End-Employee-Churn-Prediction-with-Azure-Databricks

Introduction

This project addresses the critical issue of employee churn, which can lead to substantial costs and disruptions for organizations. By leveraging Azure Databricks, Spark, Scikit-Learn, MLflow, and Hugging Face Spaces, I developed a robust and scalable machine learning solution to predict employee churn and help organizations retain top talent.

ezgif com-animated-gif-maker (6)

Project Overview

The project consists of the following key stages:

1. Data Ingestion and Preparation:

  • Loaded and processed employee churn data using Azure Databricks
  • Leveraged Spark's powerful data processing capabilities to handle large datasets efficiently.

2. Initial Data Exploration with Spark SQL:

  • Created a temporary view of the data and ran SQL queries.
  • Gained insights into key metrics like average satisfaction level and total churn count.

3. Data Preprocessing with PySpark:

  • Handled missing values, encoded categorical variables using StringIndexer and OneHotEncoder.
  • Assembled feature vectors with VectorAssembler to prepare the data for machine learning.
Screenshot 2024-05-19 at 6 38 06 PM

4. Machine Learning Model Building with Scikit-Learn:

  • Converted the processed Spark DataFrame to a Pandas DataFrame for model building.
  • Experimented with various models including Logistic Regression, Decision Trees, Random Forests, and Gradient Boosting.
  • Performed hyperparameter tuning with GridSearchCV for the best model performance.
  • Evaluated models using accuracy, ROC-AUC score, and confusion matrix, with the Random Forest model showing the best results.

5. Model Management with MLflow:

  • Integrated MLflow to track experiments, log metrics, and manage model versions.
  • Registered the best model in the MLflow Model Registry for version control and easy deployment.
Screenshot 2024-05-19 at 5 50 00 PM Screenshot 2024-05-19 at 6 44 14 PM

6. Model Serving with Azure Databricks:

  • Seamlessly deployed the model using Azure Databricks' model serving capabilities.
  • Provided a real-time inference endpoint to integrate with HR systems for real-time churn predictions.
Screenshot 2024-05-19 at 5 52 34 PM

7. Application Deployment on Hugging Face Spaces:

  • Developed an interactive application using Streamlit, deployed on Hugging Face Spaces.
  • Allowed users to input employee data and receive churn predictions in real-time.
  • Demonstrated the model's scalability and low latency.
Screenshot 2024-05-19 at 6 32 31 PM

8. Performance Monitoring and Scalability Testing:

  • Monitored the model's performance using Azure Databricks' comprehensive monitoring tools.
  • Ensured the model could handle high traffic and deliver efficient predictions.
Screenshot 2024-05-19 at 6 34 50 PM

9. MLOps for Seamless Model Management and Deployment:

  • Employed robust MLOps practices with Azure Databricks and MLflow.
  • Streamlined the deployment process, eliminating the need for extensive operational infrastructure.
  • Enabled continuous model management, monitoring, and updating to maintain high performance.

Conclusion

This project demonstrates a comprehensive and efficient approach to predicting employee churn, highlighting the integration of advanced data analytics and machine learning tools. The final application provides valuable insights for organizations to retain their best-performing employees, showcasing the potential of MLOps in streamlining model deployment and management.

About

Developed an end-to-end machine learning solution for predicting employee churn using Azure Databricks, leveraging Spark for data processing, MLflow for managing the ML workflow, and deploying the model using Databricks model serving.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published