# Problem Statement

As someone aspiring to become a data scientist, I am working on a case study to address a critical challenge in the ride-sharing industry: recruiting and retaining drivers. Driver churn is a persistent issue, with many drivers leaving suddenly or switching to competitors based on changing rates.

In a growing organization, high churn rates can escalate into significant problems. While expanding hiring criteria to include individuals without vehicles can help bring in new drivers, it comes with high acquisition costs. Frequent driver turnover disrupts operations, impacts morale, and makes acquiring new drivers much more expensive than retaining the existing ones.

For this case study, I am working with monthly driver data from 2019 and 2020 to predict whether a driver is likely to leave the organization. The dataset includes key attributes such as:

- **Demographics**: Information like city, age, gender, etc.  
- **Tenure Information**: Joining date and last working date.  
- **Historical Performance Data**: Quarterly ratings, monthly business metrics, grades, and income.  

The objective is to analyze patterns, build a predictive model, and gain insights into how driver attrition can be reduced. This exercise not only enhances my technical skills but also provides valuable experience in solving real-world business problems.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import zscore

# Observations on Data

In [3]:
df=pd.read_csv('data.csv', usecols=lambda column: column != df.columns[0])
df

Unnamed: 0,MMM-YY,Driver_ID,Age,Gender,City,Education_Level,Income,Dateofjoining,LastWorkingDate,Joining Designation,Grade,Total Business Value,Quarterly Rating
0,01/01/19,1,28.0,0.0,C23,2,57387,24/12/18,,1,1,2381060,2
1,02/01/19,1,28.0,0.0,C23,2,57387,24/12/18,,1,1,-665480,2
2,03/01/19,1,28.0,0.0,C23,2,57387,24/12/18,03/11/19,1,1,0,2
3,11/01/20,2,31.0,0.0,C7,2,67016,11/06/20,,2,2,0,1
4,12/01/20,2,31.0,0.0,C7,2,67016,11/06/20,,2,2,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
19099,08/01/20,2788,30.0,0.0,C27,2,70254,06/08/20,,2,2,740280,3
19100,09/01/20,2788,30.0,0.0,C27,2,70254,06/08/20,,2,2,448370,3
19101,10/01/20,2788,30.0,0.0,C27,2,70254,06/08/20,,2,2,0,2
19102,11/01/20,2788,30.0,0.0,C27,2,70254,06/08/20,,2,2,200420,2
