Classification model (RandomForestClassifier)

Target

Churn: Whether the customer churned or not (Yes or No)

Description about the features/columns

Two numerical columns:

MonthlyCharges: The amount charged to the customer monthly
TotalCharges: The total amount charged to the customer

Eighteen categorical columns:

Feature name : Description

CustomerID : Customer ID unique for each customer
gender : Whether the customer is a male or a female
SeniorCitizen: Whether the customer is a senior citizen or not (1, 0)
Partner: Whether the customer has a partner or not (Yes, No)
Dependents: Whether the customer has dependents or not (Yes, No)
Tenure: Number of months the customer has stayed with the company
PhoneService: Whether the customer has a phone service or not (Yes, No)
MultipleLines: Whether the customer has multiple lines or not (Yes, No, No phone service)
InternetService: Customer’s internet service provider (DSL, Fiber optic, No)
OnlineSecurity: Whether the customer has online security or not (Yes, No, No internet service)
OnlineBackup: Whether the customer has an online backup or not (Yes, No, No internet service)
DeviceProtection: Whether the customer has device protection or not (Yes, No, No internet service)
TechSupport: Whether the customer has tech support or not (Yes, No, No internet service)
StreamingTV: Whether the customer has streaming TV or not (Yes, No, No internet service)
StreamingMovies: Whether the customer has streaming movies or not (Yes, No, No internet service)
Contract: The contract term of the customer (Month-to-month, One year, Two years)
PaperlessBilling: Whether the customer has paperless billing or not (Yes, No)
PaymentMethod: The customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))

Phase 1 (Explorotary Data Analysis)

Getting to know the dataset better

Shape
Null Values summary
Column datatypes aswell as a small amount of cleaning and preprocessing

Why did customers churn(main question)?

FILE_TAG: (EDA_1) This is where we try to visualize the features, and how they are related with our target. we also try to draw patters/trends from these features, aswell as concluding some basic theory on our main question Aswell as providing possible solution to what we conclude is part of the problem

FILE_TAG: (EDA_2) Here we take several groups of people

People who left and stayed in first 6 months
Loyal Customers We mainly apply the same metrics we applied in EDA_1 file, where we also compare different features with out target

Phase 2 (Training the model)

FILE_TAG: (Final_Train_Model)

Feature extraction (Recursive Feature Elimination (RFE) and SelectKBest)
Sampling Techniques (SMOTE/RandomOverSampler)
Data Splitting and training
Model Evaluation (confusion_matrix, ROC_AUC_Curve)

Phase 2 Confusion matrix

Our evaluation was Recall based (Lower False Negative)

[What is a confusion matrix?] : https://medium.com/analytics-vidhya/what-is-a-confusion-matrix-d1c0f8feda5

Recall = out of the total actually who churned , how many did the model actually manage to predict correctly

Precision = out of the total which the model predicted will leave , how many actually churned

If recall increases , precision decreases, and vice versa

Why focus on lowering False negatives ?

In our situation, we decided that lowering false negative values is more important than lowering false positive

Positive prediction (0) = no churn

Negative prediction (1) = churn

False Negative = predicted positve , while actual value was negative.

In simpler terms , a positive prediction means that the model predicited that a certain customer did not churn, but he actually did.

this is bad in a business situation.

False Positive = predicted negative , while actualy value was positive

In simpler terms , a negative prediction means that the model predicited that a certain customer churned, but he actually did not

this is digestable in a business situation and not as bad as the situation above

A higher recall mean lower false negatives. Recall was our main evaluation metric , while also taking into consideration a considerable f1 score(harmonic mean of precision and recall).

Threshold optimization (OPTIONAL)

You can even increase recall more by applying a high threshold to the predicted probabilities. This inturn will increase recall significantly , but a decreases precision aswell. Overall lower f1-score , but high precision

Phase 3 (Model Deployment)

** Under Construction **

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
exploratory_data_analysis		exploratory_data_analysis
model_training		model_training
README.md		README.md
Telco_Customer_Churn.csv		Telco_Customer_Churn.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classification model (RandomForestClassifier)

Target

Churn: Whether the customer churned or not (Yes or No)

Description about the features/columns

Two numerical columns:

Eighteen categorical columns:

Feature name : Description

Phase 1 (Explorotary Data Analysis)

Getting to know the dataset better

Why did customers churn(main question)?

Phase 2 (Training the model)

Phase 2 Confusion matrix

Why focus on lowering False negatives ?

Threshold optimization (OPTIONAL)

Phase 3 (Model Deployment)

About

Releases

Packages

Languages

BmHB0tcHi/churn_prediction

Folders and files

Latest commit

History

Repository files navigation

Classification model (RandomForestClassifier)

Target

Churn: Whether the customer churned or not (Yes or No)

Description about the features/columns

Two numerical columns:

Eighteen categorical columns:

Feature name : Description

Phase 1 (Explorotary Data Analysis)

Getting to know the dataset better

Why did customers churn(main question)?

Phase 2 (Training the model)

Phase 2 Confusion matrix

Why focus on lowering False negatives ?

Threshold optimization (OPTIONAL)

Phase 3 (Model Deployment)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages