# Project Report

# Customer Churn Prediction Report

## Project Overview

This project aims to predict customer churn for a telecommunication company. Customer churn prediction involves identifying customers likely to discontinue the company's services. By leveraging machine learning models, the aim is to predict churn based on customer tenure, gender, Monthly charges, Total charges and service usage patterns.

## Models Used

Several machine learning models were applied to predict customer churn:

1. **Logistic Regression**: A linear model appropriate for binary classification tasks.
2. **Support Vector Classifier (SVC)**: A robust classifier that finds the optimal hyperplane to separate different classes.
3. **Decision Tree Classifier**: A non-linear model that splits data into branches based on feature values.
4. **K-Nearest Neighbors (KNN) Classifier**: A classifier that categorizes data points based on the majority class among the k-nearest neighbors.

## Prediction Modes Used

Several machine learning models were applied to predict customer churn:

1. **precision**: Indicates the proportion of correctly predicted churns out of all predicted churns.
2. **accuracy**: Measures the proportion of correctly predicted instances out of the total instances.
3. **recall**: Reflects the proportion of actual churns that were correctly predicted.
4. **F1-Score**: The harmonic mean of precision and recall, balancing both metrics to give a single performance score.

## Exploratory Data Analysis (EDA)


### Initial Insights

- The dataset includes customer age, tenure, gender, partner, monthly charges, total charges and service usage features.
- The target variable is `Churn`, indicating whether a customer has churned (1) or not (0).


### Data Cleaning

- The `TotalCharges` feature, initially an object data type, was converted to a float.
- Outliers were detected using the Interquartile Range (IQR) method.


### Feature Distribution

- **Numerical Features**: Analysis indicated that higher `TotalCharges` and longer `tenure` are linked to lower churn rates.
- **Categorical Features**: Examined various service-related features in relation to churn.


### Categorical Feature Insights

- **Gender**:
             From the analysis of gender in the project, the insight obtained is that there is no significant difference in churn rates between male and female customers. This suggests that gender is not a strong predictor of whether a customer will churn in this dataset.
             
- **AGE**:
          The analysis indicate that, Older customers are less likely to churn compared to younger customers. Age appears to have an inverse relationship with churn, where younger customers have a higher probability of churning.
          
- **Partner and Dependents**:
                             The analysis shows that, Customers without partners are more likely to churn than those with partners. Similarly, customers without dependents are more likely to churn compared to those with dependents.
                             
- **Phone and Internet Services**:
                                  The analysis shows that, Customers using phone services with singular line are more likely to churn than those of customers with multiple lines. Customers using internet sevices with fibre optics are more likely to churn than are the ones with DSL and even the customers with no internet services are less likely to churn than the other two.
                                  
- **Billing and Payment Methods**:
                                  The simple analysis of the feature shows that, customers with paperless billing method are more likely to churn. Customers with electronic checkare more likely to churn followed by Mailed check, followed by bank transfer(automatic), credit card (automatic) have no significant difference in their churn rates.
                                  

### Target Variable Distribution

- The dataset showed a significant imbalance, with many more non-churned customers than churned ones.
- To address this, techniques such as resampling, gathering more data, and using precision and recall as evaluation metrics were considered.


## Feature Engineering


### Data Transformation

- **Dropped customerID**: This feature was not necessary for modeling.
- **One Hot Encoding**: Applied to categorical features to convert them into a numerical format suitable for machine learning.
- **Feature Scaling**: Standardized numerical features to ensure equal contribution to model performance.


### Feature Selection

- Relevant features were selected based on their importance and correlation with the target variable.


## Model Evaluation


### Logistic Regression

- **Metrics**: Accuracy, Precision, Recall, and F1 Score.
- **Performance**: Served as a baseline for comparison with other models.


### Support Vector Classifier (SVC)

- **Metrics**: Similar metrics as logistic regression.
- **Performance**: Generally effective with high-dimensional data, though computationally intensive.


### Decision Tree Classifier

- **Metrics**: Accuracy, Precision, Recall, and F1 Score.
- **Performance**: Easily interpretable but can overfit.


### K-Nearest Neighbors (KNN) Classifier

- **Metrics**: Same metrics as above.
- **Performance**: Dependent on the choice of `k` and feature scaling, performs well with appropriate tuning.


## Challenges Faced


1. **Imbalanced Dataset**: The notable imbalance between churned and non-churned customers posed a challenge. Addressing this involved using various evaluation metrics and resampling techniques.
2. **Feature Scaling and Transformation**: Ensuring proper scaling and transformation of features for model compatibility was essential.
3. **Model Selection and Tuning**: Selecting the best model and fine-tuning hyperparameters required extensive experimentation and validation.

## Conclusion

This project successfully implemented several machine learning models to predict telecommunication customer churn. Despite challenges such as data imbalance and feature transformation, the models provided valuable insights and a solid foundation for further optimization. 