### Customer Churn Prediction
#### Project Overview
This project aims to predict customer retention using machine learning models. By identifying patterns in customer behavior, businesses can implement strategies to reduce churn and improve customer loyalty.

In [33]:
#importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#### Data Loading

In [34]:
data=pd.read_csv("A:/10x/data-science-Project-vault/Data/Customertravel.csv")

In [35]:
data.head()

Unnamed: 0,Age,FrequentFlyer,AnnualIncomeClass,ServicesOpted,AccountSyncedToSocialMedia,BookedHotelOrNot,Target
0,34,No,Middle Income,6,No,Yes,0
1,34,Yes,Low Income,5,Yes,No,1
2,37,No,Middle Income,3,Yes,No,0
3,30,No,Middle Income,2,No,No,0
4,30,No,Low Income,1,No,No,0


In [39]:
data.shape

(954, 7)

#### Data Preprocessing
    Description: Clean and prepare the dataset. Handle missing values, encode categorical variables, and normalize numerical features.
    Tools : Pandas, Scikit-learn.

In [40]:
data.columns

Index(['Age', 'FrequentFlyer', 'AnnualIncomeClass', 'ServicesOpted',
       'AccountSyncedToSocialMedia', 'BookedHotelOrNot', 'Target'],
      dtype='object')

In [41]:
data.describe()

Unnamed: 0,Age,ServicesOpted,Target
count,954.0,954.0,954.0
mean,32.109015,2.437107,0.234801
std,3.337388,1.606233,0.424097
min,27.0,1.0,0.0
25%,30.0,1.0,0.0
50%,31.0,2.0,0.0
75%,35.0,4.0,0.0
max,38.0,6.0,1.0


In [42]:
# checking Missing Value
data.isnull().sum()

Age                           0
FrequentFlyer                 0
AnnualIncomeClass             0
ServicesOpted                 0
AccountSyncedToSocialMedia    0
BookedHotelOrNot              0
Target                        0
dtype: int64

In [43]:
#information about columns data type
data.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 954 entries, 0 to 953
Data columns (total 7 columns):
 #   Column                      Non-Null Count  Dtype 
---  ------                      --------------  ----- 
 0   Age                         954 non-null    int64 
 1   FrequentFlyer               954 non-null    object
 2   AnnualIncomeClass           954 non-null    object
 3   ServicesOpted               954 non-null    int64 
 4   AccountSyncedToSocialMedia  954 non-null    object
 5   BookedHotelOrNot            954 non-null    object
 6   Target                      954 non-null    int64 
dtypes: int64(3), object(4)
memory usage: 52.3+ KB


In [47]:
#checking all unique value in the given columns
unique_value_counts = {col: data[col].nunique() for col in data.columns}

# Print the unique value counts for each column
for col, count in unique_value_counts.items():
    if(count>1 and count<=3):
        print(f"Column '{col}' : {count} ")

Column 'FrequentFlyer' : 3 
Column 'AnnualIncomeClass' : 3 
Column 'AccountSyncedToSocialMedia' : 2 
Column 'BookedHotelOrNot' : 2 
Column 'Target' : 2 


So, I have 4 columns Which I can make it encode as Catagorical

In [31]:
# Encoding Catagorical value in our scenario FrequentFlyer (Yes, no value)
from sklearn.preprocessing import LabelEncoder
encoder=LabelEncoder()
data["Encode_freq_flyer"]=encoder.fit_transform(data["FrequentFlyer"])
data["Encode_anual_income_class"]=encoder.fit_transform(data["AnnualIncomeClass"])
data["Encode_BookedHotelOrNot"]=encoder.fit_transform(data["BookedHotelOrNot"])
data["Encode_AccountSyncedToSocialMedia"]=encoder.fit_transform(data["AccountSyncedToSocialMedia"])


In [32]:
data.head()

Unnamed: 0,Age,FrequentFlyer,AnnualIncomeClass,ServicesOpted,AccountSyncedToSocialMedia,BookedHotelOrNot,Target,Encode_freq_flyer,Encode_anual_income_class,Encode_BookedHotelOrNot,Encode_AccountSyncedToSocialMedia
0,34,No,Middle Income,6,No,Yes,0,0,2,1,0
1,34,Yes,Low Income,5,Yes,No,1,2,1,0,1
2,37,No,Middle Income,3,Yes,No,0,0,2,0,1
3,30,No,Middle Income,2,No,No,0,0,2,0,0
4,30,No,Low Income,1,No,No,0,0,1,0,0


#### Exploratory Data Analysis (EDA)
    Description: Analyze the dataset to understand correlations and patterns. Use visualizations to gain insights into key factors affecting churn.
    Tools: Matplotlib, Seaborn.

#### Feature Engineering
Description: Create new features that can enhance model performance. Consider interaction terms or derived metrics that capture customer behavior more effectively.
Tools: Pandas.

#### Model Training
Description: Split the data into training and testing sets. Train the chosen models on the training data.
Tools: Scikit-learn.

#### Model Evaluation
   - **Description**: Evaluate model performance using metrics such as accuracy, precision, recall, and F1-score. Use cross-validation for robust assessment.
   - **Tools**: Scikit-learn.

#### Hyperparameter Tuning
Description: Optimize model parameters to improve accuracy and generalization.
Tools: Scikit-learn's GridSearchCV or RandomizedSearchCV.

Conclusion and Insights
Description: Summarize the findings and suggest actionable insights for reducing churn based on model predictions.