# **Project Name**    -EDA Capstone Project 



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual
##### **NAME**            - PRODDUTURI SAI KUMAR


# **Project Summary -**

 The goal of this project is to predict customer churn for a telecommunications company. Customer churn refers to the rate at which customers leave a service. In this project, I will use a dataset containing customer demographics and service usage data to predict the likelihood of customer churn. The dataset includes information such as customer age, contract type, monthly charges, and tenure, among other variables. The objective is to understand key factors influencing customer churn and provide actionable insights to the business to reduce it.
To achieve this, I will perform Exploratory Data Analysis (EDA) to uncover patterns and relationships between features. Afterward, I will build machine learning models to classify whether a customer will churn or not. Different classification algorithms will be tested, including Logistic Regression, Decision Trees, and Random Forests. The performance of these models will be evaluated based on accuracy, precision, recall, and F1-score.
By the end of the project, actionable recommendations will be provided to the business, helping them understand what influences churn and how they can improve customer retention strategies.


Provide your GitHub Link here.https://github.com/Saikumar778542/Labmentix

# **Problem Statement**


**Write Problem Statement Here.**
Customer churn is a critical problem for businesses, especially in the telecommunications industry. A high churn rate leads to revenue loss, increased marketing costs, and a negative impact on the brand. By predicting which customers are likely to churn, businesses can take proactive measures to retain valuable customers and improve overall profitability. In this project, the business objective is to build a predictive model that accurately identifies customers at risk of churn, allowing for targeted retention strategies.

### Import Libraries

In [None]:
# Import Libraries# Importing essential libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier


### Dataset Loading


df = pd.read_csv('customer_churn.csv')


### Dataset First View

In [None]:
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count  
df.shape


### Dataset Information

In [None]:
# Dataset information 

df.info()


#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count 

df.duplicated().sum()


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count 

df.isnull().sum()

In [None]:
# Visualizing the missing values
df.isnull()

### What did you know about your dataset?

The dataset contains various features such as customer demographics, subscription plans, and usage information. There are missing values in some columns, and some columns may need transformation for better usability. There are also some duplicate rows that need to be handled.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns 

df.columns


In [None]:
# Dataset Describe 

df.describe()


### Variables Description

The dataset includes the following columns:  
CustomerID: Unique identifier for the customer.
Gender: Gender of the customer (Male/Female).
SeniorCitizen: Whether the customer is a senior citizen (1/0).
Partner: Whether the customer has a partner (Yes/No).
Dependents: Whether the customer has dependents (Yes/No).
Tenure: Number of months the customer has been with the company.
PhoneService: Whether the customer has phone service (Yes/No).
MultipleLines: Whether the customer has multiple lines (Yes/No).
InternetService: The type of internet service the customer has (DSL/Fiber optic/No).
OnlineSecurity: Whether the customer has online security (Yes/No).
OnlineBackup: Whether the customer has online backup (Yes/No).
DeviceProtection: Whether the customer has device protection (Yes/No).
TechSupport: Whether the customer has tech support (Yes/No).
StreamingTV: Whether the customer has streaming TV (Yes/No).
StreamingMovies: Whether the customer has streaming movies (Yes/No).
Contract: Type of contract the customer has (Month-to-month/One year/Two year).
PaperlessBilling: Whether the customer has paperless billing (Yes/No).
PaymentMethod: Payment method used by the customer (Electronic check/Mailed check/Bank transfer/credit card).
MonthlyCharges: The monthly charge for the service.
TotalCharges: Total charges incurred by the customer.
Churn: Whether the customer has churned (Yes/No).

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

df.nunique()


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df.drop_duplicates(inplace=True)
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
df.fillna(df.mean(), inplace=True)


### What all manipulations have you done and insights you found?

Answer Here.
I handled missing values, converted incorrect data types (e.g., converting "TotalCharges" to numeric), and removed duplicate entries. I also identified columns with categorical values and encoded them for machine learning.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

sns.histplot(df['MonthlyCharges'], kde=True)


##### 1. Why did you pick the specific chart?

Answer Here.
This histogram shows the distribution of monthly charges among customers.

##### 2. What is/are the insight(s) found from the chart?

Answer Here
Most customers are concentrated in the lower range of monthly charges, with a few customers paying significantly higher amounts.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here
Yes, the business can target customers with higher charges for retention programs or identify factors leading to higher charges.

#### Chart - 2

In [None]:
# Chart - 2 visualization code 

sns.countplot(x='Gender', hue='Churn', data=df)


##### 1. Why did you pick the specific chart?

Answer Here.
This bar plot compares churn rates between male and female customers.

##### 2. What is/are the insight(s) found from the chart?

Answer Here 
The churn rate is slightly higher among female customers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here  
Yes, targeted retention strategies for female customers could help lower churn rates.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
sns.countplot(x='Contract', hue='Churn', data=df)

##### 1. Why did you pick the specific chart?

Answer Here.
This bar plot shows churn rates by contract type.

##### 2. What is/are the insight(s) found from the chart?

Answer Here
Customers with month-to-month contracts have a higher churn rate.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here
Yes, the business can target customers with month-to-month contracts with retention incentives to reduce churn.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
corr_matrix = data.corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')

##### 1. Why did you pick the specific chart?

Answer Here. (e.g., a heatmap is a great way to visualize the correlation between numerical variables).

##### 2. What is/are the insight(s) found from the chart?

Answer here (e.g., a strong negative correlation between variable X and Y).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here (e.g., understanding distribution helps identify the most common values for inventory management).

#### Chart - 5

In [None]:
# Chart - 5 visualization code
corr_matrix = data.corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')

##### 1. Why did you pick the specific chart?

Answer Here.(e.g., a heatmap is a great way to visualize the correlation between numerical variables).

##### 2. What is/are the insight(s) found from the chart?

Answer Here  (e.g., a strong negative correlation between variable X and Y).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here (e.g., understanding distribution helps identify the most common values for inventory management).

#### Chart - 6

In [None]:
# Chart - 6 visualization code
sns.histplot(data['column_name'], kde=True)

##### 1. Why did you pick the specific chart?

Answer Here. (e.g., this histogram helps visualize the distribution of the variable).

##### 2. What is/are the insight(s) found from the chart?

Answer here (e.g., the variable is normally distributed).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer here (e.g., understanding distribution helps identify the most common values for inventory management).

#### Chart - 7

In [None]:
# Chart - 7 visualization code
sns.histplot(data['column_name'], kde=True)

##### 1. Why did you pick the specific chart?

Answer here (e.g., this histogram helps visualize the distribution of the variable).

##### 2. What is/are the insight(s) found from the chart?

Answer here (e.g., the variable is normally distributed).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer here (e.g., understanding distribution helps identify the most common values for inventory management).

#### Chart - 8

In [None]:
# Chart - 8 visualization code
sns.histplot(data['column_name'], kde=True)

##### 1. Why did you pick the specific chart?

Answer here (e.g., this histogram helps visualize the distribution of the variable).

##### 2. What is/are the insight(s) found from the chart?

Answer here (e.g., the variable is normally distributed).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer here (e.g., understanding distribution helps identify the most common values for inventory management).

#### Chart - 9

In [None]:
# Chart - 9 visualization code
sns.histplot(data['column_name'], kde=True)

##### 1. Why did you pick the specific chart?

Answer here (e.g., this histogram helps visualize the distribution of the variable).

##### 2. What is/are the insight(s) found from the chart?

Answer here (e.g., the variable is normally distributed).Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer here (e.g., understanding distribution helps identify the most common values for inventory management).Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code
sns.histplot(data['column_name'], kde=True)

##### 1. Why did you pick the specific chart?

Answer here (e.g., this histogram helps visualize the distribution of the variable).

##### 2. What is/are the insight(s) found from the chart?

Answer here (e.g., the variable is normally distributed).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer here (e.g., understanding distribution helps identify the most common values for inventory management).

#### Chart - 11

In [None]:
# Chart - 11 visualization code
sns.histplot(data['column_name'], kde=True)

##### 1. Why did you pick the specific chart?

Answer here (e.g., this histogram helps visualize the distribution of the variable).

##### 2. What is/are the insight(s) found from the chart?

Answer here (e.g., the variable is normally distributed).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer here (e.g., understanding distribution helps identify the most common values for inventory management).

#### Chart - 12

In [None]:
# Chart - 12 visualization code
sns.histplot(data['column_name'], kde=True)

##### 1. Why did you pick the specific chart?

Answer here (e.g., this histogram helps visualize the distribution of the variable).

##### 2. What is/are the insight(s) found from the chart?

Answer here (e.g., the variable is normally distributed).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer here (e.g., understanding distribution helps identify the most common values for inventory management).

#### Chart - 13

In [None]:
# Chart - 13 visualization code
sns.histplot(data['column_name'], kde=True)

##### 1. Why did you pick the specific chart?

.Answer here (e.g., this histogram helps visualize the distribution of the variable).

##### 2. What is/are the insight(s) found from the chart?

Answer here (e.g., the variable is normally distributed).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer here (e.g., understanding distribution helps identify the most common values for inventory management).

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
sns.histplot(data['column_name'], kde=True)

##### 1. Why did you pick the specific chart?

Answer here (e.g., this histogram helps visualize the distribution of the variable).

##### 2. What is/are the insight(s) found from the chart?

Answer here (e.g., the variable is normally distributed).

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.histplot(data['column_name'], kde=True)

##### 1. Why did you pick the specific chart?

Answer here (e.g., this histogram helps visualize the distribution of the variable).

##### 2. What is/are the insight(s) found from the chart?

Answer here (e.g., the variable is normally distributed).

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.by providing actionable insights based on the EDA findings (e.g., focus on improving customer retention in specific regions based on patterns found in the data).

# **Conclusion**

Write the conclusion here.
Summarize the key insights and conclusions from your analysis. Highlight how the findings relate to achieving the business objective and suggest possible next steps.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***

This template guides you through each step of the process, from data exploration to providing insights for business decisions. You can replace the placeholders with specific details as you progress with the analysis.