# Customer Churn Analysis
Context
The leading telecom company has a massive market share but one big problem: several rivals that are constantly trying to steal customers.  Because this company has been the market leader for so many years, there are not significant opportunities to grow with new customers.  Instead, company executives have decided to focus on their churn: the rate at which they lose customers.

They have two teams especially interested in this data: the marketing team and the customer service team.  Each team has its own reason for wanting the analysis. The marketing team wants to find out who the most likely people to churn are and create content that suits their interests.  The customer service team would like to proactively reach out to customers who are about to churn, and try to encourage them to stay.

They decide to hire you for two tasks:
Help them identify the types of customers who churn
Predict who of their current customers will churn next month

To do this, they offer you a file of 7,000 customers. Each row is a customer.  The Churn column will say Yes if the customer churned in the past month.  The data also offers demographic data and data on the services that each customer purchases.  Finally there is information on the payments those customers make.

# Deliverables - What is expected

# Week 1
A presentation explaining churn for the marketing team - with links to technical aspects of your work. Tell a story to the marketing team to help them understand the customers who churn and what the marketing team can do to prevent it.  Highlight the information with helpful visualizations.

1- How much is churn affecting the business? How big is churn compared to the existing customer base?

2- Explain churn by the below categories. Are there any factors that combine to be especially impactful?

a- Customer demographics like age and gender
b- Services used
c- Billing information

3- What services are typically purchased by customers who churned? Are any services especially helpful in retaining customers?

4- Bonus! How long will it take for the company to lose all its customers?  Which demographics will they lose first?


# import all libraries need

In [3]:
#Import the data: 
import pandas as pd
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import rcParams
import seaborn as sns

# Data Prepocessing #

In [2]:
#read file
url=
df=pd.read_csv('datasets_13996_18858_WA_Fn-UseC_-Telco-Customer-Churn.csv')
df.head()

FileNotFoundError: [Errno 2] File datasets_13996_18858_WA_Fn-UseC_-Telco-Customer-Churn.csv does not exist: 'datasets_13996_18858_WA_Fn-UseC_-Telco-Customer-Churn.csv'

In [None]:
df.info()

In [None]:
df.nunique()

In [None]:
df.shape

In [None]:
df.isnull().sum(axis=0)

In [None]:
df.boxplot()

In [None]:
df.Churn

In [None]:
df.Churn.value_counts(normalize=True)

In [None]:
df.dtypes

In [None]:
df.dropna(subset=['TotalCharges'],axis=0,inplace=True)

# Data Analysis #

In [None]:
dum_churn = pd.get_dummies(df['Churn'])

In [None]:
dum_churn

# How much is churn affecting the business?

In [None]:
# How much is churn affecting the business
dum_churn['Yes'].value_counts().to_frame()

In [None]:
# As we can see, there are 1869 people who left the company

In [None]:
display(df.groupby(['Churn']).size())
colors = ["#BDFCC9","#FFDEAD"]
ax = (df['Churn'].value_counts()*100.0 /len(df))\
.plot.pie(autopct='%.1f%%', colors=colors, labels = ['No', 'Yes'],figsize =(4,4), fontsize = 12 )    
#ax.yaxis.set_major_formatter(mtick.PercentFormatter())
ax.set_ylabel('Churn',fontsize = 12)
ax.set_title('Percent of Churn', fontsize = 12)


In [None]:
# They are 26.6 percent of churn 

In [None]:
df.describe()

# Customer demographics like age and gender

# Gender #

In [None]:
gender = df.groupby(['SeniorCitizen','gender','Churn']).size().to_frame()
gender

In [None]:
ax = sns.countplot (x="gender",hue="Churn",data=df)
ax.set_title("Churn distribution by gender")

# SeniorCitizen #

In [None]:
ax = sns.countplot (x="SeniorCitizen",hue="Churn",data=df)
ax.set_title("Churn distribution by SeniorCitizen")

# Partner #

In [None]:
ax = sns.countplot (x="Partner",hue="Churn",data=df)
ax.set_title("Churn distribution by Partner")

# Dependents #

In [None]:
ax = sns.countplot (x="Dependents",hue="Churn",data=df)
ax.set_title("Churn distribution by Dependents")

# The sevices used

# PhoneService #

In [None]:
ax = sns.countplot (x="PhoneService",hue="Churn",data=df)
ax.set_title("Churn distribution by PhoneService")

# InternetService

In [None]:
ax = sns.countplot (x="InternetService",hue="Churn",data=df)
ax.set_title("Churn distribution by InternetService")

 # TechSupport

In [None]:
ax = sns.countplot (x="TechSupport",hue="Churn",data=df)
ax.set_title("Churn distribution by TechSupport")

# StreamingTV

In [None]:
ax = sns.countplot (x="StreamingTV",hue="Churn",data=df)
ax.set_title("Churn distribution by StreamingTV")

# StreamingMovies #

In [None]:
ax = sns.countplot (x="StreamingMovies",hue="Churn",data=df)
ax.set_title("Churn distribution by StreamingMovies")

# OnlineSecurity#

In [None]:
ax = sns.countplot (x="OnlineSecurity",hue="Churn",data=df)
ax.set_title("Churn distribution by OnlineSecuritys")

# OnlineBackup #

In [None]:
ax = sns.countplot (x="OnlineBackup",hue="Churn",data=df)
ax.set_title("Churn distribution by OnlineBackup")

# MultipleLines #

In [None]:
ax = sns.countplot (x="MultipleLines",hue="Churn",data=df)
ax.set_title("Churn distribution by MultipleLines")

# DeviceProtection

In [None]:
ax = sns.countplot (x="DeviceProtection",hue="Churn",data=df)
ax.set_title("Churn distribution by DeviceProtection")

# Billing information


 # Contrat

In [None]:
ax = sns.countplot (x="Contract",hue="Churn",data=df)
ax.set_title("Churn distribution by Contract")

# PaperlessBilling

In [None]:
ax = sns.countplot (x="PaperlessBilling",hue="Churn",data=df)
ax.set_title("Churn distribution by PaperlessBilling")

# PaymentMethod

In [None]:
ax = sns.countplot (x="PaymentMethod",hue="Churn",data=df)
ax.set_title("Churn distribution by PaymentMethod")

# What services are typically purchased by customers who churned?

In [None]:
df.Churn.replace(to_replace = {'Yes'==1,'No==0'}, inplace=True)

In [None]:
cols =df.columns
cols = list(cols)
display(cols)

In [None]:
service_a=df.groupby('Churn').mean()
service_a

In [None]:
value = df[df.Churn=='Yes']
value

In [None]:
service = value.groupby(['PhoneService','gender']).size().to_frame()
service

In [None]:
#percent of customer churn used phone service
sizes = value['PhoneService'].value_counts(sort = True)
colors = ["#BDFCC9","#FFDEAD"]
explode = (0.1,0.1)
labels= ['No','Yes']
# Plot
plt.pie(sizes,colors=colors,labels=labels,explode=explode,autopct='%1.1f%%',startangle=270,)
plt.title('Percentage of PhoneService ')
plt.show()

In [None]:
ax = sns.countplot(x="InternetService",data=value)
ax.set_title("Churn distribution by InternetService")


In [None]:
ax = sns.countplot(x="MultipleLines",data=value)
ax.set_title("Churn distribution by MultipleLines")
#moun ki pa itilize phone service gn mwens pou churn

In [None]:
ax = sns.countplot(x="InternetService",data=value)
ax.set_title("Churn distribution by InternetService")

In [None]:
ax = sns.countplot(x="OnlineSecurity",data=value)
ax.set_title("Churn distribution by OnlineSecurity")

In [None]:
ax = sns.countplot(x="DeviceProtection",data=value)
ax.set_title("Churn distribution by DeviceProtection")

In [None]:
ax = sns.countplot(x="TechSupport",data=value)
ax.set_title("Churn distribution by TechSupport")

In [None]:
ax = sns.countplot(x="StreamingTV",data=value)
ax.set_title("Churn distribution by StreamingTV")

In [None]:
ax = sns.countplot(x="StreamingMovies",data=value,)
ax.set_title("Churn distribution by StreamingMovies")

In [None]:
phone_service= value.groupby(['PhoneService']).size().to_frame()
service
