# Customer Churn Analysis

## Customer Churn Analysis

The leading telecom company has a massive market share but one big problem: several rivals that are constantly trying to steal customers.  Because this company has been the market leader for so many years, there are not significant opportunities to grow with new customers.  Instead, company executives have decided to focus on their churn: the rate at which they lose customers.

They have two teams especially interested in this data: the marketing team and the customer service team.  Each team has its own reason for wanting the analysis. The marketing team wants to find out who the most likely people to churn are and create content that suits their interests.  The customer service team would like to proactively reach out to customers who are about to churn, and try to encourage them to stay.

They decide to hire you for two tasks:
Help them identify the types of customers who churn
Predict who of their current customers will churn next month

To do this, they offer you a file of 7,000 customers. Each row is a customer.  The Churn column will say Yes if the customer churned in the past month.  The data also offers demographic data and data on the services that each customer purchases.  Finally there is information on the payments those customers make.

# Deliverables - What is expected
### Week 1


A presentation explaining churn for the marketing team - with links to technical aspects of your work. Tell a story to the marketing team to help them understand the customers who churn and what the marketing team can do to prevent it.  Highlight the information with helpful visualizations.
- 1- How much is churn affecting the business? 
  How big is churn compared to the existing customer base?
- 2- Explain churn by the below categories. Are there any factors that combine to be especially impactful?
     A- Customer demographics like age and gender
     B- Services used 
     C- Billing information
- 3- what vices are typically purchased by customers who churned? Are any services especially helpful in retaining customers?
- 4- Bonus! How long will it take for the company to lose all its customers?  Which demographics will they lose first?

# Data Preprocessing

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression

In [None]:
url='datase_Churn/datasets_13996_18858_WA_Fn-UseC_-Telco-Customer-Churn (1).csv'
df = pd.read_csv(url)
df.head()

### 1- How much is churn affecting the business? How big is churn compared to the existing customer base?

In [None]:
display(df.groupby(['Churn']).size())

ax = (df['Churn'].value_counts()*100.0 /len(df))\
.plot.pie(autopct='%.1f%%', labels = ['No', 'Yes'],figsize =(5,5), fontsize = 12 )                                                                           
#ax.yaxis.set_major_formatter(mtick.PercentFormatter())
ax.set_ylabel('Churn',fontsize = 12)
ax.set_title('Percent of Churn', fontsize = 12)


# How much is churn affecting the business?
Rep: The churn affects the company compared to the 1869 customers who have unsubscribed.  

## How big is churn compared to the existing customer base?
Rep: there are about 26 percent of people who have unsubscribed, so the company has lost a large portion of its customers.

### 2- Explain churn by the below categories. Are there any factors that combine to be especially impactful?
* A- Customer demographics like age and gender 
* B- Services used 
* C- Billing information

### A- Customer demographics like age and gender

In [None]:
df.groupby(['SeniorCitizen','gender','Churn']).size().to_frame().rename(columns ={0: "size"}).reset_index()

In [None]:
df.groupby(['gender','Churn']).size().to_frame().rename(columns ={0: "size"}).reset_index()

## Female customers are more likely to churn vs. male customers, but the difference is minimal

In [None]:
ax = sns.countplot(x="gender", hue="Churn", data = df)
##ax.set_title("trans")

# A- Customer demographics like age and gender
In the case of desafectation there are more women who have unsubscribed. 

In [None]:
df.groupby([ "SeniorCitizen", "Churn",]).size().to_frame().rename(columns ={0: "size"}).reset_index()

In [None]:
ax = sns.countplot(x="SeniorCitizen", hue="Churn", data = df)
#ax.set_title("trans")

# A- Customer demographics like age and gender
Rep:

### B- Services used

In [None]:
## creation of a groupby to gather all the services to see where the number of Churn is the most affected

df.groupby(['PhoneService','MultipleLines','InternetService','OnlineSecurity','StreamingMovies','StreamingTV','TechSupport','Churn']).size().to_frame().rename(columns ={0: "size"}).reset_index()

In [None]:
ax = sns.countplot(x="MultipleLines", hue="Churn", data = df)

In [None]:
ax = sns.countplot(x="InternetService", hue="Churn", data = df)

In [None]:
ax = sns.countplot(x="OnlineSecurity", hue="Churn", data = df)

In [None]:
ax = sns.countplot(x="StreamingMovies", hue="Churn", data = df)

In [None]:
ax = sns.countplot(x="StreamingTV", hue="Churn", data = df)

### C-Billing information

In [None]:
df.groupby(['PaperlessBilling','Churn']).size().to_frame().rename(columns ={0: "size"}).reset_index()

In [None]:

df.groupby(['PaymentMethod','Churn']).size().to_frame().rename(columns ={0: "size"}).reset_index()

In [None]:
df.groupby(['MonthlyCharges','Churn']).size().to_frame().rename(columns ={0: "size"}).reset_index()

In [None]:

plt.hist(df['PaperlessBilling'])

plt.title("PaperlessBilling", fontsize=10)
plt.ylabel('Churn')
plt.xlabel('PaperlessBilling')
plt.show()

In [None]:
ax = sns.countplot(x="PaperlessBilling", hue="Churn", data = df)

In [None]:
df['PaperlessBilling'].describe().to_frame()

In [None]:
plt.hist(df['PaymentMethod'])

plt.title("PaymentMethod", fontsize=10)
plt.ylabel('Churn')
plt.xlabel('PaymentMethod')
plt.show()


In [None]:
df['PaymentMethod'].describe().to_frame()

In [None]:
df['MonthlyCharges'].describe().to_frame()

In [None]:
plt.hist(df['TotalCharges'])

plt.title("TotalCharges", fontsize=10)
plt.ylabel('Churn')
plt.xlabel('TotalCharges')
plt.show()

In [None]:

plt.hist(df['MonthlyCharges'])

plt.title("MonthlyCharges", fontsize=10)
plt.ylabel('Churn')
plt.xlabel('MonthlyCharges')
plt.show()



In [None]:
totalch = df[['TotalCharges','Churn']]
totalcha = totalch.groupby(['Churn']).sum()
a = totalcha/totalch.sum()


In [None]:
round(100 * a,2)

# 3- what services are typically purchased by customers who churned? Are any services especially helpful in retaining customers?

### The services are typically purchased by customers who churned is  Phoneservices and IternetServices 


In [None]:
### affect all Churn
data_Churn = df[df.Churn== 'Yes']
data_Churn

In [None]:
ax = sns.countplot(x="StreamingTV", data = data_Churn)

In [None]:
df.groupby(['StreamingTV','Churn']).size().to_frame().rename(columns ={0: "size"}).reset_index()

In [None]:
plt.hist(data_Churn['StreamingTV'])

plt.title("StreamingTV", fontsize=10)
plt.ylabel('Churn')
plt.xlabel('StreamingTV')
plt.show()

### people using streamtv services are more likely to unsubscribe

In [None]:
ax = sns.countplot(x="PhoneService", data = data_Churn)

In [None]:
df.groupby(['PhoneService','Churn']).size().to_frame().rename(columns ={0: "size"}).reset_index()

In [None]:
plt.hist(data_Churn['PhoneService'])

plt.title("PhoneService", fontsize=10)
plt.ylabel('Churn')
plt.xlabel('PhoneService')
plt.show()

### people using streamtv services are more likely to unsubscribe

In [None]:
ax = sns.countplot(x="MultipleLines", data = data_Churn)

In [None]:
df.groupby(['MultipleLines','Churn']).size().to_frame().rename(columns ={0: "size"}).reset_index()

In [None]:
plt.hist(data_Churn['MultipleLines'])

plt.title("MultipleLines", fontsize=10)
plt.ylabel('Churn')
plt.xlabel('MultipleLines')
plt.show()

In [None]:
ax = sns.countplot(x="InternetService", data = data_Churn)

In [None]:
df.groupby(['InternetService','Churn']).size().to_frame().rename(columns ={0: "size"}).reset_index()

In [None]:
plt.hist(data_Churn['InternetService'])

plt.title("InternetService", fontsize=10)
plt.ylabel('Churn')
plt.xlabel('InternetService')
plt.show()

In [None]:
ax = sns.countplot(x="OnlineSecurity", data = data_Churn)

In [None]:
df.groupby(['OnlineSecurity','Churn']).size().to_frame().rename(columns ={0: "size"}).reset_index()

In [None]:
plt.hist(data_Churn['OnlineSecurity'])

plt.title("OnlineSecurity", fontsize=10)
plt.ylabel('Churn')
plt.xlabel('OnlineSecurity')
plt.show()

In [None]:
ax = sns.countplot(x="TechSupport", data = data_Churn)

In [None]:
df.groupby(['TechSupport','Churn']).size().to_frame().rename(columns ={0: "size"}).reset_index()

In [None]:
plt.hist(data_Churn['TechSupport'])

plt.title("TechSupport", fontsize=10)
plt.ylabel('Churn')
plt.xlabel('TechSupport')
plt.show()

In [None]:

data_no_Churn = df[df.Churn== 'No']
data_Churn

In [None]:
ax = sns.countplot(x="StreamingTV", data = data_Churn)

In [None]:
df.groupby(['StreamingTV','Churn']).size().to_frame().rename(columns ={0: "size"}).reset_index()

In [None]:
plt.hist(data_no_Churn['StreamingTV'])

plt.title("StreamingTV", fontsize=10)
plt.ylabel('Churn')
plt.xlabel('StreamingTV')
plt.show()

In [None]:
ax = sns.countplot(x="PhoneService", data = data_Churn)

In [None]:
df.groupby(['PhoneService','Churn']).size().to_frame().rename(columns ={0: "size"}).reset_index()

In [None]:
plt.hist(data_no_Churn['PhoneService'])

plt.title("PhoneService", fontsize=10)
plt.ylabel('Churn')
plt.xlabel('PhoneService')
plt.show()


In [None]:
ax = sns.countplot(x="MultipleLines", data = data_Churn)

In [None]:
df.groupby(['MultipleLines','Churn']).size().to_frame().rename(columns ={0: "size"}).reset_index()

In [None]:
plt.hist(data_no_Churn['MultipleLines'])

plt.title("MultipleLines", fontsize=10)
plt.ylabel('Churn')
plt.xlabel('MultipleLines')
plt.show()


In [None]:
ax = sns.countplot(x="InternetService", data = data_Churn)

In [None]:
df.groupby(['InternetService','Churn']).size().to_frame().rename(columns ={0: "size"}).reset_index()

In [None]:
plt.hist(data_no_Churn['InternetService'])

plt.title("InternetService", fontsize=10)
plt.ylabel('Churn')
plt.xlabel('InternetService')
plt.show()


In [None]:
ax = sns.countplot(x="OnlineSecurity", data = data_Churn)

In [None]:
df.groupby(['OnlineSecurity','Churn']).size().to_frame().rename(columns ={0: "size"}).reset_index()

In [None]:
plt.hist(data_no_Churn['OnlineSecurity'])

plt.title("OnlineSecurity", fontsize=10)
plt.ylabel('Churn')
plt.xlabel('OnlineSecurity')
plt.show()

In [None]:
ax = sns.countplot(x="TechSupport", data = data_Churn)

In [None]:
df.groupby(['TechSupport','Churn']).size().to_frame().rename(columns ={0: "size"}).reset_index()

In [None]:
plt.hist(data_no_Churn['TechSupport'])

plt.title("TechSupport", fontsize=10)
plt.ylabel('Churn')
plt.xlabel('TechSupport')
plt.show()

## 4- Bonus! How long will it take for the company to lose all its customers?  Which demographics will they lose first?

In [None]:
quantity = 7043
Churn_rate = 0.2654
day = 0

In [None]:
while quantity >=1:
    quantity = quantity -(quantity * (Churn_rate))
    day +=1
    print(day)

In [None]:
df.dtypes

In [None]:
ax = sns.countplot(x="InternetService",data= data_Churn )
ax.set_title("Customer demographics InternetService")


In [None]:
df.describe()

In [None]:
## graphical representation of groups of numerical data through their quarters
df.boxplot()

### I visualize the linear correlations

In [None]:
df.corr()

In [None]:
fig, ax = plt.subplots(figsize=(12,12))         # Sample figsize in inches
cm_df = sns.heatmap(df.corr(),annot=True, fmt = ".2f", cmap = "coolwarm", ax=ax)

In [None]:
df.shape

In [None]:
df.groupby('gender').Churn.head()


In [None]:
df.Churn.value_counts(normalize=True)

In [None]:
df.info()

In [None]:
## the number of Churn
df['Churn'].value_counts()

In [None]:
df.head()

In [None]:
df=(tenure,MonthlyCharges)

In [None]:
df.sample(1)