#### Question: What is the relationship between the customer churn based on the Services they subscribed to ?

##### Expectations:
Predicting customer churn based on the services they subscribe to can be very useful for a telecommunications company. By identifying which services are more strongly associated with churn, the company can take targeted actions to improve those services or offer incentives to encourage customers to keep them. This information can also help the company to develop more targeted marketing campaigns to promote the services that are less likely to lead to churn. Additionally, the insights gained from this analysis can be used to inform future product development and service offerings, helping the company to better meet the needs and preferences of its customers.


##### Information about the data:
The data is stored in an Excel file named `Telco_customer_churn_services.xlsx`. The file contains 7043 rows. Each row represents a customer, each column contains customer’s attributes described on the column Metadata. This company provides 8 services:
1. Phone Service
2. Multiple Lines
3. Internet Service
4. online security
5. online backup
6. Device Protection Plan
7. Premium Tech Support
8. Unlimited Data

#### EDA:

In [None]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
import seaborn as sns
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from imblearn.over_sampling import SMOTE
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC

In [None]:
# Load the dataset from ../Dataset/Telco_customer_churn_services.xlsx
dataset1 = pd.read_excel('../Dataset/Telco_customer_churn_services.xlsx')

In [None]:
# we need to get a column from another excel file and join it with the dataset

# Load the data from /Dataset/Telco_customer_churn.xlsx
dataset2 = pd.read_excel('../Dataset/Telco_customer_churn.xlsx')

In [None]:
# rename the column to match the column name in the dataset
dataset2.rename(columns={'CustomerID':'Customer ID'}, inplace=True)

In [None]:
# Join the two datasets on the column 'Customer ID'
dataset = pd.merge(dataset1, dataset2, on='Customer ID')

In [None]:
dataset.columns

In [None]:
my_columns = ['Phone Service_x', 'Internet Service_x', 'Multiple Lines_x',
              'Online Security_x', 'Online Backup_x', 'Device Protection Plan', 'Premium Tech Support', 'Unlimited Data', 'Churn Value', 'Total Revenue']

dataset = dataset[my_columns]

In [None]:
# Check for messing values
dataset.isnull().sum()

In [None]:
# rename the columns with _x to remove the _x
if 'Phone Service_x' in dataset.columns:
    dataset.rename(columns={'Phone Service_x':'Phone Service', 'Internet Service_x':'Internet Service', 'Multiple Lines_x':'Multiple Lines',
                        'Online Security_x':'Online Security', 'Online Backup_x':'Online Backup'}, inplace=True)

In [None]:
dataset.columns

In [None]:
# check the data types
dataset.dtypes

In [None]:
## turn the categorical variables into dummy variables
dataset = pd.get_dummies(dataset, drop_first=True)

# check the data types of the  
dataset.dtypes

In [None]:
# rename the columns with _yes to remove the _yes
if 'Phone Service_Yes' in dataset.columns:
    dataset.rename(columns={'Phone Service_Yes':'Phone Service', 'Internet Service_Yes':'Internet Service', 'Multiple Lines_Yes':'Multiple Lines',
                        'Online Security_Yes':'Online Security', 'Online Backup_Yes':'Online Backup','Device Protection Plan_Yes':'Device Protection Plan',
                        'Premium Tech Support_Yes':'Premium Tech Support','Unlimited Data_Yes':'Unlimited Data'}, inplace=True)

In [None]:
# check the head of the dataset
dataset.head()

In [None]:
# visualize the relationship between the different services and total revenue using a boxplot all in one figure and  same axes
plt.figure(figsize=(20, 10))
plt.subplot(2, 4, 1)
sns.boxplot(x='Phone Service', y='Total Revenue',
            data=dataset)
plt.subplot(2, 4, 2)
sns.boxplot(x='Internet Service', y='Total Revenue', data=dataset)
plt.subplot(2, 4, 3)
sns.boxplot(x='Multiple Lines', y='Total Revenue', data=dataset)
plt.subplot(2, 4, 4)
sns.boxplot(x='Online Security', y='Total Revenue', data=dataset)
plt.subplot(2, 4, 5)
sns.boxplot(x='Online Backup', y='Total Revenue', data=dataset)
plt.subplot(2, 4, 6)
sns.boxplot(x='Device Protection Plan', y='Total Revenue', data=dataset)
plt.subplot(2, 4, 7)
sns.boxplot(x='Premium Tech Support', y='Total Revenue', data=dataset)
plt.subplot(2, 4, 8)
sns.boxplot(x='Unlimited Data', y='Total Revenue', data=dataset)
plt.show()

# print in a table the median of the total revenue for each service
print(dataset.groupby('Phone Service')['Total Revenue'].median())
print(dataset.groupby('Internet Service')['Total Revenue'].median())
print(dataset.groupby('Multiple Lines')['Total Revenue'].median())
print(dataset.groupby('Online Security')['Total Revenue'].median())
print(dataset.groupby('Online Backup')['Total Revenue'].median())
print(dataset.groupby('Device Protection Plan')['Total Revenue'].median())
print(dataset.groupby('Premium Tech Support')['Total Revenue'].median())
print(dataset.groupby('Unlimited Data')['Total Revenue'].median())

###### Initial Observations:

The online backup service generates the most revenue out of all the services. The phone service


In [None]:
# visualize the distribution of the different services vs the churn value all in one figure and same axes
plt.figure(figsize=(20, 10))
plt.subplot(2, 4, 1)
sns.countplot(x='Phone Service', hue='Churn Value', data=dataset)
plt.subplot(2, 4, 2)
sns.countplot(x='Internet Service', hue='Churn Value', data=dataset)
plt.subplot(2, 4, 3)
sns.countplot(x='Multiple Lines', hue='Churn Value', data=dataset)
plt.subplot(2, 4, 4)
sns.countplot(x='Online Security', hue='Churn Value', data=dataset)
plt.subplot(2, 4, 5)
sns.countplot(x='Online Backup', hue='Churn Value', data=dataset)
plt.subplot(2, 4, 6)
sns.countplot(x='Device Protection Plan', hue='Churn Value', data=dataset)
plt.subplot(2, 4, 7)
sns.countplot(x='Premium Tech Support', hue='Churn Value', data=dataset)
plt.subplot(2, 4, 8)
sns.countplot(x='Unlimited Data', hue='Churn Value', data=dataset)
plt.show()