---
format:
  html:
    embed-resources: true
    page-layout: full
    grid:
      body-width: 1200px
    fontsize: 18pt
  pdf: default
echo: false
warning: false
title: Telco Churn Analysis
jupyter: python3
---

## Overview

On the Alpha Consulting team, we have been tasked with investigating customer churn at Telco.

In this report we will showcase some of the key findings from our analysis. We will also provide a high-level overview of the data, and discuss some of the key factors that are driving churn.

At the end of the report we will provide some recommendations for how we can move forward as a business to help identify customers who are at risk of churning, and to help reduce churn rates.

::: content-hidden
### Import necessary libraries & load data
:::

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

In [None]:
file = Path("__file__").parent / "telco-customer-churn.csv"
df = pd.read_csv(file)

plt.style.use('seaborn-v0_8-talk')

## Initial Data Exploration

In [None]:
#| output: false
df.info()

In [None]:
#| fig-cap: Data Sample
#| column: page
from itables import show

show(df.head(50))

### High-level overview of the data

In [None]:
#| layout-ncol: 2
#| column: page
churn = df['Churn'].value_counts()
plt.title('Count of Customer Churn')
plt.bar(churn.index, churn.values)
plt.show()

pct_churn = df['Churn'].value_counts(normalize=True)
plt.pie(pct_churn, labels=pct_churn.index, autopct='%1.1f%%')
plt.show()

## Understanding the Variables that causes churn

### Numeric Features

In [None]:
#| column: body-outset-left
#| out-width: 100%
#| fig-align: center
numerical_features = ['tenure', 'MonthlyCharges', 'TotalCharges']
fig, axes = plt.subplots(1, 3, figsize=(15, 5)) 
for i, feature in enumerate(numerical_features):
    if feature == 'TotalCharges':
        df[feature] = pd.to_numeric(df[feature], errors='coerce')  # Convert to numeric
    sns.histplot(data=df, x=feature, hue='Churn', multiple="stack", ax=axes[i])
    axes[i].set_title(f'Distribution of {feature}')

As you can see here there are some churned customers with high tenure, high monthly charges and high total charges. This is interesting because we would expect that customers with high tenure, low monthly charges and low total charges would be less likely to churn.

------------------------------------------------------------------------

### Categorical Features

::::: {layout="[60, -1, 39]"}
::: {#first-column}

In [None]:
categorical_features = ['gender', 'SeniorCitizen', 'Partner', 'Dependents', 'PhoneService', 'InternetService', 'Contract']
fig, axes = plt.subplots(7, 1, figsize=(8, 20))
axes = axes.flatten()

plt.rcParams.update({'font.size': 12})  # Increase base font size

for i, feature in enumerate(categorical_features):
    # Calculate percentages
    percentages = (df.groupby(feature)['Churn']
                    .value_counts(normalize=True)
                    .unstack()
                    .mul(100))
    
    # Create horizontal stacked bars
    percentages.plot(kind='barh', 
                    stacked=True,
                    ax=axes[i],
                    legend=False,
                    width=0.6)  # Changed from height to width
    
    # Customize the plot
    axes[i].set_title(f'Churn Distribution by {feature}', fontsize=14, pad=-30)
    axes[i].set_ylabel(feature, fontsize=12)
    
    # Add percentage labels on the bars
    for c in axes[i].containers:
        axes[i].bar_label(c, fmt='%.1f%%', label_type='center', fontsize=11)
    
    # Remove x-axis percentage labels
    axes[i].set_xticks([])
    
    # Add border around the subplot
    for spine in axes[i].spines.values():
        spine.set_visible(True)
    
    # Make tick labels larger
    axes[i].tick_params(axis='both', which='major', labelsize=11)
    
    # Adjust plot to reduce white space
    axes[i].margins(y=0.15)  # Reduce vertical margins

# Remove empty subplots
for j in range(i+1, len(axes)):
    fig.delaxes(axes[j])

plt.tight_layout()
plt.show()

:::

::: {#second-column}
In this graph we see a few different things

- Thing 1
- Thing 2
- Thing 3
:::
:::::

### Some more detailed analysis

In [None]:
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df, x='MonthlyCharges', y='TotalCharges', hue='InternetService')
plt.title('Scatter plot of Monthly Charges based on Internet Service type')
plt.xlabel('Monthly Charges')
plt.ylabel('Total Charges')
plt.show()

In [None]:
#| output: false
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df, x='Contract', y='MonthlyCharges', hue='Contract')
plt.title('Scatter plot of Monthly Charges based on Contract type')
plt.xlabel('Contract Type')
plt.ylabel('Monthly Charges')
plt.show()

In [None]:
def plot_churn_by_tenure(data, contract_type):
    # Create the bins
    bins = np.arange(0, data['tenure'].max() + 2, 2)  # +2 to include the last value
    data['tenure_bin'] = pd.cut(data['tenure'], bins=bins)
    
    # Calculate percentage of churned customers in each bin
    churn_by_tenure = (data.groupby('tenure_bin')['Churn']
                          .value_counts(normalize=True)
                          .unstack())
    
    plt.figure(figsize=(12, 8))
    churn_by_tenure['Yes'].multiply(100).plot(kind='bar')
    plt.title(f'Percentage of Churned Customers by Tenure Length\n{contract_type} Contracts')
    plt.xlabel('Tenure (months)')
    plt.ylabel('Churn Percentage')
    plt.axhline(y=50, color='r', linestyle='--', alpha=0.3)
    plt.grid(True, alpha=0.3)
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()
    
    # Print statistics
    # print(f"\nChurn percentage by tenure bins for {contract_type} contracts:")
    # print(churn_by_tenure['Yes'].multiply(100).round(1))

# Create three dataframes
monthly = df[df['Contract'] == 'Month-to-month']
one_year = df[df['Contract'] == 'One year']
two_year = df[df['Contract'] == 'Two year']

::: {.column-screen layout-ncol="3"}

In [None]:
plot_churn_by_tenure(monthly, 'Month-to-month')

In [None]:
plot_churn_by_tenure(one_year, 'One year')

In [None]:
plot_churn_by_tenure(two_year, 'Two year')  

:::