#Internship Program Data Science




##Project Title: Customer Churn Analysis and Prediction

### **Project Overview:**

The project aims to analyze customer churn in a
telecommunications company and develop
predictive models to identify at-risk customers. The
ultimate goal is to provide actionable insights and
recommendations to reduce churn and improve
customer retention.

#Task 1: Data Preparation

### 1. Data description:

In this task, we will be responsible for loading the dataset and conducting an initial exploration. Handle missing values, and if necessary we will convert the categorical variables into numerical representations. Furthermore, split the dataset into training and testing sets for subsequent model evaluation.


**Skills:**

- Data loading and data exploration
- Handling missing values,
- Data preprocessing,
- Categorical variable encoding,
- Dataset splitting.

### 2. Import necessary package


In [0]:
#loading package
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import (accuracy_score, confusion_matrix, classification_report)
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler


### 3. Data Loading

In [0]:
#Data loading
try:
    df = pd.read_csv('/Workspace/Users/meresagidey0938@gmail.com/SKS-Data-Science-internship/SKS Data Science internship/Telco_Customer_Churn_Dataset .csv')
    print("Data loaded successfully!.")
    display(df.head())
except FileNotFoundError:
    print("Error: the pathe is not found. Please upload your data file or provide the correct path.")


 #### 3.1. Data Columns Description (Telco Customer Churn Dataset)


| **Column Name**      | **Data Type**                           | **Description**                                                                                                        |
| -------------------- | --------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| **customerID**       | *object (string)*                       | Unique identifier assigned to each customer.                                                                           |
| **gender**           | *object (string)*                       | Customer's gender. Values: `"Male"`, `"Female"`.                                                                       |
| **SeniorCitizen**    | *int64 (0 or 1)*                        | Indicates whether the customer is a senior citizen. `1 = Yes`, `0 = No`.                                               |
| **Partner**          | *object (string)*                       | Indicates if the customer has a spouse/partner. `"Yes"` or `"No"`.                                                     |
| **Dependents**       | *object (string)*                       | Indicates if the customer has dependent family members. `"Yes"` or `"No"`.                                             |
| **tenure**           | *int64*                                 | Number of months the customer has stayed with the company.                                                             |
| **PhoneService**     | *object (string)*                       | Whether the customer has phone service. `"Yes"` or `"No"`.                                                             |
| **MultipleLines**    | *object (string)*                       | Whether the customer has multiple phone lines. `"Yes"`, `"No"`, or `"No phone service"`.                               |
| **InternetService**  | *object (string)*                       | Customer’s internet service provider. Values: `"DSL"`, `"Fiber optic"`, `"No"`.                                        |
| **OnlineSecurity**   | *object (string)*                       | Whether the customer has online security add-on. `"Yes"`, `"No"`, `"No internet service"`.                             |
| **OnlineBackup**     | *object (string)*                       | Whether the customer has online data backup service.                                                                   |
| **DeviceProtection** | *object (string)*                       | Whether the customer has device protection add-on.                                                                     |
| **TechSupport**      | *object (string)*                       | Indicates if customer has tech support service.                                                                        |
| **StreamingTV**      | *object (string)*                       | Whether customer subscribes to streaming TV service.                                                                   |
| **StreamingMovies**  | *object (string)*                       | Whether customer subscribes to streaming movies.                                                                       |
| **Contract**         | *object (string)*                       | Type of customer contract. `"Month-to-month"`, `"One year"`, `"Two year"`.                                             |
| **PaperlessBilling** | *object (string)*                       | Billing preference. `"Yes"` = paperless, `"No"` = physical bill.                                                       |
| **PaymentMethod**    | *object (string)*                       | Customer’s payment method. Example values: `"Electronic check"`, `"Mailed check"`, `"Credit card"`, `"Bank transfer"`. |
| **MonthlyCharges**   | *float64*                               | Monthly service charges billed to the customer.                                                                        |
| **TotalCharges**     | *object (string)* *(should be numeric)* | Total amount charged to the customer. Often stored as string due to missing or bad entries.                            |
| **Churn**            | *object (string)*                       | Target variable. `"Yes"` if the customer left the company, `"No"` otherwise.                                           |


### 4. Data profile

In [0]:
# Show column names and data types
print("Column names and data types:")
print(df.dtypes)

#### 2.1. Sample Data

In [0]:
#sample data
df = df.sample(frac=0.1, random_state=42)
df.head()

####2.2. Basic Info

In [0]:
# Display basic information about the dataset
print("\nBasic Dataset Info:")
df.info()

#### 2.2. Unique Values in Categorical Columns

In [0]:
# Unique Values in Categorical Columns
print("\nNumber of Unique Values in Categorical Columns:")
categorical_cols = df.select_dtypes(include=['object']).columns
for col in categorical_cols:
    print(col, ": ", df[col].nunique())

#### 2.3. Statistical Summary  

In [0]:
numerical_summary=df.describe().transpose()
display(numerical_summary)

####2.4. Checking Missing Values**

In [0]:
# Check for missing values
print("Missing values before handling:")
print(df.isnull().sum())


### 5. Data Preprocessing

#### 5.1. Categorical Feature Analysis

In [0]:
# Explore relationship between categorical features and Churn

# Get the list of categorical columns from the original dataframe (excluding 'customerID', 'TotalCharges', and 'Churn')
categorical_cols = df.select_dtypes(include='object').columns.tolist()
categorical_cols.remove('customerID')
if 'TotalCharges' in categorical_cols:
    categorical_cols.remove('TotalCharges')
if 'Churn' in categorical_cols:
    categorical_cols.remove('Churn')

# import seaborn as sns
# Plot countplots for each categorical feature against Churn
plt.figure(figsize=(15, 25))
for i, col in enumerate(categorical_cols):
    plt.subplot(6, 3, i + 1) # Adjust subplot grid based on the number of categorical columns
    sns.countplot(x=col, hue='Churn', data=df)
    plt.title(f'{col} vs. Churn')
    plt.xticks(rotation=45, ha='right') # Rotate labels for better readability

plt.tight_layout()
plt.show()

In [0]:
# # Explore relationship between categorical features and Churn

# # Get the list of categorical columns from the original dataframe (excluding 'customerID', 'TotalCharges', and 'Churn')
# categorical_cols = df.select_dtypes(include='object').columns.tolist()
# categorical_cols.remove('customerID')
# if 'TotalCharges' in categorical_cols:
#     categorical_cols.remove('TotalCharges')
# if 'Churn' in categorical_cols:
#     categorical_cols.remove('Churn')

# # Plot countplots for each categorical feature against Churn
# plt.figure(figsize=(15, 25))
# for i, col in enumerate(categorical_cols):
#     plt.subplot(6, 3, i + 1) # Adjust subplot grid based on the number of categorical columns
#     sns.countplot(x=col, hue='Churn', data=df)
#     plt.title(f'{col} vs. Churn')
#     plt.xticks(rotation=45, ha='right') # Rotate labels for better readability

# plt.tight_layout()
# plt.show()

#### 5.2. Categorical Variable Encoding

In [0]:
# Apply one-hot encoding to the identified categorical columns
df_encoded = pd.get_dummies(
    df,
    columns=categorical_cols,
    drop_first=True
)

# Convert all uint8 columns to int32 to avoid Arrow type issues
for col in df_encoded.select_dtypes(include=['uint8']).columns:
    df_encoded[col] = df_encoded[col].astype('int32')

# Display the first few rows of the encoded DataFrame
display(df_encoded.head())

> # **6. Dataset Splitting**

In [0]:
from sklearn.model_selection import train_test_split

# Assuming 'Churn' is your target variable
X = df_encoded.drop(['customerID', 'Churn'], axis=1) # Features
y = df_encoded['Churn'] # Target

# Convert the target variable to numeric if it's not already (e.g., 'Yes'/'No' to 1/0)
y = y.apply(lambda x: 1 if x == 'Yes' else 0)


# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Data splitting complete.")
print(f"Training features shape: {X_train.shape}")
print(f"Testing features shape: {X_test.shape}")
print(f"Training target shape: {y_train.shape}")
print(f"Testing target shape: {y_test.shape}")

#Task 1:- Completed

- Data Loading
- data exploration
- Handling missing values,
- Data preprocessing,
- Categorical variable encoding,
- Dataset splitting.
            
        



> ## **Correlation Analysis with Target Variable**

In [0]:
# Calculate correlation between numerical features and Churn
# Ensure 'Churn' is numeric (which we already did during splitting)
numerical_features_with_churn = df_encoded[['tenure', 'MonthlyCharges', 'TotalCharges', 'Churn']].copy()
# Explicitly convert 'Churn' to numeric (0 for 'No', 1 for 'Yes')
numerical_features_with_churn['Churn'] = numerical_features_with_churn['Churn'].apply(lambda x: 1 if x == 'Yes' else 0)
correlation_with_churn = numerical_features_with_churn.corr()['Churn'].sort_values(ascending=False)

print("Correlation of Numerical Features with Churn:")
display(correlation_with_churn)

# For categorical features (which are now one-hot encoded and are boolean/numeric),
# we can also look at the correlation with Churn.
# Exclude 'customerID' and the original 'Churn' column from the encoded dataframe
encoded_features_with_churn = df_encoded.drop(['customerID'], axis=1).copy()

# Convert boolean columns to integer (0 or 1) for correlation calculation
for col in encoded_features_with_churn.select_dtypes(include='bool').columns:
    encoded_features_with_churn[col] = encoded_features_with_churn[col].astype(int)

# Explicitly convert 'Churn' to numeric (0 for 'No', 1 for 'Yes')
encoded_features_with_churn['Churn'] = encoded_features_with_churn['Churn'].apply(lambda x: 1 if x == 'Yes' else 0)


correlation_with_churn_encoded = encoded_features_with_churn.corr()['Churn'].sort_values(ascending=False)

print("\nCorrelation of All Features (including encoded categorical) with Churn:")
display(correlation_with_churn_encoded)

# Visualize the correlation of all features with Churn (excluding Churn itself from the plot)
plt.figure(figsize=(10, 15))
sns.barplot(x=correlation_with_churn_encoded.drop('Churn').values, y=correlation_with_churn_encoded.drop('Churn').index, palette='coolwarm')
plt.title('Correlation of Features with Churn')
plt.xlabel('Correlation Coefficient')
plt.ylabel('Features')
plt.show()

> ## **Correlation Analysis of Numerical Features**

In [0]:
# Select only numerical columns for correlation analysis
# Ensure 'TotalCharges' is numeric before calculating correlation
numerical_df = df[['tenure', 'MonthlyCharges', 'TotalCharges']].copy()

# Calculate the correlation matrix
correlation_matrix = numerical_df.corr()

# Display the correlation matrix
print("Correlation Matrix of Numerical Features:")
display(correlation_matrix)

# Visualize the correlation matrix using a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Matrix of Numerical Features')
plt.show()

> ### **Relationship between specific pairs of features**

In [0]:
# Visualize the relationship between MonthlyCharges and Tenure
plt.figure(figsize=(8, 6))
sns.scatterplot(x='tenure', y='MonthlyCharges', hue='Churn', data=df)
plt.title('MonthlyCharges vs. Tenure (colored by Churn)')
plt.xlabel('Tenure (Months)')
plt.ylabel('Monthly Charges')
plt.show()

# Visualize the relationship between TotalCharges and Tenure
plt.figure(figsize=(8, 6))
sns.scatterplot(x='tenure', y='TotalCharges', hue='Churn', data=df)
plt.title('TotalCharges vs. Tenure (colored by Churn)')
plt.xlabel('Tenure (Months)')
plt.ylabel('Total Charges')
plt.show()

#Task 2: Exploratory Data Analysis (EDA)
- **Description:**

Calculate and visually represent the overall
churn rate. Explore customer distribution by
gender, partner status, and dependent status.
Analyze tenure distribution and its relation
with churn. Investigate how churn varies
across different contract types and payment
methods.
- **Skills:**
    1. Data visualization,
    2. statistical analysis,
    3. Exploratory data analysis,
    4. Understanding of customer demographic variables
    5. Churn rate calculation.


## 2.1. Calculate churn rate

In [0]:
# Calculate churn rate
churn_rate = df['Churn'].value_counts(normalize=True)['Yes'] * 100
print(f"Churn Rate: {churn_rate:.2f}%")

## 2.2. Explore customer distribution by gender, partner status, and dependent status

In [0]:
import matplotlib.pyplot as plt
import numpy as np

def advanced_subplot(df):

    features = ["gender", "Partner", "Dependents"]
    titles = ["Churn by Gender", "Churn by Partner Status", "Churn by Dependent Status"]

    fig, axes = plt.subplots(3, 1, figsize=(10, 12))

    for ax, col, title in zip(axes, features, titles):

        churn_categories = sorted(df['Churn'].unique())
        counts = df.groupby([col, 'Churn']).size().unstack(fill_value=0)

        index = np.arange(len(counts.index))
        bar_width = 0.35

        # ✔ Grid behind bars
        ax.set_axisbelow(True)
        ax.grid(axis='y', linestyle='--', alpha=0.3)

        # Plot grouped bars
        for i, churn_cat in enumerate(churn_categories):
            ax.bar(
                index + i * bar_width,
                counts[churn_cat],
                bar_width,
                label=f"Churn = {churn_cat}"
            )

            # Add labels
            for j, val in enumerate(counts[churn_cat]):
                ax.text(
                    j + i * bar_width,
                    val + 0.25,
                    str(val),
                    ha='center',
                    va='bottom',
                    fontsize=8
                )

        ax.set_xticks(index + bar_width / 2)
        ax.set_xticklabels(counts.index)
        ax.set_title(title, fontsize=14, fontweight="bold")
        ax.set_ylabel("Count")
        ax.legend()

    plt.tight_layout()
    plt.show()
advanced_subplot(df)

In [0]:
# import matplotlib.pyplot as plt
# import numpy as np

# def advanced_subplot(df):

#     features = ["gender", "Partner", "Dependents"]
#     titles = [
#         "Churn by Gender",
#         "Churn by Partner Status",
#         "Churn by Dependent Status"
#     ]

#     fig, axes = plt.subplots(3, 1, figsize=(12, 14))

#     for ax, col, title in zip(axes, features, titles):
        
#         churn_categories = sorted(df['Churn'].unique())
#         counts = df.groupby([col, 'Churn']).size().unstack(fill_value=0)

#         index = np.arange(len(counts.index))
#         bar_width = 0.35

#         # Focused + clean visual
#         ax.set_axisbelow(True)
#         ax.grid(axis='y', linestyle='--', alpha=0.25)

#         # Draw grouped bars
#         bars = []
#         for i, churn_cat in enumerate(churn_categories):
#             bar = ax.bar(
#                 index + i * bar_width,
#                 counts[churn_cat],
#                 bar_width,
#                 label=f"Churn = {churn_cat}"
#             )
#             bars.append(bar)

#             # Add bar labels above bars
#             for j, val in enumerate(counts[churn_cat]):
#                 ax.text(
#                     j + i * bar_width,
#                     val + 0.15,
#                     f"{val}",
#                     ha='center',
#                     va='bottom',
#                     fontsize=9,
#                     fontweight='medium'
#                 )

#         # Improve axes & titles
#         ax.set_xticks(index + bar_width / 2)
#         ax.set_xticklabels(
#             counts.index,
#             fontsize=11,
#             fontweight="medium"
#         )

#         ax.set_title(title, fontsize=16, fontweight="bold", pad=15)
#         ax.set_ylabel("Count", fontsize=12)
#         ax.tick_params(axis='y', labelsize=10)

#         # Legend styling
#         ax.legend(
#             title="Churn Category",
#             title_fontsize=11,
#             fontsize=10,
#             frameon=True,
#             shadow=False,
#             edgecolor="black"
#         )

#     plt.tight_layout()
#     plt.show()
# advanced_subplot(df)

In [0]:
# Explore customer distribution by gender
plt.figure(figsize=(6, 4))
sns.countplot(x='gender', hue='Churn', data=df)
plt.title('Churn by Gender')
plt.show()

# Explore customer distribution by partner status
plt.figure(figsize=(6, 4))
sns.countplot(x='Partner', hue='Churn', data=df)
plt.title('Churn by Partner Status')
plt.show()

# Explore customer distribution by dependent status
plt.figure(figsize=(6, 4))
sns.countplot(x='Dependents', hue='Churn', data=df)
plt.title('Churn by Dependent Status')
plt.show()

## 2.3. Analyze tenure distribution and its relation with churn**



In [0]:
# Analyze tenure distribution and its relation with churn
plt.figure(figsize=(8, 6))
sns.histplot(data=df, x='tenure', hue='Churn', multiple='stack', kde=True)
plt.title('Tenure Distribution by Churn')
plt.xlabel('Tenure (Months)')
plt.ylabel('Count')
plt.show()

In [0]:
# # Investigate how churn varies across different contract types
# plt.figure(figsize=(8, 6))
# sns.countplot(x='Contract', hue='Churn', data=df)
# plt.title('Churn by Contract Type')
# plt.show()

# # Investigate how churn varies across different payment methods
# plt.figure(figsize=(10, 6))
# sns.countplot(x='PaymentMethod', hue='Churn', data=df)
# plt.title('Churn by Payment Method')
# plt.xticks(rotation=45, ha='right')
# plt.tight_layout()
# plt.show()

# Task 2: Complteted!

1. Calculate and visually represent the overall churn rate.
2. Explore customer distribution by gender,
3. partner status, and dependent status.
4. Analyze tenure distribution and its relation with churn.
5. Investigate how churn varies across different contract types and payment methods.

# Task 3: Customer Segmentation

> ## **Description:**

Segment customers based on tenure, monthly charges, and contract type. Analyze
churn rates within these segments. Identify
high-value customers who are at risk of
churning and might need special attention.

> ## **Skill**
1.   Segmentation techniques
2.   Understanding of customer behavior

3.   Churn analysis within segments
4.   Identifying high-value customers.

> ## **Segment customers based on tenure, monthly charges, and contract type**

In [0]:
# Segment based on tenure
# Let's define tenure segments (e.g., short-term, medium-term, long-term)
df['Tenure_Segment'] = pd.cut(df['tenure'], bins=[0, 12, 36, 72], labels=['0-12', '13-36', '37-72'], right=False)

# Segment based on monthly charges
# Let's define monthly charges segments (e.g., low, medium, high)
df['MonthlyCharges_Segment'] = pd.cut(df['MonthlyCharges'], bins=[0, 30, 70, 120], labels=['Low', 'Medium', 'High'], right=False)

# Segment based on contract type (already a categorical column)

# Display the first few rows with new segments
print("DataFrame with Segmentation Columns:")
display(df[['tenure', 'Tenure_Segment', 'MonthlyCharges', 'MonthlyCharges_Segment', 'Contract', 'Churn']].head())

> ## **Analyze churn rates within these segments**

In [0]:
# Analyze churn rate by Tenure Segment
print("\nChurn Rate by Tenure Segment:")
display(df.groupby('Tenure_Segment')['Churn'].value_counts(normalize=True).unstack().mul(100).fillna(0))

# Analyze churn rate by Monthly Charges Segment
print("\nChurn Rate by Monthly Charges Segment:")
display(df.groupby('MonthlyCharges_Segment')['Churn'].value_counts(normalize=True).unstack().mul(100).fillna(0))

# Analyze churn rate by Contract Type
print("\nChurn Rate by Contract Type:")
display(df.groupby('Contract')['Churn'].value_counts(normalize=True).unstack().mul(100).fillna(0))

# Analyze churn rate by combined segments (e.g., Tenure and Monthly Charges)
print("\nChurn Rate by Tenure and Monthly Charges Segments:")
display(df.groupby(['Tenure_Segment', 'MonthlyCharges_Segment'])['Churn'].value_counts(normalize=True).unstack().mul(100).fillna(0))

# Analyze churn rate by combined segments (e.g., Contract and Monthly Charges)
print("\nChurn Rate by Contract Type and Monthly Charges Segments:")
display(df.groupby(['Contract', 'MonthlyCharges_Segment'])['Churn'].value_counts(normalize=True).unstack().mul(100).fillna(0))

> ## **Identify high-value customers who are at risk of churning**

In [0]:
# # To identify high-value customers, we can consider those with high TotalCharges and potentially long tenure.
# # Customers at risk of churning are those with Churn == 'Yes'.

# # Let's define 'High Value' customers as those in the top 25% of TotalCharges
# high_value_threshold = df['TotalCharges'].quantile(0.75)
# high_value_customers = df[df['TotalCharges'] >= high_value_threshold]

# # Identify high-value customers who are at risk of churning
# high_value_churn_risk = high_value_customers[high_value_customers['Churn'] == 'Yes']

# print(f"\nNumber of high-value customers (Top 25% TotalCharges): {len(high_value_customers)}")
# print(f"Number of high-value customers at risk of churning: {len(high_value_churn_risk)}")

# print("\nHigh-Value Customers at Risk of Churning (first 10):")
# display(high_value_churn_risk.head(10))

# # You can further analyze the characteristics of these high-value, high-risk customers
# # For example, their contract type, internet service, etc.
# print("\nDistribution of Contract Type among High-Value Churn-Risk Customers:")
# display(high_value_churn_risk['Contract'].value_counts(normalize=True).mul(100))

# print("\nDistribution of Internet Service among High-Value Churn-Risk Customers:")
# display(high_value_churn_risk['InternetService'].value_counts(normalize=True).mul(100))

## Task 3: Completed!
- Segment customers based on tenure, monthly charges, and contract type.
- Analyze churn rates within these segments.
- Identify high-value customers who are at risk of churning and might need special attention.

# ***Task 4: Churn Prediction Model***

> ## **3. Model Tuning**

In [0]:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.model_selection import GridSearchCV

In [0]:
from sklearn.model_selection import GridSearchCV

# Define hyperparameters grids for tuning
# Logistic Regression
param_grid_lr = {
    'C': [0.001, 0.01, 0.1, 1, 10, 100],
    'solver': ['liblinear', 'lbfgs']
}

# Random Forest
param_grid_rf = {
    'n_estimators': [100, 200, 500],
    'max_depth': [5, 10, 15, None],
    'min_samples_split': [2, 5, 10]
}

# Gradient Boosting
param_grid_gb = {
    'n_estimators': [100, 200, 500],
    'learning_rate': [0.01, 0.1, 0.05],
    'max_depth': [3, 5, 7]
}

# Perform GridSearchCV for each model
tuned_results = {}

print("Starting Hyperparameter Tuning...")

# Logistic Regression Tuning
print("\nTuning Logistic Regression...")
grid_search_lr = GridSearchCV(LogisticRegression(random_state=42), param_grid_lr, cv=5, scoring='roc_auc')
grid_search_lr.fit(X_train, y_train)
tuned_results['Logistic Regression (Tuned)'] = grid_search_lr.best_score_
print(f"Best parameters for Logistic Regression: {grid_search_lr.best_params_}")
print(f"Best AUC for Logistic Regression (Tuned): {grid_search_lr.best_score_:.4f}")

# Random Forest Tuning
print("\nTuning Random Forest...")
grid_search_rf = GridSearchCV(RandomForestClassifier(random_state=42), param_grid_rf, cv=5, scoring='roc_auc')
grid_search_rf.fit(X_train, y_train)
tuned_results['Random Forest (Tuned)'] = grid_search_rf.best_score_
print(f"Best parameters for Random Forest: {grid_search_rf.best_params_}")
print(f"Best AUC for Random Forest (Tuned): {grid_search_rf.best_score_:.4f}")


# Gradient Boosting Tuning
print("\nTuning Gradient Boosting...")
grid_search_gb = GridSearchCV(GradientBoostingClassifier(random_state=42), param_grid_gb, cv=5, scoring='roc_auc')
grid_search_gb.fit(X_train, y_train)
tuned_results['Gradient Boosting (Tuned)'] = grid_search_gb.best_score_
print(f"Best parameters for Gradient Boosting: {grid_search_gb.best_params_}")
print(f"Best AUC for Gradient Boosting (Tuned): {grid_search_gb.best_score_:.4f}")

print("\nHyperparameter Tuning Complete.")
print("\nTuned Model AUC Scores:")
display(pd.Series(tuned_results).sort_values(ascending=False))

> ## ***Description:***
Choose suitable machine learning algorithms
(e.g., logistic regression, decision trees) for
churn prediction. Split data into training and
testing sets, train and evaluate multiple
models using metrics like accuracy, precision,
recall, and F1-score. Perform feature selection
and hyperparameter tuning for optimal
performance.

> ## **Skills**
1. Machine learning algorithms
2. Model training and evaluation,
3. Feature selection, hyperparameter tuning
4. Understanding of classification metrics.