# Customer Churn Business Prediction Dataset

## I. Exploring the Dataset & Workflow Plan

In [154]:
# from google.colab import drive
# drive.mount('/content/drive')

### 0. Setting up the seed

Let's set the seed to 42, in order to ensure results are reproducible across runs.

In [155]:
import random
import numpy as np

random.seed(42)
np.random.seed(42)


### 1. Printing the Features

First we will open the dataset using pandas in order to visualise the available features. We will read it from the `.csv` file and insert it into the `ccb_dataset` as a data frame. To print the all the available features we will use `.info` function. Also, we can see that in order to view all the available features I have exported them in an `.xlsx` file.

In [156]:
import pandas as pd

ccb_dataset = pd.read_csv(
    # '/content/drive/MyDrive/FML/Project/customer_churn_business_dataset.csv'
    './customer_churn_business_dataset.csv'
)
info_df = pd.DataFrame({
    "Column": ccb_dataset.columns,
    "Data Type": ccb_dataset.dtypes,
    "Non-Null Count": ccb_dataset.notnull().sum()
})

# info_df.to_excel("ccb_columns.xlsx", sheet_name="features")

ccb_dataset.info()
print("Dataset shape:", ccb_dataset.shape)
print(ccb_dataset['churn'].value_counts(normalize=True))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 32 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   customer_id             10000 non-null  object 
 1   gender                  10000 non-null  object 
 2   age                     10000 non-null  int64  
 3   country                 10000 non-null  object 
 4   city                    10000 non-null  object 
 5   customer_segment        10000 non-null  object 
 6   tenure_months           10000 non-null  int64  
 7   signup_channel          10000 non-null  object 
 8   contract_type           10000 non-null  object 
 9   monthly_logins          10000 non-null  int64  
 10  weekly_active_days      10000 non-null  int64  
 11  avg_session_time        10000 non-null  float64
 12  features_used           10000 non-null  int64  
 13  usage_growth_rate       10000 non-null  float64
 14  last_login_days_ago     10000 non-null 

### 2. Defining the problem

This is a __classification problem__ which can be solved using __supervised learning__. Our task is to predict whether a certain customer will churn, meaning he will unsubscribe from our service. Our target variable is `churn`, which has two possible values:
- __0__ → "No" (customer did not churn)  
- __1__ → "Yes" (customer churned)

Each entry in our dataset represents a certain user, along with various characteristics. Some features are likely more important for predicting churn than others. For example, features such as `avg_session_time`, `last_login_days_ago`, `csat_score`, `nps_score` and `survey_response` may have a strong influence on whether a customer churns.

### 3. Workflow Plan

**Step 1: Data Cleaning**  
This step will consist of removing irrelevant or uninformative features, such as identifiers or columns that do not contribute to predicting customer churn.

**Step 2: Preprocessing**  
Categorical features will be encoded using appropriate encoding techniques. Numeric features will be scaled to ensure they are suitable for models that rely on distance-based optimization.

**Step 3: Training Strategies**  
Two training datasets will be created:
- One dataset using all available features
- One dataset obtained after applying Principal Component Analysis (PCA) to reduce dimensionality while retaining most of the variance

The performance of both approaches will be compared.

**Step 4: Modeling**  
Support Vector Machines (SVM), Logistic Regression and Decision Trees will be used to predict the target variable `churn`. Model performance will be evaluated and compared using appropriate classification metrics.

## II. Data Cleaning

### 0. Datasets

For this project we are proposing four datasets:
- the first will have a __high__ amount of features. We will not care about the qualitative relevance of them.
- the second will have only features that are considered relevant to predicting `churn`
- this one will have only __two features__ decided by us using a __qualitative__ analysis
- last but not least we will have a dataset containting __three__ features based on the __amount of variance__ they have to offer

### 1. Many Features Dataset

We will keep most features except of the following ones:

**Identifiers**
- `customer_id`

**Marketing & support**
- `email_open_rate`
- `marketing_click_rate`
- `complaint_type`

In [157]:

many_feat_dataset = ccb_dataset.drop(
    columns=[
        'customer_id',
        'email_open_rate',
        'marketing_click_rate',
        'complaint_type'
    ]
)

print(many_feat_dataset.head())

   gender  age     country      city customer_segment  tenure_months  \
0    Male   68  Bangladesh    London              SME             22   
1  Female   57      Canada    Sydney       Individual              9   
2    Male   24     Germany  New York              SME             58   
3    Male   49   Australia     Dhaka       Individual             19   
4    Male   65  Bangladesh     Delhi       Individual             52   

  signup_channel contract_type  monthly_logins  weekly_active_days  ...  \
0            Web       Monthly              26                   7  ...   
1         Mobile       Monthly               7                   5  ...   
2            Web        Yearly              19                   5  ...   
3         Mobile        Yearly              34                   7  ...   
4            Web       Monthly              20                   6  ...   

   discount_applied  price_increase_last_3m  support_tickets  \
0               Yes                      No         

### 2. Relevant Features Dataset

The following features will be considered irrelevant (based on domain knowledge) for predicting `churn`:

**Identifiers**
- `customer_id`

**Demographics**
- `gender`
- `age`
- `country`
- `city`

**Customer profile**
- `customer_segment`
- `signup_channel`
- `contract_type`

**Usage & engagement**
- `avg_session_time`
- `features_used`
- `last_login_days_ago`

**Pricing & billing**
- `monthly_fee`
- `total_revenue`
- `payment_method`
- `discount_applied`
- `price_increase_last_3m`

**Marketing & support**
- `email_open_rate`
- `marketing_click_rate`
- `complaint_type`

In [158]:
rel_feat_dataset = ccb_dataset.drop(columns=['customer_id', 'gender', 'age', 'country', 'city', 'customer_segment', 'tenure_months', 'signup_channel', 'contract_type', 'avg_session_time', 'features_used', 'last_login_days_ago', 'monthly_fee', 'total_revenue', 'payment_method', 'discount_applied', 'price_increase_last_3m', 'complaint_type', 'email_open_rate', 'marketing_click_rate'])

print(rel_feat_dataset.head())

   monthly_logins  weekly_active_days  usage_growth_rate  payment_failures  \
0              26                   7               0.06                 1   
1               7                   5              -0.28                 1   
2              19                   5               0.13                 2   
3              34                   7              -0.17                 0   
4              20                   6              -0.16                 0   

   support_tickets  avg_resolution_time  csat_score  escalations  nps_score  \
0                4            13.354360         4.0            0         27   
1                1            25.140088         2.0            0        -19   
2                1            27.572928         3.0            0         80   
3                3            26.420822         5.0            1        100   
4                0            26.674579         4.0            0         21   

  survey_response  referral_count  churn  
0       Satis

### 3. High Quality Features dataset

In order to predict `churn` we believe `csat_score` and `nps_score` are by far the features of the highest importance in the entire dataset.

CSAT Score stands for __Customer Satisfaction Score__. It typically ranges from __1__ to __5__, 5 being the highest score. Enough said.

NPS Score stands for __Net Promoter Score__. It shows how likely a customer is to recommend a product to someone else.

In [159]:
hq_feat_dataset = ccb_dataset[['csat_score', 'nps_score', 'churn']].copy()

print(hq_feat_dataset.head())

   csat_score  nps_score  churn
0         4.0         27      0
1         2.0        -19      1
2         3.0         80      0
3         5.0        100      0
4         4.0         21      0


The reduced datacet will be created later in this project using the __PCA__ algorithm.

## III. Preprocessing

### 1. Encoding

We can observe that several features in the dataset are categorical and represented as strings. Since machine learning models such as Logistic Regression and Support Vector Machines require numerical input, these categorical variables must be encoded.

Because these features are nominal categorical variables (they do not represent an explicit numerical order), `One-Hot Encoding` is applied. This approach prevents the introduction of artificial ordinal relationships that could negatively impact distance-based and linear models.

**The following categorical features are encoded using One-Hot Encoding:**

- `gender`

- `country`

- `city`

- `customer_segment`

- `signup_channel`

- `contract_type`

- `discount_applied`

- `price_increase_last_3m`

- `survey_response`

Numerical features are left unchanged at this stage and will be processed separately during scaling.

In [160]:

categorical_cols = many_feat_dataset.select_dtypes(include=['object']).columns.tolist()

many_feat_dataset = pd.get_dummies(
    many_feat_dataset,
    columns=categorical_cols,
    drop_first=True
)


We will perform the same operation with the `rel_feat_dataset` as well.

In [161]:
categorical_cols = rel_feat_dataset.select_dtypes(include=['object']).columns.tolist()

rel_feat_dataset = pd.get_dummies(
    rel_feat_dataset,
    columns=categorical_cols,
    drop_first=True
)

And the `h1_feat_dataset`...

In [162]:
categorical_cols = hq_feat_dataset.select_dtypes(include=['object']).columns.tolist()

hq_feat_dataset = pd.get_dummies(
    hq_feat_dataset,
    columns=categorical_cols,
    drop_first=True
)

### 2. Splitting the Dataset

The dataset is split into training and test sets using a 70%–30% ratio. Stratified sampling is applied to maintain the original class distribution of the target variable, and a fixed random seed is used to ensure reproducibility.

In [163]:
from sklearn.model_selection import train_test_split

# Separate features and target
X_test_many_feat = many_feat_dataset.drop(columns='churn')
y_test_many_feat = many_feat_dataset['churn']

# Train / Test split
X_train_many_feat, X_test_many_feat, y_train_many_feat, y_test_many_feat = train_test_split(
    X_test_many_feat,
    y_test_many_feat,
    test_size=0.3,
    random_state=42,
    stratify=y_test_many_feat
)

print("Train shape:", X_train_many_feat.shape)
print("Test shape:", X_test_many_feat.shape)


Train shape: (7000, 42)
Test shape: (3000, 42)


Let's split the other datasets as well.

In [164]:
X_test_rel_feat = rel_feat_dataset.drop(columns='churn')
y_test_rel_feat = rel_feat_dataset['churn']

X_train_rel_feat, X_test_rel_feat, y_train_rel_feat, y_test_rel_feat = train_test_split(
    X_test_rel_feat,
    y_test_rel_feat,
    test_size=0.3,
    random_state=42,
    stratify=y_test_rel_feat
)

print("Train shape:", X_train_rel_feat.shape)
print("Test shape:", X_test_rel_feat.shape)

Train shape: (7000, 12)
Test shape: (3000, 12)


In [165]:
X_test_hq_feat = hq_feat_dataset.drop(columns='churn')
y_test_hq_feat = hq_feat_dataset['churn']

X_train_hq_feat, X_test_hq_feat, y_train_hq_feat, y_test_hq_feat = train_test_split(
    X_test_hq_feat,
    y_test_hq_feat,
    test_size=0.3,
    random_state=42,
    stratify=y_test_hq_feat
)

print("Train shape:", X_train_hq_feat.shape)
print("Test shape:", X_test_hq_feat.shape)

Train shape: (7000, 2)
Test shape: (3000, 2)


### 3. Scaling
We will use the standard scaler in order to transform the data into feedable entries. The standard scaler is picked becouse it works gread for classification models such as __SVM__ and __Logistic Regression__.

In [166]:
from sklearn.preprocessing import StandardScaler

# Select numerical columns
numerical_cols = X_train_many_feat.select_dtypes(include=['int64', 'float64']).columns

scaler = StandardScaler()

# Fit on training data only
X_train_many_feat[numerical_cols] = scaler.fit_transform(X_train_many_feat[numerical_cols])
X_test_many_feat[numerical_cols] = scaler.transform(X_test_many_feat[numerical_cols])

X_train_many_feat.head()


Unnamed: 0,age,tenure_months,monthly_logins,weekly_active_days,avg_session_time,features_used,usage_growth_rate,last_login_days_ago,monthly_fee,total_revenue,...,signup_channel_Referral,signup_channel_Web,contract_type_Quarterly,contract_type_Yearly,payment_method_Card,payment_method_PayPal,discount_applied_Yes,price_increase_last_3m_Yes,survey_response_Satisfied,survey_response_Unsatisfied
1078,-1.576243,-0.072337,1.050666,-1.511833,0.797427,0.898253,-0.272831,-0.560739,-0.209509,-0.185256,...,False,True,True,False,True,False,False,False,False,True
6331,1.703909,-1.358107,-0.379765,-1.078377,-0.090936,-1.345296,-1.471208,-0.764649,-0.628048,-0.897613,...,False,False,True,False,True,False,False,True,True,False
6323,-0.118398,0.512104,-2.014544,0.221992,0.372326,2.244382,-0.006524,1.172499,-0.628048,-0.273081,...,False,True,False,True,False,False,False,True,False,False
7739,-0.786577,-0.189225,-2.014544,-1.078377,1.200962,-0.447876,-1.338055,-0.152918,0.62757,0.283143,...,False,False,False,True,False,False,True,True,True,False
547,1.21796,-0.013893,1.459361,-0.211465,-0.360284,0.000833,1.0587,-0.050963,-0.628048,-0.448731,...,False,True,False,False,False,True,False,False,True,False


We will scale the other datasets, too.

In [167]:
numerical_cols = X_train_rel_feat.select_dtypes(include=['int64', 'float64']).columns

scaler = StandardScaler()

X_train_rel_feat[numerical_cols] = scaler.fit_transform(X_train_rel_feat[numerical_cols])
X_test_rel_feat[numerical_cols] = scaler.transform(X_test_rel_feat[numerical_cols])

X_train_rel_feat.head()

Unnamed: 0,monthly_logins,weekly_active_days,usage_growth_rate,payment_failures,support_tickets,avg_resolution_time,csat_score,escalations,nps_score,referral_count,survey_response_Satisfied,survey_response_Unsatisfied
1078,1.050666,-1.511833,-0.272831,-0.701872,-0.192313,-0.9311,1.558244,1.306013,-0.512638,1.022136,False,True
6331,-0.379765,-1.078377,-1.471208,-0.701872,1.620742,-1.299269,-1.516659,-0.535665,0.180394,-0.989943,True,False
6323,-2.014544,0.221992,-0.006524,0.709129,-0.192313,-0.367868,-0.491692,1.306013,2.00281,-0.989943,False,False
7739,-2.014544,-1.078377,-1.338055,-0.701872,0.714214,-0.520712,1.558244,-0.535665,-0.230292,-0.989943,True,False
547,1.459361,-0.211465,1.0587,-0.701872,-1.098841,-0.854083,1.558244,1.306013,-1.282673,0.016097,True,False


In [168]:
numerical_cols = X_train_hq_feat.select_dtypes(include=['int64', 'float64']).columns

scaler = StandardScaler()

X_train_hq_feat[numerical_cols] = scaler.fit_transform(X_train_hq_feat[numerical_cols])
X_test_hq_feat[numerical_cols] = scaler.transform(X_test_hq_feat[numerical_cols])

X_train_hq_feat.head()

Unnamed: 0,csat_score,nps_score
1078,1.558244,-0.512638
6331,-1.516659,0.180394
6323,-0.491692,2.00281
7739,1.558244,-0.230292
547,1.558244,-1.282673


### 4. Dimensionality reduction



PCA with three components is applied after scaling. The transformation is fitted on the training data only to avoid data leakage.

In [169]:
from sklearn.decomposition import PCA

# Apply PCA with 3 components
pca = PCA(n_components=3, random_state=42)

# Fit PCA on training data only
X_train_pca = pca.fit_transform(X_train_many_feat)
X_test_pca = pca.transform(X_test_many_feat)

print("Original feature space:", X_train_many_feat.shape)
print("Reduced feature space:", X_train_pca.shape)


Original feature space: (7000, 42)
Reduced feature space: (7000, 3)


## IV. Training

### 1. Logistic Regression

In [170]:
from sklearn.linear_model import LogisticRegression

lr_model_many = LogisticRegression(class_weight='balanced', random_state=42, max_iter=1000).fit(X_train_many_feat, y_train_many_feat)

lr_model_rel = LogisticRegression(class_weight='balanced', random_state=42, max_iter=1000).fit(X_train_rel_feat, y_train_rel_feat)

lr_model_hq = LogisticRegression(class_weight='balanced', random_state=42, max_iter=1000).fit(X_train_hq_feat, y_train_hq_feat)

lr_model_pca = LogisticRegression(class_weight='balanced', random_state=42, max_iter=1000).fit(X_train_pca, y_train_many_feat)

### 2. Support Vector Machines (SVMs)

In [171]:
from sklearn.svm import SVC

svc_model_many = SVC(class_weight='balanced', kernel='rbf', C=0.1, gamma='scale', random_state=42).fit(X_train_many_feat, y_train_many_feat)

svc_model_rel = SVC(class_weight='balanced', kernel='rbf', C=0.1, gamma='scale', random_state=42).fit(X_train_rel_feat, y_train_rel_feat)

svc_model_hq = SVC(class_weight='balanced', kernel='rbf', C=0.1, gamma='scale', random_state=42).fit(X_train_hq_feat, y_train_hq_feat)

svc_model_pca = SVC(class_weight='balanced', kernel='rbf', C=0.1, gamma='scale', random_state=42).fit(X_train_pca, y_train_many_feat)


### 3. Decision Trees

Decision Tree classifiers are trained using the Gini impurity criterion. Although tree-based models do not require feature scaling or dimensionality reduction, the model is also trained on the PCA-reduced dataset for comparison purposes.

In [172]:
from sklearn.tree import DecisionTreeClassifier

dct_clf_many = DecisionTreeClassifier(class_weight='balanced', criterion='gini', random_state=42).fit(X_train_many_feat, y_train_many_feat)

dct_clf_rel = DecisionTreeClassifier(class_weight='balanced', criterion='gini', random_state=42).fit(X_train_rel_feat, y_train_rel_feat)

dct_clf_hq = DecisionTreeClassifier(class_weight='balanced', criterion='gini', random_state=42).fit(X_train_hq_feat, y_train_hq_feat)

dct_clf_pca = DecisionTreeClassifier(class_weight='balanced', criterion='gini', random_state=42).fit(X_train_pca, y_train_many_feat)


## V. Evaluation

### 1. Logistic Regression

In [173]:
from sklearn.metrics import f1_score, accuracy_score
import numpy as np

y_pred_lr_many = lr_model_many.predict(X_test_many_feat)
results_lr_many_acc = accuracy_score(y_test_many_feat, y_pred_lr_many)
results_lr_many_f1 = f1_score(y_test_many_feat, y_pred_lr_many)

y_pred_lr_rel = lr_model_rel.predict(X_test_rel_feat)
results_lr_rel_acc = accuracy_score(y_test_rel_feat, y_pred_lr_rel)
results_lr_rel_f1 = f1_score(y_test_rel_feat, y_pred_lr_rel)

y_pred_lr_hq = lr_model_hq.predict(X_test_hq_feat)
results_lr_hq_acc = accuracy_score(y_test_hq_feat, y_pred_lr_hq)
results_lr_hq_f1 = f1_score(y_test_hq_feat, y_pred_lr_hq)   

y_pred_lr_pca = lr_model_pca.predict(X_test_pca)
results_lr_pca_acc = accuracy_score(y_test_many_feat, y_pred_lr_pca)
results_lr_pca_f1 = f1_score(y_test_many_feat, y_pred_lr_pca)

print("Accuracy score for Logistic Regression using all features:", results_lr_many_acc)
print("F1 score for Logistic Regression using all features:", results_lr_many_f1, "\n")

print("Accuracy score for Logistic Regression using relevant features:", results_lr_rel_acc)
print("F1 score for Logistic Regression using relevant features:", results_lr_rel_f1, "\n")

print("Accuracy score for Logistic Regression using HQ features:", results_lr_hq_acc)
print("F1 score for Logistic Regression using HQ features:", results_lr_hq_f1, "\n")

print("Accuracy score for Logistic Regression using all features:", results_lr_pca_acc)
print("F1 score for Logistic Regression using reduced features:", results_lr_pca_f1)

Accuracy score for Logistic Regression using all features: 0.6753333333333333
F1 score for Logistic Regression using all features: 0.2859237536656892 

Accuracy score for Logistic Regression using relevant features: 0.654
F1 score for Logistic Regression using relevant features: 0.26487252124645894 

Accuracy score for Logistic Regression using HQ features: 0.5403333333333333
F1 score for Logistic Regression using HQ features: 0.2151394422310757 

Accuracy score for Logistic Regression using all features: 0.5443333333333333
F1 score for Logistic Regression using reduced features: 0.20845396641574984


### 2. Suppor Vector Machines

In [177]:
y_pred_svc_many = svc_model_many.predict(X_test_many_feat)
results_svc_many_acc = accuracy_score(y_test_many_feat, y_pred_svc_many)
results_svc_many_f1 = f1_score(y_test_many_feat, y_pred_svc_many)

y_pred_svc_rel = svc_model_rel.predict(X_test_rel_feat)
results_svc_rel_acc = accuracy_score(y_test_rel_feat, y_pred_svc_rel)
results_svc_rel_f1 = f1_score(y_test_rel_feat, y_pred_svc_rel)

y_pred_svc_hq = svc_model_hq.predict(X_test_hq_feat)
results_svc_hq_acc = accuracy_score(y_test_hq_feat, y_pred_svc_hq)
results_svc_hq_f1 = f1_score(y_test_hq_feat, y_pred_svc_hq)

y_pred_svc_pca = svc_model_pca.predict(X_test_pca)
results_svc_pca_acc = accuracy_score(y_test_many_feat, y_pred_svc_pca)
results_svc_pca_f1 = f1_score(y_test_many_feat, y_pred_svc_pca)

print("Accuracy score for SVM using all features:", results_svc_many_acc)
print("F1 score for SVM using all features:", results_svc_many_f1, "\n")

print("Accuracy score for SVM using relevant features:", results_svc_rel_acc)
print("F1 score for SVM using relevant features:", results_svc_rel_f1, "\n")

print("Accuracy score for SVM using HQ features:", results_svc_hq_acc)
print("F1 score for SVM using HQ features:", results_svc_hq_f1, "\n")

print("Accuracy score for SVM using reduced features:", results_svc_pca_acc)
print("F1 score for SVM using reduced features:", results_svc_pca_f1)

Accuracy score for SVM using all features: 0.6876666666666666
F1 score for SVM using all features: 0.3234657039711191 

Accuracy score for SVM using relevant features: 0.697
F1 score for SVM using relevant features: 0.3045141545524101 

Accuracy score for SVM using HQ features: 0.8163333333333334
F1 score for SVM using HQ features: 0.2778505897771953 

Accuracy score for SVM using reduced features: 0.6243333333333333
F1 score for SVM using reduced features: 0.23281143635125937


### 3. Decision Trees

In [176]:
y_pred_dt_many = dct_clf_many.predict(X_test_many_feat)
results_dt_many_acc = accuracy_score(y_test_many_feat, y_pred_dt_many)
results_dt_many = f1_score(y_test_many_feat, y_pred_dt_many)

y_pred_dt_rel = dct_clf_rel.predict(X_test_rel_feat)
results_dt_rel_acc = accuracy_score(y_test_rel_feat, y_pred_dt_rel)
results_dt_rel = f1_score(y_test_rel_feat, y_pred_dt_rel)

y_pred_dt_hq = dct_clf_hq.predict(X_test_hq_feat)
results_dt_hq_acc = accuracy_score(y_test_hq_feat, y_pred_dt_hq)
results_dt_hq = f1_score(y_test_hq_feat, y_pred_dt_hq)

y_pred_dt_pca = dct_clf_pca.predict(X_test_pca)
results_dt_pca_acc = accuracy_score(y_test_many_feat, y_pred_dt_pca)
results_dt_pca = f1_score(y_test_many_feat, y_pred_dt_pca)

print("Accuracy score for Decision Tree using all features:", results_dt_many_acc)
print("F1 score for Decision Tree using all features:", results_dt_many, "\n")

print("Accuracy score for Decision Tree using relevant features:", results_dt_rel_acc)
print("F1 score for Decision Tree using relevant features:", results_dt_rel, "\n")

print("Accuracy score for Decision Tree using HQ features:", results_dt_hq_acc)
print("F1 score for Decision Tree using HQ features:", results_dt_hq, "\n")

print("Accuracy score for Decision Tree using reduced features:", results_dt_pca_acc)
print("F1 score for Decision Tree using reduced features:", results_dt_pca)

Accuracy score for Decision Tree using all features: 0.8293333333333334
F1 score for Decision Tree using all features: 0.16339869281045752 

Accuracy score for Decision Tree using relevant features: 0.8423333333333334
F1 score for Decision Tree using relevant features: 0.21818181818181817 

Accuracy score for Decision Tree using HQ features: 0.6526666666666666
F1 score for Decision Tree using HQ features: 0.18973561430793157 

Accuracy score for Decision Tree using reduced features: 0.81
F1 score for Decision Tree using reduced features: 0.13636363636363635


## VI. Conclusions

As a result of this experiment, the following conclusions can be drawn:

• The dataset is highly imbalanced, with significantly fewer churned customers compared to non-churned ones, which makes the classification task more challenging and justifies the use of the F1-score as an evaluation metric.

• Among the evaluated models, Support Vector Machines achieved the best performance when trained on the full feature set, indicating their ability to capture complex, non-linear relationships in the data.

• Applying Principal Component Analysis (PCA) consistently reduced model performance across all algorithms, suggesting that dimensionality reduction led to the loss of important predictive information.

• Overall, models trained on the original feature space performed better than those trained on the PCA-reduced data, highlighting the importance of preserving the full set of engineered features for churn prediction.
