# **Project Code**

**By:** Julius Salomons <br>
**Student number:** 14039559 <br>
**Date:** (19-11-2024) 22 december 2024 <br>
**Course:** Symbolic and Neural AI <br> <br>

**Task description** <br>
This project focuses on a classification task: predicting whether a customer will churn (leave the service) based on their data. The aim is to develop and compare the performance of two models to determine which is better suited for this task. <br>  <br>

**Dataset description** <br>
- https://www.kaggle.com/code/ybifoundation/telecom-customer-churn-prediction  <br>
- 7043 rows and 21 columns, making it manageable for computation on a standard
laptop. <br>
- Fairly balanced dataset (based on churning rates) <br> <br>

**Models**  <br>
1. Logic tensor network (LTN)  <br>
2. Nearest Neighbors Classification (NN) <br>

Model 1 is covered in this course. <br>
Model 2 (https://scikit-learn.org/stable/modules/neighbors.html#nearest-neighborsclassification) is a traditional machine learning model that I am already familiar with. <br>  <br>

**Investigation description** <br>
This investigation will compare model performance using metrics like accuracy, precision, recall, and F1-score. By examining these metrics, we can gain insight into how well each model predicts customer churn. The comparison will provide a deeper understanding of the strengths and limitations of a neural-symbolic approach (LTN) and a traditional machine learning model (NN). <br> <br>

**Expected Outcomes**  <br>
I expect Nearest Neighbors to outperform the LTN due to its proven effectiveness in handling classification tasks with similar feature structures. However, I anticipate the difference in performance may not be substantial.

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report

In [None]:
# Load dataset
url = "https://raw.githubusercontent.com/dsrscientist/DSData/master/Telecom_customer_churn.csv"  # Update if necessary
data = pd.read_csv(url)

# displays basic info about the dataset
print("Dataset Head:\n", data.head())

# Handle missing values if any
data = data.dropna()

Dataset Head:
    customerID  gender  SeniorCitizen Partner Dependents  tenure PhoneService  \
0  7590-VHVEG  Female              0     Yes         No       1           No   
1  5575-GNVDE    Male              0      No         No      34          Yes   
2  3668-QPYBK    Male              0      No         No       2          Yes   
3  7795-CFOCW    Male              0      No         No      45           No   
4  9237-HQITU  Female              0      No         No       2          Yes   

      MultipleLines InternetService OnlineSecurity  ... DeviceProtection  \
0  No phone service             DSL             No  ...               No   
1                No             DSL            Yes  ...              Yes   
2                No             DSL            Yes  ...               No   
3  No phone service             DSL            Yes  ...              Yes   
4                No     Fiber optic             No  ...               No   

  TechSupport StreamingTV StreamingMovies      

# **Nearest neigbours code**

In [None]:
# encode categorical variables
label_encoders = {}
for column in data.select_dtypes(include=['object']).columns:
    le = LabelEncoder()
    data[column] = le.fit_transform(data[column])
    label_encoders[column] = le

# separate features and target
target_column = 'Churn'
X = data.drop(columns=[target_column])
y = data[target_column]

# split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
# train Nearest Neighbors Classifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

# Make predictions
y_pred = knn.predict(X_test)

In [None]:
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print("\nModel Evaluation:")
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")

# Detailed classification report
print("\nClassification Report:\n", classification_report(y_test, y_pred))


Model Evaluation:
Accuracy: 0.7729
Precision: 0.5810
Recall: 0.5094
F1 Score: 0.5429

Classification Report:
               precision    recall  f1-score   support

           0       0.83      0.87      0.85      1036
           1       0.58      0.51      0.54       373

    accuracy                           0.77      1409
   macro avg       0.71      0.69      0.70      1409
weighted avg       0.76      0.77      0.77      1409



# **LTN code**

1e poging is van de stof van week 3

In [None]:
# !pip install ltntorch
import ltn
import torch
import torch.nn as nn
import torch.optim as optim

In [None]:
url = "https://raw.githubusercontent.com/dsrscientist/DSData/master/Telecom_customer_churn.csv"  # Update if necessary
data = pd.read_csv(url)

In [None]:
# preprocessing
data = data.dropna()
label_encoders = {}
for column in data.select_dtypes(include=['object']).columns:
    le = LabelEncoder()
    data[column] = le.fit_transform(data[column])
    label_encoders[column] = le

X = data.drop(columns=['Churn'])
y = data['Churn']

# split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# convert to pyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
y_train = torch.tensor(y_train.values, dtype=torch.float32).view(-1, 1)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_test = torch.tensor(y_test.values, dtype=torch.float32).view(-1, 1)

In [None]:
# define the feedforward neural network
class ModelP2(nn.Module):
    def __init__(self):
        super(ModelP2, self).__init__()
        self.elu = nn.ELU()
        self.sigmoid = nn.Sigmoid()
        self.dense1 = nn.Linear(X_train.shape[1], 5)
        self.dense2 = nn.Linear(5, 1)

    def forward(self, x):
        x = self.elu(self.dense1(x))
        return self.sigmoid(self.dense2(x))

# wrap the model in LTN.Predicate
modelP2 = ModelP2()
P2 = ltn.Predicate(model=modelP2)

# loss function for LTN
criterion = nn.BCELoss()

# Define an optimizer
optimizer = optim.Adam(modelP2.parameters(), lr=0.01)

# training loop
epochs = 100
for epoch in range(epochs):
    modelP2.train()
    optimizer.zero_grad()

    # forward pass
    predictions = modelP2(X_train)
    loss = criterion(predictions, y_train)

    # backward pass and optimization
    loss.backward()
    optimizer.step()

    # log progress
    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")



Epoch [10/100], Loss: 0.6036
Epoch [20/100], Loss: 0.5232
Epoch [30/100], Loss: 0.4765
Epoch [40/100], Loss: 0.4487
Epoch [50/100], Loss: 0.4373
Epoch [60/100], Loss: 0.4306
Epoch [70/100], Loss: 0.4261
Epoch [80/100], Loss: 0.4224
Epoch [90/100], Loss: 0.4194
Epoch [100/100], Loss: 0.4174


In [None]:
# evaluate the model
modelP2.eval()
with torch.no_grad():
    y_pred = modelP2(X_test)
    y_pred_class = (y_pred > 0.5).float()

# convert predictions and labels to numpy arrays
y_test_np = y_test.numpy()
y_pred_class_np = y_pred_class.numpy()

# calculate evaluation metrics
accuracy = accuracy_score(y_test_np, y_pred_class_np)
precision = precision_score(y_test_np, y_pred_class_np)
recall = recall_score(y_test_np, y_pred_class_np)
f1 = f1_score(y_test_np, y_pred_class_np)

# print the evaluation results
print("\nModel Evaluation:")
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")

# detailed classification report
print("\nClassification Report:\n", classification_report(y_test_np, y_pred_class_np))


Model Evaluation:
Accuracy: 0.8197
Precision: 0.6854
Recall: 0.5898
F1 Score: 0.6340

Classification Report:
               precision    recall  f1-score   support

         0.0       0.86      0.90      0.88      1036
         1.0       0.69      0.59      0.63       373

    accuracy                           0.82      1409
   macro avg       0.77      0.75      0.76      1409
weighted avg       0.81      0.82      0.82      1409

