# Assignment 2

### Objective:
This assignment aims to equip students with hands-on experience in predictive modeling using real-world clinical data related to heart failure. The students will implement regression and classification techniques to predict the severity of a condition and patient survival outcomes.

### Dataset:
The dataset contains records of 299 patients who suffered heart failure, with 13 clinical features and two target variables: DEATH_EVENT (binary: 0 or 1) and Severity (numerical). You can download the data from the course files section.

### Reference:
Davide Chicco & Giuseppe Jurman (2020): "Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone", BMC Medical Informatics and Decision Making, 20:16.

## Task 1: Regression Analysis (Predicting Severity)

### 1.1 Data Preparation
- Split the dataset into training (70%) and testing (30%) sets.
- Formulate: Describe the splitting mathematically

In [24]:
import pandas as pd

In [25]:
df = pd.read_csv('heart_failure_clinical_records_with_severity (3).csv')
df.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT,Severity
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1,6.6
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1,2.0
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1,6.4
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1,4.6
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1,8.8


In [26]:
data = df.iloc[:,:-2]
data.head()


Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8


In [27]:
Y=df.iloc[:,-1]
Y.head()

0    6.6
1    2.0
2    6.4
3    4.6
4    8.8
Name: Severity, dtype: float64

In [28]:
from sklearn.model_selection import train_test_split
# Split the dataset into training (70%) and testing (30%) sets
x_train, x_test, y_train, y_test = train_test_split(data, Y, test_size=0.3)

- Formulate: Describe the splitting mathematically
    $$ data = (x_{\text{train}}, y_{\text{train}}) U (y_{\text{test}}, x_{\text{test}}) $$
    $$ where $$
    $$ |(x_{\text{train}}, y_{\text{train}})| = 0.7 |data| $$
    $$ |(x_{\text{test}}, y_{\text{test}})| = 0.3 |data| $$

### 1.2 Linear Regression
- Train a Linear Regression model to predict Severity using all available clinical features except DEATH_EVENT.
- Formulate: Write the regression model. Derive the least squares solution. 

In [29]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Train a Linear Regression model to predict Severity using all available clinical features except DEATH_EVENT
L = LinearRegression()
L.fit(x_train, y_train) # training 
y_pred_L = L.predict(x_test) 
print("MSE [Linear]: ", mean_squared_error(y_test, y_pred_L), "\n")

MSE [Linear]:  0.628477105371346 



- Formulate: Write the regression model. Derive the least squares solution. 
    $$ \bar{Y} = \bar{D} \cdot w + e $$
    $$ \min_w \| \bar{Y} - \bar{D} w \|^2 $$
    $$\frac{dJ}{d \widetilde{w}} = -2\widetilde{D}^TY+2\widetilde{D}^T\widetilde{D} \widetilde{w} $$
    $$ 0 = -2\widetilde{D}^TY+2\widetilde{D}^T\widetilde{D} \widetilde{w} $$
    $$ \widetilde{D}^T\widetilde{D} \widetilde{w} = \widetilde{D}^TY $$
    $$ \widetilde{w} = ( \widetilde{D}^T\widetilde{D} )^{-1} \widetilde{D}^TY $$


### 1.3 Ridge Regression
- Train a Ridge Regression model with regularization.
- Formulate: Derive the closed-form solution

In [None]:
from sklearn.linear_model import Ridge

R = Ridge(alpha=0.5)
R.fit(x_train, y_train) # training 
y_pred_R = R.predict(x_test) 
print("MSE [Ridge]: ", mean_squared_error(y_test, y_pred_R), "\n")

MSE [Ridge]:  0.6269771655910469 



- Formulate: Write the regression model. Derive the least squares solution. 
    $$ \widehat{w}^{\text{ridge}} = arg\min_w \| \bar{Y} - \bar{D} w \|^2 + \alpha ||w||^2$$
    $$ J = Y^TY-2\widetilde{D}^TY\widetilde{w}+\widetilde{w}\widetilde{D}^T\widetilde{D} \widetilde{w} + \alpha \widetilde{w}^T\widetilde{w}$$
    $$ \frac{dJ}{d \widetilde{w}} = 2\widetilde{D}Y + 2\widetilde{D}^T\widetilde{D}\widetilde{w} + 2\alpha \widetilde{w} $$
    $$ 0 = 2\widetilde{D}Y + 2\widetilde{D}^T\widetilde{D}\widetilde{w} + 2\alpha \widetilde{w} $$
    $$ \widetilde{D}^T\widetilde{D} \widetilde{w} + \alpha \widetilde{w} = \widetilde{D}^TY $$
    $$ (\widetilde{D}^T\widetilde{D} + \alpha I) \widetilde{w} = \widetilde{D}^TY $$
    $$  \widetilde{w} = (\widetilde{D}^T\widetilde{D} + \alpha I)^{-1} \widetilde{D}^TY $$

### 1.4 Lasso Regression
- Train a Lasso Regression model and identify the most important features.
- Formulate

In [31]:
from sklearn.linear_model import Lasso

L1 = Lasso()
L1.fit(x_train, y_train) # training 
y_pred_L1 = L1.predict(x_test) 
print("MSE [Lasso]: ", mean_squared_error(y_test, y_pred_L1), "\n")

# Imporant for feature selection
for col, coef in zip(data.columns, L1.coef_):
    print(f"{col}: {coef}")

MSE [Lasso]:  1.2091446297984814 

age: 0.04617194233749001
anaemia: 0.0
creatinine_phosphokinase: -0.00010910027079665507
diabetes: 0.0
ejection_fraction: -0.06211179683259887
high_blood_pressure: 0.0
platelets: -3.071076337915684e-07
serum_creatinine: 0.0
serum_sodium: -0.08104630201199649
sex: 0.0
smoking: 0.0
time: -0.005687918559922949


- Identify the most important features:
    - age
    - creatinine_phosphokinase
    - ejection_fraction
    - serum_sodium
    - time

- The above features have the most significant impact on the predictions made. All other features either have a coef value of 0 or close to 0 relative to the important features. 

- Formulate:  
    $$ \widehat{w}^{\text{lasso}} = arg\min_w \| \bar{Y} - \bar{D} w \|^2 + \Sigma^{p}_{j=1} |w_j|$$
    $$ \widehat{w}^{\text{lasso}} = \arg\min_w \| \bar{Y} - \bar{D} w \|^2 + \alpha \|w\|_1 $$


### 1.5 Kernel Regression
- Apply Kernel Regression with three kernels:
    - Linear
    - Polynomial
    - Radial Basis Function (RBF)
- Formulate: Derive the kernel regression estimator


In [32]:
from sklearn.kernel_ridge import KernelRidge

KL = KernelRidge(kernel='linear')
KL.fit(x_train, y_train) # training 
y_pred_KL = KL.predict(x_test) 
print("MSE [kernel ridge linear]: ", mean_squared_error(y_test, y_pred_KL), "\n")

MSE [kernel ridge linear]:  0.9334311006256635 



In [33]:
KP = KernelRidge(kernel='poly', degree=1)
KP.fit(x_train, y_train) # training 
y_pred_KP = KP.predict(x_test) 
print("MSE [kernel ridge poly]: ", mean_squared_error(y_test, y_pred_KP), "\n")

MSE [kernel ridge poly]:  0.8599968338022204 



In [34]:
KR = KernelRidge(kernel='rbf', gamma=0.1)
KR.fit(x_train, y_train) # training 
y_pred_KR = KR.predict(x_test) 
print("MSE [kernel ridge rbf]: ", mean_squared_error(y_test, y_pred_KR), "\n")

MSE [kernel ridge rbf]:  13.633333333333294 



- Formulate: Derive the kernel regression estimator
    $$ \hat{f}(x) = \Sigma^n_{i=1} \alpha_i K(x, x_i) $$
    $$ J = || Y - \widetilde{D}_{\phi}\widetilde{w}||^2 + \alpha||\widetilde{w}||^2$$
    $$ \hat{y} = \widetilde{D}_{\phi} \widetilde{w} $$
    $$ \widetilde{D}_{\phi} = \widetilde{\phi}(x_n)$$
    $$ J = ( Y - \widetilde{D}_{\phi}\widetilde{w})^T(Y - \widetilde{D}_{\phi}\widetilde{w}) + \alpha \widetilde{w}^T\widetilde{w} $$
    $$ J = Y^TY - 2\widetilde{D}_{\phi} ^TY\widetilde{w} + \widetilde{w} ^T \widetilde{D}_{\phi} ^T \widetilde{D}_{\phi} \widetilde{w} + \alpha \widetilde{w}^T\widetilde{w} $$
    $$ \frac{dJ} {\widetilde{w}} = 0 $$
    $$ \frac{dJ} {\widetilde{w}} = 0 - 2\widetilde{D}_{\phi} ^TY + 2\widetilde{D}_{\phi} ^T \widetilde{D}_{\phi} \widetilde{w} + 2\alpha \widetilde{w}$$
    $$ 0 = 0 - 2\widetilde{D}_{\phi} ^TY + 2\widetilde{D}_{\phi} ^T \widetilde{D}_{\phi} \widetilde{w} + 2\alpha \widetilde{w}$$
    $$ \alpha \widetilde{w} = \widetilde{D}_{\phi} ^TY + \widetilde{D}_{\phi} ^T \widetilde{D}_{\phi} \widetilde{w} $$
    $$ \alpha \widetilde{w} = \widetilde{D}_{\phi}^T(Y-\widetilde{D}_{\phi}\widetilde{w})$$
    $$ \widetilde{w} = \widetilde{D}_{\phi}^T \frac{1}{\alpha} (Y- \widetilde{D}_{\phi} \widetilde{w}) $$

    $$ \widetilde{w} = \widetilde{D}_{\phi}^T \vec{c}$$
    $$ \vec{c} = \frac{1}{\alpha} (Y- \widetilde{D}_{\phi}\widetilde{D}_{\phi}^T \vec{c}) $$
    $$ \widetilde{D}_{\phi}\widetilde{D}_{\phi}^T \vec{c} + \alpha \vec{c} = Y $$
    $$ (\widetilde{D}_{\phi}\widetilde{D}_{\phi}^T + \alpha I) \vec{c} = Y $$
    $$ \vec{c} = (\widetilde{D}_{\phi}\widetilde{D}_{\phi}^T + \alpha I)^{-1}Y $$
    $$ \vec{c} = (\widetilde{K}+\alpha I)^{-1}Y $$
    $$ \hat{y} = \widetilde{K}(\widetilde{K}+\alpha I)^{-1}Y $$


### 1.6 Evaluation
- Evaluate all models using:
    - Mean Squared Error (MSE)
    - R-squared 

In [35]:
print("Mean Squared Error:")
print("MSE [Linear]: ", mean_squared_error(y_test, y_pred_L))
print("MSE [Ridge]: ", mean_squared_error(y_test, y_pred_R))
print("MSE [Lasso]: ", mean_squared_error(y_test, y_pred_L1))
print("MSE [kernel ridge linear]: ", mean_squared_error(y_test, y_pred_KL))
print("MSE [kernel ridge poly]: ", mean_squared_error(y_test, y_pred_KP))
print("MSE [kernel ridge rbf]: ", mean_squared_error(y_test, y_pred_KR))

print("\nR Squared Error:")
print("R^2 [Linear]: ", r2_score(y_test, y_pred_L))
print("R^2 [Ridge]: ", r2_score(y_test, y_pred_R))
print("R^2 [Lasso]: ", r2_score(y_test, y_pred_L1))
print("R^2 [kernel ridge linear]: ", r2_score(y_test, y_pred_KL))
print("R^2 [kernel ridge poly]: ", r2_score(y_test, y_pred_KP))
print("R^2 [kernel ridge rbf]: ", r2_score(y_test, y_pred_KR))

Mean Squared Error:
MSE [Linear]:  0.628477105371346
MSE [Ridge]:  0.6269771655910469
MSE [Lasso]:  1.2091446297984814
MSE [kernel ridge linear]:  0.9334311006256635
MSE [kernel ridge poly]:  0.8599968338022204
MSE [kernel ridge rbf]:  13.633333333333294

R Squared Error:
R^2 [Linear]:  0.829672795186094
R^2 [Ridge]:  0.8300793025162483
R^2 [Lasso]:  0.6723027406262774
R^2 [kernel ridge linear]:  0.7470254542335382
R^2 [kernel ridge poly]:  0.7669272983877579
R^2 [kernel ridge rbf]:  -2.6948482913961413


### 1.7 Discussion
- Discuss the pros and cons of Linear, Ridge, Lasso, and Kernel Regression models.
    - todo***

## Task 2: Classification Analysis (Predicting DEATH_EVENT)

### 2.1 Logistic Regression 
- Train a Logistic Regression model using clinical features to predict DEATH_EVENT
- Formulate: Derive the log-likelihood for MLE

In [36]:
Y=df.iloc[:,-2]
Y.head()

0    1
1    1
2    1
3    1
4    1
Name: DEATH_EVENT, dtype: int64

In [37]:
# Split the dataset into training (70%) and testing (30%) sets
x_train, x_test, y_train, y_test = train_test_split(data, Y, test_size=0.3)

In [38]:
from sklearn.linear_model import LogisticRegression

LR = LogisticRegression()
LR.fit(x_train, y_train) # training 
y_pred_LR = LR.predict(x_test) 
print("MSE [LogisticRegression]: ", mean_squared_error(y_test, y_pred_LR), "\n")

MSE [LogisticRegression]:  0.16666666666666666 



STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


- Formulate: Derive the log-likelihood for MLE
    $$ P(Y = 1 | x) = \frac{1} {1+ e^{-w^Tx}}$$
    $$ P(Y_i = 1 | \widetilde{x}_i) = \Pi(\widetilde{x}_i) $$
    $$ \text{Probability of output for class 1: } $$
    $$ \Pi(\widetilde{x}_i) = P(Y_i = 1 | \widetilde{x}_i) = \sigma (\widetilde{w}^T\widetilde{x}_i) $$ 
    $$ P(Y_i | \widetilde{x}_i) = \sigma(\widetilde{x}^Tx_i)^{Y_i}(1- \sigma(\widetilde{x}^Tx_i)^{1- Y_i}) $$
    $$ P(\frac{D} {\widetilde{w}}) = \prod_{i=1}^{n} P({Y_i}|{x_i}) $$
    $$ P(\frac{D} {\widetilde{w}}) =\prod_{i=1}^{n} \sigma(\widetilde{x}^Tx_i)^{Y_i}(1- \sigma(\widetilde{x}^Tx_i)^{1- Y_i})$$
    $$ \text{Log-likelihood for MLE: } $$
    $$ \max P(\frac{D} {\widetilde{w}}) = \max ln P(\frac{D} {\widetilde{w}}) $$
    $$ ln P(\frac{D} {\widetilde{w}}) = ln ( \prod_{i=1}^{n} \sigma(\widetilde{x}^Tx_i)^{Y_i}(1- \sigma(\widetilde{x}^Tx_i)^{1- Y_i}))$$
    $$ = \sum_{i=1}^{n} ln[\sigma(\widetilde{x}^Tx_i)^{Y_i}(1- \sigma(\widetilde{x}^Tx_i)^{1- Y_i})]$$
    $$ = \sum_{i=1}^{n} [ln\sigma(\widetilde{x}^Tx_i)^{Y_i}+ln(1- \sigma(\widetilde{x}^Tx_i)^{1- Y_i})]$$
    $$ = \sum_{i=1}^{n} [{Y_i}ln\sigma(\widetilde{x}^Tx_i)+({1- Y_i})ln(1- \sigma(\widetilde{x}^Tx_i))]$$

### 2.2 Classifier Comparison
- Train and compare the following classifiers:
    - Support Vector Machine (SVM)
        - Linear Kernel
        - RBF Kernel
        - Formulate: Write the SVM primal and dual problem. Derive the kernelized form.



In [39]:
from sklearn.metrics import accuracy_score, precision_score, recall_score

In [40]:
from sklearn.svm import SVC

# Linear Kernel
svm_model_l = SVC(kernel='linear')
svm_model_l.fit(x_train, y_train) # training
svm_pred_l = svm_model_l.predict(x_test) 
print(accuracy_score(y_test,svm_pred_l))

0.7444444444444445


In [41]:
#RBF Kernel
svm_model_rbf = SVC(kernel='rbf', gamma='scale')
svm_model_rbf.fit(x_train, y_train) # training
svm_pred_rbf = svm_model_rbf.predict(x_test) 
print(accuracy_score(y_test,svm_pred_rbf))

0.6222222222222222


- Formulate: Write the regression model. Derive the least squares solution. 
    - todo***

### 2.3 Evaluation
- Use the following metrics:
    - Accuracy
    - Precision
    - Recall

In [42]:
print("Accuracy:")
print("Accuracy [SVM Linear]: ", accuracy_score(y_test, svm_pred_l))
print("Accuracy [SVM RBF]: ", accuracy_score(y_test, svm_pred_rbf))

print("Precision:")
print("Precision [SVM Linear]: ", precision_score(y_test, svm_pred_l, zero_division=0))
print("Precision [SVM RBF]: ", precision_score(y_test, svm_pred_rbf, zero_division=0))

print("Recall:")
print("Recall [SVM Linear]: ", recall_score(y_test, svm_pred_l, zero_division=0))
print("Recall [SVM RBF]: ", recall_score(y_test, svm_pred_rbf, zero_division=0))


Accuracy:
Accuracy [SVM Linear]:  0.7444444444444445
Accuracy [SVM RBF]:  0.6222222222222222
Precision:
Precision [SVM Linear]:  0.8235294117647058
Precision [SVM RBF]:  0.0
Recall:
Recall [SVM Linear]:  0.4117647058823529
Recall [SVM RBF]:  0.0


### 2.4 Discussion
- Analyze the effectiveness of each classifier:
    - todo***
- Discuss which model might be most appropriate in a clinical setting:
    - todo***
