## Assignment 6

<br>

#### Exercise 6.1

Consider the dataset from `data_banknote_authentication.csv`.

1) Read data into a pandas dataframe.

2) Pick the column named "class" as target variable `y` and all other columns as feature variables `X`.

3) Split the data into training and testing sets with 80/20 ratio and `random_state=20`.

4) Use support vector classifier with linear kernel to fit to the training data.

5) Predict on the testing data and compute the confusion matrix and classification report.

6) Repeat steps 3 and 4 for the radial basis function kernel.

7) Compare the two SVM models in your own words.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, classification_report


df = pd.read_csv("data_banknote_authentication.csv")


X = df.drop("class", axis=1)
y = df["class"]


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=20)


linear_svm = SVC(kernel="linear")
linear_svm.fit(X_train, y_train)


y_pred_linear = linear_svm.predict(X_test)
print("Linear Kernel Results:")
print(confusion_matrix(y_test, y_pred_linear))
print(classification_report(y_test, y_pred_linear))


rbf_svm = SVC(kernel="rbf")
rbf_svm.fit(X_train, y_train)


y_pred_rbf = rbf_svm.predict(X_test)
print("RBF Kernel Results:")
print(confusion_matrix(y_test, y_pred_rbf))
print(classification_report(y_test, y_pred_rbf))


Linear Kernel Results:
[[152   2]
 [  0 121]]
              precision    recall  f1-score   support

           0       1.00      0.99      0.99       154
           1       0.98      1.00      0.99       121

    accuracy                           0.99       275
   macro avg       0.99      0.99      0.99       275
weighted avg       0.99      0.99      0.99       275

RBF Kernel Results:
[[154   0]
 [  0 121]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00       154
           1       1.00      1.00      1.00       121

    accuracy                           1.00       275
   macro avg       1.00      1.00      1.00       275
weighted avg       1.00      1.00      1.00       275



In [None]:
The RBF kernel outperformed the linear kernel by achieving perfect classification accuracy, precision, recall, and F1-score, while the linear 
kernel had a small number of misclassifications, indicating that the RBF model was better at capturing the data’s nonlinear patterns.



#### Exercise 6.2

This exercise is related to exercise 5.2 of the previous week. Consider the data from CSV file `weight-height.csv`.

1) Read data into a pandas dataframe.

2) Pick the target variable `y` as weight in kilograms, and the feature variable `X` as height in centimeters.

3) Split the data into training and testing sets with 80/20 ratio.

4) Scale the training and testing data using normalization and standardization.

4) Fit a KNN regression model with `k=5` to the training data without scaling, predict on unscaled testing data and compute the $R^2$ value.

6) Repeat step 4 for normalized data.

7) Repeat step 4 for standardize data.

8) Compare the models in terms of their $R^2$ value.

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import r2_score


df = pd.read_csv("weight-height.csv")  


X = df[['Height']]  
y = df['Weight']    


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


minmax = MinMaxScaler()
standard = StandardScaler()

X_train_norm = minmax.fit_transform(X_train)
X_test_norm = minmax.transform(X_test)

X_train_std = standard.fit_transform(X_train)
X_test_std = standard.transform(X_test)


knn_raw = KNeighborsRegressor(n_neighbors=5)
knn_raw.fit(X_train, y_train)
y_pred_raw = knn_raw.predict(X_test)
r2_raw = r2_score(y_test, y_pred_raw)


knn_norm = KNeighborsRegressor(n_neighbors=5)
knn_norm.fit(X_train_norm, y_train)
y_pred_norm = knn_norm.predict(X_test_norm)
r2_norm = r2_score(y_test, y_pred_norm)


knn_std = KNeighborsRegressor(n_neighbors=5)
knn_std.fit(X_train_std, y_train)
y_pred_std = knn_std.predict(X_test_std)
r2_std = r2_score(y_test, y_pred_std)


print("R² without scaling:", r2_raw)
print("R² with normalization:", r2_norm)
print("R² with standardization:", r2_std)


R² without scaling: 0.8346485438169171
R² with normalization: 0.8346485438169171
R² with standardization: 0.8346485438169171


In [None]:
 Conclusion: All three models gave the same R² = 0.8346, showing scaling had no effect with one feature
