<a href="https://colab.research.google.com/github/Apoak/Deep-Learning-Projects/blob/main/Multi_Class_Linear_Classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Lab 1.3: Multi-Class Linear Classifier

In this lab you will explore multi-class classification and evaluate model generalization using a [dataset for heart disease prediction from the UCI ML repository](https://archive.ics.uci.edu/dataset/45/heart+disease).

In [None]:
!pip install ucimlrepo

This ``ucimlrepo`` package provides a nice interface for accessing their datasets.

In [None]:
from ucimlrepo import fetch_ucirepo

# fetch dataset
heart_disease = fetch_ucirepo(id=45)

# data (as pandas dataframes)
X = heart_disease.data.features
y = heart_disease.data.targets

# variable information
heart_disease.variables


Here I remove the missing values from the features and labels.

In [None]:
bad = X.isna().any(axis=1)
X = X[~bad]
y = y[~bad]

Finally I convert the DataFrames to numpy arrays.

In [None]:
X = X.values
y = y.values.flatten()

The classification target is a number from 0-4 indicating the severity of heart disease.  Let's try fitting a linear model.

In [None]:
import sklearn

In [None]:
model = sklearn.linear_model.LogisticRegression().fit(X,y)

In [None]:
model.score(X,y)

### Exercises

1. Compute the $\mathbf{z}$ values for the classifier manually, i.e. compute

$$\mathbf{W}\mathbf{X}+\mathbf{b}.$$

*Hints*:
- Use `.shape` to get the shape of a Numpy matrix.
- ``@`` is the matrix multiplication operator in Numpy
- The actual computation will be a little different from what is written above.  You will need to use a matrix transpose which is `.T` in Numpy.

In [None]:
X_shape = X.shape
W = model.coef_
W_transpose = W.T
# print(W_transpose)
b = model.intercept_
W_shape = W.shape
b_shape = b.shape
print("X shape ", X_shape)
print("W_shape", W_shape)
print("b_shape", b_shape)
z = X @ W_transpose + b
# print(product.shape)
print(z.shape)



Print out the $\mathbf{z}$ values for the first example in the dataset and the first label.   Determine if the classifier is correctly classifying the first example in the dataset.

In [None]:
print(z[0])

The classifier is correctly classifying the data.

2. Use ``sklearn.model_selection.train_test_split`` to split ``X`` and ``y`` into 90% train and 10% test splits.  Note that this should be done in a single call to ``train_test_split``.

*Note*: Pass ``random_state=42`` to ``train_test_split`` to ensure you get the same result from random shuffling each time.


In [None]:
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size= .1, train_size= .9, random_state= 42)


Fit the model to the training split and calculate accuracy on the test split.  How does it compare to the previous accuracy value (when the model was trained and evaluated on the same data)?

In [None]:

model_2 = sklearn.linear_model.LogisticRegression().fit(X_train, y_train)
model_2.score(X_test, y_test)


The accuracy is significantly higher than previously. This seems possibly incorrect because the score is so much higher.

3. Run $k$-fold cross validation with $k=5$ and interpret the results (see `sklearn.model_selection.cross_val_score`).

In [None]:
sklearn.model_selection.cross_val_score(model, X, y, cv=5)
# Output: array([0.6       , 0.6       , 0.52542373, 0.55932203, 0.59322034])

The results show that there are better secitons of the data to use for training. In these cases the classification algorithm scores higher when classifying the test data.