## Prepare python environment


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

%matplotlib inline

In [2]:
random_state=5 # use this to control randomness across runs e.g., dataset partitioning

## Preparing the Statslog (Heart) Dataset (2 points)
We will use heart dataset from UCI machine learning repository. Details of this data can be found [here](https://archive.ics.uci.edu/ml/datasets/statlog+(heart)). 
The dataset contains the following features with their corresponding feature types:
1. age in years (real)
2. sex (binary; 1=male/0=female)
3. cp: chest pain type (categorical)
4. trestbps: resting blood pressure (in mm Hg on admission to the hospital) (real)
5. chol: serum cholestorol in mg/dl (real)
6. fbs: (fasting blood sugar > 120 mg/dl) (binary; 1=true/0=false)
7. restecg: resting electrocardiographic results (categorical)
8. thalach: maximum heart rate achieved (real)
9. exang: exercise induced angina (1 = yes; 0 = no) (binary)
10. oldpeak: ST depression induced by exercise relative to rest (real)
11. slope: the slope of the peak exercise ST segment (ordinal)
12. ca: number of major vessels colored by flourosopy 
(categorical)
13. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect. (categorical)

The objective is to determine whether a person has heart disease or not based on these features.

### Loading the dataset

In [None]:
# Download and load the dataset
import os
if not os.path.exists('heart.csv'): 
    !wget https://raw.githubusercontent.com/JHA-Lab/ece364/main/dataset/heart.csv 
df = pd.read_csv('heart.csv')

# Display the first five instances in the dataset
df.head()

### Check the data type for each column

In [None]:
df.info()

#### There are a total of 303 entries in this dataset. First 13 columns are features and the last column indicates whether the person has heart disease or not.

#### Look at some statistics of the data using the `describe` function in pandas.

In [None]:
df.describe()

1. Count tells us the number of Non-empty rows in a feature.

2. Mean tells us the mean value of that feature.

3. Std tells us the Standard Deviation Value of that feature.

4. Min tells us the minimum value of that feature.

5. 25%, 50%, and 75% are the percentile/quartile of each features.

6. Max tells us the maximum value of that feature.

#### Look at distribution of some features across the population. See [here](https://seaborn.pydata.org/generated/seaborn.distplot.html) for details. These have been done for you.

In [None]:
sns.histplot(df['thalach'],bins=30,color='red',stat="density",kde=True)

In [None]:
sns.histplot(df['chol'],bins=30,color='green',stat='density',kde=True)

In [None]:
sns.histplot(df['trestbps'],bins=30,color='blue',stat='density',kde=True)

#### Plot histogram of heart disease with age. This has been done for you.

In [None]:
plt.figure(figsize=(15,6))
sns.countplot(x='age',data = df, hue = 'target',palette='coolwarm_r')
plt.show()

### Extract target and descriptive features (0.5 points)

In [None]:
# Store all the features from the data in X
X= # TODO
# Store all the labels in y
y= # TODO

In [None]:
# Convert data to numpy array
X = # TODO
y = # TODO

### Create training and test datasets (0.5 points)

Split the data into training and test sets using `train_test_split`.  See [here](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) for details. To get consistent result while splitting, set `random_state` to the value defined earlier. Use 80% of the data for training and 20% of the data for testing. 

In [None]:
X_train,X_test,y_train,y_test = # TODO # 80% training and 20% validation

### Preprocess the dataset (1 points)

#### by normalizing each feature to have zero mean and unit standard deviation. This can be done using `StandardScaler()` function. See [here](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) for more details.

In [None]:
# Define the scaler for scaling the data
scaler = # TODO 

# Normalize the training data
X_train = # TODO

# Use the scaler defined above to standardize the validation data by applying the same transformation to the validation data.
X_test = # TODO


## Training a Multi-Layer Perceptron (18 points)


#### We will use `sklearn's` neural network library to train a multi-layer perceptron for classification. The model is trained to optimize the cross-entropy loss using Stochastic Gradient Descent. Review ch.8 and see [here](https://scikit-learn.org/stable/modules/neural_networks_supervised.html) for more details. 


#### NOTE: Training each network takes several seconds to minutes.

In [None]:
from sklearn.neural_network import MLPClassifier 
from sklearn.metrics import accuracy_score 
import matplotlib.pyplot as plt

In [None]:
"""
For info on the arguments and attributes, see here: 
(https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier)
"""

def get_mlp(hidden_layer_sizes=(100,),
            activation='relu',
            learning_rate_init=0.1,
            early_stopping=False, 
            validation_fraction=0.15):
  
  # use stochastic gradient descent
  parameters={'solver':'sgd',
              'alpha': 0,
              'momentum': 0,
              'max_iter':20000,
              'n_iter_no_change':100,
              'tol': 1e-5,
              'random_state': random_state
              }

  parameters['hidden_layer_sizes']=hidden_layer_sizes
  parameters['activation']=activation
  parameters['learning_rate_init']=learning_rate_init
  parameters['early_stopping']=early_stopping
  parameters['validation_fraction']=validation_fraction 

  return MLPClassifier(**parameters)

### Exercise 1: Warm up (2 points)

#### Use `get_mlp` defined above to create a Multi-layer perceptron with 1 hidden layer consisting of 100 units and train the classifier on the training dataset. Keep all other parameters at their default values.
 

In [None]:
# TODO


#### Visualize the evolution of the training loss. Hint: use `loss_curve_` attribute of the classifier.





In [None]:
# TODO

#### Report the classifier's accuracies over the training and validation datasets. Hint: use `accuracy_score`

In [None]:
# TODO

#### Explain any performance difference observed between the training and test datasets.

TODO

#### We will next explore several strategies to improve the model's test performance. 

### Exercise 2: Width vs Depth (12 points)

#### Exercise 2a (4 points)

#### Next, we will experiment with the width of the hidden layer, defined by the number of units in the hidden layer. 

#### Do this by using `get_mlp` to create a Multi-layer perceptron with 1 hidden layer. Vary the number of hidden units among 1, 15, 25, 50, by setting `hidden_layer_sizes`. Keep all other parameters at their default values.

#### Fit each classifier on the training dataset and report its training and test accuracies.
 

  

In [None]:
# TODO

#### Provide a possible explanation for any effect observed upon increasing the number of hidden units on classifier performance.

TODO

#### Exercise 2b (4 points)

#### Next, we will experiment with the depth of the MLP, by varying the number of hidden layers. 

#### Do this by using `get_mlp` to create a Multi-layer perceptron with 15 units per hidden layer. Vary the number of hidden layers from 1 through 4, by setting `hidden_layer_sizes`. Keep all other parameters at their default values.

#### Fit each classifier on the training dataset and report its training and test accuracies.


In [None]:
# TODO

#### Provide a possible explanation for any change in performance upon increasing the model depth. 

TODO

#### Exercise 2c (4 points)

#### Next, we'll explore the role of the hidden activation function when training a deeper network.

#### Do this by using `get_mlp` to create a Multi-layer perceptron with 5 hidden layers, each with 15 hidden units. Vary the activation functions among identity, logistic, tanh, relu. Keep all other parameters at their default values.

#### Fit each classifier on the training dataset and report its training accuracy.

#### Also, plot the training loss curves for each classifier on a single plot. 


In [None]:
# TODO

#### Explain any effect observed on the traininig loss trajectories and accuracies when varying the hidden activation function.

TO DO

### Exercise 3: Early stopping (4 points)

#### As we've seen from the above exercises, neural networks are prone to overfitting. To mitigate this, we can use a regularization method called early stopping. In early stopping, one monitors the performance of the model on a validation dataset (separate from the training and test datasets) throughout training. Then, the model with the lowest loss on the validation dataset, typically found in the earlier iterations of training, is selected, rather than the model with the lowest training loss. 




#### Do this by calling `get_mlp` and setting `early_stopping=True`, `validation_fraction=0.3`. Keep all other parameters at their default values. This will create a classifier that automatically splits the original training set into nonoverlapping training and validation splits, where the validation split is 30% of the original training set.    

#### Compare this classifier against the same model trained without early stopping.

#### Fit each classifier on the training dataset and report its training and test accuracies.

#### Also, plot the training loss and validation loss curves for the classifier trained with early stoppping. Hint: use the validation_scores_ (analogous to loss_curve_) to plot the validation loss.

In [None]:
# TODO

#### Explain the plot and any change in the train and test performance compared to the classifier trained without early stopping.

TODO