## Homework 3- Feature Engineering for Classification

Goal: Train the best classifier possible for Heart Disease Prediction while trying various feature engineering techniques and learning to run experiments to find the best possible model.


### Feature Preprocessing
For each categorical features implement two feature representations:
1. OneHot Encoding: For example, transform `X_train['Sex']` with values `M or F` into two features `X_Train['Sex_M']`, `X_Train['Sex_F']` with values `0 or 1`
2. Target Encoding: For example, transform `X_train['Sex']` with values `M or F` into a feature `X_Train['Sex-TargetEncoded']` with value equal to the average rate of heart disease of `M and F` respectively.

Please implement these yourself, but you can check against sklearn.preprocessing implementations for correctness.

The set of categorical features is:
`categorical_features = ['Sex', 'ChestPainType', 'RestingECG', 'ExerciseAngina', 'ST_Slope']`


For numerical features implement two feature normalisations:
1. Standard Scaler (scale numeric values to have zero mean and unit variance). For example `X_train['Age-Scaled'] = (X_train['Age'] - mean) / standard_deviation`
2. MinMax Normalization (subtract min and divide by max - min). For example `X_train['Age-MinMax'] = (X_train['Age'] - min) / (max - min)`

Please implement these yourself but you can check against the sklearn.preprocessing implementations.

The set of numeric features is:
`numeric_features = ["Age", "RestingBP", "Cholesterol", "FastingBS", "MaxHR", "Oldpeak"]`
    
Note, any feature preprocessing parameters (like mean or standard deviation) should be calculated on training data only.

### Feature Engineering:
Create at least 5 custom features as functions of other features:

Some ideas:
  - A binary feature representing High Heart Rate `Custom-BinaryHighMaxHR`
  - A bucketized categorical feature of Oldpeak or of MaxHR called `Oldpeak`
  - A feature cross of a bucketized version of OldPeak and MaxHR for example `HighOldPeak_X_LowMaxHR`
  
### Feature Experiments:
Please run at least this set of experiments, but try others as you see fit.
1. All features with no scaling on numeric features and OneHot for categorical. No custom features.
2. All features using StandardScaler for numeric and OneHot for categorical. No custom features.
3. All features using StandardScaler for numeric and TargetEncoding for categorical. No custom features.
4. All features using MinMax-Normalization for numeric and OneHot for categorical. No custom features.
5. The kitchen sink: Include everything to try to get the best performance possible.
6. Only custom features. How good can you get with your own custom feature set?
7. Only categorical features using one of the encodings.
8. Only numerical features using one of the encodings.


### Model Training
Model training Data:
Separate data into a training set with 80% of the samples and a test set with 20%. 
`X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)`. Note that ideally we would also have a validation set, but since this dataset is fairly small, we will just do train/test splits. Please use the same random state as above so we have comparable performance across the class.

### Models
Please run all experiments using at least 3 different models
1. Logistic regression using `sklearn.linear_model.LogisticRegression`. Extra credit of +2 if you use your own gradient descent implementation from Homework 2.
2. Decision Trees: using `sklearn.tree.DecisionTreeClassifier`. Hyperparameter here is the depth of the tree.
3. Neural Net classifier using `sklearn.neural_network.MLPClassifier`. Hyperparameters here include the shape of the network and the learning rate. Consider using the 'adam' solver as constructor argument, 'relu' activation fuctions (which is the default), and try with a shape of network, but start small with a network of size `(4,1)`, that is 4 hidden units in one layer or `(3,2)` which includes 2 hidden layers of 3 units each.
4. At least 1 other classifier. Some ideas include SVM classifiers or Gradient Boosted Decision trees which are all available in sklearn as well


### Question 1: Run Experiments (15 points)

We have many varying parameters. Often in ML we run multiple experiments to find the best result.
- at least 8 different feature setups, maybe more
- at least 4 different model algorithms
- some number of hyperparemeters, let's say we try at least 10 for each model

This gives us a minimum of 320 different experiments. To easily keep track of these, let's create a spreadsheet to track the experiment results. 

Create a .csv file with the following columns tracking your experiments. Make sure this .csv can be loaded into Google sheets so I can grade it. Upload the .csv separately.

1. Experiment Name string. For example `Experiment1-LogReg-OneHot-Custom`. This name be anything.
2. List of Feature Included in Model separated by semicolons. Please name these according to this scheme: `<FeatureName>-<Preprocessing>-<Value>` for example `MaxHR-StandardScaler` or `RestingECG-OneHot-Normal`. For custom features name them `Custom-<Name>` for example: `Custom-BucketizedOldpeak`. As an example, a row in this columns might look like `MaxHR; MaxHR-StandardScaler; Custom-BinaryHighHeartRate; ...`. If your features are in your dataframe, you can create this list with `';'.join(X_train.columns)`.
3. Model Type (MyLogisticRegression or DecisionTree or anything else you want to try)
4. Epochs (for logistic regression)
5. Hyperparameters used in your model, for example depth of tree or size of neural network
6. Training Accuracy (w/ threshold of 0.5)
7. Testing Accuracy (w/ threshold of 0.5)
8. Training PR-AUC (Area under the precision/recall curve)
9. Testing PR-AUC (Area under the precision/recall curve)
10. Training Precision (w/ threshold of 0.5)
11. Testing Precision (w/ threshold of 0.5)
12. Training Recall (w/ threshold of 0.5)
13. Testing Recall (w/ threshold of 0.5)

Extra Credit (2 points): Use Weights And Biases (https://wandb.ai/) to track your experiments as an alternative to a spreadsheet (https://towardsdatascience.com/introduction-to-weight-biases-track-and-visualize-your-machine-learning-experiments-in-3-lines-9c9553b0f99d)


### Question 2: Analyze Logistic Regression Experiments (4 points)

- Question 2.1: What is the best set of features and parameters for Logistic Regression in terms of Test PR-AUC. Is this the same as the best in terms of accuracy? Discuss why you think this experiment showed the best results.
- Question 2.2: If features are well normalized, we can get a sense of feature importance by looking at the absolute value of the weights of each feature. Print out the highest 5 weights by absolute value. Which features are these? Discuss what this set of 5 features has on the model.
- Question 2.3: Train a model with just those top 5 features. What is the Test Accuracy and PR-AUC?
- Question 2.4: Describe the custom features you added. Why did you pick these?

### Question 3: Analyze Other Experiments (4 points)

- Question 3.1: What is the best set of features and parameters for Decision Tree classifier in terms of Test PR-AUC.  Discuss why you think this experiment showed the best results.
- Question 3.2: What is the best set of features and parameters for Neural Net classifier in terms of Test PR-AUC.  Discuss why you think this experiment showed the best results.
- Question 3.3: What is the best set of features and parameters for another chosen model in terms of Test PR-AUC.  Discuss why you think this experiment showed the best results.

### Question 4: Achieving particular peformance charactaristics (2 points)
- Question 4.1: From your trained models, can you produce a classifier that has around 90% recall on the test set? How? What is the precision of this model?
- Question 4.2: From your trained models, can you produce a classifier that has around 90% precision on the test set? How? What is the recall of this model?

### Question 5: Taking a step back (2 points)
- Question 5.1: What is your best performing model over all experiments? Describe why you think this was the best? What is the PR-AUC and Accuracy of this model? Feel free to share your results on discord, the best model in the class will receive extra credit of 3 points!
- Question 5.2: Discuss your thoughts on this process? What surprised you? What was hard or tedious? What did you learn?


In [135]:
# Imports
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.pyplot import figure
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, precision_recall_curve, auc

In [136]:
# Load data from our csv
df = pd.read_csv('../../Data/heart.csv')

# Select out Binary and Categorical Features
numeric_features = ["Age", "RestingBP", "Cholesterol", "FastingBS", "MaxHR", "Oldpeak"]
categorical_features = ['Sex', 'ChestPainType', 'RestingECG', 'ExerciseAngina', 'ST_Slope']
label_feature = "HeartDisease"
X = df[numeric_features + categorical_features]
y = df[label_feature]

# Create Train / Test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

# Keeping track of custom features list
custom_features = []

In [137]:
# Examples of Pandas operations you can use:
# Calculate mean of a column:
print(X_train['Age'].mean())
# Calculate standard deviation of a column:
print(X_train['Age'].std())

# Example of creating a custom new feature column, in this case a binary feature indicating age > 50
X_train_old = X_train.copy()
X_test_old = X_test.copy()

53.65122615803815
9.364289800106517


In [138]:
# Feature Preprocessing
# 1. OneHot Encoding: For example, transform X_train['Sex'] with values M or F into two features X_Train['Sex_M'], X_Train['Sex_F'] with values 0 or 1
def oneHotEncoder(features, categories):
    feature_list = []
    for i in categories:
        for j in features[i].unique():
            features[i + "_" + j] = (X[i] == j).astype(int)
            feature_list.append(i + "_" + j)
    return feature_list

# Outputs all the categorical columns with oneHot Encoding
OneHot_Encoding_Feature_List = oneHotEncoder(X_train, categorical_features) # X_train has been fitted with new columns
oneHotEncoder(X_test, categorical_features) # X_test has been fitted with new columns
X_train[OneHot_Encoding_Feature_List] # Showing the new fitted columns in X_train

Unnamed: 0,Sex_M,Sex_F,ChestPainType_NAP,ChestPainType_ASY,ChestPainType_TA,ChestPainType_ATA,RestingECG_Normal,RestingECG_LVH,RestingECG_ST,ExerciseAngina_N,ExerciseAngina_Y,ST_Slope_Down,ST_Slope_Up,ST_Slope_Flat
795,1,0,1,0,0,0,1,0,0,1,0,1,0,0
25,1,0,1,0,0,0,1,0,0,1,0,0,1,0
84,1,0,0,1,0,0,1,0,0,0,1,0,0,1
10,0,1,1,0,0,0,1,0,0,1,0,0,1,0
344,1,0,0,1,0,0,1,0,0,1,0,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
106,0,1,0,1,0,0,0,0,1,1,0,0,1,0
270,1,0,0,1,0,0,1,0,0,1,0,0,1,0
860,1,0,0,1,0,0,1,0,0,0,1,0,1,0
435,1,0,0,1,0,0,0,0,1,0,1,0,1,0


In [139]:
# Feature Preprocessing
# 2. Target Encoding: For example, transform X_train['Sex'] with values M or F into a feature X_Train['Sex-TargetEncoded'] with value equal to the average rate of heart disease of M and F respectively.
def targetEncoder(features, label, categories):
    feature_list = []
    for i in categories:
        features[i+"-TargetEncoded"] = features[i].map(pd.concat([features, label], axis=1).groupby(i)[label_feature].mean())
        feature_list.append(i+"-TargetEncoded")
    return feature_list

targetEncoded_feature_List = targetEncoder(X_train, y_train, categorical_features) # X_train has been fitted with new columns
targetEncoder(X_test, y_test, categorical_features)  # X_test has been fitted with new columns
X_train[targetEncoded_feature_List] # Showing the new fitted columns in X_train

Unnamed: 0,Sex-TargetEncoded,ChestPainType-TargetEncoded,RestingECG-TargetEncoded,ExerciseAngina-TargetEncoded,ST_Slope-TargetEncoded
795,0.632042,0.364198,0.509009,0.327146,0.770833
25,0.632042,0.364198,0.509009,0.327146,0.175896
84,0.632042,0.770574,0.509009,0.858086,0.817942
10,0.253012,0.364198,0.509009,0.327146,0.175896
344,0.632042,0.770574,0.509009,0.327146,0.817942
...,...,...,...,...,...
106,0.253012,0.770574,0.645390,0.327146,0.175896
270,0.632042,0.770574,0.509009,0.327146,0.175896
860,0.632042,0.770574,0.509009,0.858086,0.175896
435,0.632042,0.770574,0.645390,0.858086,0.175896


In [140]:
# Feature Preprocessing
# 1. Standard Scaler (scale numeric values to have zero mean and unit variance). For example X_train['Age-Scaled'] = (X_train['Age'] - mean) / standard_deviation
def standardScaler(features, categories):
    features_list = []
    for i in categories:
        features[i+"-Scaled"] = (features[i] - features[i].mean()) / features[i].std()
        features_list.append(i+"-Scaled")
    return features_list

standardScaler_Feature_List = standardScaler(X_train, numeric_features) # X_train has been fitted with new columns
standardScaler(X_test, numeric_features) # X_test has been fitted with new columns
X_train[standardScaler_Feature_List] # Showing the new fitted columns in X_train

Unnamed: 0,Age-Scaled,RestingBP-Scaled,Cholesterol-Scaled,FastingBS-Scaled,MaxHR-Scaled,Oldpeak-Scaled
795,-1.244219,-0.708502,0.372549,1.841354,2.282796,-0.096995
25,-1.884951,-0.166172,0.086087,-0.542339,1.651116,-0.835717
84,0.250822,0.918489,0.123050,1.841354,-0.441327,0.087685
10,-1.778162,-0.166172,0.104569,-0.542339,0.229834,-0.835717
344,-0.283121,-0.708502,-1.845220,1.841354,-1.270407,-0.835717
...,...,...,...,...,...,...
106,-0.603487,-0.708502,0.501919,-0.542339,-1.033527,-0.835717
270,-0.923853,-0.708502,0.233938,-0.542339,0.150874,-0.835717
860,0.677977,-0.166172,0.492678,-0.542339,0.308794,0.457046
435,0.677977,1.026955,-1.845220,-0.542339,-0.717687,-0.835717


In [141]:
# Feature Preprocessing
# 2. MinMax Normalization (subtract min and divide by max - min). For example X_train['Age-MinMax'] = (X_train['Age'] - min) / (max - min)
def minMax(features, categories):
    features_list = []
    for i in categories:
        features[i+"-MinMax"] = (features[i] - features[i].min()) / (features[i].max() - features[i].min())
        features_list.append(i+"-MinMax")
    return features_list

minMax_feature_list = minMax(X_train, numeric_features) # X_train has been fitted with new columns
minMax(X_test, numeric_features) # X_test has been fitted with new columns
X_train[minMax_feature_list] # Showing the new fitted columns in X_train


Unnamed: 0,Age-MinMax,RestingBP-MinMax,Cholesterol-MinMax,FastingBS-MinMax,MaxHR-MinMax,Oldpeak-MinMax
795,0.270833,0.60,0.398010,1.0,0.943662,0.386364
25,0.145833,0.65,0.346600,0.0,0.830986,0.295455
84,0.562500,0.75,0.353234,1.0,0.457746,0.409091
10,0.166667,0.65,0.349917,0.0,0.577465,0.295455
344,0.458333,0.60,0.000000,1.0,0.309859,0.295455
...,...,...,...,...,...,...
106,0.395833,0.60,0.421227,0.0,0.352113,0.295455
270,0.333333,0.60,0.373134,0.0,0.563380,0.295455
860,0.645833,0.65,0.419569,0.0,0.591549,0.454545
435,0.645833,0.76,0.000000,0.0,0.408451,0.295455


In [142]:
# Feature Engineering --> 1st custom feature
# A binary feature representing High Heart Rate BinaryHighMaxHR
def custom_BinaryFeature(features, old_feature, new_feature):
    feature_mean = features[0][old_feature].mean()
    features[0][new_feature] = (features[0][old_feature] >= feature_mean).apply(int)
    features[1][new_feature] = (features[1][old_feature] >= feature_mean).apply(int)
    return new_feature

custom_features.append(custom_BinaryFeature((X_train, X_test), "MaxHR", "BinaryHighMaxHR")) 
X_train[["MaxHR", "BinaryHighMaxHR"]].head(10)  # Showing the new fitted columns in X_train

Unnamed: 0,MaxHR,BinaryHighMaxHR
795,194,1
25,178,1
84,125,0
10,142,1
344,104,0
254,96,0
398,122,0
244,103,0
621,142,1
118,185,1


In [143]:
# Feature Engineering --> 2nd custom feature 
# A bucketized categorical feature of Oldpeak called bucketized_Oldpeak
def bucketized_feature(features, feature, intervals):
    for f in features:
        f["Bucket-Low" + feature] = (f[feature] <= intervals[0]).apply(int)
        f["Bucket-Mid" + feature] = ((f[feature] <  intervals[1]) & (f[feature] >  intervals[0])).apply(int)
        f["Bucket-High" + feature] = (f[feature] >= intervals[1]).apply(int)
    return ["Bucket-Low" + feature, "Bucket-Mid" + feature, "Bucket-High" + feature]


print("Max", X_train["Oldpeak"].max(), "\tMean:", X_train["Oldpeak"].mean(), "\tMin", X_train["Oldpeak"].min(), "\tStd", X_train["Oldpeak"].std())
bucketized_Oldpeak_list = bucketized_feature((X_train, X_test), "Oldpeak", (0, 2))
for i in bucketized_Oldpeak_list: custom_features.append(i)
X_train[bucketized_Oldpeak_list + ["Oldpeak"]].head(10)

Max 6.2 	Mean: 0.9050408719346048 	Min -2.6 	Std 1.0829519487461572


Unnamed: 0,Bucket-LowOldpeak,Bucket-MidOldpeak,Bucket-HighOldpeak,Oldpeak
795,0,1,0,0.8
25,1,0,0,0.0
84,0,1,0,1.0
10,1,0,0,0.0
344,1,0,0,0.0
254,0,0,1,2.0
398,0,1,0,1.0
244,0,1,0,1.0
621,0,1,0,0.6
118,1,0,0,0.0


In [144]:
# Feature Engineering --> 3rd custom feature
# A bucketized categorical feature of MaxHR called bucketized_MaxHR
print("Max", X_train["MaxHR"].max(), "\tMean:", X_train["MaxHR"].mean(), "\tMin", X_train["MaxHR"].min(), "\tStd", X_train["MaxHR"].std())
bucketized_MaxHR_list = bucketized_feature((X_train, X_test), "MaxHR", (110, 160))
for i in bucketized_MaxHR_list: custom_features.append(i)
X_train[bucketized_MaxHR_list + ["MaxHR"]].head(10)


Max 202 	Mean: 136.17847411444143 	Min 60 	Std 25.329253934908074


Unnamed: 0,Bucket-LowMaxHR,Bucket-MidMaxHR,Bucket-HighMaxHR,MaxHR
795,0,0,1,194
25,0,0,1,178
84,0,1,0,125
10,0,1,0,142
344,1,0,0,104
254,1,0,0,96
398,0,1,0,122
244,1,0,0,103
621,0,1,0,142
118,0,0,1,185


In [145]:
# Feature Engineering --> 4th custom feature
# A feature cross of a bucketized version of OldPeak and MaxHR for example HighOldpeak_X_LowMaxHR
def cross_feature(features, feature_one, feature_two, new_feature):
    custom_feature = []
    for f in features:
        f[new_feature] = ((f[feature_one] == 1) & (f[feature_two] == 1)).astype(int)
    return new_feature

custom_features.append(cross_feature((X_train, X_test), "Bucket-HighOldpeak", "Bucket-LowMaxHR", "HighOldpeak_X_LowMaxHR"))
custom_features.append(cross_feature((X_train, X_test), "Bucket-MidOldpeak", "Bucket-MidMaxHR", "midOldpeak_X_MidMaxHR"))
custom_features.append(cross_feature((X_train, X_test), "Bucket-LowOldpeak", "Bucket-HighMaxHR", "lowOldpeak_X_HighMaxHR"))
X_train[["Oldpeak", "MaxHR"] + custom_features[len(custom_features)-4:len(custom_features)-1]].head(10)

Unnamed: 0,Oldpeak,MaxHR,Bucket-HighMaxHR,HighOldpeak_X_LowMaxHR,midOldpeak_X_MidMaxHR
795,0.8,194,1,0,0
25,0.0,178,1,0,0
84,1.0,125,0,0,1
10,0.0,142,0,0,0
344,0.0,104,0,0,0
254,2.0,96,0,1,0
398,1.0,122,0,0,1
244,1.0,103,0,0,0
621,0.6,142,0,0,1
118,0.0,185,1,0,0


In [146]:
# Feature Engineering --> 5th custom feature
# A feature cross of a bucketized version of OldPeak and MaxHR for example HighOldpeak_X_LowMaxHR
print("Max", X_train["Age"].max(), "\tMean:", X_train["Age"].mean(), "\tMin", X_train["Age"].min(), "\tStd", X_train["Age"].std())
bucketized_Age_list = bucketized_feature((X_train, X_test), "Age", (45, 60))
for i in bucketized_Age_list: custom_features.append(i)
X_train[["Age"] + bucketized_Age_list].head(10)

Max 77 	Mean: 53.65122615803815 	Min 29 	Std 9.364289800106517


Unnamed: 0,Age,Bucket-LowAge,Bucket-MidAge,Bucket-HighAge
795,42,1,0,0
25,36,1,0,0
84,56,0,1,0
10,37,1,0,0
344,51,0,1,0
254,55,0,1,0
398,52,0,1,0
244,48,0,1,0
621,56,0,1,0
118,35,1,0,0


In [147]:
# Feature Experiments: All features with no scaling on numeric features and OneHot for categorical. No custom features.
# Experiment 1 - All features with no scaling on numeric features and OneHot for categorical. No custom features.
exp1_X_train = X_train[numeric_features + OneHot_Encoding_Feature_List]
exp1_X_test = X_test[numeric_features + OneHot_Encoding_Feature_List]

# Experiment 2 - All features using StandardScaler for numeric and OneHot for categorical. No custom features.
exp2_X_train = X_train[OneHot_Encoding_Feature_List + standardScaler_Feature_List]
exp2_X_test = X_test[OneHot_Encoding_Feature_List + standardScaler_Feature_List]

# Experiment 3 - All features using StandardScaler for numeric and TargetEncoding for categorical. No custom features.
exp3_X_train = X_train[standardScaler_Feature_List +  targetEncoded_feature_List]
exp3_X_test = X_test[standardScaler_Feature_List + targetEncoded_feature_List]

# Experiment 4 - All features using MinMax-Normalization for numeric and OneHot for categorical. No custom features.
exp4_X_train = X_train[minMax_feature_list +  OneHot_Encoding_Feature_List]
exp4_X_test = X_test[minMax_feature_list + OneHot_Encoding_Feature_List]

# Experiment 5 - The kitchen sink: Include everything to try to get the best performance possible.
exp5_X_train = X_train[numeric_features + standardScaler_Feature_List +  minMax_feature_list + OneHot_Encoding_Feature_List + targetEncoded_feature_List]
exp5_X_test = X_test[numeric_features + standardScaler_Feature_List +  minMax_feature_list + OneHot_Encoding_Feature_List + targetEncoded_feature_List]

# Experiment 6 - Only custom features. How good can you get with your own custom feature set?
exp6_X_train = X_train[custom_features]
exp6_X_test = X_test[custom_features]

# Experiment 7 - Only categorical features using one of the encodings.
exp7_X_train = X_train[OneHot_Encoding_Feature_List]
exp7_X_test = X_test[OneHot_Encoding_Feature_List]

# Experiment 8 - Only numerical features using one of the encodings.
exp8_X_train = X_train[standardScaler_Feature_List]
exp8_X_test = X_test[standardScaler_Feature_List]

# Experiment 9 -  Only categorical features using one of the encodings.
exp9_X_train = X_train[targetEncoded_feature_List]
exp9_X_test = X_test[targetEncoded_feature_List]

# Experiment 10 - Only numerical features using one of the encodings.
exp10_X_train = X_train[minMax_feature_list]
exp10_X_test = X_test[minMax_feature_list]

experiments_X_train = [exp1_X_train, exp2_X_train, exp3_X_train, exp4_X_train, exp5_X_train, exp6_X_train, exp7_X_train, exp8_X_train, exp9_X_train, exp10_X_train]
experiments_X_test = [exp1_X_test, exp2_X_test, exp3_X_test, exp4_X_test, exp5_X_test, exp6_X_test, exp7_X_test, exp8_X_test, exp9_X_test, exp10_X_test]

In [148]:
# # Code to implement linear regression prediction and mean squared error.
# def linear_regression_predict(x: np.array, w: np.array) -> float:
#     return w[0] + x.dot(w[1:])

# def mean_squared_error(y_true, y_predicted) -> float:
#   n = len(y)
#   error = 0
#   for truth, prediction in zip(y_true, y_predicted):
#     absolute_error = truth - prediction
#     error += absolute_error * absolute_error
#   return error/n

# # Solution to 1.4.1
# def gradient_of_mean_squared_error(X, y, w):
#     """Calculate the gradient of MSE loss with respect to weights w
    
#     Args:
#       X: N x d matrix of X values (in this case d=1, we only have 1 feature)
#       y: N x 1 vector of targets
#       w: weight vector of length d+1 (in this case d=1)
      
#     Return:
#       d x 1 gradient vector - consisting of gradient of MSE loss. gradient[j] is
#           the partial derivative of the MSE with respect to weight j averaged
#           over all the samples
#     """
#     n = X.shape[0]
#     k = X.shape[1]    
#     predicted_y = linear_regression_predict(X, w)
#     # Implement this:
#     gradient = [0, 0]
#     for i in range(n):
#         gradient[0] += predicted_y[i] - y[i]
#         gradient[1] += (predicted_y[i] - y[i]) * X[i]
#     gradient[0] = (gradient[0] * 2 )/ n
#     gradient[1] = (gradient[1] * 2 )/ n
#     assert(len(gradient) == 2)
#     return gradient
# mean_gradient = gradient_of_mean_squared_error(X_train, y_train, [0, 0])
# print(f"Solution to 1.4.1: {mean_gradient}")

# # Questioin to 1.4.2
# def batch_gradient_descent_epoch(X, y, w, alpha):
#     """Returns new weights after one step of batch gradient descent
#     across the entire dataset.
    
#     Args:
#       X: N x d matrix (in this case d=1, we only have 1 feature)
#       y: N x 1 vector
#       w: weight vector of length d+1 (in this case d=1)
#       alpha: Floating point learning rate.
      
#     Return:
#       updated weights as a length-2 vector after 1 step of gradient descent.
#     """
#     n = X.shape[0]
#     d = X.shape[1]
    
#     # Evaluate the error gradient across the entire dataset. NOTE: there are various
#     # versions of gradient descent that work differently here. In this case we are running
#     # what is called 'Batch' gradient descen that updates based off the entire training
#     # dataset on each epoch. Other techniques are mini-batch which break the dataset into
#     # small chunks and stochastic gradient descent which uses only one sample at a time.
#     error_gradient = gradient_of_mean_squared_error(X, y, w)
#     assert(len(error_gradient) == d+1)
#     return [(w[0] - (alpha * error_gradient[0])), (w[1] - (alpha * error_gradient[1]))]
# update = batch_gradient_descent_epoch(X_train, y_train, [0, 0], 1)
# print(f"Solution to 1.4.2: Single weight update: {update}")

In [149]:
# Models + extra
def results(model, features_train, labels_train, features_test, labels_test):
    training_accuracy = accuracy_score(y_train, model.predict(exp1_X_train))
    testing_accuracy = accuracy_score(y_test, model.predict(exp1_X_test))
    print(f"Training Accuracy: {training_accuracy}")
    print(f"Testing Accuracy: {testing_accuracy}")
    training_precisions, training_recalls, training_thresholds = precision_recall_curve(y_train, model.predict(exp1_X_train))
    training_pr_auc = auc( training_recalls,  training_precisions)
    print(f"Training PR_AUC: {training_pr_auc}")
    testing_precisions, testing_recalls, testing_thresholds = precision_recall_curve(y_test, model.predict(exp1_X_test))
    testing_pr_auc = auc(testing_recalls, testing_precisions)
    print(f"Testing PR_AUC: {testing_pr_auc}")
    print(f"Training Precision: {training_precisions}")
    print(f"Testing Precision: {testing_precisions}")
    print(f"Training Recall: {training_recalls}")
    print(f"Testing Recall: {testing_recalls}\n")

def LogisticReg():
    logr = LogisticRegression(max_iter=9999).fit(exp1_X_train, y_train)
    print(f"Training Accuracy: {accuracy_score(y_train, logr.predict(exp1_X_train))}")
    print(f"Testing Accuracy: {accuracy_score(y_test, logr.predict(exp1_X_test))}")

def experiment(features_train, labels_train, features_test, labels_test):
    print("Logistic Regression")
    logr = LogisticRegression(max_iter=9999).fit(exp1_X_train, y_train)
    results(logr, features_train, labels_train, features_test, labels_test)

    print("Decision Tree Classifier")
    dtree = DecisionTreeClassifier().fit(exp1_X_train, y_train)
    results(dtree, features_train, labels_train, features_test, labels_test)

    print("Neural Networks - Multi Perceptron Classifier")
    mlp = MLPClassifier(solver="adam", max_iter=9999).fit(exp1_X_train, y_train)
    results(mlp, features_train, labels_train, features_test, labels_test)

    print("Gradient Boost Classifier")
    gboost = GradientBoostingClassifier().fit(exp1_X_train, y_train)
    results(gboost, features_train, labels_train, features_test, labels_test)

In [150]:
for i, j, k in zip(experiments_X_train, experiments_X_test, range(1, len(experiments_X_train)+1)):
    print(f"Experiment {k}:")
    experiment(i, y_train, j, y_test)

Experiment 1:
Logistic Regression
Training Accuracy: 0.8692098092643051
Testing Accuracy: 0.8532608695652174
Training PR_AUC: 0.9111987073889954
Testing PR_AUC: 0.9167563998374644
Training Precision: [0.54632153 0.87104623 1.        ]
Testing Precision: [0.58152174 0.9        1.        ]
Training Recall: [1.         0.89276808 0.        ]
Testing Recall: [1.        0.8411215 0.       ]

Decision Tree Classifier
Training Accuracy: 1.0
Testing Accuracy: 0.7989130434782609
Training PR_AUC: 1.0
Testing PR_AUC: 0.8853595252607342
Training Precision: [1. 1.]
Testing Precision: [0.58152174 0.86458333 1.        ]
Training Recall: [1. 0.]
Testing Recall: [1.         0.77570093 0.        ]

Neural Networks - Multi Perceptron Classifier
Training Accuracy: 0.8773841961852861
Testing Accuracy: 0.8695652173913043
Training PR_AUC: 0.9139824248795466
Testing PR_AUC: 0.9221974710241675
Training Precision: [0.54632153 0.86416862 1.        ]
Testing Precision: [0.58152174 0.8952381  1.        ]
Training 

In [151]:
import wandb
wandb.init(project="hw03")
wandb.config = {
  "learning_rate": 0.001,
  "epochs": 100,
  "batch_size": 128
}
wandb.log(wandb.config)




0,1
batch_size,▁
epochs,▁
learning_rate,▁

0,1
batch_size,128.0
epochs,100.0
learning_rate,0.001
