<a href="https://colab.research.google.com/github/Alonment/CSCI4962-Projects-In-ML-AI/blob/main/CSCI4692_HW2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Homework 2: Ensemble Learning

**Task 1(30 points)**: Implement a Decision Tree Clasifier for your classification problem. You may use a built-in package to implement your classifier. Try modifying one or more of the input parameters and describe what changes you notice in your results. Clearly describe how these factors are affecting your output.


Let's start off by loading and cleaning our data as we did from the previous homework.

In [235]:
import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

#Load our data
data = pd.read_csv('drive/MyDrive/high_diamond_ranked_10min.csv')

#Clean our data
cleaned_data = data.copy()
cleaned_data = cleaned_data[cleaned_data["blueWardsPlaced"] < 75]
cleaned_data = cleaned_data[cleaned_data["redWardsPlaced"] < 75]
cleaned_data = cleaned_data.drop(["blueTotalGold", "redTotalGold", "redGoldDiff", "blueTotalExperience",
                   "redTotalExperience", "redExperienceDiff", "blueAvgLevel", "redAvgLevel",
                   "gameId", "blueFirstBlood", "redFirstBlood"], axis=1)

#Separate data into training and test sets
Y = cleaned_data.pop("blueWins").values
X = cleaned_data.to_numpy()

#Holdout method replaced with k-fold cross val
#X_train, X_test, Y_train, Y_test = train_test_split(X_data, Y_data, train_size = 0.9, test_size = 0.1, random_state = 0)

#Initiliaze our decision-classification tree object
model = DecisionTreeClassifier()

#Train our decision tree
model = model.fit(X, Y)


Here, we've utilized sklearn's implementation of a Decision Tree Classifier along with all of its default parameter values. (i.e. max_depth = "until leaves are singleton or all leaves contain less than min_sample_split_samples", min_samples_split = 2, max_features = n_features, min_impurity_split = 0).

In [236]:
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold

def cross_val_accuracy(model, X, Y):
  """
  Prints out the accuracy of a model on a given X, Y set 
  by utilizing Repeated KFold Cross Validation
  """
  
  cv = RepeatedKFold(n_splits = 10, n_repeats=3, random_state = 0)
  cv_scores = cross_val_score(model, X, Y, scoring = 'accuracy', cv=cv, n_jobs=-1, error_score = 'raise')
  print('Accuracy: %.3f (%.3f)' % (np.mean(cv_scores), np.std(cv_scores)))

print("Default sklearn DecisionTreeClasifier")
cross_val_accuracy(model, X, Y)


Default sklearn DecisionTreeClasifier
Accuracy: 0.633 (0.015)


With RepeatedKFold cross validation, we can see that the default classification tree has rather low accuracy. Now, it would probably be best to test what different input parameters would do to our model as well as its accuracy.

In [237]:
# Testing with different max_depths values

model_low_max_depth = DecisionTreeClassifier(max_depth = 5).fit(X,Y)
print("Model with low max_depth input")
cross_val_accuracy(model_low_max_depth, X, Y)

model_high_max_depth = DecisionTreeClassifier(max_depth = 30).fit(X,Y)
print("Model with high max_depth input")
cross_val_accuracy(model_high_max_depth, X, Y)

model_super_high_max_depth = DecisionTreeClassifier(max_depth = 300).fit(X,Y)
print("Model with super high max_depth input")
cross_val_accuracy(model_super_high_max_depth, X, Y)

Model with low max_depth input
Accuracy: 0.724 (0.013)
Model with high max_depth input
Accuracy: 0.632 (0.015)
Model with super high max_depth input
Accuracy: 0.635 (0.016)


Low values of max_depth appear to produce higher model accuracy while larger values of max_depth tend to produce lower accuracy.

In [238]:
# Testing with different min_samples_split values

model_low_min_samples_split = DecisionTreeClassifier(min_samples_split = 2).fit(X,Y)
print("Model with low min_samples_split input")
cross_val_accuracy(model_low_min_samples_split, X, Y)

model_high_min_samples_split = DecisionTreeClassifier(min_samples_split = 30).fit(X,Y)
print("Model with high min_samples_split input")
cross_val_accuracy(model_high_min_samples_split, X, Y)

model_super_high_min_samples_split = DecisionTreeClassifier(min_samples_split = 300).fit(X,Y)
print("Model with super high min_samples_split input")
cross_val_accuracy(model_super_high_min_samples_split, X, Y)

Model with low min_samples_split input
Accuracy: 0.632 (0.018)
Model with high min_samples_split input
Accuracy: 0.659 (0.015)
Model with super high min_samples_split input
Accuracy: 0.717 (0.015)


Lower values of min_samples_split appear to produce lower model accuracy while large values of min_samples_split tend to produce higher accuracy.

In [239]:
# Testing with different max_features values

model_low_max_features = DecisionTreeClassifier(max_features = 5).fit(X,Y)
print("Model with low max_features input")
cross_val_accuracy(model_low_max_features, X, Y)

model_high_max_features = DecisionTreeClassifier(max_features = 15).fit(X,Y)
print("Model with high max_features input")
cross_val_accuracy(model_high_max_features, X, Y)

model_super_high_max_features = DecisionTreeClassifier(max_features = len(X[0])).fit(X,Y)
print("Model with super high max_features input")
cross_val_accuracy(model_super_high_max_features, X, Y)

Model with low max_features input
Accuracy: 0.634 (0.017)
Model with high max_features input
Accuracy: 0.633 (0.014)
Model with super high max_features input
Accuracy: 0.631 (0.012)


The input parameter max_features appears to have no real effect on the accuracy of our model.

In [240]:
# Testing with different min_impurity_split values
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning) # Ignore deprecation warning on min_impurity_split

model_low_min_impurity_split = DecisionTreeClassifier(min_impurity_split = 0.0).fit(X,Y)
print("Model with low min_impurity_split input")
cross_val_accuracy(model_low_min_impurity_split, X, Y)

model_high_min_impurity_split = DecisionTreeClassifier(min_impurity_split = 0.4).fit(X,Y)
print("Model with high min_impurity_split input")
cross_val_accuracy(model_high_min_impurity_split, X, Y)

model_super_high_min_impurity_split = DecisionTreeClassifier(min_impurity_split = 0.9).fit(X,Y)
print("Model with super high min_impurity_split input")
cross_val_accuracy(model_super_high_min_impurity_split, X, Y)

Model with low min_impurity_split input
Accuracy: 0.633 (0.012)
Model with high min_impurity_split input
Accuracy: 0.716 (0.015)
Model with super high min_impurity_split input
Accuracy: 0.490 (0.008)


Low to medium values for min_impurity_split tend to produce more a more accurate model than that of larger min_impurity_split values.

In [254]:
# Testing different combinations based off the best values from the above tests

modelOne = DecisionTreeClassifier(max_depth = 5, min_samples_split = 300, min_impurity_split = 0.4).fit(X,Y)
print("Model with supposed best input parameters")
cross_val_accuracy(modelOne, X, Y)

modelTwo = DecisionTreeClassifier(max_depth = 500, min_samples_split = 75, min_impurity_split = 0.4).fit(X,Y)
print("Model that is less shallow but maintains same impurity_split")
cross_val_accuracy(modelTwo, X, Y)

modelThree = DecisionTreeClassifier(max_depth = 500, min_samples_split = 20, min_impurity_split = 0.1).fit(X,Y)
print("Model with more realistic inputs to combat overfitting at the cost of accuracy")
cross_val_accuracy(modelThree, X, Y)


print("\nIdeal model: modelTwo")
cross_val_accuracy(modelTwo, X, Y)

Model with supposed best input parameters
Accuracy: 0.729 (0.015)
Model that is less shallow but maintains same impurity_split
Accuracy: 0.717 (0.016)
Model with more realistic inputs to combat overfitting at the cost of accuracy
Accuracy: 0.651 (0.018)

Ideal model: modelTwo
Accuracy: 0.717 (0.016)


As we can see from the various models tested with a wide range of differing input parameter values, max_features tended to not have an effect at all on the accuracy of our model while the three other features, max_depth, min_samples_split, and min_impurity had their own respective effects and tradeoffs on the model's accuracy. Upon analyzing each parameter individually as well as collectively, I suppose that the best model would probably have to be **modelTwo** since it provides a relatively high accuracy whilst still having realistic parameters that prevents the tree from being too shallow and therefore capable of generalizing well outside of $D_N$. 

**Task 2(30 points)**: From the Bagging and Boosting methods pick any one algorithm from each category. Implement both the algorithms using the same data. Use k-fold cross validation to find the effectiveness of both the models. Comment on the difference/similarity of the results.


In [255]:
#Adaptive Boosting with DecisionTreeClassifiers
from sklearn.ensemble import AdaBoostClassifier, RandomForestClassifier

# We let modelOne be our estimator since boost methods work best with shallow trees
model_with_adaboost = AdaBoostClassifier(modelOne, random_state = 0, algorithm='SAMME')
model_with_adaboost.fit(X, Y)
print("Adaboost")
cross_val_accuracy(model_with_adaboost, X, Y)

Adaboost
Accuracy: 0.724 (0.012)


In [243]:
# Random Forest
# Initialize with same input parameters as modelTwo
model_with_RF = RandomForestClassifier(max_depth = 500, min_samples_split = 75, min_impurity_split = 0.4)
model_with_RF.fit(X,Y)
print("Random Forest")
cross_val_accuracy(model_with_RF, X, Y)

Random Forest
Accuracy: 0.729 (0.014)


The algorithms from Bagging and Boosting that I decided to implement were Random Forest and Adaboost respectively. In regards to their results from KFold cross validation, the Random Forest implementation succeeded in increasing accuracy and significantly decreasing modelTwo's variance(lower std) while the Adaboost implementation appears to also have lowered variance significantly while actually having a lower accuracy than modelOne.

**Analysis of Adaboost**:

Boost ensemble algorithms tend to be comprised of weak learners where bias and variance are ideally reduced in attempt to convert these weak learners into a final, strong model. Since the adaboost model, which was comprised of modelOne(an extremely shallow tree, i.e. max_depth = 5 and min_samples_split = 300), produced a marginally lower accuracy with a significantly lower variance than that of modelOne itself, the model itself appears to have been a success, seeing how bias must have been near its optimal trade-off value already since the accuracy has barely changed while the variance has greatly decreased. 

**Analysis of Random Forest**:

Bagging ensemble algorithms tend to also be composed of weak learners designed to provide an optimal final prediction based on the average prediction across all the learners. Since the random forest model, which was comprised of modelTwo ( a somewhat shallow tree, but not to the same degree as modelOne ), produced a slightly significant increase in accuracy along with a significant decrease in variance.

**Task 3(40 points)**: Compare the effectiveness of the three models implemented above. Clearly describe the metric you are using for comparison. Describe (with examples) why is this metric/metrics suited for the problem at hand? How would a choice of a different metric impact your results? Can you demonstrate that?

In [267]:
# Import metric score analysis functions
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.25, random_state = 0)

y_pred = modelTwo.fit(X_train, y_train).predict(X_test)
cm_vanilla = confusion_matrix(y_test, y_pred)
print("modelTwo (Single Descision Tree Classifier):\n", cm_vanilla)
print(f"Accuracy: {accuracy_score(y_pred, y_test)}")
print(f"Precision: {precision_score(y_pred, y_test)}\n")

y_pred = model_with_adaboost.fit(X_train, y_train).predict(X_test)
cm_adaboost = confusion_matrix(y_test, y_pred)
print("Adaptive Boost Model:\n", cm_adaboost)
print(f"Accuracy: {accuracy_score(y_pred, y_test)}")
print(f"Precision: {precision_score(y_pred, y_test)}\n")

y_pred = model_with_RF.fit(X_train, y_train).predict(X_test)
cm_rf = confusion_matrix(y_test, y_pred)
print("Random Forest Model:\n", cm_rf)
print(f"Accuracy: {accuracy_score(y_pred, y_test)}")
print(f"Precision: {precision_score(y_pred, y_test)}")

modelTwo (Single Descision Tree Classifier):
 [[753 419]
 [245 916]]
Accuracy: 0.715387912558937
Precision: 0.788975021533161

Adaptive Boost Model:
 [[868 304]
 [331 830]]
Accuracy: 0.727818259751393
Precision: 0.7149009474590869

Random Forest Model:
 [[852 320]
 [308 853]]
Accuracy: 0.7308186883840548
Precision: 0.7347114556416882


The metric that I will be using to compare the effectiveness of the three models is that of accuracy since from our previous EDA, we know that the classes in our data are nearly balanced (i.e. blueWins ≈ 0.49...). This makes sense because one of the qualities of our data set is the fact that all of the players are of the same skill level, with that skill level itself being high. This reduces the amount of randomess that may occur during a game and allows a more deterministic approach to analyzing the game and the effect various features may have on the outcome (i.e. lower player skill disparity + game is played the way it was meant to be played). In regards to the three models, the adaboost and random forest models clearly had a better accuracy than that of our vanilla modelTwo (single Decision Tree), with the random forest model ultimately outperforming the adaboost model despite their scores being very, very close.

Choosing another metric to measure our models' effectiveness could easily change our results and conclusions. Let us demonstrate this by choosing to measure our models based on their precision. When comparing the three models based on their precision, the single decision tree, modelTwo, outperformed the other models significantly, with adaboost coming in last and random forest coming in second. Thus, in this world where precision is our metric, we would consider modelTwo to be the superior model in terms of effectiveness, which is a stark difference when compared to utilizing accuracy as a metric. 