# Performance analysis of a machine learning model
Author: Izael Manuel Rascón Durán A01562240

In the next lines of code we are going to go through the performance analysis of a Random Forest Classifier which is going to be fitted in the already known iris dataset.

Let's start setting up our dataset

In [1]:
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from mlxtend.evaluate import bias_variance_decomp
from sklearn.model_selection import GridSearchCV

## Separate the dataset
Because we do not want to test our model with the same data we trained it, we choose to separate our dataset in three groups. The train data is the 60% of the dataset because we want the most of the data to be use to train the model, so we can have a model more "experienced". We split the other part of the model because we want to validate our model before we test it to the final.

This validation process is need, because most of the time, the hyperparameters we choose are not the most optimal, and we want to score the model a couple of times before doing the final test. Because this process is do it iteratively, we can overfit the model to the validation test even manually; that's why we have the last dataset separately, and we choose to use only at the end of the proces.

This process of splitting the test set into validation set and test set can be avoided with something called Cross-Validation, but let's talk about it later.

In [2]:
iris = datasets.load_iris()

X = pd.DataFrame(iris['data'], columns = iris['feature_names'])
y = pd.DataFrame(iris['target'], columns = ['species'])

In [3]:
# separate the model for training, validation and testing
X_train, X_test, y_train, y_tes = train_test_split(X,y, train_size=0.6)
X_test, X_val, y_test, y_val = train_test_split(X_test, y_tes, train_size=0.5)
print(y_train.shape)
print(y_val.shape)
print(y_test.shape)

(90, 1)
(30, 1)
(30, 1)


## Setup the model

In [4]:
rfc = RandomForestClassifier(n_estimators = 1)

rfc.fit(X_train, y_train)

  rfc.fit(X_train, y_train)


RandomForestClassifier(n_estimators=1)

## Performance analysis

In [5]:
y_hat = rfc.predict(X_val)
accuracy_score(y_val, y_hat)

0.9333333333333333

In [6]:
avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(rfc, X_train.values, y_train.values, X_val.values, y_val.values, loss='0-1_loss', random_seed=11)

  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_b

In [7]:
avg_bias, avg_var

(19.7, 0.050833333333333335)

As we see, there is a huge difference between the bias and the variance. This is a symptom of underfitting, which means that our model is too simple, so we should add more complexity to it in order to fit it well. Although our accuracy score is high, we now we can still improve our model.

So let's try something different:
## Hyperparameters adjustment
A different way of taking care of the underfitting and the overfitting is using Grid Search Cross Validation. This algorithm search through a set of hyperparameters given, and trying all the combinations, giving us the best model. The way this algorithm evaluate the models is something called Cross Validation, in which the algorithm subdivide the dataset on subsets and takes one as the validation set and use the others for training, and repeat this until the whole dataset has been used as the test set. At the final it takes all the scores and give us the mean.

In [8]:
param_grid = {
                 'n_estimators': [10, 50, 500, 1000],
                 'max_depth':[10, 30, 100, 200, None]
             }
rfc_2 = RandomForestClassifier()
grid_rfc = GridSearchCV(rfc_2, param_grid, cv=10)
grid_rfc.fit(X_train, y_train)

  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_

GridSearchCV(cv=10, estimator=RandomForestClassifier(),
             param_grid={'max_depth': [10, 30, 100, 200, None],
                         'n_estimators': [10, 50, 500, 1000]})

In [9]:
grid_rfc.best_params_, grid_rfc.best_score_

({'max_depth': 10, 'n_estimators': 500}, 0.9888888888888889)

We see that the max_depth is the minimum value we placed in the dictionary. Maybe we can even improve the model more if we search further.

In [10]:
param_grid = {
    'n_estimators': [10, 100, 300, 500, 600],
    'max_depth':[1, 5, 10, 15, 20, None]
}
rfc_3 = RandomForestClassifier()
grid_rfc_2 = GridSearchCV(rfc_3, param_grid, cv=10)
grid_rfc_2.fit(X_train, y_train)

  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_

GridSearchCV(cv=10, estimator=RandomForestClassifier(),
             param_grid={'max_depth': [1, 5, 10, 15, 20, None],
                         'n_estimators': [10, 100, 300, 500, 600]})

In [11]:
grid_rfc_2.best_params_, grid_rfc_2.best_score_

({'max_depth': 5, 'n_estimators': 100}, 0.9888888888888889)

It appears that this is a better model, but we see that the score is not better, so we keep with the last model.

In [13]:
y_hat = grid_rfc.best_estimator_.predict(X_test)
accuracy_score(y_test, y_hat)



0.9

In [14]:
avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(grid_rfc.best_estimator_, X_train.values, y_train.values, X_val.values, y_val.values, loss='0-1_loss', random_seed=11)

  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_boot, **fit_params).predict(X_test)
  pred = estimator.fit(X_boot, y_b

In [15]:
avg_bias, avg_var

(19.533333333333335, 0.029999999999999995)

## Conclusion

We achieved to reduce a little the bias, but still not sufficient. Even we couldn't increase the accuracy. Maybe we should change the model or try different methods, so we can improve it. However, we achieved a good model.