# Predictor Importance by Class for Random Forrest Classifiers
Use the Gini index of each split in a tree of a random forrest to calculate the importance of each predictor to an accurate classification of each class in a dataset.
![Pseudo-code of the importance calculation algorithm](../2_docs/img/pseudo-code.jpg)

In [30]:
# import packages
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from tqdm import tqdm

In [28]:
# import the dataset
wines = pd.read_csv('../0_data/wines.csv')
data_names = ['Class', 'Alcohol', 'Malic acid', 'Ash', 'Alcalinity of ash', 
                 'Magnesium', 'Total phenols', 'Flavanoids', 'Nonflavanoid phenols', 
                 'Proanthocyanins', 'Color intensity', 'Hue', 'OD280-OD315 of diluted wines', 'Proline']
wines.columns = data_names
wines.head(5)

Unnamed: 0,Class,Alcohol,Malic acid,Ash,Alcalinity of ash,Magnesium,Total phenols,Flavanoids,Nonflavanoid phenols,Proanthocyanins,Color intensity,Hue,OD280-OD315 of diluted wines,Proline
0,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
1,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
2,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
3,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735
4,1,14.2,1.76,2.45,15.2,112,3.27,3.39,0.34,1.97,6.75,1.05,2.85,1450


In [29]:
# separate the dataset into features (X) and categories (Y)
Y = wines['Class']
feature_names = data_names[1:]
X = np.array(wines[feature_names])
X = np.array(X)

In [33]:
# separate the data into training and testing splits
train_X, test_X, train_Y, test_Y = train_test_split(X, Y, test_size = 0.25, random_state = 42)

In [34]:
# train the random forrest classifier
# training with 1000 trees
rf = RandomForestRegressor(n_estimators = 1000, random_state = 42)
rf.fit(train_X, train_Y)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=1000, n_jobs=None,
           oob_score=False, random_state=42, verbose=0, warm_start=False)

In [35]:
# make predictions and calculate the error
predictions = rf.predict(test_X)
errors = abs(predictions - test_Y)
print('Mean Absolute Error: {}'.format(round(np.mean(errors), 2))) # Print out the mean absolute error (mae)

Mean Absolute Error: 0.16


In [37]:
# check the accuracy of the classifier with mean absolute percentage error (MAPE)
mape = 100 * (errors / test_Y)
accuracy = 100 - np.mean(mape)
print('Accuracy: {}%'.format(round(accuracy, 2)))

Accuracy: 91.48%


The accuracy is above 90%, so the judgement call is to proceed with implementing the PIBC algorithm on this classifier.