Code to generate a tree. You can transform the graphviz file into an image using [this link](http://www.webgraphviz.com/)

In [1]:
from sklearn.datasets import load_iris
from sklearn import tree


clf = tree.DecisionTreeClassifier()
iris = load_iris()


clf = clf.fit(iris.data, iris.target)
tree.export_graphviz(clf, out_file='tree.dot') 


First, let's import some packages.

In [1]:
import numpy as np
import pandas as pd
import sklearn


# Plotting library
import matplotlib.pyplot as plt


# Preprocessing functions
from sklearn.model_selection import train_test_split
from sklearn import preprocessing


# ML functions
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor


Second, let's import the data.

In [3]:
# Import the data
# Change the following line according to your directory
data = pd.read_csv('absenteeism/Absenteeism_at_work.csv',
                   header=0,delimiter=";")
print("Size of the data: ", data.shape)


# Converting pandas dataframe to numpy (just for convenience)
col_names = list(data.columns)
data = np.array(data)
data_X = data[:,:-1]
data_Y = data[:,-1]
n_feature = data_X.shape[1]
print("number of features: ", n_feature)


# Generate Train/Test data
X_train, X_test, y_train, y_test = train_test_split(
       data_X, data_Y, test_size=0.33, random_state=0)


Size of the data:  (740, 21)
number of features:  20


Third, we train a random forest model with n_estimators=30 (i.e. B=30).

In [4]:
# Train a random forest model
RF_model = RandomForestRegressor(n_estimators=30)
RF_model.fit(X_train, y_train)


The following code prints the list of input variables in decreasing order of feature importance. In the language of Python, RF_model is an object and feature_importances_, among others, is one attribute of the object.

In [5]:
importances = RF_model.feature_importances_
indices = np.argsort(importances)[::-1]
# Print the feature ranking
print("Feature ranking:")
for f in range(len(indices)):
    print("%d. %s (%f)" % (f + 1, col_names[indices[f]], importances[indices[f]]))


Feature ranking:
1. Reason for absence (0.178507)
2. Month of absence (0.132023)
3. Work load Average/day  (0.122011)
4. Age (0.094946)
5. Hit target (0.072252)
6. ID (0.063259)
7. Seasons (0.060775)
8. Transportation expense (0.058365)
9. Height (0.055908)
10. Day of the week (0.043422)
11. Weight (0.039610)
12. Body mass index (0.021100)
13. Son (0.020294)
14. Distance from Residence to Work (0.011984)
15. Service time (0.008949)
16. Disciplinary failure (0.005007)
17. Pet (0.003985)
18. Social drinker (0.003959)
19. Education (0.002403)
20. Social smoker (0.001241)
