# A Basic Random Forest Classifier model run on Flask

In this notebook, we will build an RFC model on the Iris Dataset(provided in Python by default), and run a Flask server that takes in a new sample, classifies it using the trained RFC model and prints out the result.

The six package(compatible with both Python 2 and 3(hence the word six)) will provide us with the Pickle module which will dump the RFC model in a file and have it open for use later when the server is called to classify a new sample.

In [27]:
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report

import numpy as np

import six.moves.cPickle as pickle
import requests, json

In [28]:
iris = datasets.load_iris()
# print (iris.DESCR)

In [29]:
X = iris.data
y = iris.target
type(X), type(y)

(numpy.ndarray, numpy.ndarray)

In [30]:
X.shape, y.shape

((150, 4), (150,))

In [31]:
X_train, X_test, y_train, y_test = train_test_split(X,y)

In [32]:
rfc = RandomForestClassifier(n_estimators = 100, n_jobs = 2)

In [33]:
rfc.fit(X_train, y_train)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=2,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)

In [34]:
print ("Accuracy = %0.2f" % accuracy_score(y_test, rfc.predict(X_test)))

print(classification_report(y_test, rfc.predict(X_test)))

Accuracy = 0.92
             precision    recall  f1-score   support

          0       1.00      1.00      1.00        14
          1       0.86      0.92      0.89        13
          2       0.90      0.82      0.86        11

avg / total       0.92      0.92      0.92        38



### Model Serialization

In [35]:
pickle.dump(rfc, open("iris_rfc.pkl", "wb")) # write

In [36]:
my_random_forest = pickle.load(open("iris_rfc.pkl", "rb")) # read

In [37]:
my_random_forest

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=2,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)

In [38]:
url = 'http://localhost:9000/api'
data = json.dumps({'sl':1, 'sw':1, 'pl':1, 'pw':2})
response = (requests.post(url, data))

In [39]:
response.json() # Yay.

{'results': 0}

The content below is to have a picture of how data is represented and transferred. RFC needs inputs to be in the form of Numpy arrays and evidently it's outputs are of the same type. However, json has no way to represent numpy arrays. Applying the np.asscalar() method on the prediction converts it into basic Python int. The json() method can then display the response appropriately.

In [40]:
a = rfc.predict(np.array([1,2,3,4]).reshape(1,-1))

In [41]:
a, type(a)

(array([1]), numpy.ndarray)

In [42]:
b = a[0]
b, type(b)

(1, numpy.int32)

In [43]:
c = np.asscalar(b)
c, type(c)

(1, int)