# Flask Random Forest API (local)

Felipe Viacava de Freitas

São Paulo, Oct/2022

#### Hello, stranger

This is my first time trying to deploy a machine learning API. This notebook will be responsible for preprocessing the data to make the API simpler -- what woudn't be possible in a real application, but serves the learning purpose of this introductory project well -- and also plot test results.

In [12]:
# Importing all necessary libraries

import json
import pickle
import requests
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.pipeline import Pipeline

In [13]:
# For the first step, lets load our data into a Pandas DataFrame. 
# We'll be working with the 'winequality-white.csv' dataset, downloadable at:
# https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/

df = pd.read_csv('winequality-white.csv',sep=';',decimal=',')

df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6


In [14]:
# Since we wish to build a Random Forest Classifier,
# I'll divide the quality into 0 and 1 (bad and good, respectively),
# with 1 being for any quality score equal to 6 or over and 0 being for any quality score under 6.

df['quality'] = df['quality'].apply(lambda quality: 0 if quality < 6 else 1)

df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,1
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,1
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,1
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,1
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,1


In [15]:
# Splitting our data into train and test sets

X = df.drop(['quality'],axis=1).values
Y = df['quality'].values

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, random_state=123)

In [16]:
ss = StandardScaler()
clf = RandomForestClassifier()

In [17]:
pipeline = Pipeline(steps=[('ss',ss),('clf',clf)])

In [18]:
pipeline.fit(X_train,Y_train)

Pipeline(steps=[('ss', StandardScaler()), ('clf', RandomForestClassifier())])

In [19]:
with open('model.pickle', 'wb') as f:
    pickle.dump(pipeline, f)

In [20]:
# The API itself is written in a separate script
# Run RF_API.py in the terminal

In [11]:

# serialize the data into json and send the request to the model
payload = {'data': X_test.tolist()}
Y_predict = requests.post('http://127.0.0.1:5000/Predict',data=payload).json()

ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /Predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000002A7AD8D9C40>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))