# Learning Flask

In the previous tutorial, we created a basic Hello World app in Flask. In this one, we're going to create a simple RESTful API with Flask.

Instruction
This time, we'll be working with a .py file instead of .ipynb file. So you can choose a Python IDE of your choice and create an empty file with the name greeting_api.py. You will be pasting the code from this tutorial there.

Our goal is to build an API that will greet the person who calls it.

Let's begin by importing the necessary libraries we'll need:

In [2]:
# import Flask and jsonify
from flask import Flask, jsonify

# import Resource, Api and reqparser
from flask_restful import Resource, Api, request

In [3]:
# Create an application

app = Flask(__name__)

In [4]:
# Create an API from the application

api = Api(app)

Now that our API has been created, we need to add an endpoint. We can do that by creating a class with the name Greet (any other name will work as well). This class must inherit properties from the Resources class from the flask_restful module.

In [5]:
class Greet(Resource):
    def get(self):

        name = request.args.get('name')

        if name:
            greeting = f'Hello {name}!'
        else:
            greeting = 'Hello person without name!'

        # make json from greeting string 
        return jsonify(greeting=greeting)

The class Greet contains only one method – get. This time the naming convention is strict (we can use only HTTP request methods: get, post, put, ...). Inside the get method, we initialize RequestParser() which allows us to parse optional arguments. We create only one optional argument name as a string type. In the variable name, we store an argument value that was passed by calling our API. If the user doesn't pass the argument name in an API call the value of the variable is NULL. We also create different greetings based on the value in the name variable.

Now that we have our class created, we need to assign an endpoint. The functionality of the Greet class will be available in the /greet endpoint.

In [6]:
# assign endpoint
api.add_resource(Greet, '/greet',)

The last thing to do is to create an application run when the file greeting_api.py is called directly (not imported as a module from another script).

In [7]:
if __name__ == '__main__':
    app.run(debug=True)

 * Serving Flask app '__main__'
 * Debug mode: on


 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
 * Restarting with stat
Traceback (most recent call last):
  File "/opt/anaconda3/envs/MLenv3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/anaconda3/envs/MLenv3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/anaconda3/envs/MLenv3.10/lib/python3.10/site-packages/ipykernel_launcher.py", line 18, in <module>
    app.launch_new_instance()
  File "/opt/anaconda3/envs/MLenv3.10/lib/python3.10/site-packages/traitlets/config/application.py", line 1074, in launch_instance
    app.initialize(argv)
  File "/opt/anaconda3/envs/MLenv3.10/lib/python3.10/site-packages/traitlets/config/application.py", line 118, in inner
    return method(app, *args, **kwargs)
  File "/opt/anaconda3/envs/MLenv3.10/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 692, in initialize
    self.init_sockets()
  File "/opt/anaconda3/envs/

SystemExit: 1

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


# Deploying ML using Flask

## Part I: Model Creation

In [8]:
# import of packages we are going to need.

import pandas as pd
from sklearn.datasets import load_wine

from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline, FeatureUnion

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest

from sklearn.ensemble import RandomForestClassifier

import pickle

In [9]:
# We will load the toy dataset directly from sklearn.

data = load_wine()
df = pd.DataFrame(data['data'])
df.columns = data['feature_names']
y = data['target']
df.head()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0


During the model creation, we will work on following tasks:

Filter own columns for PCA
Scaling
PCA
SelectKBest
Random Forest Regressor
and put them all to one pipeline.

## Filter Own Columns
Firstly, we will create our own class to keep only features we want in our pipeline. We don't want to run PCA on all features but only on the sample so we create our own class that filters the features in the original dataframe. We can put our own classes into the pipelines, as long as they have following methods:

.fit()
.transform()
.fit_transform()

In [10]:
# own class that can be inserted to pipeline as any other sklearn object.
class RawFeats:
    def __init__(self, feats):
        self.feats = feats

    def fit(self, X, y=None):
        pass


    def transform(self, X, y=None):
        return X[self.feats]

    def fit_transform(self, X, y=None):
        self.fit(X)
        return self.transform(X)


# features we want to keep for PCA
feats = ['alcohol','malic_acid','ash','alcalinity_of_ash','magnesium',
         'total_phenols','flavanoids','nonflavanoid_phenols']
# creating class object with indexes we want to keep.
raw_feats = RawFeats(feats)

## Scaling and PCA

In [11]:
sc = StandardScaler()
pca = PCA(n_components=2)

## SelectKBest

In [12]:
selection = SelectKBest(k=4)

## Random Forest

In [13]:
rf = RandomForestClassifier()

## Combining Everything Into One Pipeline
As in the previous tutorial we will apply two different feature extraction techniques:

PCA
SelectKBest
and combine them with FeatureUnion. The small difference is that we will use only sample of features for PCA.



In [14]:
PCA_pipeline = Pipeline([
    ("rawFeats", raw_feats),
    ("scaler", sc),
    ("pca", pca)
])

kbest_pipeline = Pipeline([("kBest", selection)])

In [15]:
# Now, we will combine these ouputs with FeatureUnion:

all_features = FeatureUnion([
    ("pcaPipeline", PCA_pipeline), 
    ("kBestPipeline", kbest_pipeline)
])

In [16]:
# Now, we will create the main pipeline which ends with Regressor.

main_pipeline = Pipeline([
    ("features", all_features),
    ("rf", rf)
])

In [17]:
# Let's apply grid search to tune the parameters properly:

# set up our parameters grid
param_grid = {"features__pcaPipeline__pca__n_components": [1, 2, 3],
                  "features__kBestPipeline__kBest__k": [1, 2, 3],
                  "rf__n_estimators":[2, 5, 10],
                  "rf__max_depth":[2, 4, 6]
             }

# create a Grid Search object
grid_search = GridSearchCV(main_pipeline, param_grid, n_jobs = -1, verbose=10, refit=True)    

# fit the model and tune parameters
grid_search.fit(df, y)

Fitting 5 folds for each of 81 candidates, totalling 405 fits
[CV 1/5; 1/81] START features__kBestPipeline__kBest__k=1, features__pcaPipeline__pca__n_components=1, rf__max_depth=2, rf__n_estimators=2
[CV 1/5; 1/81] END features__kBestPipeline__kBest__k=1, features__pcaPipeline__pca__n_components=1, rf__max_depth=2, rf__n_estimators=2;, score=0.778 total time=   0.0s
[CV 4/5; 2/81] START features__kBestPipeline__kBest__k=1, features__pcaPipeline__pca__n_components=1, rf__max_depth=2, rf__n_estimators=5
[CV 4/5; 2/81] END features__kBestPipeline__kBest__k=1, features__pcaPipeline__pca__n_components=1, rf__max_depth=2, rf__n_estimators=5;, score=0.857 total time=   0.0s
[CV 5/5; 2/81] START features__kBestPipeline__kBest__k=1, features__pcaPipeline__pca__n_components=1, rf__max_depth=2, rf__n_estimators=5
[CV 5/5; 2/81] END features__kBestPipeline__kBest__k=1, features__pcaPipeline__pca__n_components=1, rf__max_depth=2, rf__n_estimators=5;, score=0.943 total time=   0.0s
[CV 1/5; 3/81] ST

0,1,2
,estimator,Pipeline(step...lassifier())])
,param_grid,"{'features__kBestPipeline__kBest__k': [1, 2, ...], 'features__pcaPipeline__pca__n_components': [1, 2, ...], 'rf__max_depth': [2, 4, ...], 'rf__n_estimators': [2, 5, ...]}"
,scoring,
,n_jobs,-1
,refit,True
,cv,
,verbose,10
,pre_dispatch,'2*n_jobs'
,error_score,
,return_train_score,False

0,1,2
,transformer_list,"[('pcaPipeline', ...), ('kBestPipeline', ...)]"
,n_jobs,
,transformer_weights,
,verbose,False
,verbose_feature_names_out,True

0,1,2
,copy,True
,with_mean,True
,with_std,True

0,1,2
,n_components,3
,copy,True
,whiten,False
,svd_solver,'auto'
,tol,0.0
,iterated_power,'auto'
,n_oversamples,10
,power_iteration_normalizer,'auto'
,random_state,

0,1,2
,score_func,<function f_c...t 0x173841900>
,k,3

0,1,2
,n_estimators,10
,criterion,'gini'
,max_depth,6
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,'sqrt'
,max_leaf_nodes,
,min_impurity_decrease,0.0
,bootstrap,True


We were able to call the pipeline on the original dataset without any transformations. We can check the best combination of parameters:

In [18]:
print(grid_search.best_params_)

{'features__kBestPipeline__kBest__k': 3, 'features__pcaPipeline__pca__n_components': 3, 'rf__max_depth': 6, 'rf__n_estimators': 10}


In [19]:
# use pickle to store the model onto our disk.

pickle.dump( grid_search, open( "model.p", "wb" ) )

## Part II: API Creation

Now we go back to flask. Our goal is to build an API that will classify wine into the class when it receives the information about it.

Note
We don't have to retrain the model in the cloud. We will use the pickle file from the model which was developed on our local machines.

In a new file (we can call it api.py and store it in the same directory like our previous notebook and pickle file model.p), let's begin by importing the necessary libraries we'll need:



In [None]:
# import Flask and jsonify
from flask import Flask, jsonify, request
# import Resource, Api and reqparser
from flask_restful import Resource, Api, reqparse
import pandas as pd
import numpy
import pickle

app = Flask(__name__)
api = Api(app)

At the beginning of the file, we need to create the same custom class we used in the model creation part. The functions from that class are used in the model and stored in the pickle file we created earlier. Therefore, the model needs to have access to the class during the scoring as well. The accesses to other sklearn modules are provided automatically and we don't have to do anything about them in the scoring file.

In [20]:
class RawFeats:
    def __init__(self, feats):
        self.feats = feats

    def fit(self, X, y=None):
        pass


    def transform(self, X, y=None):
        return X[self.feats]

    def fit_transform(self, X, y=None):
        self.fit(X)
        return self.transform(X)

In [21]:
#Now, we will load our model (from pickle).

model = pickle.load( open( "model.p", "rb" ) )



In [22]:
# Now, we need to create an endpoint where we can communicate with our ML model. This time, we are going to use POST request.

class Scoring(Resource):
    def post(self):
        json_data = request.get_json()
        df = pd.DataFrame(json_data.values(), index=json_data.keys()).transpose()
        # getting predictions from our model.
        # it is much simpler because we used pipelines during development
        res = model.predict_proba(df)
        # we cannot send numpt array as a result
        return res.tolist() 

In [None]:
# Now, we need to assign an endpoint to our API.

# assign endpoint
api.add_resource(Scoring, '/scoring')

In [None]:
# The last thing to do is to create an application run when the file api.py is run directly (not imported as a module from another script).

if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=5000)

In [None]:
#Now, we can send our POST request from jupyter notebook:

import requests
URL = "http://127.0.0.1:5000/scoring"
# sending get request and saving the response as response object 
r = requests.post(url = URL, json = json_data) 

#and we can check results with:

print(r.json())
# It should be something like: [[1.0, 0.0, 0.0]] where each value is probability of being in that particular class. We can test it by using Postman APP as well.