<div>
<img src=https://www.institutedata.com/wp-content/uploads/2019/10/iod_h_tp_primary_c.svg width="300">
</div>

# Lab 10.2 - Deployment via Flask

### Introduction

**Note**: This notebook should work on your local machine. There is no need to use AWS SageMaker for this lab.

The purpose of this lab is to take you through the process of deploying a machine learning web app on a publicly hosted platform (Heroku). A trained model will be created using the Scikit-learn pipeline (combining loading, preprocessing and training steps), then separate files of Python code and text will need to be completed to complete the components necessary for deployment. Firstly the app will be deployed to your local machine (so that you can view it in your browser). Once that it is sucessful the files will be uploaded to a new repository you create in GitHub and then Heroku will read from this to host the application via a publicly accessible URL. 

The app will take in a text string from a user and output a prediction of whether that string is expressing positive or negative sentiment. The model is created using methods from Module 8 (Natural Language Processing). Since the training data used to create the model is small (300 records), the prediction may only be accurate around 70% of the time. In future you may wish to improve this app's performance or develop your own app in a similar manner.

The following files are needed to create the app:

- requirements.txt
- app.py
- Procfile
- model.joblib
- utils.py
- templates/ (folder containing index.html)
- static/ (folder containing css/style.css)


Firstly we will see how a predictive model can be created as a pipe which combines the preprocessing, feature engineering and model training steps. This model is then saved as a joblib pickle file which can be reloaded at any time to avoid retraining.

This trained model can be loaded within your production environment along with required packages and real-time predictions can be made by calling its predict() method. 

Flask is a web app framework written in Python. It enables one to run application code whose output can be viewed on a browser. It is installed as a Python library via `pip install flask`. For a sample "Hello World" application see https://palletsprojects.com/p/flask/.

Note that Flask does not scale up for use in large deployment applications (ones involving many frequent API requests).

### Model Training and Testing

In [1]:
## Import Libraries
import numpy as np
import pandas as pd
import regex as re
import spacy
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.base import BaseEstimator, TransformerMixin
import joblib

The training data set is `sentiments.csv`, a dataset used in the NLP module.

In [17]:
# Read in the data
yelp_text = 'yelp_labelled.txt'
imdb_text = 'imdb_labelled_fixed.txt'
amazon_text = 'amazon_cells_labelled.txt'

# ANSWER
amazon = pd.read_csv('D:\IOD Data\sentiment\sentiment labelled sentences/amazon_cells_labelled.txt', sep='\t', header=None)
yelp = pd.read_csv('D:\IOD Data\sentiment\sentiment labelled sentences/yelp_labelled.txt', sep='\t', header=None)
imdb = pd.read_csv('D:\IOD Data\sentiment\sentiment labelled sentences/imdb_labelled.txt', sep='\t', header=None)


In [18]:
frames = [amazon, yelp, imdb]
df = pd.concat(frames)

In [19]:
type(df)

pandas.core.frame.DataFrame

In [20]:
df.head()

Unnamed: 0,0,1
0,So there is no way for me to plug it in here i...,0
1,"Good case, Excellent value.",1
2,Great for the jawbone.,1
3,Tied to charger for conversations lasting more...,0
4,The mic is great.,1


In [21]:
df.columns
df = df.rename(columns={0: "text", 1: "sentiment"})

Next we define a function to do some preprocessing.

In [22]:
def clean_text(text):
    # reduce multiple spaces and newlines to only one
    text = re.sub(r'(\s\s+|\n\n+)', r'\1', text)
    # remove double quotes
    text = re.sub(r'"', '', text)

    return text

In [23]:
df['text'] = df['text'].apply(clean_text)

In [24]:
df.head()

Unnamed: 0,text,sentiment
0,So there is no way for me to plug it in here i...,0
1,"Good case, Excellent value.",1
2,Great for the jawbone.,1
3,Tied to charger for conversations lasting more...,0
4,The mic is great.,1


In [44]:
df.to_csv(r'D:\IOD Data\sentimentfull.csv', index = False)

The following NLP model is used for further preprocessing. The following steps are the same as used in Module 8.

In [30]:
import en_core_web_sm
nlp = en_core_web_sm.load()

In [31]:
def convert_text(text):
    sent = nlp(text)
    ents = {x.text: x for x in sent.ents}
    tokens = []
    for w in sent:
        if w.is_stop or w.is_punct:
            continue
        if w.text in ents:
            tokens.append(w.text)
        else:
            tokens.append(w.lemma_.lower())
    text = ' '.join(tokens)

    return text

In [32]:
df['short'] = df['text'].apply(convert_text)

In [33]:
# Features and Labels
X = df['short']
y = df['sentiment']

In [34]:
# split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 101)

In [35]:
classifier = LinearSVC()

In [36]:
# create a matrix of word counts from the text
# use TF-IDF
tfidf = TfidfVectorizer()
# do the actual counting
A = tfidf.fit_transform(X_train, y_train)

# train the classifier with the training data
classifier.fit(A.toarray(), y_train)

# do the transformation for the test data
# NOTE: use `transform()` instead of `fit_transform()`
B = tfidf.transform(X_test)

# make predictions based on the test data
predictions = classifier.predict(B)

# check the accuracy
print('Accuracy: %.4f' % accuracy_score(y_test, predictions))

Accuracy: 0.7855


We will not attempt to improve on the performance in this lab as we are more interested in how to deploy the model.

Next we create a pipeline to simplify the process of model creation. We first define a preprocessor class which applies the `clean_text` and `convert_text` functions defined earlier.

In [37]:
class preprocessor(TransformerMixin, BaseEstimator):

    def __init__(self):
        pass

    def fit(self, X, y=None):
        return self

    def transform(self, X):
         return X.apply(clean_text).apply(convert_text)

Next we combine the preprocessing, feature engineering and modelling steps into a single pipe.

In [38]:
pipe = make_pipeline(preprocessor(), tfidf, classifier)
pipe.fit(df['text'],df['sentiment'])

Pipeline(steps=[('preprocessor', preprocessor()),
                ('tfidfvectorizer', TfidfVectorizer()),
                ('linearsvc', LinearSVC())])

**Exercise**: test the resulting model on phrases of positive and negative sentiment.

In [43]:
pipe.predict(pd.Series('terrible'))

array([0], dtype=int64)

Once satisified that we have a model ready for deployment, we can write a self-contained script that creates the model and saves it as a joblib file. By doing so from a script rather than the notebook we simplify the process when deploying.

**Exercise**: Review the code in model.py and run "python model.py" via an Anaconda prompt (Windows) or Terminal window (Mac). This creates a file model.joblib.

Let us load this model and verify that it alone can be used to make predictions.

In [45]:
newpipe = joblib.load(open('model.joblib','rb'))

In [46]:
type(newpipe)

sklearn.pipeline.Pipeline

Testing this out:

In [47]:
print(newpipe.predict(pd.Series('awesome place'))[0])
print(newpipe.predict(pd.Series('terrible!'))[0])
print(newpipe.predict(pd.Series('very interesting'))[0])

1
0
1


We can then write a self-contained script that loads the model and can make predictions on the fly. This is partially done for you in the file "app.py".

**Exercise**: Refer to app.py and fill in the missing code based on the code above using a text editor such as Spyder. Observe how it links to utils.py which contains the preprocessing functions.

### Local hosting

**Exercise**: Open the index.html with the text editor and fill in the missing HTML code there.

Using Anaconda prompt (Windows) or a Terminal window (Mac) run "python app.py". This deploys the app locally on http://127.0.0.1:5000/ (or similar) which you can then view on the browser.

Feel free to be creative and redesign the webpage by modifying the .css and .html pages.

**Bonus Exercise**: Redesign the webpage by modifying the .css and .html pages.

### Deployment via Heroku

So far you have deployed your model on your local machine. Now we seek to deploy it publicly.

There are two additional files needed for external deployment of your model: 
- requirements.txt includes the versions of packages that are to be used with the app. 
- Procfile specifies the processes to be run on the Heroku dyno (see https://blog.heroku.com/the_new_heroku_1_process_model_procfile). Dynos are virtualised Linux containers used to run web apps. 

In the Procfile you will see mention of `gunicorn`. Gunicorn (Green Unicorn) manages the Flask application. It is a Python HTTP server for applications over a Web Service Gateway Interface (WSGI). It allows one to run a Python application concurrently by running multiple processes on a single machine. Further information is at https://docs.gunicorn.org/en/stable/.

To update the `requirements.txt` file use the `__version__` attribute to see the version of packages being used. This ensures that your model is reproducible on other computing environments.

In [48]:
joblib.__version__

'1.0.1'

In [49]:
en_core_web_sm.__version__

'3.1.0'

Log into your GitHub account and create a new repository containing the following files.

- requirements.txt
- app.py
- Procfile
- model.joblib
- utils.py
- templates/ (folder containing index.html)
- static/ (folder containing css/style.css)

Next sign up for a free account at http://signup.heroku.com (a Platform As A Service). You will receive an email link to activate the account.

Once signed into heroku.com click on "Create new app". Choose a unique app name and leave the region as USA.

Next connect via GitHub to the repository you recently created. Then select Manual deploy -> Deploy Branch

Eventually it will say `https://<your app name>.herokuapp.com/ deployed to Heroku`. Navigate your browser to this location to see if your deployment was successful.

If deployment was unsuccessful it may be necessary to download Heroku's command line interface to view error messages. This is available at https://devcenter.heroku.com/articles/heroku-cli#download-and-install. Type `heroku login` at the command prompt or terminal window to start the command line interface.

If you managed to see your app successfully, congratulations! You now know how to deploy an app on the cloud.

Note that if working in part of a larger software system it is good practice to have versioning of code (e.g. with GitHub) and also make use of CI/CD software.

### References

More information on pipelines:
- https://gist.github.com/amberjrivera/8c5c145516f5a2e894681e16a8095b5c
- https://scikit-learn.org/stable/modules/compose.html#pipeline

More on Flask for web app deployment:
- https://flask.palletsprojects.com/en/1.1.x/quickstart/



---



---



> > > > > > > > > © 2021 Institute of Data


---



---



