# Deploying a Machine Learning Model into Production using Flask and Heroku

### Let's create our NLP Model

In [84]:
# import out dataset
from sklearn.datasets import fetch_20newsgroups

### About our dataset

The 20 newsgroups dataset comprises around 18000 newsgroups posts on 20 topics split in two subsets: one for training (or development) and the other one for testing (or for performance evaluation). The split between the train and test set is based upon a messages posted before and after a specific date.

- 'alt.atheism',
- 'comp.graphics',
- 'comp.os.ms-windows.misc',
- 'comp.sys.ibm.pc.hardware',
- 'comp.sys.mac.hardware',
- 'comp.windows.x',
- 'misc.forsale',
- 'rec.autos',
- 'rec.motorcycles',
- 'rec.sport.baseball',
- 'rec.sport.hockey',
- 'sci.crypt',
- 'sci.electronics',
- 'sci.med',
- 'sci.space',
- 'soc.religion.christian',
- 'talk.politics.guns',
- 'talk.politics.mideast',
- 'talk.politics.misc',
-'talk.religion.misc'

https://scikit-learn.org/0.19/datasets/twenty_newsgroups.html


In [85]:
# We extract only 5 categories
categories = ['alt.atheism','soc.religion.christian', 'talk.politics.misc', 'comp.graphics','sci.med']
news_train = fetch_20newsgroups(subset='train', categories=categories, shuffle=True)
news_test = fetch_20newsgroups(subset='test', categories=categories, shuffle=True)

In [86]:
news_train

Output hidden; open in https://colab.research.google.com to view.

In [87]:
# Create our Vectorizer and transform to create our Sparse Matrix input data
from sklearn.feature_extraction.text import CountVectorizer

count_vect = CountVectorizer()
X_train_tf = count_vect.fit_transform(news_train.data)

In [88]:
# Use our TF-IDF (Term Frequency Inverse Document Frequency)
from sklearn.feature_extraction.text import TfidfTransformer

tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_tf)

In [89]:
# Using the Naive Bayes algorithm, we train our model
from sklearn.naive_bayes import MultinomialNB

clf = MultinomialNB().fit(X_train_tfidf, news_train.target)

In [90]:
from sklearn.metrics import accuracy_score

Y_predict = clf.predict(X_train_tfidf)
score = accuracy_score(news_train.target, Y_predict)
print(score)

0.9577516531961793


In [95]:
# Let's test some input statements
docs_new = ['Anti-aliasing was turned on using OpenGL',
            'The earth was made in 7 days',
            'Biden can become the President of U.S']

X_new_counts = count_vect.transform(docs_new)
X_new_tfidf = tfidf_transformer.transform(X_new_counts)
predicted = clf.predict(X_new_tfidf)
predicted

array([1, 3, 4])

In [96]:
# Get the Actual Outputs
dict = {"0": "atheism", "1": "computer graphics", "2": "medical science", "3": "christianity", "4": "politics"}

for p in predicted:
  print(dict[str(p)])

computer graphics
christianity
politics


In [97]:
import pickle 

with open('model.pkl', 'wb') as f:
    pickle.dump(clf, f)

# 2. Quick Intro to Flask
![](https://flask.palletsprojects.com/en/1.1.x/_images/flask-logo.png)

**Flask** is a micro-web framework for Python that can be used to deploy web applications, APIs and full websites. A framework "is a code library that makes a developer's life easier when building reliable, scalable, and maintainable web applications" by providing reusable code or extensions for common operations.
https://flask.palletsprojects.com/en/1.1.x/

In [98]:
# Install ngrok as we can't run Flask natively without it 
# ngrok secure introspectable tunnels to localhost webhook development tool 
!pip install flask-ngrok



## Deploy our first Flask App

Wait around 3-5 seconds for the ngrok link to appear below

In [99]:
from flask_ngrok import run_with_ngrok
from flask import Flask

app = Flask(__name__)
run_with_ngrok(app)   #starts ngrok when the app is run

@app.route("/")
def home():
    return "<h1>We're running Flask!</h1>"
  
app.run()

 * Serving Flask app "__main__" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off


 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)


 * Running on http://dc18a7d092e6.ngrok.io
 * Traffic stats available on http://127.0.0.1:4040


127.0.0.1 - - [26/Oct/2020 18:13:03] "[37mGET / HTTP/1.1[0m" 200 -
127.0.0.1 - - [26/Oct/2020 18:13:04] "[33mGET /favicon.ico HTTP/1.1[0m" 404 -


Learn about Flask here - https://flask.palletsprojects.com/en/1.1.x/

# Instructions to clone the Flask App we're Deploying

1. Clone this repository - https://github.com/rajeevratan84/flaskapp
2. Create new repository on your Github with this code (watch video tutorial)
3. Go to Heroku and create an account
4. Create new App and Connect to your Github Repo
5. Manually Deploy from Pipeline


## Instructions for setting this up from scratch on your local machine

1. pip install virtualenv
2. Create new Environemnt - ```virtualenv myvenv```
3. Activate Virtual Env (Mac/Linux) ```source myvenv/bin/activate``` or in Windows ```myvenv\Scripts\activate```
4. cd flask_api
5. Create requirements.txt file in flask_api directory with only the following in that file.
```
pandas
Flask
scikit-learn
gunicorn
pytest
```
5. ```pip3 install -r requirements.txt```
6. ```mkdir app_test, cd app_test``` # This is a just a test app to ensure your environment and flask is working correctly
6. Create flask app.py file with the code below and test:

```
from flask_ngrok import run_with_ngrok
from flask import Flask

app = Flask(__name__)
run_with_ngrok(app)   #starts ngrok when the app is run

@app.route("/")
def home():
    return "<h1>We're running Flask!</h1>"
  
app.run()
```

9. Create repo on your GitHub (give it a name like, 'flaskapp').
9. Connect to this repo by ```git remote add flaskapp```
10. ```git remote add origin https://github.com/rajeevratan84/flaskapp.git```
11. ```Git add -A```
12. ```Git commit -m "adding all files to my repo"``
13. Go back to Heroku and login
14. Create new App in Heroku
15. Give Pipeline a new name
16. In Production, 
14. git branch -M main
15. git push -u origin main

16. Heroku - New App
17. Production give name to pipeline
18. In settings ensure Pipeline is connected to your GitHub repo (give permissions)
19. Manual Deploy, deploy main branch


# About Heroku's **Procfile**

Heroku requires Procfile to be present in your app root directory. It will tell Heroku how to run the application. Make sure it is a simple file with no extension. Procfile.txt is not valid. The part to the left of the colon is the process type and the part on the right is the command to run to start that process. In this, we can tell on which port the code should be deployed and you can start and stop these processes.

# About Gunicorn

We use Gunicorn as our HTTP Server. There are other options as well includign NGINX. We need these as Flask can't scale in production.