# Sentiment Analysis Web Application for Movie Reviews

## from Python Machine Learning text by Sebastian Raschka

#### Clean and check directory (for fresh run)

In [35]:
%%bash
rm -rf aclImdb*
rm movie_data.csv
pwd
ls

/Users/austin/Desktop/Sentiment_Analysis_Web_App
Sentiment Analysis Web App.ipynb


rm: movie_data.csv: No such file or directory


#### Download IMDb movie review dataset

In [36]:
%%bash
wget "http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz"
ls

Sentiment Analysis Web App.ipynb
aclImdb_v1.tar.gz


--2017-12-22 17:10:42--  http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
Resolving ai.stanford.edu... 171.64.68.10
Connecting to ai.stanford.edu|171.64.68.10|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 84125825 (80M) [application/x-gzip]
Saving to: ‘aclImdb_v1.tar.gz’

     0K .......... .......... .......... .......... ..........  0%  356K 3m51s
    50K .......... .......... .......... .......... ..........  0%  702K 2m54s
   100K .......... .......... .......... .......... ..........  0%  718K 2m34s
   150K .......... .......... .......... .......... ..........  0%  744K 2m23s
   200K .......... .......... .......... .......... ..........  0% 16.9M 1m55s
   250K .......... .......... .......... .......... ..........  0%  769K 1m54s
   300K .......... .......... .......... .......... ..........  0% 7.86M 99s
   350K .......... .......... .......... .......... ..........  0% 20.6M 87s
   400K .......... .......... .......... .......... ......

#### Decompress dataset

In [37]:
%%bash
tar -zxf aclImdb_v1.tar.gz
ls

Sentiment Analysis Web App.ipynb
aclImdb
aclImdb_v1.tar.gz


#### Print data documentation

In [38]:
%%bash
cat aclImdb/README

Large Movie Review Dataset v1.0

Overview

This dataset contains movie reviews along with their associated binary
sentiment polarity labels. It is intended to serve as a benchmark for
sentiment classification. This document outlines how the dataset was
gathered, and how to use the files provided. 

Dataset 

The core dataset contains 50,000 reviews split evenly into 25k train
and 25k test sets. The overall distribution of labels is balanced (25k
pos and 25k neg). We also include an additional 50,000 unlabeled
documents for unsupervised learning. 

In the entire collection, no more than 30 reviews are allowed for any
given movie because reviews for the same movie tend to have correlated
ratings. Further, the train and test sets contain a disjoint set of
movies, so no significant performance is obtained by memorizing
movie-unique terms and their associated with observed labels.  In the
labeled train/test sets, a negative review has a score <= 4 out of 10,
and a positive review has a scor

#### Assemble individual text docs from archive into single CSV file

In [39]:
import pyprind # python progress indicator
import pandas as pd
import os
pbar = pyprind.ProgBar(50000) # n iterations, number of docs to read in
labels = {'pos':1, 'neg':0}
df = pd.DataFrame()
for s in ('test','train'):
    for l in ('pos','neg'):
        path='./aclImdb/%s/%s' % (s,l)
        for file in os.listdir(path):
            #  with to: open a file, process its contents, and make sure to close it
            with open(os.path.join(path, file), 'r') as infile:
                txt = infile.read()
            df = df.append([[txt, labels[l]]], ignore_index=True)
            pbar.update()
df.columns = ['review', 'sentiment']

0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:01:18


#### Check dataframe

In [40]:
print(df.shape)
df.head(3)

(50000, 2)


Unnamed: 0,review,sentiment
0,"Based on an actual story, John Boorman shows t...",1
1,This is a gem. As a Film Four production - the...,1
2,"I really like this show. It has drama, romance...",1


#### Shuffle dataframe in order to split into train/test sets later

In [41]:
import numpy as np
np.random.seed(0)
df = df.reindex(np.random.permutation(df.index))

In [42]:
df.head(3)

Unnamed: 0,review,sentiment
11841,My family and I normally do not watch local mo...,1
19602,"Believe it or not, this was at one time the wo...",0
45519,"After some internet surfing, I found the ""Home...",0


#### Store dataset as CSV file for convenience (and confirm)

In [43]:
df.to_csv('./movie_data.csv', index=False)
df = pd.read_csv('./movie_data.csv')
df.head(3) # (notice reindexing)

Unnamed: 0,review,sentiment
0,My family and I normally do not watch local mo...,1
1,"Believe it or not, this was at one time the wo...",0
2,"After some internet surfing, I found the ""Home...",0


## Bag-of-words model (example)

representing text as numerical feature vectors:

1. Create a vocabulary of unique tokens (words from the entire set of documents)
2. Construct a (sparse) feature vector from each document that contains counts of how often each word occurs in the particular document

#### Transforming words into feature vectors

1-gram or unigram model: each item or token in the vocab represents a single word. For example,

**1-gram**: "the", "sun", "is", "shining"  
**2-gram**: "the sun", "sun is", "is shining"  

note: can switch to a 2-gram representation with the parameter `ngram_range=(2,2)`

In [72]:
from sklearn.feature_extraction.text import CountVectorizer

# Example
count = CountVectorizer()
docs = np.array([
    'The sun is shining',
    'The weather is sweet',
    'The sun is shining and the weather is sweet'])
bag = count.fit_transform(docs)
# print vocabulary
count.vocabulary_

{'and': 0, 'is': 1, 'shining': 2, 'sun': 3, 'sweet': 4, 'the': 5, 'weather': 6}

#### raw term frequencies (tf)

In [73]:
# index positions correspond to integer values stored in vocab dictionary
print(bag.toarray())

[[0 1 1 1 0 1 0]
 [0 1 0 0 1 1 1]
 [1 2 1 1 1 2 1]]


#### Assessing word relevancy via *term frequency-inverse document frequency (tf-idf)* 

Downweight words that occur across multiple documents from both classes which typically don't contain useful or discriminatory information

Note: tf-idf is the product of term frequency and inverse document frequency

#### inverse document frequency (idf)

$$\text{idf}(t,d)=\log{\frac{n_d}{1+\text{df}(d,t)}}$$
scikit-learn actually calculates as:
$$\text{idf}(t,d)=\log{\frac{1+n_d}{1+\text{df}(d,t)}}$$
$$\text{tf-idf}(t,d)=\text{tf}(t,d)x(\text{idf}(t,d)+1)$$

$n_d$ is the total number of documents  
$\text{df}(d,t)$ is the number of documents $d$ that contain the term $t$ 

Optionally adding 1 to the denominator assigns a non-zero value to terms that occur in all training examples ($\log{1}=0$).

The log is used to ensure that low document frequencies are not given too much weight.

#### `TfidfTransformer` takes raw term frequencies from `CountVectorizer` as input and transforms them into tf-idfs

for example, before normalization (`norm=None`),   
tf("is",d3)=2    
idf("is",d3)=log((1+3)/(1+3))=0  
tf-idf("is",d3)=2*(0+1)=2  
(then goes to 0.48 after l2 norm)

In [85]:
from sklearn.feature_extraction.text import TfidfTransformer
tfidf = TfidfTransformer() # l2 normalization by default
np.set_printoptions(precision=2)
tfidf.fit_transform(count.fit_transform(docs)).toarray()

array([[ 0.  ,  0.43,  0.56,  0.56,  0.  ,  0.43,  0.  ],
       [ 0.  ,  0.43,  0.  ,  0.  ,  0.56,  0.43,  0.56],
       [ 0.4 ,  0.48,  0.31,  0.31,  0.31,  0.48,  0.31]])

### Cleaning text data (strip unwanted characters)

In [195]:
# Example
df.loc[0, 'review'][:-500]

'My family and I normally do not watch local movies for the simple reason that they are poorly made, they lack the depth, and just not worth our time.<br /><br />The trailer of "Nasaan ka man" caught my attention, my daughter in law\'s and daughter\'s so we too'

#### Remove all punctuations with regex

In [219]:
import re
def preprocessor(text):
    re.sub('<[^>]*>', '', text) # remove HTML markup
    emoticons = re.findall('(?::|;|=)(?:-)?(?:\)|\(|D|P)', text) # find emoticons
    # remove all non-word chars, convert to lowercase, add emoticons to end, remove nose char (-)
    text = re.sub('[\W]+', ' ', text.lower()) +\
        ' '.join(emoticons).replace('-', '')
    return text

In [224]:
# Example 1
print(preprocessor(df.loc[0, 'review'][:-500]), '\n')
# Example 2
print(preprocessor("</a>This :) is :( a test :-)!"))

my family and i normally do not watch local movies for the simple reason that they are poorly made they lack the depth and just not worth our time br br the trailer of nasaan ka man caught my attention my daughter in law s and daughter s so we too 

 a this is a test :) :( :)


In [225]:
# Apply to all movie reviews in dataframe
df['reviw'] = df['review'].apply(preprocessor)

### Processing documents into tokens

In [226]:
# one way to tokenize:
def tokenizer(text):
    return text.split()
tokenizer('runners like running and thus they run')

['runners', 'like', 'running', 'and', 'thus', 'they', 'run']

#### Word stemming (transforming words into root form to map related words to same stem)

In [228]:
# Porter stemming algorithm implemented in NLTK
# oldest/simplest stemming algo, 
# other popular stemming algos are Snowball stemmer and Lancaster stemmer 
from nltk.stem.porter import PorterStemmer
porter = PorterStemmer()
def tokenizer_porter(text):
    return [porter.stem(word) for word in text.split()]
tokenizer_porter('runners like running and thus they run')

['runner', 'like', 'run', 'and', 'thu', 'they', 'run']

**Note**: While stemming can create non-real words, such as thu, (from thus) as shown in the previous example, a technique called lemmatization aims to obtain the canonical (grammatically correct) forms of individual words— the so-called lemmas. However, lemmatization is computationally more dif cult and expensive compared to stemming and, in practice, it has been observed that stemming and lemmatization have little impact on the performance of text classi cation

#### stop-word removal
Stop-words are words that are extremely common and contain little or no useful info to distinguish classes of documents. 
They're usually useful if working with raw or normalized term frequencies rather than tf-idfs which already downweight frequently occurring words.

In [233]:
nltk.download('stopwords')
from nltk.corpus import stopwords
stop = stopwords.words('english')

removing collection member with no package: hmm_treebank_pos_tagger
removing collection member with no package: hmm_treebank_pos_tagger
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/austin/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


In [239]:
# Example
[w for w in tokenizer_porter('a runner likes running and runs a lot') if w not in stop]

['runner', 'like', 'run', 'run', 'lot']

## Training a logistic regression model to classify movie reviews into positive and negative

In [240]:
# Split into train and test
X_train = df.loc[:25000, 'review'].values
y_train = df.loc[:25000, 'sentiment'].values
X_test = df.loc[25000:, 'review'].values
y_test = df.loc[25000:, 'sentiment'].values

In [245]:
# GridSearchCV for optimal parameters 
from sklearn.grid_search import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
# TfidfVectorizer is equivalent to CountVectorizer followed by TfidfTransformer
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(strip_accents=None, lowercase=False, preprocessor=None)
param_grid = [{'vect__ngram_range': [(1,1)],
             'vect__stop_words': [stop, None],
             'vect__tokenizer': [tokenizer, tokenizer_porter],
             'clf__penalty': ['l1', 'l2'],
             'clf__C': [1.0, 10.0, 100.0]},
             {'vect__ngram_range': [(1,1)],
              'vect__stop_words': [stop, None],
              'vect__tokenizer': [tokenizer, tokenizer_porter],
              'vect__use_idf':[False],
              'vect__norm':[None],
              'clf__penalty':['l1', 'l2'],
              'clf__C': [1.0, 10.0, 100.0]}]
lr_tfidf = Pipeline([('vect', tfidf),
                    ('clf',
                    LogisticRegression(random_state=0))])
gs_lr_tfidf = GridSearchCV(lr_tfidf, param_grid, scoring='accuracy', cv=5, verbose=1, n_jobs=-1)
gs_lr_tfidf.fit(X_train, y_train)

Fitting 5 folds for each of 48 candidates, totalling 240 fits


[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:  9.5min
[Parallel(n_jobs=-1)]: Done 184 tasks      | elapsed: 54.7min
[Parallel(n_jobs=-1)]: Done 240 out of 240 | elapsed: 73.7min finished


GridSearchCV(cv=5, error_score='raise',
       estimator=Pipeline(memory=None,
     steps=[('vect', TfidfVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=False, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), norm='l2', preprocessor=None, smooth_idf=True,
 ...nalty='l2', random_state=0, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False))]),
       fit_params={}, iid=True, n_jobs=-1,
       param_grid=[{'vect__ngram_range': [(1, 1)], 'vect__stop_words': [['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's...se_idf': [False], 'vect__norm': [None], 'clf__penalty': ['l1', 'l2'], 'clf__C': [1.0, 10.0, 100.0]}],
       pre_dispatch='2*n_jobs', refit=True, scoring='ac

Note: Above fitting takes over an hour

param_grid consists of two parameter dictionaries, one with default settings to calculate tf-idfs, and one with parameters set for raw term frequencies. For logistic regression classifier trained models using L1 and L2 regularization and a list of values for the inverse-regularization parameter.

In [247]:
print('Best parameter set: %s ' % gs_lr_tfidf.best_params_)

Best parameter set: {'clf__C': 10.0, 'clf__penalty': 'l2', 'vect__ngram_range': (1, 1), 'vect__stop_words': None, 'vect__tokenizer': <function tokenizer at 0x10e8598c8>} 


In [249]:
print('CV Accuracy: %.3f' % gs_lr_tfidf.best_score_)
clf = gs_lr_tfidf.best_estimator_
print('Test Accuracy: %.3f' % clf.score(X_test, y_test))

CV Accuracy: 0.885
Test Accuracy: 0.894


## Working with bigger data - online algorithms and out-of-core learning
out-of-core learning is a technique that allows for working with large datasets

Stochastic gradient descent is an optimization algorithm that updates the model's weights using one sample at a time. We will make use of the `partial_fit` functuon of the `SGDClassifier` in scikit-learn to stream the documents directly from our local drive and train a logistic regression model using small minibatches of documents.

#### Define tokenizer function that cleans the unprocessed text data from `movie_data.csv` and separates it into word tokens while removing stop words

In [1]:
import numpy as np
import re
from nltk.corpus import stopwords
stop = stopwords.words('english')
def tokenizer(text):
    text = re.sub('<[^>]*>', '', text)    
    emoticons = re.findall('(?::|;|=)(?:-)?(?:\)|\(|D|P)', text) # find emoticons
    # remove all non-word chars, convert to lowercase, add emoticons to end, remove nose char (-)
    text = re.sub('[\W]+', ' ', text.lower()) +\
        ' '.join(emoticons).replace('-', '')
    tokenized = [w for w in text.split() if w not in stop]
    return tokenized

#### Define generator function that reads in and returns one document at a time

In [3]:
def stream_docs(path):
    with open(path, 'r') as csv:
        next(csv) # skip header
        for line in csv:
            text, label = line[:-3], int(line[-2]) # -3: ',1\n' , -2: 1
            yield text, label # like return, but one at a time

In [4]:
# Example
next(stream_docs(path='./movie_data.csv'))

('"My family and I normally do not watch local movies for the simple reason that they are poorly made, they lack the depth, and just not worth our time.<br /><br />The trailer of ""Nasaan ka man"" caught my attention, my daughter in law\'s and daughter\'s so we took time out to watch it this afternoon. The movie exceeded our expectations. The cinematography was very good, the story beautiful and the acting awesome. Jericho Rosales was really very good, so\'s Claudine Barretto. The fact that I despised Diether Ocampo proves he was effective at his role. I have never been this touched, moved and affected by a local movie before. Imagine a cynic like me dabbing my eyes at the end of the movie? Congratulations to Star Cinema!! Way to go, Jericho and Claudine!!"',
 1)

#### Define function to take document stream and return particular number of docs specified by size parameter

In [5]:
def get_minibatch(doc_stream, size):
    docs, y = [], []
    try:
        for _ in range(size):
            text, label = next(doc_stream)
            docs.append(text)
            y.append(label)
    except StopIteration: # Signal the end from iterator
        return None, None
    return docs, y

#### Out-of-core text vectorizing

Can't use `CountVectorizer` or `TfidfVectorizer` for out-of-core learning since it requires holding the complete vocaulary in memory. However, can use **[`HashingVectorizer`](http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.HashingVectorizer.html)**. It's data-independent and makes use of the Hashing trick via 32-bit MurmurHash3 algorithm. 

Pros:
- it is very low memory scalable to large datasets as there is no need to store a vocabulary dictionary in memory
- it is fast to pickle and un-pickle as it holds no state besides the constructor parameters
- it can be used in a streaming (partial fit) or parallel pipeline as there is no state computed during fit.

Cons:
- there is no way to compute the inverse transform (from feature indices to string feature names) which can be a problem when trying to introspect which features are most important to a model.
- there can be collisions: distinct tokens can be mapped to the same feature index. However in practice this is rarely an issue if n_features is large enough (e.g. 2\**18 for text classification problems).
- no IDF weighting as this would render the transformer stateful.

In [6]:
from sklearn.feature_extraction.text import HashingVectorizer
from sklearn.linear_model import SGDClassifier # ‘log’ loss gives logistic regression (probabilistic)
vect = HashingVectorizer(decode_error='ignore',
                         n_features=2**21, # reduce chance of hash collisions but increase coefficients
                         preprocessor=None,
                         tokenizer=tokenizer)

clf = SGDClassifier(loss='log', random_state=1, n_iter=1)
doc_stream = stream_docs(path='./movie_data.csv')

#### Start out-of-core learning

In [7]:
import pyprind
pbar = pyprind.ProgBar(45) # 45*1000 = 45000 training, so 5000 left for testing
classes = np.array([0, 1])
for _ in range(45):
    X_train, y_train = get_minibatch(doc_stream, size=1000)
    if not X_train:
        break
    X_train = vect.transform(X_train) # HashingVectorizer
    clf.partial_fit(X_train, y_train, classes=classes)
    pbar.update()



0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:00:23


In [8]:
X_test, y_test = get_minibatch(doc_stream, size=5000)
X_test = vect.transform(X_test)
print('Accuracy: %.3f' % clf.score(X_test, y_test))

Accuracy: 0.868


Slightly below the accuracy using gridsearch, but very memory-efficient and took far less time to complete.

Can finally use the last 5,000 documents to update the model:

In [9]:
clf = clf.partial_fit(X_test, y_test)



**Note**: bag-of-words is most common model for text classification, but it doesn't consider sentence structure. A popular extension to bag-of-words is **Latent Dirichlet allocation** which is a topic model that considers the latent semantics of words.

A more modern alternative to bag-of-words model is **word2vec** from Google in 2013. word2vec is an unsupervised algorithm based on neural networks that attempts to automatically learn the relationship between words. Clusters words with similar meanings via vector-spacing and can reporudce certain words using simple vector math, for example, king - man + woman = queen.

# Embedding a Machine Learning Model into a Web App

- Saving the current state of a trained machine learning model
- Using SQLite databases for storage
- Developing a web application using Flask web framework
- Deploying a machine learning application to a public web server

## Serializing (pickling) fitted scikit-learn estimators
Python pickle for model persistence

Create a `movieclassifier` directory to later store files and data for web app  
within directory, create `pkl_objects` subdirectory to save serialized python objects  
use dump method to serialize stopwords and 
'wb' = binary mode for pickle  
protocol=4 for latest pickle (py 3.4) protocol 

In [10]:
import pickle
import os
dest = os.path.join('movieclassifier', 'pkl_objects')
if not os.path.exists(dest):
    os.makedirs(dest)
pickle.dump(stop, open(os.path.join(dest, 'stopwords.pkl'), 'wb'), protocol=4)
pickle.dump(clf, open(os.path.join(dest, 'classifier.pkl'), 'wb'), protocol=4) # logistic clf

In [506]:
%%bash
ls

Sentiment Analysis Web App.ipynb
aclImdb
aclImdb_v1.tar.gz
app.py
movie_data.csv
movieclassifier


Note: A more efficient way to serialize NumPy arrays is to use the alternative joblib library, but to ensure compatibility with the server environment later on, will instead use standard pickle approach.

`HashingVectorizer` doesn't get fitted, so there's no need to pickle it. Just need to save it to a file:

In [None]:
%%writefile vectorizer.py

from sklearn.feature_extraction.text import HashingVectorizer
import re
import os
import pickle

cur_dir = os.path.dirname(__file__)
stop = pickle.load(open(os.path.join(cur_dir, 'pkl_objects', 'stopwords.pkl'), 'rb'))

def tokenizer(text):
    text = re.sub('<[^>]*>', '', text)    
    emoticons = re.findall('(?::|;|=)(?:-)?(?:\)|\(|D|P)', text) # find emoticons
    # remove all non-word chars, convert to lowercase, add emoticons to end, remove nose char (-)
    text = re.sub('[\W]+', ' ', text.lower()) +\
        ' '.join(emoticons).replace('-', '')
    tokenized = [w for w in text.split() if w not in stop]
    return tokenized

vect = HashingVectorizer(decode_error='ignore',
                         n_features=2**21, # reduce chance of hash collisions but increase coefficients
                         preprocessor=None,
                         tokenizer=tokenizer)

#### test to make sure can deserialize objects without error

In [331]:
import pickle
import re
import os
print(os.getcwd())
os.chdir('./movieclassifier/')
print(os.getcwd())

/Users/austin/Desktop/Sentiment_Analysis_Web_App
/Users/austin/Desktop/Sentiment_Analysis_Web_App/movieclassifier


In [332]:
import numpy as np
label = {0:'negative', 1:'positive'}
example = ['I love this movie']
X = vect.transform(example)
print('Prediction: %s\nProbability: %.2f%%' %\
      (label[clf.predict(X)[0]],
      np.max(clf.predict_proba(X))*100))

Prediction: positive
Probability: 86.74%


## Setting up a SQLite database for data storage
Set up a SQLite database to collect optional feedback about the predictions from users of the web app. Can use this feedback to update classification model.

SQLite is an open source SQL database engine that doesn't require a separate server to operate. Can be understood as a single, self-contained database file that allows direct access to storage files.

#### Create new SQLite database inside `movieclassifier` directory and store to example movie reviews

In [340]:
import sqlite3
import os
conn = sqlite3.connect('reviews.sqlite') # connect to SQLite db file (create if doesn't exist)
c = conn.cursor() # create cursor to traverse over db records using SQL syntax
# create new db table with three columns
c.execute('CREATE TABLE review_db'\
          ' (review TEXT, sentiment INTEGER, date TEXT)')
# Example 1
example1 = 'I Love this movie'
# pass tuples to positional arguments (?)
c.execute("INSERT INTO review_db"\
          " (review, sentiment, date) VALUES"\
          " (?, ?, DATETIME('now'))", (example1, 1))
# Example 2
example2 = 'I disliked this movie'
c.execute("INSERT INTO review_db"\
          " (review, sentiment, date) VALUES"\
          " (?, ?, DATETIME('now'))", (example2,0))
conn.commit() # save changes
conn.close() # close connection

#### check if entries have been stored in db table correctly

In [341]:
conn = sqlite3.connect('reviews.sqlite')
c = conn.cursor()
c.execute("SELECT * FROM review_db WHERE date"\
         " BETWEEN '2015-01-01 00:00:00' AND DATETIME('now')")
results = c.fetchall()
conn.close()
print(results)

[('I Love this movie', 1, '2017-12-24 19:56:50'), ('I disliked this movie', 0, '2017-12-24 19:56:50')]


## Developing a web application with Flask
Flask is a microframework, which means that its core is kept lean and simple but can be easily extended with other libraries.

#### Simple web application example to become familiar with the flask API

Basic directory tree:

1st_flask_app_1/  
&nbsp;&nbsp;&nbsp;&nbsp;app.py  
&nbsp;&nbsp;&nbsp;&nbsp;templates/  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;first_app.html  

In [344]:
%%bash
mkdir 1st_flask_app_1
cd 1st_flask_app_1
touch app.py
mkdir templates
pwd
ls
cd templates
touch first_app.html
pwd
ls

/Users/austin/Desktop/Sentiment_Analysis_Web_App/movieclassifier/1st_flask_app_1
app.py
templates
first_app.html


In [363]:
%%writefile 1st_flask_app_1/app.py

from flask import Flask, render_template

app = Flask(__name__) # initialize a new Flask instance (template in same directory)

@app.route('/') # route decorator to specify the URL to trigger execution of the index function
def index(): # renders the HTML file first_app.html located in templates folder
    return render_template('first_app.html')

if __name__ == '__main__': # ensure below
    app.run() # only run app on server when this script is directly executed by Py interpreter

Overwriting 1st_flask_app_1/app.py


In [364]:
cat 1st_flask_app_1/app.py


from flask import Flask, render_template

app = Flask(__name__) # initialize a new Flask instance (template in same directory)

@app.route('/') # route decorator to specify the URL to trigger execution of the index function
def index(): # renders the HTML file frist_app.html located in templates folder
    return render_template('first_app.html')

if __name__ == '__main__': # ensure below
    app.run() # only run app on server when this script is directly executed by Py interpreter

`app.py` contains the main code that will be executed by the Python interpreter to run the Flask web app. `templates` directory is the directory which Flask will look for static HTML files for rendering in the web browser.

In `app.py`, run app as a single module, so initialized a new Flask instance with the argument `__name__` to let Flask know that it can find the HTML template folder in the same directory where it is located.

#### HTML file to render web browser

In [357]:
%%writefile 1st_flask_app_1/templates/first_app.html

<!doctype html>
<html>
  <head>
    <title>First app</title>
  </head>
  <body>
  <div>Hi, this is my first Flask web app!</div>
  </body>
</html>

Overwriting 1st_flask_app_1/templates/first_app.html


`div` element is a block level element. Flask allows running apps locally which is useful for developing and testing web apps before deployment.

double check:

In [360]:
%%bash
cat 1st_flask_app_1/templates/first_app.html


<!doctype html>
<html>
  <head>
    <title>First app</title>
  </head>
  <body>
  <div>Hi, this is my first Flask web app!</div>
  </body>
</html>

#### Start web app

In [None]:
%%bash
cd 1st_flask_app_1/
python3 app.py

### Example 2: Form validation and rendering
extending Flask web app with HTML form elements to collect data from users using WTForms library

New directory structure:

1st_flask_app_2/  
&nbsp;&nbsp;&nbsp;&nbsp;app.py  
&nbsp;&nbsp;&nbsp;&nbsp;static/  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;style.css  
&nbsp;&nbsp;&nbsp;&nbsp;templates/  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;first_app.html  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;_formhelpers.html  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;hello.html  

In [374]:
%%bash
mkdir 1st_flask_app_2
cd 1st_flask_app_2
mkdir static
mkdir templates
pwd
ls

/Users/austin/Desktop/Sentiment_Analysis_Web_App/movieclassifier/1st_flask_app_2
static
templates


In [None]:
%%writefile 1st_flask_app_2/app.py

from flask import Flask, render_template, request
from wtforms import Form, TextAreaField, validators

app = Flask(__name__)

class HelloForm(Form):
    sayhello = TextAreaField('', [validators.DataRequired()]) # wtforms
    
@app.route('/')
def index():
    form = HelloForm(request.form)
    return render_template('first_app.html', form=form)

@app.route('/hello', methods=['POST'])
def hello():
    form = HelloForm(request.form)
    if request.method == 'POST' and form.validate():
        name = request.form['sayhello']
        return render_template('hello.html', name=name)
    return render_template('first_app.html', form=form)

if __name__ == '__main__':
    app.run(debug=True)

Extend `index` function with `wtforms` to include a text field to embed into start page using `textAreaField` class, which automatically checks whether a user has provided a valid input text or not. 

Defined new function `hello` to render an HTML page `hello.html` if the form has been validated. `POST` method used to transport the form data to the server in the message body. 

`debug=TRUE` to activate Flask's debugger. Useful feature for developing new web apps.

In [377]:
cat 1st_flask_app_2/app.py


from flask import Flask, render_template, request
from wtforms import Form, TextAreaField, validators

app = Flask(__name__)

class HelloForm(Form):
    sayhello = TextAreaField('', [validators.DataRequired()]) # wtforms
    
@app.route('/')
def index():
    form = HelloForm(request.form)
    return render_template('first_app.html', form=form)

@app.route('/hello', methods=['POST'])
def hello():
    form = HelloForm(request.form)
    if request.method == 'POST' and form.validate():
        name = request.form['sayhello']
        return render_template('hello.html', name=name)
    return render_template('first_app.html', form=form)

if __name__ == '__main__':
    app.run(debug=True)

#### generic macro in file `_formhelpers.html` with Jina2 templating engine to import `first_app.html` file and render text field

In [None]:
%%writefile 1st_flask_app_2/templates/_formhelpers.html

{% macro render_field(field) %}
  <dt>{{ field.label }}
  <dd>{{ field(**kwargs)|safe }} 
  {% if field.errors %}
    <ul class=errors>
    {% for error in field.errors %}
      <li>{{ error }}</li>
    {% endfor %}
    </ul>
  {% endif %}
  </dd>
{% endmacro %}

Jinja2 is a modern and designer-friendly templating language for Python, modelled after Django’s templates. Jinja is Flask's default template engine.

A web template system is used in web publishing to allow web designers and developers to work with web templates for the automatic generation of custom web pages, such as the results from a search. This allows for reuse of the static elements of a web page, while allowing the dynamic elements to be defined based on the parameters of the web request. Web templates are also used in the creation of static content, providing a basic structure and appearance characteristic for web content. It can be present in content management systems, web application frameworks, and HTML editors.

In [381]:
cat 1st_flask_app_2/templates/_formhelpers.html


{% macro render_field(field) %}
  <dt>{{ field.label }}
  <dd>{{ field(**kwargs)|safe }} 
  {% if field.errors %}
    <ul class=errors>
    {% for error in field.errors %}
      <li>{{ error }}</li>
    {% endfor %}
    </ul>
  {% endif %}
  </dd>
{% endmacro %}

#### Cascading style sheets (CSS) file to modify the look of the HTML document

double font size of HTML body elements
`static` is the default directory where Flask looks for static files such as CSS

In [399]:
%%writefile 1st_flask_app_2/static/style.css

body {
    font-size: 2em;
}

Overwriting 1st_flask_app_2/static/style.css


In [383]:
cat 1st_flask_app_2/static/style.css


body {
    font-size: 2em;
}

####  `first_app.html` to render text form where user can enter a name

In [415]:
%%writefile 1st_flask_app_2/templates/first_app.html

<!doctype html>
<html>
  <head>
    <title>First app</title>
    <link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
  </head>
  <body>
    
{% from "_formhelpers.html" import render_field %}

<div>What's your name?</div>
<form method=post action="/hello">
  <dl>
    {{ render_field(form.sayhello) }}
  </dl>
  <input type=submit value='Say Hello' name='submit_btn'>
</form>
  </body>
</html>

Overwriting 1st_flask_app_2/templates/first_app.html


In [416]:
cat 1st_flask_app_2/templates/first_app.html


<!doctype html>
<html>
  <head>
    <title>First app</title>
    <link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
  </head>
  <body>
    
{% from "_formhelpers.html" import render_field %}

<div>What's your name?</div>
<form method=post action="/hello">
  <dl>
    {{ render_field(form.sayhello) }}
  </dl>
  <input type=submit value='Say Hello' name='submit_btn'>
</form>
  </body>
</html>

Load CSS file in header to alter size of text elements in HTML body. In HTML body section imported the form macro from `_formhelpers.html` and rendered the `sayhello` form specified in the `app.py` file. Also added button to same form element so user can submit text field entry.

#### create `hello.html` file to be rendered inside `hello` function defined in `app.py` script to display text that user submitted.

In [403]:
%%writefile 1st_flask_app_2/templates/hello.html

<!doctype html>
<html>
  <head>
    <title>First app</title>
  <link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
  </head>
  <body>

<div>Hello {{ name }}</div>
  </body>
</html>

Overwriting 1st_flask_app_2/templates/hello.html


In [396]:
cat 1st_flask_app_2/templates/hello.html


<!doctype html>
<html>
  <head>
    <title>First app</title>
  <link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
  </head>
  <body>

<div>Hello {{ name }}</div>
  </body>
</html>

#### Execute Flask web app with `python3 app.py`

In [417]:
%%bash
python3 1st_flask_app_2/app.py

Process is interrupted.


http://127.0.0.1:5000/

<br><br><br>
# Turning the movie classifier into a web application
After getting some familiarity with the basics of Flask web development, now will implement movie classifier into a web application. 

First prompt user to enter a movie review. After the review has been submitted, user will see a new page that shows the predicted class label and the probability of the prediction. User will be able to provide feedback about this prediction with correct/incorrect buttons. Classification model will be updated with respect to user feedback. Will also store movie review text provided by user as well as suggested class label, in a SQLite database for future reference. Third page will be a thank you screen wiyth a submit another review button to redirect user back to start page.

**Directory tree:**  
app.py  
vectorizer.py  
reviews.sqlite  
pkl_objects/  
&nbsp;&nbsp;&nbsp;&nbsp;classifier.pkl  
&nbsp;&nbsp;&nbsp;&nbsp;stopwords.pkl  
static/  
&nbsp;&nbsp;&nbsp;&nbsp;style.css  
templates/  
&nbsp;&nbsp;&nbsp;&nbsp;_formhelpers.html  
&nbsp;&nbsp;&nbsp;&nbsp;results.html  
&nbsp;&nbsp;&nbsp;&nbsp;reviewform.html  
&nbsp;&nbsp;&nbsp;&nbsp;thanks.html  

Already created `vectorizer.py`, `reviews.sqlite`, and `pkl_objects` subdirectory in previous section

In [435]:
ls

[1m[34m1st_flask_app_1[m[m/ [1m[34m__pycache__[m[m/     reviews.sqlite   [1m[34mtemplates[m[m/
[1m[34m1st_flask_app_2[m[m/ [1m[34mpkl_objects[m[m/     [1m[34mstatic[m[m/          vectorizer.py


#### app.py

In [493]:
%%writefile app.py

# First half:
# import python modules and objects, code to unpickle and set up classification model

from flask import Flask, render_template, request
from wtforms import Form, TextAreaField, validators
import pickle
import sqlite3
import os
import numpy as np
# import HashingVectorizer from local dir
from vectorizer import vect

app = Flask(__name__)

###### PREPARING THE CLASSIFIER
cur_dir = os.path.dirname(__file__)
# Note: clf object will be reset to original pickled state if web app restarts
clf = pickle.load(open(os.path.join(cur_dir, 'pkl_objects/classifier.pkl'), 'rb'))
db = os.path.join(cur_dir, 'reviews.sqlite')

# return predicted class label and corresponding probability
def classify(document):
    label = {0: 'negative', 1: 'positive'}
    X = vect.transform([document])
    y = clf.predict(X)[0]
    proba = np.max(clf.predict_proba(X))
    return label[y], proba

# used to update the classifier provided a document and class label
def train(document, y):
    X = vect.transform([document])
    clf.partial_fit(X, [y], classes=np.array([0, 1])) # ADDED classes thing
    
    
# store submitted movie review in SQLite database along with label and timestamp for record    
def sqlite_entry(path, document, y):
    conn = sqlite3.connect(path)
    c = conn.cursor()
    c.execute("INSERT INTO review_db (review, sentiment, date)"\
              " VALUES (?, ?, DATETIME('now'))", (document, y))
    conn.commit()
    conn.close()
    
# Second half:
# ReviewForm class instantiates a TextAreaField which will be rendered in reviewform.html
# template file (landing page) which will be rendered by index function. 
# validators.length(min=15) to require at least 15 characters
# results function fetches contents of submitted web form and passes it to classifier to predict
# gets displayed in rendered results.html template
# feedback function getches predicted class label from results.html template if user clicked on
# correct/incorrect feedback button, then transforms predicted sentiment back into int class 
# label to use used to update classifier via train function implemented above.
# new entry to SQLite db  made via sqlite_entry function if feedback provided
# thanks.html template rendered to thank user for feedback

class ReviewForm(Form):
    moviereview = TextAreaField('', [validators.DataRequired(), validators.length(min=15)])
    
@app.route('/')    
def index():
    form = ReviewForm(request.form)
    return render_template('reviewform.html', form=form)

@app.route('/results', methods=['POST'])
def results():
    form = ReviewForm(request.form)
    if request.method == 'POST' and form.validate():
        review = request.form['moviereview']
        y, proba = classify(review)
        return render_template('results.html', 
                               content=review, 
                               prediction=y, 
                               probability=round(proba*100, 2))
    return render_template('reviewform.html', form=form)

@app.route('/thanks', methods=['POST'])
def feedback():
    feedback = request.form['feedback_button']
    review = request.form['review']
    prediction = request.form['prediction']
    inv_label = {'negative': 0, 'positive': 1}
    y = inv_label[prediction]
    if feedback == 'Incorrect':
        y = int(not(y))
    train(review, y)
    sqlite_entry(db, review, y)
    return render_template('thanks.html')

if __name__ == '__main__':
    app.run(debug=True)

Writing app.py


In [494]:
cat app.py


# First half:
# import python modules and objects, code to unpickle and set up classification model

from flask import Flask, render_template, request
from wtforms import Form, TextAreaField, validators
import pickle
import sqlite3
import os
import numpy as np
# import HashingVectorizer from local dir
from vectorizer import vect

app = Flask(__name__)

###### PREPARING THE CLASSIFIER
cur_dir = os.path.dirname(__file__)
# Note: clf object will be reset to original pickled state if web app restarts
clf = pickle.load(open(os.path.join(cur_dir, 'pkl_objects/classifier.pkl'), 'rb'))
db = os.path.join(cur_dir, 'reviews.sqlite')

# return predicted class label and corresponding probability
def classify(document):
    label = {0: 'negative', 1: 'positive'}
    X = vect.transform([document])
    y = clf.predict(X)[0]
    proba = np.max(clf.predict_proba(X))
    return label[y], proba

# used to update the classifier provided a document and class label
def train(do

#### reviewform.html template (starting page of application)

imported `_formhelpers.html` template. `render_field` function of this macro is used to render a `TextFieldArea` where user can provide movie review and submit it via Submit review button.

In [468]:
%%writefile templates/reviewform.html

<!doctype html>
<html>
  <head>
    <title>Movie Classification</title>
	<link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
  </head>
  <body>

<h2>Please enter your movie review:</h2>

{% from "_formhelpers.html" import render_field %}

<form method=post action="/results">
  <dl>
	{{ render_field(form.moviereview, cols='30', rows='10') }}
  </dl>
  <div>
	  <input type=submit value='Submit review' name='submit_btn'>
  </div>
</form>

  </body>
</html>

Overwriting templates/reviewform.html


#### _formhelpers.html template

In [459]:
%%writefile templates/_formhelpers.html

{% macro render_field(field) %}
  <dt>{{ field.label }}
  <dd>{{ field(**kwargs)|safe }}
  {% if field.errors %}
    <ul class=errors>
    {% for error in field.errors %}
      <li>{{ error }}</li>
    {% endfor %}
    </ul>
  {% endif %}
  </dd>
{% endmacro %}

Overwriting templates/_formhelpers.html


In [460]:
cat templates/_formhelpers.html


{% macro render_field(field) %}
  <dt>{{ field.label }}
  <dd>{{ field(**kwargs)|safe }}
  {% if field.errors %}
    <ul class=errors>
    {% for error in field.errors %}
      <li>{{ error }}</li>
    {% endfor %}
    </ul>
  {% endif %}
  </dd>
{% endmacro %}

#### results.html template

First insert submitted review as well as results of prediction in fields `{{ content }}`, `{{ prediction }}`, annd `{{ probability }}`.  
imported CSS file which limits width of web app contents to 600 pixels and moves correct/incorrect buttons down by 20 pixels  

In [442]:
%%writefile templates/results.html

<!doctype html>
<html>
  <head>
    <title>Movie Classification</title>
  <link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
  </head>
  <body>

<h3>Your movie review:</h3>
<div>{{ content }}</div>

<h3>Prediction:</h3>
<div>This movie review is <strong>{{ prediction }}</strong>
  (probability: {{ probability }}%).</div>
    
<div id='button'>
  <form action="/thanks" method="post">
    <input type=submit value='Correct' name='feedback_button'>
    <input type=submit value='Incorrect' name='feedback_button'>
    <input type=hidden value='{{ prediction }}' name='prediction'>
    <input type=hidden value='{{ content }}' name='review'>
  </form>
</div>

<div id='button'>
  <form action="/">
    <input type=submit value='Submit another review'>
  </form>
</div>

  </body>
</html>

Writing templates/results.html


In [443]:
cat templates/results.html


<!doctype html>
<html>
  <head>
    <title>Movie Classification</title>
  <link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
  </head>
  <body>

<h3>Your movie review:</h3>
<div>{{ content }}</div>

<h3>Prediction:</h3>
<div>This movie review is <strong>{{ prediction }}</strong>
  (probability: {{ probability }}%).</div>
    
<div id='button'>
  <form action="/thanks" method="post">
    <input type=submit value='Correct' name='feedback_button'>
    <input type=submit value='Incorrect' name='feedback_button'>
    <input type=hidden value='{{ prediction }}' name='prediction'>
    <input type=hidden value='{{ content }}' name='review'>
  </form>
</div>

<div id='button'>
  <form action="/">
    <input type=submit value='Submit another review'>
  </form>
</div>

  </body>
</html>

#### thanks.html template
provides thank you message after user provides feedback and Submit another review button at bottom to redirect to starting page

In [473]:
%%writefile templates/thanks.html

<!doctype html>
<html>
  <head>
    <title>Movie Classification</title>
</head>
  <body>
    
<h3>Thank you for your feedback!</h3>
<div id='button'>
  <form action="/">
    <input type=submit value='Submit another review'>
  </form>
</div>

  </body>
</html>

Overwriting templates/thanks.html


#### CSS file

In [474]:
%%writefile static/style.css

body{
  width:600px;
} 
#button{
  padding-top: 20px;
}

Writing static/style.css


#### Start web app locally to test

In [475]:
%%bash
python3 app.py

Process is interrupted.


http://127.0.0.1:5000/

## Deploying the web application to a public server

https://www.pythonanywhere.com/user/austinmw/webapps/#tab_id_austinmw_pythonanywhere_com

Using **PythonAnywhere** web hosting service which specializes in hosting Python web applications making it extremely simple. Also offers beginner account option that allows running single web application free of charge.

Free beginner account doesn't allow access to remote server via SSH from terminal, so need to use web inteface to manage web application.  
First need to create a new web application for PythonAnywhere account by clicking on Dashboard button in top-right, then Add a new web app in Web tab to create new Python 3.4 Flask web application to be named `movieclassifer`.

Create directories and upload all files, then click Reload in Web tab to apply changes and refresh web app.

Web app will live at http://austinmw.pythonanywhere.com

## Updating the movie review classifier

Predictive model is updated on-the-fly with user feedback, but the updates to `clf` object are rest if the web server crashes or restarts. One option to apply updates permanently would be to pickle the `clf` object once again after each update. This would become computationally inefficient with growing number of users and could corrupt pickle file if users provide feedback simultaneously. 

Alternative solution is to update the predictive model from the feedback data that is being collected in the SQLite database. One option is to download SQLite db from PythonAnywhere server, update `clf` object locally, and upload new pickle file to PythonAnywhere. To update classifier locally create an `update.py` script in the `movieclassifier` directory:

#### update.py

In [13]:
os.chdir('movieclassifier/')

In [14]:
pwd

'/Users/austin/Desktop/Sentiment_Analysis_Web_App/movieclassifier'

In [15]:
%%writefile update.py
import pickle
import sqlite3
import numpy as np
import os

# import HashingVectorizer from local dir
from vectorizer import vect

def update_model(db_path, model, batch_size=10000):
    
    conn = sqlite3.connect(db_path)
    c = conn.cursor()
    c.execute('SELECT * from review_db')
    results = c.fetchmany(batch_size)
    while results:
        data = np.array(results)
        X = data[:, 0]
        y = data[:, 1].astype(int)
        
        classes = np.array([0, 1])
        X_train = vect.transform(X)
        clf.partial_fit(X_train, y, classes=classes)
        results = c.fetchmany(batch_size)
        
    conn.close()
    return None

cur_dir = os.path.dirname(__file__)

clf = pickle.load(open(os.path.join(cur_dir, 
                                    'pkl_objects',
                                    'classifier.pkl'), 'rb'))
db = os.path.join(cur_dir, 'reviews.sqlite')

update_model(db_path=db, model=clf, batch_size=10000)

# Uncomment the following lines if you are sure that 
# you want to update your classifier.pkl file permanently

#pickle.dump(clf, open(os.path.join(cur_dur, 
#                                   'pkl_objects',
#                                   'classifier,pkl'), 'wb'), protocol=4)

Writing update.py


The update_model function will fetch entries from the SQLite database in batches of 10,000 entries at a time unless the database contains fewer entries. Alternatively, could also fetch one entry at a time by using `fetchone` instead of `fetchmany`, but that would be very inefficient. 

#### Update app.py

Now can import `update_model` function into `app.py` to update classifier from SQLite database every time web app is restarted. Just need to add line of code to import `update_model` function from `update.py` script at top of `app.py`.

In [17]:
%%writefile app.py

# First half:
# import python modules and objects, code to unpickle and set up classification model

from flask import Flask, render_template, request
from wtforms import Form, TextAreaField, validators
import pickle
import sqlite3
import os
import numpy as np
# import HashingVectorizer from local dir
from vectorizer import vect
# import update function from local dir
from update import update_model

app = Flask(__name__)

###### PREPARING THE CLASSIFIER
cur_dir = os.path.dirname(__file__)
# Note: clf object will be reset to original pickled state if web app restarts
clf = pickle.load(open(os.path.join(cur_dir, 'pkl_objects/classifier.pkl'), 'rb'))
db = os.path.join(cur_dir, 'reviews.sqlite')

# return predicted class label and corresponding probability
def classify(document):
    label = {0: 'negative', 1: 'positive'}
    X = vect.transform([document])
    y = clf.predict(X)[0]
    proba = np.max(clf.predict_proba(X))
    return label[y], proba

# used to update the classifier provided a document and class label
def train(document, y):
    X = vect.transform([document])
    clf.partial_fit(X, [y], classes=np.array([0, 1])) # ADDED classes thing
    
    
# store submitted movie review in SQLite database along with label and timestamp for record    
def sqlite_entry(path, document, y):
    conn = sqlite3.connect(path)
    c = conn.cursor()
    c.execute("INSERT INTO review_db (review, sentiment, date)"\
              " VALUES (?, ?, DATETIME('now'))", (document, y))
    conn.commit()
    conn.close()
    
# Second half:
# ReviewForm class instantiates a TextAreaField which will be rendered in reviewform.html
# template file (landing page) which will be rendered by index function. 
# validators.length(min=15) to require at least 15 characters
# results function fetches contents of submitted web form and passes it to classifier to predict
# gets displayed in rendered results.html template
# feedback function getches predicted class label from results.html template if user clicked on
# correct/incorrect feedback button, then transforms predicted sentiment back into int class 
# label to use used to update classifier via train function implemented above.
# new entry to SQLite db  made via sqlite_entry function if feedback provided
# thanks.html template rendered to thank user for feedback

class ReviewForm(Form):
    moviereview = TextAreaField('', [validators.DataRequired(), validators.length(min=15)])
    
@app.route('/')    
def index():
    form = ReviewForm(request.form)
    return render_template('reviewform.html', form=form)

@app.route('/results', methods=['POST'])
def results():
    form = ReviewForm(request.form)
    if request.method == 'POST' and form.validate():
        review = request.form['moviereview']
        y, proba = classify(review)
        return render_template('results.html', 
                               content=review, 
                               prediction=y, 
                               probability=round(proba*100, 2))
    return render_template('reviewform.html', form=form)

@app.route('/thanks', methods=['POST'])
def feedback():
    feedback = request.form['feedback_button']
    review = request.form['review']
    prediction = request.form['prediction']
    inv_label = {'negative': 0, 'positive': 1}
    y = inv_label[prediction]
    if feedback == 'Incorrect':
        y = int(not(y))
    train(review, y)
    sqlite_entry(db, review, y)
    return render_template('thanks.html')

if __name__ == '__main__':
    app.run(debug=True)
    update_model(filepath=db, model=clf, batch_size=10000)

Overwriting app.py


#### Check db for current user feedback

In [31]:
import sqlite3

conn = sqlite3.connect('reviews.sqlite')
c = conn.cursor()

def get_posts():
    c.execute("SELECT * FROM review_db")
    print(c.fetchall())
    
get_posts()
conn.close()

[('I Love this movie', 1, '2017-12-24 19:56:50'), ('I disliked this movie', 0, '2017-12-24 19:56:50'), ('this was a terrible movie', 0, '2017-12-26 22:38:36'), ('this movie was lit fam', 1, '2017-12-26 22:40:06'), ('this movie was lit fam', 1, '2017-12-26 22:40:14'), ('this movie was lit fam', 1, '2017-12-26 22:40:22'), ('this movie was lit fam', 1, '2017-12-26 22:40:31'), ('this movie was lit fam', 1, '2017-12-26 22:41:20'), ('this movie was lit fam', 1, '2017-12-26 22:41:28'), ('this was a fantastic movie', 1, '2017-12-27 20:18:50'), ('this movie was the tits', 1, '2017-12-27 20:29:10')]


### Try web app!

In [30]:
%%bash
python3 app.py

Process is terminated.


http://127.0.0.1:5000/