# Context

Build a machine learning model to determine the variety of the wine being reviewed based on the review text. Create an API to deploy that model.

## Data

We will be using the wine magazine dataset at https://www.kaggle.com/zynicide/wine-reviews which is provided by Kaggle user zackthoutt.

winemag-data-130k-v2.csv contains 10 columns and 130k rows of wine reviews.

## Acknowledgements

The data was scraped from WineEnthusiast during the week of November 22nd, 2017.

## Import Necessary Libraries

In [1]:
import numpy as np
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
import pandas as pd
from collections import Counter
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfTransformer

## Read Data

In [2]:

wine = pd.read_csv('../Data/winemag-data-130k-v2.csv',index_col=0)


FileNotFoundError: [Errno 2] File b'../Data/winemag-data-130k-v2.csv' does not exist: b'../Data/winemag-data-130k-v2.csv'

## Explore Data

Perform EDA on the dataset.

In [None]:
wine.head()

## Find Top 10 Variety

Find Top 10 Variety and build a dictionary.

    {
        'Pinot Noir': 0,
        'Chardonnay': 1,
        'Cabernet Sauvignon': 2,
        'Red Blend': 3,
        'Bordeaux-style Red Blend': 4,
        'Riesling': 5,
        'Sauvignon Blanc': 6,
        'Syrah': 7
        'Rosé': 8,
        'Merlot': 9
    }

In [None]:
wine['variety'].value_counts()[:10]

## Create a dataframe with only top 10 variety

There are 130K rows 707 variety wine. Please select rows which contain only top 10 variety.

In [None]:

counter = Counter(wine['variety'].tolist())
varieties = {i[0]: idx for idx, i in enumerate(counter.most_common(10))}
varieties

In [None]:
top10_wine = wine[wine['variety'].map(lambda x: x in varieties)].reset_index(drop=True)
top10_wine

## Create a list of description

Assign description of all rows in a list.

In [None]:
descriptions = top10_wine['description'].tolist()
descriptions

## Create target variable which is `variety`

Create an array which will contain variety number as target variable. Use dictionary we have created earlier.

In [8]:
variety_no = [varieties[i] for i in top10_wine['variety'].tolist()]
variety_no = np.array(variety_no)

#wine['variety_no'] = variety_no

variety_no

array([5, 0, 2, ..., 2, 5, 0])

## Count Vectorizer

Create a count vectorizer for list of description.

In [9]:
count_vect = CountVectorizer()
x_train_counts = count_vect.fit_transform(descriptions)

## Tfidf Transformer

Transform CountVectorizer to TfidfTransformer

In [10]:
tfidf_transformer = TfidfTransformer()
x_train_tfidf = tfidf_transformer.fit_transform(x_train_counts)

## Split Tfidf to train and test

In [11]:
train_x, test_x, train_y, test_y = train_test_split(x_train_tfidf, variety_no, test_size=0.3)

## Naive Bayes

Build a naive bayes model and find score.

In [12]:
clf = MultinomialNB().fit(train_x, train_y)
NB_score = clf.score(test_x, test_y)
print("Accuracy: %.2f%%" % ((NB_score*100)))

Accuracy: 63.67%


## Save the trained model object as a pickle file (serialization)

### Import Necessary Libraries

In [14]:
import pickle

### Save Trained Model and Necessary Objects As A Pickle FIle

Save trained model and all necessary models to prepare data to predict wine.

In [15]:
with open('count_vect.pkl', 'wb') as f:
    pickle.dump(count_vect, f)

In [16]:
with open('tfidf_transformer.pkl', 'wb') as f:
    pickle.dump(tfidf_transformer, f)

In [17]:
with open('clf.pkl', 'wb') as f:
    pickle.dump(clf, f)

In [18]:
with open('top10_wine.pkl', 'wb') as f:
    pickle.dump(top10_wine, f)

## Install Flask

Create a flask environment that will have an API endpoint which would encapsulate our trained model and enable it to receive inputs (features) through GET requests over HTTP/HTTPS and then return the output after de-serializing the earlier serialized model.

In [19]:
import flask

### Create a Folder For Your Project

Create a folder 'Wine Predictor'

### Create a Subfolder `templates` in Your Project Folder

All necessary html files will be placed here.

Create a html file to input description and submit button to interact with api.

```
<html>
  <head>
    <title>Predict Type of Wine</title>
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <link href="static/bootstrap.min.css" rel="stylesheet" media="screen">
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.1.1/jquery.min.js"></script>
  </head>
  <body>
    <div class="container">
      <h1>Wine Predictor</h2>
    </div>
    <div class="col-lg-12">
        <div class="row">
            <div class="col-lg-8">
                <textarea id='word' rows="15" cols="100">
                </textarea>
            </div>
            <div class="col-lg-4">
                <div id='wordResult'></div>
            </div>
        </div>
        <div class="row">
            <div class="col-offset-lg4 col-lg-4">
                <button type='button' id ='retrieve'>Submit</button>
            </div>
        </div>
    </div>
  </body>
</html>

<script>
    $(document).ready(function() {
        $('#retrieve').click(function() {
            var helloName = $('#word').val();

            $.ajax({
                url: "/sayHello",
                type: "POST",
                data: JSON.stringify({ helloName: helloName }),
                contentType: "application/json",
                success: function(response) {
                    $("#wordResult").html(response.html);
                },
                error: function(response) {
                    alert(response);
                }
            });
        });
    });
  </script>
```

### Create an Hello World `API` First!

Let's start with a simple hello world API.

Create app.py in your project folder.

**1. Import necessary libraries.**

```
from flask import Flask, render_template, jsonify, request
from sklearn.externals import joblib
app = Flask(__name__)
```

**2. Define a function to render index.html**

```
@app.route("/")
def homepage():
    return render_template("index.html")
```

**3. Define an POST method called 'sayHello'**

This method will receive JSON from web. And return greetings by concatenating 'Hello' + `JSON` 

``` 
# Receive JSON from client, send response from server to client
@app.route('/sayHello', methods=['POST'])
def sayHello():
    json_ = request.get_json()
    return jsonify({'html': getHello(json_['helloName'])})  
    
def getHello(json_):
    hellostring = "Hello, ", json_, "!"
    return hellostring
```

**4. Define a function to run the App**
```
if __name__ == "__main__":
    app.run(debug=True)
```

## Test The Flask App 

Using following command fire up the site in local server http://127.0.0.1:5000/

```
python app.py
```

## Create an API to Predict Wine

As we have a functional API now. Let's create another API called `getPrediction` to predict wine from wine description.

1. Load all pickle files to tranform and predict
2. Using count vectorizer transform JSON string.
3. Using TfidfTransformer transform count_vectorizer
4. Using MultinomialNB predict wine
5. Using dictionary find the wine type
6. Return Type of Wine

In [None]:
def getPrediction(json_):

    input_str = json_

    print('Transforming to Count Vectorizer...')
    count_vect = joblib.load("count_vect.pkl") # Load "count_vect.pkl"
    count_vect_input = count_vect.transform([input_str])

    print('Transforming to TFIDF...')
    tfidf_transformer = joblib.load("tfidf_transformer.pkl") # Load "tfidf.pkl"
    transform_input = tfidf_transformer.transform(count_vect_input)

    print('Predict From Model...')
    clf = joblib.load("clf.pkl") # Load "clf.pickle"
    predict_wine = clf.predict(transform_input)

    print('Predicted Value', predict_wine)

    print('Load Top 10 Varieties...')
    top10_wine = joblib.load("top10_wine.pkl") # Load "top10_wine.pickle"

    lVlaue = predict_wine[0]
    lKey = [key for key, value in top10_wine.items() if value == lVlaue][0]

    return lKey