# UNSPSC Reverese Lookup

Lets go through the beginning of the lifecycle of an "intelligent" system with a classification model serving a webapplication in real time.

The purpose of this notebook is to show how long you can come with "off the shelf" techniques when classifying into the [United Nations Standard Products and Services Code®](https://www.unspsc.org/) UNSPSC from here.



## Data preparation

In [1]:
# Download data
import wget
import os

data_folder = './data'
os.makedirs(data_folder, exist_ok=True)

url = 'https://www.unspsc.dk/media/1054/unspsc-190501.xlsx'
file = './data/unspsc-190501.xlsx'
wget.download(url, file)

100% [..........................................................................] 5325615 / 5325615

'./data/unspsc-190501 (1).xlsx'

In [20]:
# Read .xlsx file
import pandas as pd
file = './data/unspsc-190501.xlsx'

unspsc = pd.read_excel(file, indx=None)
unspsc.head()

Unnamed: 0,Code,Title,Title [da],Definition
0,10000000,Live Plant and Animal Material and Accessories...,Levende planter og animalsk materiale og tilbe...,"This segment includes live, wild and domestica..."
1,10100000,Live animals,Levende dyr,
2,10101500,Livestock,Husdyr,
3,10101501,Cats,Katte,
4,10101502,Dogs,Hunde,


In [21]:
# Add hierarchy c1 to c4 targets
import numpy as np
unspsc["c1"] = unspsc["Code"].astype(str).str.slice(0,2)
unspsc["c2"] = unspsc["Code"].astype(str).str.slice(0,4)
unspsc.loc[unspsc.c2.str.slice(2,4) == '00', 'c2'] = np.nan
unspsc["c3"] = unspsc["Code"].astype(str).str.slice(0,6)
unspsc.loc[unspsc.c3.str.slice(4,6) == '00', 'c3'] = np.nan
unspsc["c4"] = unspsc["Code"].astype(str).str.slice(0,8)
unspsc.loc[unspsc.c4.str.slice(6,8) == '00', 'c4'] = np.nan
unspsc.head()

Unnamed: 0,Code,Title,Title [da],Definition,c1,c2,c3,c4
0,10000000,Live Plant and Animal Material and Accessories...,Levende planter og animalsk materiale og tilbe...,"This segment includes live, wild and domestica...",10,,,
1,10100000,Live animals,Levende dyr,,10,1010.0,,
2,10101500,Livestock,Husdyr,,10,1010.0,101015.0,
3,10101501,Cats,Katte,,10,1010.0,101015.0,10101501.0
4,10101502,Dogs,Hunde,,10,1010.0,101015.0,10101502.0


### Download word embedding
Here we will use a pretrained word embedding.

In [3]:
import wget
url = 'https://loar.kb.dk/bitstream/handle/1902/329/danish_newspapers_1880To2013.txt?sequence=4&isAllowed=y'
#url = 'http://vectors.nlpl.eu/repository/11/0.zip'
file = 'danish_newspapers_1880To2013.txt'
#file = 'English.zip'
wget.download(url, file) 

100% [....................................................................] 6869762980 / 6869762980

'danish_newspapers_1880To2013.txt'

Load pretrained word_to_vec embedding weights

In [4]:
n=0
with open('danish_newspapers_1880To2013.txt', 'r', encoding="utf8", errors="ignore") as f:
    words = set()
    word_to_vec_map = {}
    for line in f:
        if n > 10:
            break
        print(line)
        print("################")
        n=n+1

2404836 300

################
</s> 0.001334 0.001473 -0.001277 -0.001093 0.000456 0.001007 0.000314 0.000070 -0.001201 0.000739 -0.001452 0.000417 -0.000250 -0.000319 -0.001105 -0.000627 0.000860 0.001008 0.000990 0.000532 0.000515 -0.001268 -0.001365 0.001657 0.001267 0.001030 -0.000201 0.001339 -0.000165 0.000245 -0.000050 -0.000994 0.000437 -0.000446 -0.001275 0.001585 0.001460 -0.000365 -0.000075 0.000170 -0.001213 -0.001336 0.001518 0.000021 -0.000861 -0.001014 -0.001025 0.000566 0.000067 0.000444 -0.001405 -0.001269 -0.000043 0.000381 0.000850 -0.001057 0.001360 0.000309 0.000373 -0.000203 0.000995 -0.000763 -0.000699 0.000719 -0.000251 0.000344 0.000602 -0.001363 -0.000661 0.000971 0.001411 0.001311 -0.001016 -0.000703 -0.000303 0.000667 -0.001263 0.000999 0.000929 -0.000533 -0.000517 -0.000746 0.001410 0.001304 -0.000393 0.001405 0.001607 0.000605 0.001661 -0.001037 -0.000511 -0.000702 -0.000969 0.000938 0.000526 0.000142 -0.000731 0.000508 0.001020 0.000065 0.000760 -0.001651 

In [5]:
def read_vecs(file):
    with open(file, 'r', encoding="utf8", errors="ignore") as f:
        words = set()
        word_to_vec_map = {}
        for line in f:
            line = line.strip().split()
            curr_word = line[0]
            words.add(curr_word)
            word_to_vec_map[curr_word] = np.array(line[1:], dtype=np.float64)
        
        i = 1
        words_to_index = {}
        index_to_words = {}
        for w in sorted(words):
            words_to_index[w] = i
            index_to_words[i] = w
            i = i + 1
    return words_to_index, index_to_words, word_to_vec_map

In [7]:
import numpy as np
word_to_index, index_to_word, word_to_vec_map = read_vecs('danish_newspapers_1880To2013.txt')#English/model.txt')#'


- `word_to_index`: dictionary mapping from words to their indices in the vocabulary
- `index_to_word`: dictionary mapping from indices to their corresponding words in the vocabulary
- `word_to_vec_map`: dictionary mapping words to their vector representation.

Run the following cell to check if it works.

In [None]:
word = "hund"
index = 220873
print("the index of", word, "in the vocabulary is", word_to_index[word])

print("the", str(index) + "th word in the vocabulary is", index_to_word[index])

In [15]:
word_to_vec_map["s"]

array([-2.33980e-01,  1.78998e-01,  1.67700e-02,  7.65800e-02,
        1.87588e-01,  3.61010e-02,  4.47900e-02,  1.24178e-01,
       -5.63750e-02, -2.34177e-01,  6.30470e-02, -1.47584e-01,
       -1.46727e-01,  3.48770e-02,  1.88587e-01, -1.38400e-03,
       -5.45660e-02, -1.38300e-02,  5.26770e-02, -1.31301e-01,
        9.13810e-02,  1.55886e-01,  2.19124e-01,  1.84022e-01,
       -2.04227e-01,  1.26770e-01,  4.03200e-03, -4.20050e-02,
        1.90915e-01, -1.25860e-02, -5.69700e-03, -1.88410e-01,
       -1.55600e-02,  1.75784e-01,  4.22050e-02, -8.14840e-02,
        2.60110e-02, -1.49508e-01,  2.90240e-02,  1.06910e-01,
       -4.51270e-02, -6.51920e-02,  2.65647e-01,  1.29000e-04,
       -1.71128e-01,  1.76186e-01,  3.90780e-02, -1.71357e-01,
       -1.94162e-01, -1.28015e-01,  1.88456e-01, -2.69564e-01,
       -2.61286e-01,  1.29039e-01, -1.56106e-01,  2.49521e-01,
        2.15385e-01,  1.58470e-02, -5.67130e-02, -1.15955e-01,
        2.08890e-02, -1.55923e-01, -1.05002e-01, -2.009

Implement a function that averages the word in the unspsc description

In [16]:
import numpy as np
def sentence_to_avg(sentence, word_to_vec_map):
    """
    Converts a sentence (string) into a list of words (strings). Extracts the GloVe representation of each word
    and averages its value into a single vector encoding the meaning of the sentence.
    
    Arguments:
    sentence -- string, one training example from X
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
    
    Returns:
    avg -- average vector encoding information about the sentence, numpy-array of shape (50,)
    """
    
    ### START CODE HERE ###
    # Step 1: Split sentence into list of lower case words (≈ 1 line)
    words = sentence.lower().split()

    # Initialize the average word vector, should have the same shape as your word vectors.
    avg = np.zeros(300)
    
    # Step 2: average the word vectors. You can loop over the words in the list "words".
    for w in words:
        if w in word_to_vec_map:
            avg += word_to_vec_map[w]
    avg = avg/len(words)
    
    ### END CODE HERE ###
    
    return avg

Test the function

In [18]:
avg = sentence_to_avg("giraf spiser høns til dessert i rummet", word_to_vec_map)
print("avg = ", avg)

avg =  [-0.10909943  0.03165071 -0.05054214  0.24265729  0.07667843  0.04567086
  0.032499    0.16373214 -0.12535729 -0.02718314  0.03562271  0.006839
 -0.109758   -0.10198629  0.07005371 -0.29007186  0.17946171 -0.11337514
 -0.15653657 -0.13415443  0.09230343  0.18953771  0.04034757 -0.26147
 -0.04978414 -0.00067686 -0.04010357  0.05363629  0.13913357 -0.099602
 -0.11634071 -0.21359586 -0.08397586  0.35702    -0.05198343 -0.00658657
  0.08509529 -0.06346986 -0.02447914 -0.13594871  0.02642329  0.142592
  0.15051114  0.089526   -0.17066    -0.059775    0.00261671 -0.18522314
 -0.03358343  0.01038971 -0.00591686 -0.00040429 -0.17490286 -0.11516443
 -0.11476129 -0.02477414 -0.03450343 -0.068796    0.15356129 -0.103149
 -0.20955171  0.15719914 -0.14592771 -0.22004629 -0.13400257  0.24132357
  0.05008129 -0.10735857  0.081912   -0.184674    0.13634171 -0.126649
 -0.056414   -0.22750986  0.05508114  0.25116129 -0.13360929 -0.15988757
  0.06653571 -0.05098086 -0.063329    0.11386514  0.17341

In [23]:
unspsc['avg_vec'] = unspsc.apply(lambda x: sentence_to_avg(x['Title [da]'], word_to_vec_map), axis = 1)

In [24]:
unspsc.head()

Unnamed: 0,Code,Title,Title [da],Definition,c1,c2,c3,c4,avg_vec
0,10000000,Live Plant and Animal Material and Accessories...,Levende planter og animalsk materiale og tilbe...,"This segment includes live, wild and domestica...",10,,,,"[-0.17536444444444446, 0.11350777777777779, -0..."
1,10100000,Live animals,Levende dyr,,10,1010.0,,,"[-0.1717895, -0.0016765, -0.2357295, 0.2113195..."
2,10101500,Livestock,Husdyr,,10,1010.0,101015.0,,"[-0.313744, 0.048229, -0.106783, 0.028268, -0...."
3,10101501,Cats,Katte,,10,1010.0,101015.0,10101501.0,"[-0.452873, 0.053838, 0.037965, 0.094089, -0.2..."
4,10101502,Dogs,Hunde,,10,1010.0,101015.0,10101502.0,"[-0.481852, -0.06862, -0.2121, 0.124607, -0.06..."


Calculate Euclidian distances between the sentence and all descriptions

In [25]:
from scipy.spatial import distance
a = np.array((1, 2, 3))
b = np.array((4, 5, 6))

# method 1
dist_1 = np.linalg.norm(a-b)

# method 2
dist_2 = distance.euclidean(a, b)

print('method 1', dist_1)
print('method 2', dist_2)

method 1 5.196152422706632
method 2 5.196152422706632


### Aggregate up through the levels

In [26]:
from scipy import stats

def nnearest(unspsc_df, sentence, word_to_vec_map, agg='min', n=10, level='c4'):
    """
    Finds the n nearest classes based on euclidian distance between 
    the sentence and each description in the danish unspsc 2019. 
    If level is set lower than c4, the function groups by the selected level
    and finds the n nearst of the selected level based on the aggregation function.
    The aggregation function can be mean, hmean and min.
    When mean is selected, the function averages all distances for each group/level
    and returns the n closest averages. The same applies to the harmonic mean.
    When min is selected, the function returns the n groups whith the single closest
    description.
    
    Arguments:
    unspsc_df -- dataframe
    sentence -- string
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 300-dimensional vector representation
    agg -- string
    n -- int
    level -- string
    
    
    Returns:
    avg -- dataframe, n nearest
    """
    
    avg_vec = sentence_to_avg(sentence, word_to_vec_map)
    distances = unspsc_df.apply(lambda x: np.linalg.norm(x['avg_vec']-avg_vec), axis = 1)
    unspsc_df['Distance'] = distances
    
    cn_index = None
    if agg=='mean':  # Min n by mean
        cn_index = unspsc_df['Distance'].groupby(unspsc_df[level]).mean().nsmallest(n)
    elif agg=='hmean':  # Min n by harmonic mean
        cn_index = unspsc_df['Distance'].groupby(unspsc_df[level]).apply(stats.hmean).nsmallest(n)
    elif agg=='min':
        cn_index = unspsc_df['Distance'].groupby(unspsc_df[level]).min().nsmallest(n)
    
    level_int = level[1]
    if int(level_int) < 4:
        deeper_level = ('c'+str(int(level_int)+1))        
        level_unspsc = unspsc_df.loc[unspsc_df[deeper_level].isna()]
    else:
        level_unspsc = unspsc_df
    
    df = level_unspsc.loc[level_unspsc[level].isin(cn_index.index)]
    
    return df.sort_values(by=['Distance'])

In [33]:
df = nnearest(unspsc, 'panodil', word_to_vec_map, agg='min', level='c1') #painkillers
df

Unnamed: 0,Code,Title,Title [da],Definition,c1,c2,c3,c4,avg_vec,Distance
28514,50000000,Food Beverage and Tobacco Products,"Mad-, drikke- og tobaksvarer",This segment includes human food and beverages...,50,,,,"[-0.14833525, -0.0408095, -0.008888, 0.122464,...",5.937093
0,10000000,Live Plant and Animal Material and Accessories...,Levende planter og animalsk materiale og tilbe...,"This segment includes live, wild and domestica...",10,,,,"[-0.17536444444444446, 0.11350777777777779, -0...",5.955304
22089,42000000,Medical Equipment and Accessories and Supplies,Medicinsk udstyr og tilbehør og forbrugsstoffer,"This segment includes the tools and machines, ...",42,,,,"[-0.3227785, 0.09025966666666667, -0.235348666...",5.959161
71577,53000000,Apparel and Luggage and Personal Care Products,Beklædning og bagage og produkter til personli...,"This segment includes non-industrial clothing,...",53,,,,"[-0.310435625, 0.126715625, -0.091275125, 0.26...",5.979342
20011,41000000,Laboratory and Measuring and Observing and Tes...,Laboratorie- og måleudstyr til test og observa...,"This segment includes the machines, equipment ...",41,,,,"[-0.22140614285714286, 0.1331292857142857, -0....",5.989371
14999,31000000,Manufacturing Components and Supplies,Produktionskomponenter og forbrugsstoffer,This segment includes components and supplies ...,31,,,,"[-0.053313333333333345, 0.15585666666666667, -...",6.12258
8656,12000000,Chemicals including Bio Chemicals and Gas Mate...,"Kemikalier, herunder bio-kemikalier og gas-mat...",This segment includes inorganic and organic ch...,12,,,,"[-0.045668200000000006, 0.10747060000000001, -...",6.159912
10384,21000000,Farming and Fishing and Forestry and Wildlife ...,"Landbrug og fiskeri og skovbrug og dyreliv, ma...",This segment includes machines and accessories...,21,,,,"[-0.1668833, -0.0187565, -0.1032236, 0.2124722...",6.165094
62007,51000000,Drugs and Pharmaceutical Products,Drugs and Pharmaceutical Products,This segment includes natural or synthetic mat...,51,,,,"[-0.5822405, 0.25387325, -0.07832125, 0.258543...",6.471451
77112,85000000,Healthcare Services,Sundhedsydelser,This segment includes services associated with...,85,,,,"[-0.549421, 0.372684, -0.210456, 0.482928, 0.5...",6.799257


In [36]:
import json
print(json.dumps(json.loads(df[['Code', 'Title', 'Definition']].to_json()), indent = 4))

{
    "Code": {
        "11930": 25101713,
        "82443": 92101902,
        "83100": 95121706,
        "75918": 78111815,
        "71680": 53102709,
        "75892": 78111506,
        "23130": 42171602,
        "23144": 42171616,
        "71683": 53102712,
        "77115": 85101501
    },
    "Title": {
        "11930": "Armored ambulance",
        "82443": "Ambulance services",
        "83100": "Ambulance station",
        "75918": "Medical evacuation by ambulance",
        "71680": "Ambulance officers uniforms",
        "75892": "Medical evacuation by air ambulance",
        "23130": "Mobile medical services ambulance cots",
        "23144": "Mobile medical services ambulance cot accessories",
        "71683": "Paramedic uniforms",
        "77115": "Emergency or surgical hospital services"
    },
    "Definition": {
        "11930": "Mine resistant, ambush protected, armored ambulance.",
        "82443": null,
        "83100": "A building where ambulances are stored and maintained"

## Deploy first model
Now we are ready to deploy the model
1. Save the model
2. Create web api for serving
3. Containerize service
4. Deploy

5. Create frontend
6. Deploy frontend
7. Consume API

### 1 Save the model
The model is the averaged vectors for each description and a distance function applied to theese

In [37]:
import os
api_folder = '.'
os.makedirs(api_folder, exist_ok=True)

app_folder = "./app"
os.makedirs(app_folder, exist_ok=True)

In [38]:
import pickle
# Store data (serialize)
with open((app_folder+'/unspsc.pickle'), 'wb') as handle:
    pickle.dump(unspsc, handle, protocol=pickle.HIGHEST_PROTOCOL)
    
with open((app_folder+'/word_to_vec_map_eng.pickle'), 'wb') as handle:
    pickle.dump(word_to_vec_map, handle, protocol=pickle.HIGHEST_PROTOCOL)

### 2 Create REST API

In [39]:
%%writefile $app_folder/__init__.py
 

Overwriting ./app/__init__.py


In [40]:
%%writefile $app_folder/app.py
import os
import sys                       
import time
import warnings
import pandas as pd
import json
from flask import Flask
from flask import request
from flask import jsonify
from flask_restplus import Resource,Api,fields
import numpy as np
import pickle
from scipy import stats

api = Api()

app = Flask(__name__)
api.init_app(app)

def sentence_to_avg(sentence, word_to_vec_map):
    """
    Converts a sentence (string) into a list of words (strings). Extracts the representation of each word
    and averages its value into a single vector encoding the meaning of the sentence.
    
    Arguments:
    sentence -- string
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 300-dimensional vector representation
    
    Returns:
    avg -- average vector encoding information about the sentence, numpy-array of shape (300,)
    """
    words = sentence.lower().split()
    # Initialize the average word vector, should have the same shape as your word vectors.
    avg = np.zeros(300)    
    # Step 2: average the word vectors. You can loop over the words in the list "words".
    for w in words:
        if w in word_to_vec_map:
            avg += word_to_vec_map[w]
    avg = avg/len(words)
    return avg

def nnearest(unspsc_df, sentence, word_to_vec_map, agg='min', n=10, level='c4'):
    """
    Finds the n nearest classes based on euclidian distance between 
    the sentence and each description in the danish unspsc 2019. 
    If level is set lower than c4, the function groups by the selected level
    and finds the n nearst of the selected level based on the aggregation function.
    The aggregation function can be mean, hmean and min.
    When mean is selected, the function averages all distances for each group/level
    and returns the n closest averages. The same applies to the harmonic mean.
    When min is selected, the function returns the n groups whith the single closest
    description.
    
    Arguments:
    unspsc_df -- dataframe
    sentence -- string
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 300-dimensional vector representation
    agg -- string
    n -- int
    level -- string
    
    
    Returns:
    avg -- dataframe, n nearest
    """
    
    avg_vec = sentence_to_avg(sentence, word_to_vec_map)
    distances = unspsc_df.apply(lambda x: np.linalg.norm(x['avg_vec']-avg_vec), axis = 1)
    unspsc_df['Distance'] = distances
    
    cn_index = None
    if agg=='mean':  # Min n by mean
        cn_index = unspsc_df['Distance'].groupby(unspsc_df[level]).mean().nsmallest(n)
    elif agg=='hmean':  # Min n by harmonic mean
        cn_index = unspsc_df['Distance'].groupby(unspsc_df[level]).apply(stats.hmean).nsmallest(n)
    elif agg=='min':
        cn_index = unspsc_df['Distance'].groupby(unspsc_df[level]).min().nsmallest(n)
    
    level_int = level[1]
    if int(level_int) < 4:
        deeper_level = ('c'+str(int(level_int)+1))        
        level_unspsc = unspsc_df.loc[unspsc_df[deeper_level].isna()]
    else:
        level_unspsc = unspsc_df
    
    df = level_unspsc.loc[level_unspsc[level].isin(cn_index.index)]
    
    return df.sort_values(by=['Distance'])

input_sentence = api.model("input_sentence",{
    "Sentence": fields.String(description="Sentence", required=True),
    "Level": fields.String(description="c1, c2, c3, c4 for level", required=True),
    "Aggregation": fields.String(description="mean, hmean, min for aggregation function used in grouping b level", required=True)
})

# Load data (deserialize)
with open('unspsc.pickle', 'rb') as handle:
    unspsc = pickle.load(handle)

with open('word_to_vec_map_eng.pickle', 'rb') as handle:
    word_to_vec_map = pickle.load(handle)

    
@api.route("/test", endpoint='Test that Service is running')
class Test(Resource):
    def get(self):
        return jsonify('Hello from 8i')
    
@api.route("/reverselookup", endpoint='reverselookup')
class reverselookup(Resource):
    @api.expect(input_sentence)
    def post(self):
              
        sentence = request.json["Sentence"]
        level = request.json["Level"]
        agg = request.json["Aggregation"]
        df = nnearest(unspsc, sentence, word_to_vec_map, agg=agg, level=level)
        
        json_result = json.dumps(json.loads(df[['Code', 'Title [da]', 'Definition']].to_json()), indent = 4)
        
        return jsonify(json_result)

if __name__ == '__main__':
    app.run(host='0.0.0.0', debug=False, use_reloader=False, port = '80')

Overwriting ./app/app.py


### 3 Containerize service

In [None]:
!az account set --subscription 26402166-7d8e-426e-af0d-7c1321369cc0
!az configure --defaults acr=theeighthi
!az configure --defaults group=theeighthi_acr
!az acr list -o table

In [41]:
%%writefile $app_folder/Dockerfile
FROM python:3.6-stretch
MAINTAINER Mikkel Buchvardt "mbb@kmd.com"

RUN apt-get update
RUN apt-get install -y \
    apt-utils \
    gettext-base \
    supervisor \
    apt-transport-https \
    locales

COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
# Run flask app
CMD ["sh", "-c", "python3.6 app.py"]

# Ports opened: Flask and SSH
EXPOSE 80

Overwriting ./app/Dockerfile


In [42]:
%%writefile $app_folder/requirements.txt
Flask==1.0.2
pandas==0.24.2
flask-restplus
flask_restplus
Flask-Cors
scikit-learn
requests
emoji
numpy
scipy

Overwriting ./app/requirements.txt


In [None]:
!az acr build -r theeighthi -t reverselookupapieng:1 ./app

### 1.4.4 Deploy

In [None]:
!az container create -n reverselookupapi --image theeighthi.azurecr.io/reverselookupapi:1 -g theeighthi_acr --ip-address public --dns-name-label reverselookupapi --port 80 --registry-username theeighthi --registry-password 5WbviQ2AYMtGOtMMNUKqOn7p1g+KmpTO

In [None]:
# Test connection
import requests
sentence = "tomat"


payload ={"Sentence": sentence,
                "Level": "c1",
                "Aggregation": "min"}


url = 'http://reverselookupapi.westeurope.azurecontainer.io/reverselookup'
response = requests.post(url, json = payload)

In [None]:
print(response.json())

In [None]:
df = pd.read_json(response.json())

In [None]:
df

In [None]:
import re
import numpy as np
np.int(re.sub('["\n]', '', response.text))

### 1.4.5 Create Frontend
Visual Studio Code

### 1.4.6 Containerize & Deploy frontend
Visual Studio Code

### 1.4.7 Consume Api
Visual Studio Code



<center>
<img src="images/image_1.png" style="width:900px;height:300px;">
<caption><center> **Figure 2**: Baseline model (Emojifier-V1).</center></caption>
</center>

The input of the model is a string corresponding to a sentence (e.g. "I love you). In the code, the output will be a probability vector of shape (1,5), that you then pass in an argmax layer to extract the index of the most likely emoji output.

Implement the average function

#### Model

After using the pretrained embedding to reepresent the words and `sentence_to_avg()` the network is unchanged:

$$ z^{(i)} = W . avg^{(i)} + b$$
$$ a^{(i)} = softmax(z^{(i)})$$
$$ \mathcal{L}^{(i)} = - \sum_{k = 0}^{n_y - 1} Yoh^{(i)}_k * log(a^{(i)}_k)$$


In [None]:
def model(X, Y, word_to_vec_map, learning_rate = 0.01, num_iterations = 400):
    """
    Model to train word vector representations in numpy.
    
    Arguments:
    X -- input data, numpy array of sentences as strings, of shape (m, 1)
    Y -- labels, numpy array of integers between 0 and 7, numpy-array of shape (m, 1)
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
    learning_rate -- learning_rate for the stochastic gradient descent algorithm
    num_iterations -- number of iterations
    
    Returns:
    pred -- vector of predictions, numpy-array of shape (m, 1)
    W -- weight matrix of the softmax layer, of shape (n_y, n_h)
    b -- bias of the softmax layer, of shape (n_y,)
    """
    
    np.random.seed(1)

    # Define number of training examples
    m = Y.shape[0]                          # number of training examples
    n_y = 5                                 # number of classes  
    n_h = 50                                # dimensions of the GloVe vectors 
    
    # Initialize parameters using Xavier initialization
    W = np.random.randn(n_y, n_h) / np.sqrt(n_h)
    b = np.zeros((n_y,))
    
    # Convert Y to Y_onehot with n_y classes
    Y_oh = convert_to_one_hot(Y, C = n_y) 
    
    # Optimization loop
    for t in range(num_iterations):                       # Loop over the number of iterations
        for i in range(m):                                # Loop over the training examples
            
            ### START CODE HERE ### (≈ 4 lines of code)
            # Average the word vectors of the words from the i'th training example
            avg = sentence_to_avg(X[i], word_to_vec_map)

            # Forward propagate the avg through the softmax layer
            z = np.dot(W, avg) + b
            a = softmax(z)

            # Compute cost using the i'th training label's one hot representation and "A" (the output of the softmax)
            cost = sum(Y_oh[i] * np.log(a))*-1
            ### END CODE HERE ###
            
            # Compute gradients 
            dz = a - Y_oh[i]
            dW = np.dot(dz.reshape(n_y,1), avg.reshape(1, n_h))
            db = dz

            # Update parameters with Stochastic Gradient Descent
            W = W - learning_rate * dW
            b = b - learning_rate * db
        
        if t % 100 == 0:
            print("Epoch: " + str(t) + " --- cost = " + str(cost))
            pred = predict(X, Y, W, b, word_to_vec_map)

    return pred, W, b

We can now train the 2.nd model and learn the softmax parameters (W,b). 

In [None]:
pred, W, b = model(X_train, Y_train, word_to_vec_map)

High accuracy on the training set. Lets now see how it does on the test set. 

In [None]:
print("Training set:")
pred_train = predict(X_train, Y_train, W, b, word_to_vec_map)
print('Test set:')
pred_test = predict(X_test, Y_test, W, b, word_to_vec_map)

Random guessing would have had 20% accuracy given that there are 5 classes. This is pretty good performance after training on only 127 examples. 

In the training set, the algorithm saw the sentence "*I love you*" with the label ❤️.
Now we can check that however the word "adore" does not appear in the training set the model generalizes ok.



In [None]:
X_my_sentences = np.array(["i adore you", "i love you", "funny lol", "lets play with a ball", "food is ready", "not feeling happy"])
Y_my_labels = np.array([[0], [0], [2], [1], [4],[3]])

pred = predict(X_my_sentences, Y_my_labels , W, b, word_to_vec_map)
print_predictions(X_my_sentences, pred)

Note that it doesn't get "not feeling happy" correct. This algorithm ignores word ordering, so is not good at understanding phrases like "not happy." 

Printing the confusion matrix can also help understand which classes are more difficult for your model. A confusion matrix shows how often an example whose label is one class ("actual" class) is mislabeled by the algorithm with a different class ("predicted" class). 

In [None]:
#print(Y_test.shape)
import pandas as pd
print('           '+ label_to_emoji(0)+ '   ' + label_to_emoji(1) + '   ' +  label_to_emoji(2)+ '   ' + label_to_emoji(3)+'   ' + label_to_emoji(4))
print(pd.crosstab(Y_test, pred_test.reshape(56,), rownames=['Actual'], colnames=['Predicted'], margins=True))
plot_confusion_matrix(Y_test, pred_test)

### CI/CD
Now it is time to replace the 2.nd model in the production system


In [None]:
#Save model v2
import numpy as np
np.savetxt("api/W2.csv", W, delimiter =",")
np.savetxt("api/b2.csv", b, delimiter =",")
import pickle
with open('api/vectorizer2.pkl', 'wb') as fin:
    pickle.dump(vectorizer, fin)

In [None]:
app_folder = "./api/app2"
os.makedirs(app_folder, exist_ok=True)

In [None]:
def predict(X, W, b, word_to_vec_map):
            
        words = X.lower().split()
        avg = np.zeros((50,))
        for w in words:
            avg += word_to_vec_map[w]
        avg = avg/len(words)
        # Forward propagation
        Z = np.dot(W, avg) + b
        A = softmax(Z)
        pred = np.argmax(A)
        return pred

In [None]:
predict("ball", W, b, word_to_vec_map)

In [None]:
%%writefile $app_folder/__init__.py
import os
import sys                       
import time
import warnings
import pandas as pd
import json
from flask import Flask
from flask import request
from flask import jsonify
from flask_restplus import Resource,Api,fields
import numpy as np
import pickle
from sklearn.feature_extraction.text import CountVectorizer

api = Api()

app = Flask(__name__)
api.init_app(app)

input_sentence = api.model("input_sentence",{
    "Sentence": fields.String(description="Sentence", required=True)
})



b = np.loadtxt("b1.csv", delimiter = ",")
W = np.loadtxt("W1.csv", delimiter = ",")
vec = pickle.load(open( "vectorizer1.pkl", "rb"))

def read_glove_vecs(glove_file):
    with open(glove_file, 'r', encoding="utf8", errors="ignore") as f:
        words = set()
        word_to_vec_map = {}
        for line in f:
            line = line.strip().split()
            curr_word = line[0]
            words.add(curr_word)
            word_to_vec_map[curr_word] = np.array(line[1:], dtype=np.float64)
        
        i = 1
        words_to_index = {}
        index_to_words = {}
        for w in sorted(words):
            words_to_index[w] = i
            index_to_words[i] = w
            i = i + 1
    return words_to_index, index_to_words, word_to_vec_map

b2 = np.loadtxt("b2.csv", delimiter = ",")
W2 = np.loadtxt("W2.csv", delimiter = ",")
word_to_index, index_to_word, word_to_vec_map = read_glove_vecs('glove.6B.50d.txt')

 
def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()
        
def pred(X):
    #oh transform
    X_oh = vec.transform(X).toarray()
    # Forward propagation
    Z = np.dot(X_oh, np.transpose(W)) + b
    A = softmax(Z)
    pred = np.argmax(A)
    return pred

def predict(X):
            
        words = X.lower().split()
        avg = np.zeros((50,))
        for w in words:
            avg += word_to_vec_map[w]
        avg = avg/len(words)
        # Forward propagation
        Z = np.dot(W2, avg) + b2
        A = softmax(Z)
        pred = np.argmax(A)
        return pred
    
    
    
@api.route("/test", endpoint='Test that Service is running')
class Test(Resource):
    def get(self):
        return jsonify('Hello from KMD!')
    
@api.route("/emojify", endpoint='emojify')
class Emojify(Resource):
    @api.expect(input_sentence)
    def post(self):
        
      
        sentence = request.json["Sentence"]
        emoji = pred([sentence])
        
        return jsonify(str(emoji))
    
@api.route("/emojify2", endpoint='emojify2')
class Emojify2(Resource):
    @api.expect(input_sentence)
    def post(self):
        
      
        sentence = request.json["Sentence"]
        emoji = predict(sentence)
        
        return jsonify(str(emoji))

In [None]:
%%writefile $api_folder/emojifyapi2.py
from app2 import app

if __name__ == '__main__':
    app.run(host='0.0.0.0', debug=False, use_reloader=False, port = '80')

In [None]:
#CMD in Dockerfile to emojifyapi2.api
!az acr build -r letbetalacraikkekic -t emojifyapi:2 ./api

In [None]:

!az container create -n emojifyapi2 --image letbetalacraikkekic.azurecr.io/emojifyapi:2 -g LetBetaling-Container-Registry --ip-address public --dns-name-label mbbkeynoteapi2 --port 80 --registry-username letbetalacraikkekic --registry-password 3uaJ=9n47pIS7eD2x4zJ5wgYxrygPd9U

## 3 iteration: Using LSTMs in Keras: 

This model will be able to take word ordering into account.It will continue to use pre-trained word embeddings to represent words, but will feed them into an LSTM, whose job it is to predict the most appropriate emoji. 

In [None]:
import numpy as np
np.random.seed(0)
from keras.models import Model
from keras.layers import Dense, Input, Dropout, LSTM, Activation
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras.initializers import glorot_uniform
np.random.seed(1)

### 2.1 - Overview of the model


<img src="images/emojifier-v2.png" style="width:700px;height:400px;"> <br>
<caption><center> **Figure 3**: Emojifier-V2. A 2-layer LSTM sequence classifier. </center></caption>



### 2.2 Keras and mini-batching 

we want to train Keras using mini-batches. However, most deep learning frameworks require that all sequences in the same mini-batch have the same length. This is what allows vectorization to work: If you had a 3-word sentence and a 4-word sentence, then the computations needed for them are different (one takes 3 steps of an LSTM, one takes 4 steps) so it's just not possible to do them both at the same time.

The common solution to this is to use padding. Specifically, set a maximum sequence length, and pad all sequences to the same length. For example, of the maximum sequence length is 20, we could pad every sentence with "0"s so that each input sentence is of length 20. Thus, a sentence "i love you" would be represented as $(e_{i}, e_{love}, e_{you}, \vec{0}, \vec{0}, \ldots, \vec{0})$. In this example, any sentences longer than 20 words would have to be truncated. One simple way to choose the maximum sequence length is to just pick the length of the longest sentence in the training set. 


### 2.3 - The Embedding layer

In Keras, the embedding matrix is represented as a "layer", and maps positive integers (indices corresponding to words) into dense vectors of fixed size (the embedding vectors). It can be trained or initialized with a pretrained embedding. Because the training set is quite small, we will not update the word embeddings but will instead leave their values fixed.

The `Embedding()` layer takes an integer matrix of size (batch size, max input length) as input. This corresponds to sentences converted into lists of indices (integers), as shown in the figure below.

<img src="images/embedding1.png" style="width:700px;height:250px;">
<caption><center> **Figure 4**: Embedding layer. This example shows the propagation of two examples through the embedding layer. Both have been zero-padded to a length of `max_len=5`. The final dimension of the representation is  `(2,max_len,50)` because the word embeddings we are using are 50 dimensional. </center></caption>

The largest integer (i.e. word index) in the input should be no larger than the vocabulary size. The layer outputs an array of shape (batch size, max input length, dimension of word vectors).

The first step is to convert all your training sentences into lists of indices, and then zero-pad all these lists so that their length is the length of the longest sentence. 


In [None]:
def sentences_to_indices(X, word_to_index, max_len):
    """
    Converts an array of sentences (strings) into an array of indices corresponding to words in the sentences.
    The output shape should be such that it can be given to `Embedding()` (described in Figure 4). 
    
    Arguments:
    X -- array of sentences (strings), of shape (m, 1)
    word_to_index -- a dictionary containing the each word mapped to its index
    max_len -- maximum number of words in a sentence. You can assume every sentence in X is no longer than this. 
    
    Returns:
    X_indices -- array of indices corresponding to words in the sentences from X, of shape (m, max_len)
    """
    
    m = X.shape[0]                                   # number of training examples
    
    ### START CODE HERE ###
    # Initialize X_indices as a numpy matrix of zeros and the correct shape (≈ 1 line)
    X_indices = np.zeros((m, max_len))
    
    for i in range(m):                               # loop over training examples
        
        # Convert the ith training sentence in lower case and split is into words. You should get a list of words.
        sentence_words =X[i].lower().split()
        
        # Initialize j to 0
        j = 0
        
        # Loop over the words of sentence_words
        for w in sentence_words:
            # Set the (i,j)th entry of X_indices to the index of the correct word.
            X_indices[i, j] = word_to_index[sentence_words[j]]
            # Increment j to j + 1
            j = j + 1
            
    ### END CODE HERE ###
    
    return X_indices

Run the following cell to check what `sentences_to_indices()` does, and check your results.

In [None]:
X1 = np.array(["funny lol", "lets play baseball", "food is ready for you"])
X1_indices = sentences_to_indices(X1,word_to_index, max_len = 5)
print("X1 =", X1)
print("X1_indices =", X1_indices)

**Expected Output**:

<table>
    <tr>
        <td>
            **X1 =**
        </td>
        <td>
           ['funny lol' 'lets play football' 'food is ready for you']
        </td>
    </tr>
    <tr>
        <td>
            **X1_indices =**
        </td>
        <td>
           [[ 155345.  225122.       0.       0.       0.] <br>
            [ 220930.  286375.  151266.       0.       0.] <br>
            [ 151204.  192973.  302254.  151349.  394475.]]
        </td>
    </tr>
</table>

We build the `Embedding()` layer in Keras, using pre-trained word vectors. After this layer is built, you will pass the output of `sentences_to_indices()` to it as an input, and the `Embedding()` layer will return the word embeddings for a sentence. 

We need to carry out the following steps:
1. Initialize the embedding matrix as a numpy array of zeroes with the correct shape.
2. Fill in the embedding matrix with all the word embeddings extracted from `word_to_vec_map`.
3. Define Keras embedding layer. Use [Embedding()](https://keras.io/layers/embeddings/). Be sure to make this layer non-trainable, by setting `trainable = False` when calling `Embedding()`. If you were to set `trainable = True`, then it will allow the optimization algorithm to modify the values of the word embeddings. 
4. Set the embedding weights to be equal to the embedding matrix 

In [None]:
def pretrained_embedding_layer(word_to_vec_map, word_to_index):
    """
    Creates a Keras Embedding() layer and loads in pre-trained GloVe 50-dimensional vectors.
    
    Arguments:
    word_to_vec_map -- dictionary mapping words to their GloVe vector representation.
    word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)

    Returns:
    embedding_layer -- pretrained layer Keras instance
    """
    
    vocab_len = len(word_to_index) + 1                  # adding 1 to fit Keras embedding (requirement)
    emb_dim = word_to_vec_map["cucumber"].shape[0]      # define dimensionality of your GloVe word vectors (= 50)
    
    ### START CODE HERE ###
    # Initialize the embedding matrix as a numpy array of zeros of shape (vocab_len, dimensions of word vectors = emb_dim)
    emb_matrix = np.zeros((vocab_len, emb_dim))
    
    # Set each row "index" of the embedding matrix to be the word vector representation of the "index"th word of the vocabulary
    for word, index in word_to_index.items():
        emb_matrix[index, :] = word_to_vec_map[word]

    # Define Keras embedding layer with the correct output/input sizes, make it trainable. Use Embedding(...). Make sure to set trainable=False. 
    embedding_layer = Embedding(vocab_len, emb_dim, trainable = False)

    ### END CODE HERE ###

    # Build the embedding layer, it is required before setting the weights of the embedding layer. Do not modify the "None".
    embedding_layer.build((None,))
    
    # Set the weights of the embedding layer to the embedding matrix. Your layer is now pretrained.
    embedding_layer.set_weights([emb_matrix])
    
    return embedding_layer

In [None]:
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
print("weights[0][1][3] =", embedding_layer.get_weights()[0][1][3])

**Expected Output**:

<table>
    <tr>
        <td>
            **weights[0][1][3] =**
        </td>
        <td>
           -0.3403
        </td>
    </tr>
</table>

## 2.3 Building the Final model

<img src="images/emojifier-v2.png" style="width:700px;height:400px;"> <br>
<caption><center> **Figure 3**: Emojifier-v2. A 2-layer LSTM sequence classifier. </center></caption>

In [None]:
def Emojify_V2(input_shape, word_to_vec_map, word_to_index):
    """
    Function creating the Emojify-v2 model's graph.
    
    Arguments:
    input_shape -- shape of the input, usually (max_len,)
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
    word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)

    Returns:
    model -- a model instance in Keras
    """
    
    # Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).
    sentence_indices = Input(shape = input_shape, dtype = "int32")
    
    # Create the embedding layer pretrained with GloVe Vectors (≈1 line)
    embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
    
    # Propagate sentence_indices through your embedding layer, you get back the embeddings
    embeddings = embedding_layer(sentence_indices)   
    
    # Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
    # Be careful, the returned output should be a batch of sequences.
    X = LSTM(128, return_sequences=True)(embeddings)
    # Add dropout with a probability of 0.5
    X = Dropout(0.2)(X)
    # Propagate X trough another LSTM layer with 128-dimensional hidden state
    # Be careful, the returned output should be a single hidden state, not a batch of sequences.
    X = LSTM(128, return_sequences=False)(X)
    # Add dropout with a probability of 0.5
    X = Dropout(0.2)(X)
    # Propagate X through a Dense layer with softmax activation to get back a batch of 5-dimensional vectors.
    X = Dense(activation='softmax', units = 5)(X)
    # Add a softmax activation
    X = Activation('softmax')(X)
    
    # Create Model instance which converts sentence_indices into X.
    model = Model(inputs=sentence_indices,outputs=X)
    
    ### END CODE HERE ###
    
    return model

Run the following cell to create your model and check its summary. Because all sentences in the dataset are less than 10 words, we chose `max_len = 10`.  You should see your architecture, it uses "20,223,927" parameters, of which 20,000,050 (the word embeddings) are non-trainable, and the remaining 223,877 are. Because our vocabulary size has 400,001 words (with valid indices from 0 to 400,000) there are 400,001\*50 = 20,000,050 non-trainable parameters. 

In [None]:
model = Emojify_V2((maxLen,), word_to_vec_map, word_to_index)
model.summary()

In [None]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

It's time to train the model.

In [None]:
X_train_indices = sentences_to_indices(X_train, word_to_index, maxLen)
Y_train_oh = convert_to_one_hot(Y_train, C = 5)

Fit the Keras model on `X_train_indices` and `Y_train_oh`. We will use `epochs = 50` and `batch_size = 32`.

In [None]:
model.fit(X_train_indices, Y_train_oh, epochs = 120, batch_size = 12, shuffle=True)

Evaluate model on the test set. 

In [None]:
X_test_indices = sentences_to_indices(X_test, word_to_index, max_len = maxLen)
Y_test_oh = convert_to_one_hot(Y_test, C = 5)
loss, acc = model.evaluate(X_test_indices, Y_test_oh)
print()
print("Test accuracy = ", acc)

You should get a test accuracy between 80% and 95%. Run the cell below to see the mislabelled examples. 

In [None]:
# This code allows you to see the mislabelled examples
C = 5
y_test_oh = np.eye(C)[Y_test.reshape(-1)]
X_test_indices = sentences_to_indices(X_test, word_to_index, maxLen)
pred = model.predict(X_test_indices)
for i in range(len(X_test)):
    x = X_test_indices
    num = np.argmax(pred[i])
    if(num != Y_test[i]):
        print('Expected emoji:'+ label_to_emoji(Y_test[i]) + ' prediction: '+ X_test[i] + label_to_emoji(num).strip())

Try it on your own example. Write own sentence below. 

In [None]:
# Change the sentence below to see your prediction. Make sure all the words are in the Glove embeddings.  
x_test = np.array(['not feeling happy'])
X_test_indices = sentences_to_indices(x_test, word_to_index, maxLen)
print(x_test[0] +' '+  label_to_emoji(np.argmax(model.predict(X_test_indices))))

## Hyper parameter tuning
Lets go to the cloud for parralell power!


## Acknowledgments
 Andrew NGs coursera class on sequence modeling.


