# 1. Intro

**FAQ style retrieval based chat bot**
- Train three model types (in different configurations) to figure out user input and map to a response class. Let's see which one does well. The models are
    - a TF-IDF similarity measure doc classifier 
    - a TFIDF based n-gram MLP multi-class classifier (supervised)
    - an RNN classifier (unsupervised)

**The Data**
- Pulling data from known disease/pandemic authorities such as CDC and WHO

- Also getting KE national government content. These are static data; knowledge already in place. TODO: a channel for news updates 

- Data is maintained in a Gsheet and can make updates/additions/etc from there

- Clean and classify the above data to have two datasets
    - FAQ_db: This is the knowledge base. One to one mapping of class categories and response paragraphs. Has two main fields: class_category, response_p. Additional fields: src, src_link 
    - Phrases_db: This is the training set on questions/input that users may present to the bot. Has two main fields: input_phrase, class_category 
    
**Approach**
- Retrieval based chat bot. 


# 2. Corana Dashboard by John Hopkins Uni

[Link to map FAQ](https://coronavirus.jhu.edu/map-faq)

In [1]:
### John Hopkins Dashboard - https://coronavirus.jhu.edu/map.html
from IPython.display import IFrame
## default 77.3846,11.535 
start_coordz = "77.3846,11.535"  # rabat, morocco"33.9693414,-6.9273026"
center_coordz = "28.8189834,-2.5117154" #center Bukavu, DRC "-2.5117154,28.8189834"

IFrame(src="//arcgis.com/apps/Embed/index.html?webmap=14aa9e5660cf42b5b4b546dec6ceec7c&extent="+start_coordz+",163.5174,52.8632"+
       "&center="+center_coordz+
       "&zoom=true&previewImage=false&scale=true&disable_scroll=true&theme=light", 
    width="650", height="400", frameborder="0", scrolling="no", marginheight="0", marginwidth="0", title="2019-nCoV" )

# 3. FAQ Chat bot - Part 4

- Final model 
- Test interaction

## 3.1. Load Model

In [2]:
import numpy as np
import pandas as pd

import nltk

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [3]:
import warnings
warnings.filterwarnings('ignore')

import logging
logging.getLogger('tensorflow').disabled = True
# logging.getLogger('google_auth_oauthlib').disabled = True

In [4]:
## plot settings
params = {
    'font.size' : 14.0,
    'figure.figsize': (20.0, 12.0),
    'figure.dpi' : 40
}
plt.rcParams.update(params)
plt.style.use('fivethirtyeight') #tableau-colorblind10 ggplot

In [5]:
#### Helpers
headline_counter = 1
def printHeadline(src, msg):
    global headline_counter
    print( "\n{0} {1}. {2} : {3} {0}".format("-"*6, headline_counter, str(src).upper(), msg))
    headline_counter += 1

In [6]:
import sys
sys.path.append("../../../shared") 
import zdata_source
import zbot_logic
from zbot_logic import ZBotLogicFlow

from termcolor import colored 

In [7]:
## setup bot

## 1. path to FAQ db
faq_path = [ ('1EuvcPe9WXSQTsmSqhq0LWJG4xz2ZRQ1FEdnQ_LQ-_Ks', 'FAQ responses!A1:G1000'), ('1EuvcPe9WXSQTsmSqhq0LWJG4xz2ZRQ1FEdnQ_LQ-_Ks', 'Classify_Phrases!A1:G1000')]
faq_typ = zdata_source.zGSHEET_FAQ
      
## 2. create bot
bot_app = ZBotLogicFlow()
bot_app.loadFaqDbz(faq_path, faq_typ)

## 3. load model 

## 3.2. Using Cosine Similarity Model

In [8]:
## 3. load model 
bot_app.loadModel( zbot_logic.MODEL_COSINE_TFIDF, "cov_Cosine_Tfidf")

print( repr(bot_app.model.model) )

INFOR   : 2020-04-09 01:38:57.318567 [[34m<class 'zmodel_cosine_similarity.ZCosineSimilarity'>.model.load[0m] Model loaded from file successfully
INFOR   : 2020-04-09 01:38:57.320534 [[34m<class 'zmodel_cosine_similarity.ZCosineSimilarity'>.model.load[0m] Persist unpacked successfully


TfidfVectorizer(analyzer='word', binary=False, decode_error='strict',
                dtype=<class 'numpy.float64'>, encoding='utf-8',
                input='content', lowercase=True, max_df=1.0, max_features=None,
                min_df=1, ngram_range=(1, 1), norm='l2', preprocessor=None,
                smooth_idf=True, stop_words=None, strip_accents=None,
                sublinear_tf=False, token_pattern='(?u)\\b\\w\\w+\\b',
                tokenizer=None, use_idf=True, vocabulary=None)


In [9]:
 ## 4. Let's chat
while( 1 ):
    user_input = input( colored("Talk to me: ", "yellow") )
    prompt = colored( ">>>: ", "green") 

    response, rcode = bot_app.getResponse( user_input ) 

    print( "{} {}\n".format(prompt, "I don't understand. Try that again" if response is None else response )  ) 

    if ( rcode == -99) :
        break 


Talk to me: hey
[32m>>>: [0m hey

Talk to me: what is corona


INFOR   : 2020-04-09 01:39:07.759229 [[34mcosine.predict[0m] IN: 'what is corona'
INFOR   : 2020-04-09 01:39:07.759229 [[34mcosine.predict[0m] IN.PREPROC: {'cleanup_and_lemmatize': {'remove_stopwordz': False, 'stop_wordz': None, 'remove_numberz': True, 'lemmatized': True, 'unique': False}}
INFOR   : 2020-04-09 01:39:07.760225 [[34mcosine.predict[0m] IN.CLEAN: ['what', 'is', 'corona']


[32m>>>: [0m COVID-19 is a new strain of coronavirus that has not been previously identified in humans. It was first identified in Wuhan, Hubei Province, China, where it has caused a large and ongoing outbreak. It has since spread more widely in China. Cases have since been identified in several other countries. The COVID-19 virus is closely related to a bat coronavirus.

There is much more to learn about how COVID-19 is spread, its severity, and other features associated with the virus; epidemiological and clinical investigations are ongoing.

Outbreaks of new coronavirus infections among people are always a public health concern. The situation is evolving rapidly.

Talk to me: should i get my dog tested too


INFOR   : 2020-04-09 01:39:15.463333 [[34mcosine.predict[0m] IN: 'should i get my dog tested too'
INFOR   : 2020-04-09 01:39:15.464359 [[34mcosine.predict[0m] IN.PREPROC: {'cleanup_and_lemmatize': {'remove_stopwordz': False, 'stop_wordz': None, 'remove_numberz': True, 'lemmatized': True, 'unique': False}}
INFOR   : 2020-04-09 01:39:15.464359 [[34mcosine.predict[0m] IN.CLEAN: ['should', 'i', 'get', 'my', 'dog', 'tested', 'too']


[32m>>>: [0m Face masks are not recommended for the general population unless otherwise advised by the Ministry of Health

People who have symptoms and might be infected with COVID-19 are required to stay in isolation at home and should wear a surgical face mask when in the same room as another person and when seeking medical advice to reduce the risk of transmitting COVID-19 to anyone else.

Health care workers who are caring for patients with suspected COVID-19 should use appropriate personal protective equipment to protect themselves against COVID-19. For more information refer to Clinical Excellence Commission (CEC) - Coronavirus COVID-19

Talk to me: not right. bye
[32m>>>: [0m anytime. baadaye



## 3.3. Using Ngram MLP Neural Net

In [None]:
## 3. load model 
bot_app.loadModel( zbot_logic.MODEL_NGRAM_MLP, "cov_mlpNN_real")

print( repr(bot_app.model.model) )

In [11]:
## 4. Let's chat
while( 1 ):
        user_input = input( colored("Talk to me: ", "yellow") )
        prompt = colored( ">>>: ", "green") 

        response, rcode = bot_app.getResponse( user_input ) 

        print( "{} {}\n".format(prompt, "I don't understand. Try that again" if response is None else response )  ) 
        
        if ( rcode == -99) :
            break 
    

Talk to me: hey
[32m>>>: [0m hello

Talk to me: what is corona


INFOR   : 2020-04-09 01:39:59.087156 [[34mmlp.predict[0m] IN: ['corona']


ValueError: Error when checking input: expected dropout_2040_input to have shape (244,) but got array with shape (1,)