# COVID 19 FAQ Bot
- User input gets classified into some category, upon which the appropriate answer is matched
- Data Sources: WHO, CDC, JHU, MoH KE


**The Data**
- Pulling data from known disease/pandemic authorities such as CDC and WHO

- Also getting KE national government content. These are static data; knowledge already in place. TODO: a channel for news updates 

- Data is maintained in a Gsheet and can make updates/additions/etc from there

- Clean and classify the above data to have two datasets
    - FAQ_db: This is the knowledge base. One to one mapping of class categories and response paragraphs. Has two main fields: class_category, response_p. Additional fields: src, src_link 
    - Phrases_db: This is the training set on questions/input that users may present to the bot. Has two main fields: input_phrase, class_category 
    
**Approach**
- Retrieval based chat bot. 


In [1]:
%store -r __jhu_map

**Here's the JHU tracker**
[Link to map FAQ](https://coronavirus.jhu.edu/map-faq)

In [2]:
__jhu_map

In [3]:
%run utilz_includez.ipynb

In [6]:
import sys
sys.path.append('..')
sys.path.append('../envbin')
import zdata_source
import zbot_logic
from zbot_logic import ZBotLogicFlow


import random

logging.getLogger('zmoi').disabled = True

ZAPP_NAME = 'ncov19_tfidf_faq'

# Interact with trained model
- RECALL:Using TF-IDF Cosine Similarity. 

In [8]:
## setup bot

## 1. path to FAQ db
faq_path = [ ('1EuvcPe9WXSQTsmSqhq0LWJG4xz2ZRQ1FEdnQ_LQ-_Ks', 'FAQ responses!A1:G1000'), ('1EuvcPe9WXSQTsmSqhq0LWJG4xz2ZRQ1FEdnQ_LQ-_Ks', 'Classify_Phrases!A1:G1000')]
faq_typ = zdata_source.zGSHEET_FAQ
      
## 2. create bot
bot_app = ZBotLogicFlow()
bot_app.loadFaqDbz(faq_path, faq_typ)

## 3. load model
bot_app.loadModel( zbot_logic.MODEL_COSINE_TFIDF, ZAPP_NAME)

print( repr(bot_app.model.model) )

INFOR   : 2020-04-21 20:51:42.605831 [[34m<class 'envbin.zmodel_cosine_similarity.ZCosineSimilarity'>.model.load[0m] Model loaded from file successfully
INFOR   : 2020-04-21 20:51:42.607826 [[34m<class 'envbin.zmodel_cosine_similarity.ZCosineSimilarity'>.model.load[0m] Persist unpacked successfully


TfidfVectorizer(analyzer='word', binary=False, decode_error='strict',
                dtype=<class 'numpy.float64'>, encoding='utf-8',
                input='content', lowercase=True, max_df=1.0, max_features=None,
                min_df=1, ngram_range=(1, 1), norm='l2', preprocessor=None,
                smooth_idf=True, stop_words=None, strip_accents=None,
                sublinear_tf=False, token_pattern='(?u)\\b\\w\\w+\\b',
                tokenizer=None, use_idf=True, vocabulary=None)


In [9]:
 ## 4. Let's chat
while( 1 ):
    user_input = input( colored("Talk to me: ", "yellow") )
    prompt = colored( ">>>: ", "green") 

    response, rcode = bot_app.getResponse( user_input ) 

    print( "{} {}\n".format(prompt, "I don't understand. Try that again" if response is None else response )  ) 

    if ( rcode == -99) :
        break 


Talk to me: hi there
[32m>>>: [0m hello

Talk to me: what is corona virus


INFOR   : 2020-04-21 20:51:58.891497 [[34mcosine.predict[0m] IN: 'what is corona virus'
INFOR   : 2020-04-21 20:51:58.891497 [[34mcosine.predict[0m] IN.PREPROC: {'cleanup_and_lemmatize': {'remove_stopwordz': True, 'stop_wordz': ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over

[32m>>>: [0m <p>COVID-19 is a new strain of coronavirus that has not been previously identified in humans. It was first identified in Wuhan, Hubei Province, China</p>

<p>There is much more to learn about how COVID-19 is spread, its severity, and other features associated with the virus; epidemiological and clinical investigations are ongoing.</p>

Talk to me: how is it spread


INFOR   : 2020-04-21 20:52:07.323916 [[34mcosine.predict[0m] IN: 'how is it spread'
INFOR   : 2020-04-21 20:52:07.323916 [[34mcosine.predict[0m] IN.PREPROC: {'cleanup_and_lemmatize': {'remove_stopwordz': True, 'stop_wordz': ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', '

[32m>>>: [0m <p>COVID-19 virus spreads through contaminated droplets spread by coughing or sneezing, or by contact with contaminated hands, surfaces or objects.</p>

<p>The time between when a person is exposed to the virus and when symptoms first appear is typically 5 to 6 days, although may range from 2 to 14 days. For this reason, people who might have been in contact with a confirmed case are being asked to self-isolate for 14 days.</p>

Talk to me: who is most at risk


INFOR   : 2020-04-21 20:52:14.171268 [[34mcosine.predict[0m] IN: 'who is most at risk'
INFOR   : 2020-04-21 20:52:14.171268 [[34mcosine.predict[0m] IN.PREPROC: {'cleanup_and_lemmatize': {'remove_stopwordz': True, 'stop_wordz': ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over'

[32m>>>: [0m The people most at risk of getting COVID-19 coronavirus infections are those who have:
<ul><li> Recently returned from overseas</li>
<li> Been in close contact with someone who has been diagnosed with COVID-19.</li>
<li> People with compromised immune systems including cancer treatment </li>
<li> People with diagnosed chronic medical conditions including chronic lung disease or moderate to severe asthma , heart disease with complications </li>
<li> Elderly people: Older adults and people of any age who have serious underlying medical conditions might be at higher risk for severe illness from COVID-19. </li>
</ul>

<p><b>Children: </b>At this stage the risk to children and babies, and the role children play in the transmission of COVID-19, is not clear. However, there has so far been a low rate of confirmed COVID-19 cases among children, relative to the broader population. It is important to remember that even healthy young adults can have severe disease caused by COVID-1