# A chatbot for AI Skunkworks - SkunkBot


### Abstract

This project aims at creating a chatbot named SkunkBot in Slack for AI Skunkworks, a group of people at Northeastern University who research and develop Artificial Intelligence, Machine Learning, and Deep Learning projects primarily for the sake of innovation and learning. The chatbot is trained to answer all the questions related to AI Skunkworks such as the purpose and projects. The chatbot is created by integrating Google Cloud Platform to DialogFlow API and then integrating it with Slack. Three phases are carried out. In the first phase, the knowledge base for the data is created. The data is preprocessed by performing semi-structured analysis and stored on Google Cloud Datastore. Also, the synonyms for all these extracted topics are created using Natural Language Processing (NLP). In the second phase, a chatbot is built using DialogFlow. The topic entries from Cloud Datastore are imported into DialogFlow to populate entity.  After gathering all the data to train the chatbot, intents are created to capture request response action between the chatbot and a user and training phases are added to train a chatbot intent. API/REST web-service is published that maps to DialogFlow’s specification for webhooks. The webhook is published on the internet by using ngrok service to create an HTTPs tunnel from service running behind GCP firewall to a public URL on the internet. The webhook is enabled and HTTPs tunnel is mentioned in the fulfillment section of the DialogFlow API. In the third and the final phase, the DialogFlow is integrated to Slack app, the DialogFlow contains an inbuilt integration platform for Slack. Finally, we test the bot in Slack app by asking it the questions.

### Introduction

Artificial Intelligence is growing day by day and chatbots are a big part of this growth. Many companies have started using chatbots as a medium for automated customer support. Using chatbots is not only convenient but very helpful to huge companies in terms of labor cost. A chatbot is an AI-powered piece of software in a device, application, website or other networks that try to estimate a consumer’s needs and then assist them to perform a task. It is forecasted by Gartner that by 2020, over 85% of customer interactions will be handled without a human. A chatbot consists of three main NLP categories: Entities, Intents, and Actions. Entities are specific mappings of natural language word combinations in the human conversations to standard phases conveying their clear meaning. Intents are general characteristics that map the user’s input to the corresponding action. For example, the input “Can I work on a skunkworks project?” will map to ‘Skunkwork_projects’ intent by its entire wording. Actions are the responses to the corresponding intents. These are usually the conventional functions, which may take optional parameters from the caller with context. API are annotation for application programming interface which are a set of functions and procedures that allow the creation of applications which access the features or data of an operating system, application, or other service. API calls are used retrieve data either in XML, JSON or Excel format. This format forms the basic blocks to create the ontology template. A chatbot can be created using various APIs such as DialogFlow/api.ai, wit.ai, Rasa, Msg.ai, etc. A DialogFlow API is a Google-owned product that does not require installation and the data is hosted on cloud. It provides integration with various services such as Google Assistant, Skype, Slack, FB Messenger, etc. A chatbot intent creation consists of two main tasks: Training and Classification. In the training task, the training data with word embedding and the known labeled data are input to Machine Learning algorithms. In the classification task, the user text input along with word embedding is passed to a model which also takes as an input the algorithms from training task. This model then gives an intent/label. Figure 1. shows the block diagram for the same 



<img src="flow1.JPG">

## Semi-structured analysis

In this part, we extract topic headings and associated text from the file and store this information as key-value pairs in Cloud Database to give the chatbot a basic vocabulary

In [1]:
# Installing the google cloud datastore
!pip install google-cloud-datastore

[33mDEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.[0m


In [2]:
from google.cloud import datastore

In [3]:
#datastore_client ia a variable which stores the values to be store in Cloud Datastore
datastore_client = datastore.Client()

If a line of text consisting of less than 5 words is followed by paragraphs of text the assume the line of text with less than 5 words is a topic (i.e. the topic of a question an employee might ask) and that the paragraphs of text are the answer to that question (called action_text for the lack of a better term).

When a topic and action_text are found these are stored in Cloud Datastore as a key-value pair with the topic as the key and the action_text as the value.

In [4]:
skunk = open('skunk.txt', 'r')
while True:
  
  topic = skunk.readline()
  if not(topic):
    break
  
  if (topic != '\r\n') and (len(topic.split(' ')) < 5):
  
    action_text = ''
        
    last_line = ''
    line = skunk.readline()
    
    while (last_line != '\r\n') and (line != '\r\n') and (len(line.split(' ')) > 5):
      
      action_text += line
      last_line = line
      line = skunk.readline()
      
    if action_text != '':
      
      kind = 'Topic'
      topic_key = datastore_client.key(kind, topic.strip().lower())
      
      topic = datastore.Entity(key=topic_key)
      topic['action_text'] = action_text

      datastore_client.put(topic)

      print('Saved {}: {}'.format(topic.key.name, topic['action_text']))

Saved ﻿skunkwork project: A skunkworks project is a project developed by a small and loosely structured group of people who research and develop a project primarily for the sake of radical innovation. 

Saved ai skunkwork: AI Skunkworks at Northeastern University is a group of people who research and develop Artificial Intelligence, Machine Learning, and Deep Learning projects primarily for the sake of innovation and learning. We provide open-mic, mentorship, workshops, seminars, hack-a-thons, and events that assist those exploring the edges of AI.

Saved social butterfly: Social Butterfly is social engagement software using NEU AI Skunkworks(or you can choose
something else) as a model. In this you will develop models that enhance one of the five aspects of

Saved *profile community members: Create statistical profiles of the NEU AI Skunkworks community.

Saved *publish on social media: Create and approve content for multiple social networks and accounts. Create models that will
optim

## Processing incrorrect words and synonyms

Incorrectly tagged questions are hard to find and answer. If you know of common, alternate spellings or phrasings for this tag, add them here so we can automatically correct them in the future. For example, suggest “Skunkworks” as a synonym for AI Skunkworks, or “Projects” for project.

- Import inflect() for Plurals
The methods of the class engine in module inflect.py provide plural inflections, singular noun inflections, “a”/”an” selection for English words, and manipulation of numbers as words.
Plural forms of all nouns, most verbs, and some adjectives are provided. Where appropriate, “classical” variants (for example: “brother” -> “brethren”, “dogma” -> “dogmata”, etc.) are also provided.
Single forms of nouns are also provided. The gender of singular pronouns can be chosen (for example “they” -> “it” or “she” or “he” or “they”).
Pronunciation-based “a”/”an” selection is provided for all English words, and most initialisms.
It is also possible to inflect numerals (1,2,3) to ordinals (1st, 2nd, 3rd) and to English words (“one”, “two”, “three”).
In generating these inflections, inflect.py follows the Oxford English Dictionary and the guidelines in Fowler’s Modern English Usage, preferring the former where the two disagree.

In [5]:
!pip install inflect

[33mDEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.[0m


- Stop words: stop words are words which are filtered out before or after processing of natural language data (text).[1] Though "stop words" usually refers to the most common words in a language, there is no single universal list of stop words used by all natural language processing tools, and indeed not all tools even use such a list. Some tools specifically avoid removing these stop words to support phrase search. 
- Any group of words can be chosen as the stop words for a given purpose. For some search engines, these are some of the most common, short function words, such as the, is, at, which, and on. In this case, stop words can cause problems when searching for phrases that include them, particularly in names such as "The Who", "The The", or "Take That". Other search engines remove some of the most common words including lexical words, such as "want" from a query in order to improve performance. 

In [6]:
import nltk
nltk.download('stopwords')
nltk.download('wordnet')

[nltk_data] Downloading package stopwords to /content/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /content/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [7]:
from nltk.corpus import stopwords
stop = set(stopwords.words('english'))

In [8]:
client = datastore.Client()
query = client.query(kind='Topic')
results = list(query.fetch())

In [9]:
import inflect
plurals = inflect.engine()

In [10]:
from nltk.corpus import wordnet
from sets import Set

for result in results:
  for word in result.key.name.split():
    
    if word in stop:
        continue

    
    synonyms = Set()
    for syn in wordnet.synsets(word):
      
      if ".n." in str(syn):

        for l in syn.lemmas():
          lemma = l.name()
          if (lemma.isalpha()):
            synonyms.add(lemma)
            synonyms.add(plurals.plural(lemma))
      
      if ".a." in str(syn):
        synonyms = Set()
        break

    print result.key.name, word, synonyms
    
    kind = 'Synonym'
    synonym_key = datastore_client.key(kind, result.key.name)

    synonym = datastore.Entity(key=synonym_key)
    synonym['synonym'] = result.key.name

    datastore_client.put(synonym)
    
    synonym_key = datastore_client.key(kind, word)

    synonym = datastore.Entity(key=synonym_key)
    synonym['synonym'] = result.key.name

    datastore_client.put(synonym)
    
    for dictionary_synonym in synonyms:
      
      synonym_key = datastore_client.key(kind, dictionary_synonym)

      synonym = datastore.Entity(key=synonym_key)
      synonym['synonym'] = result.key.name

      datastore_client.put(synonym)
      
    synonym_key = datastore_client.key(kind, plurals.plural(word))

    synonym = datastore.Entity(key=synonym_key)
    synonym['synonym'] = result.key.name

    datastore_client.put(synonym)

  from ipykernel import kernelapp as app


*profile community members *profile Set([])


  _warn_if_not_unicode(string)


*profile community members community Set([u'communities', u'community'])
*profile community members members Set([u'phalluses', u'penises', u'extremity', u'appendages', u'member', u'penis', u'members', u'appendage', u'extremities', u'phallus'])
*publish on social media *publish Set([])
*publish on social media social Set([])
*publish on social media media Set([u'spiritualists', u'medium', u'sensitive', u'sensitives', u'spiritualist', u'metiers', u'mediums', u'metier'])
ai skunkwork ai Set([u'AI', u'ai', u'ais', u'AIS'])
ai skunkwork skunkwork Set([])
social butterfly social Set([])
social butterfly butterfly Set([u'butterfly', u'butterflies'])
﻿skunkwork project ﻿skunkwork Set([])
﻿skunkwork project project Set([u'task', u'projection', u'undertaking', u'labors', u'labor', u'project', u'tasks', u'projections', u'undertakings', u'projects'])


In [11]:
!pip install --upgrade pip
!pip install dialogflow==0.3.0

[33mDEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.[0m
Requirement already up-to-date: pip in /usr/local/envs/py2env/lib/python2.7/site-packages (19.1)
[33mDEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.[0m


## DialogFlow

Dialogflow by itself is a platform service that allows developers to build “engaging voice and text based conversational interfaces powered by AI”.

Why Dialogflow?
We are Aiming to create a slack channel which can easily integrate with Dialogflow which is provides one-click integrations to most popular messaging Apps like Facebook Messenger, Slack, Twitter, Kik, Line, Skype, Telegram, Twilio and Viber. Even to some voice assistants like Google Assistant, Amazon Alexa and Microsoft Cortana. So in future even if we need a facebookbot or a bot anywhere for AI Skunkworks we can easily integrate it. Also, Dialogflow service has become free and available to Google Cloud Platform users. And we did our coding part in GCP.

- Compared to some platforms which works on predefined questions like Chatfuel, Dialogflow can offer better user experience with NLP. DialogFlow Agents are pretty good at NLP.
- Entities: These are the knowledge repository that the agent would use to answer the user’s question. There are a variety of entities: system entities to include information about time etc, weather/location entities etc.


In [12]:
client = datastore.Client()
query = client.query(kind='Topic')
results = list(query.fetch())

In [13]:
import dialogflow

entity_types_client = dialogflow.EntityTypesClient()

project_id = !(gcloud config get-value project)

project_agent_path = entity_types_client.project_agent_path(
        project_id[0])

for element in entity_types_client.list_entity_types(project_agent_path):
  if (element.display_name == 'Topic'):
    entity_type_path = element.name

project_id = !(gcloud config get-value project)

entities = []

for result in results:
  
  entity = dialogflow.types.EntityType.Entity()
  entity.value = result.key.name
  entity.synonyms.append(result.key.name)

  entities.append(entity)

print entities

response = entity_types_client.batch_create_entities(
        entity_type_path, entities)

print('Entity created: {}'.format(response))

[value: "*profile community members"
synonyms: "*profile community members"
, value: "*publish on social media"
synonyms: "*publish on social media"
, value: "ai skunkwork"
synonyms: "ai skunkwork"
, value: "social butterfly"
synonyms: "social butterfly"
, value: "\357\273\277skunkwork project"
synonyms: "\357\273\277skunkwork project"
]
Entity created: <google.api_core.operation.Operation object at 0x7f40ebc0ccd0>


## Webhook

API/REST web-service is published that maps to DialogFlow’s specification for webhooks. The webhook is published on the internet by using ngrok service to create an HTTPs tunnel from service running behind GCP firewall to a public URL on the internet. The webhook is enabled and HTTPs tunnel is mentioned in the fulfillment section of the DialogFlow API.

In [14]:
!pip install flask

[33mDEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.[0m


In [15]:
import json
import re
from flask import Flask, request, jsonify, make_response

Here the webhook is running on port 5000

In [None]:
app = Flask(__name__)


@app.route('/webhook/', methods=['POST'])
def handle():
    req = request.get_json(silent=True, force=True)
    print 'Request:'
    print(json.dumps(req, indent=4))
    if req.get('queryResult').get('action') != 'lookup':
        return {}
    topic = req.get('queryResult').get('parameters').get('Topic')
    topic = re.sub(r'[^\w\s]', '', topic)
    print topic
    rsp = getResponse(topic)
    rsp = json.dumps(rsp, indent=4)
    print rsp
    r = make_response(rsp)
    r.headers['Content-Type'] = 'application/json'
    return r

def getResponse(topic):
    
    client = datastore.Client()
    query = client.query(kind='Synonym')
    key = client.key('Synonym', topic)
    query.key_filter(key, '=')
    results = list(query.fetch())
    
    if len(results) == 0:
        return buildReply('I can\'t find that in the handbook...')
    
    print results[0]['synonym']
    
    query = client.query(kind='Topic')
    key = client.key('Topic', results[0]['synonym'])
    query.key_filter(key, '=')
    results = list(query.fetch())
    
    print results[0]['action_text']
    
    return buildReply(results[0]['action_text'])

def buildReply(info):
    return {
        'fulfillmentText': info,
    }

if __name__ == '__main__':
    app.run(host='0.0.0.0')

 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)


* The ngrok file is separately attached

### Conclusion

We can conclude that DialogFlow proves to be a great API to create a chatbot compared to other platforms. Using the Analytics components, it can be concluded that the SkunkBot responds 16.5% incorrectly.

### Results

<img src=analytics.JPG>

### References

- https://dialogflow.com/
- https://en.wikipedia.org/wiki/Chatbot#Limitations_of_Chatbots
- https://cloud.google.com/solutions
- https://www.analyticsvidhya.com/


### License

Copyright <2019> <SPOORTHI BELLAM ><PRACHI PATEL>
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
All writing in the document is licensed bt The Creative Commons Attribution 3.0 https://creativecommons.org/licenses/by/3.0/us/.
