Description

Understand what Natural Language Generation is and learn how to build one of its most popular application i.e Chatbots 

Overview

The concept will introduce you to the NLG concepts. In the concept you will learn

- What is NLG

- How NLG systems work?

- What is Chatbot?

- Types of chatbot and its implementation

- RASA and its implementation
Pre-requisite

Before you start learning this concept, be sure you have already covered

- Data Wrangling with Pandas
- Foundations of Text Analytics


Learning Outcomes

By the end of this concept, you will be able to do the following

- Understand why NLG is important

- Know what chatbot is and its different types

- Know how to create chatbot using NLTK

- Know how to create chatbot using RASA


# 1.Natural Language Generation

Description: In this chapter, we will learn what Natural Language Generation is and it's architecture.


## 1.1 What is NLG


Over the last few modules, we have talked about how NLP helps us solve the problem of having machines comprehend human languages. 

NLP is broadly defined as the machine's ability to take natural languages as input, tokenize it, understand its meaning and determine appropriate action to respond back. At a very high level though we can split NLP into four different subsections: 

![](images/NLP_2.PNG)

***
**Speech recognition**

The input phase of NLP.

It involves all the methods of taking input as speech and converting them to text for further processing.

**Natural language understanding (NLU):**

This is a subset of NLP that deals with machine's reading comprehension. 

It deals with how to best handle unstructured inputs and convert them into a structured form that a machine can understand and act upon. This sounds easy as we humans are able to effortlessly handle mispronunciations or swapped words but machines are less adept at handling such random inputs.

Put simply this is the task of to optimise machine's ability to understand what we say.

Examples include:

- Rerouting an email to the right person based on a the content

- Showing resturant options based on our search command "I am in the mood to eat Italian food"



**Natural language generation (NLG):**

This is a subfield of NLP  that focuses on computer systems that can produce understandable texts in human languages. NLG processes turn structured data into text.

If NLU deals with the reading of the text, you can understand NLG as dealing with the writing of the text.

This subfield is going to be the focus throughout this module.


**Speech synthesis:**

The output phase of NLP

These are the methods which speak or read out text.

***
Usually NLP problems involve solving problem across a single subset.

Though there are NLP applications that combine different subsets. 

For instance, speech-to-speech applications(Apple’s Siri, Amazon's Alexa) use all the above types of NLP.


As mentioned before, this module is going to majorly talk about NLG.

The major question before we proceed is what's the need for NLG.

**Need for NLG**

Humans have needed data in order to formulate new ideas and communicate them. However, with the rise of big data, there is a major influx of data that needs to be assessed along with the need to reduce costs significantly.


The advantage of NLG is the ability to convert the dataset into legible narratives understood by humans. While NLP can only assess text to give us insights, NLG can be used to process statistical data present in spreadsheets and produce data-rich information. With smart automation of routine tasks that NLG provides, humans can focus on more creative activities.

For e.g:  The Associated Press used the report-generating capability of NLG to develop reports from their data. This resulted in freeing up of human reporters who were spending time energy going through the data to write a report. Instead, as NLG automatically produced thousands of reports, they invested their human resources to perform the more critical tasks.

The above mentioned use case is pretty much the essence of NLG. Automating routine tasks that humans spend energy and time wasting.

To truly see NLG's potential, let's look at some exciting applications in which NLG is used

**Applications of NLG:**

- Auto Response Aids:

NLG can be employed to produce routine factual sections of a document (which humans find monotonous to write), leaving analytical and explanatory sections to the human author.

Examples include automated personal replies to customer queries.

For example: ALETHGEN helps customer-service representatives write response letters to customers.


- Summary generations

Many experimental NLG systems have been developed to help summarise the data for further use.

Examples include summarising weather data from news articles, providing hospital patients
explanations of information in their patient records.


- Chatbots/ Personal Assistants

Perhaps the most popular use of NLG is its application to create chatbots capable of answering queries.

A chatbot refers to artificial intelligence software that can simulate a conversation (or chat) with a user in natural language through messaging applications, websites, mobile apps.

For example:

[Wysa](https://www.wysa.io/) is an AI powered chatbot that acts as your health coach.

[Collect Chat](https://collect.chat/) is another chatbot that attempts to replace the web forms(data) with an interactive method



Lot of online services are also enabling chatbots to act as first level of customer queries.


- Miscellaneous

Lot of experimental systems have been explored for different practical applications.

Marketing: The DYD system(van Deemter and Odijk) was made to generate descriptions of a music CD which are intended to increase interest in and sales of that CD.

Behaviour Change: The STOP system(Reiter, Robertson, and Osman) was made to generate personalised letters which encourage people to stop smoking

Entertainment: The JAPE system(Binstead and Ritchie) was made to generate jokes.


NLG systems are capable for much more 

To be really successful though we need to have systems that need to embody models. For instance, to make a behaviour chatbot, we need to have a proper model that explains how people break addictive habits

Currently we do not have precise computational models in many of the areas NLG can be used for. This means that building effective NLG systems is also a research project in the computational modelling of human activities.

Let's now understand in detail what NLG architecture looks like.

# 1.2 How NLG systems work

We just briefly discussed some of the exciting applications of NLG.

To understand NLG better, let's try to look broadly at the NLG architecture.

Currently, there is no agreement on the standard architecture model of NLG sysytem. 

We will look in detail, a specific architecture that has decomposition of the NLG process into distinct modules and one particular set of representations for the information that is passed between these modules. The architecture that we are going to discuss is not the only possible architecture for NLG systems, but it is similar to the architecture of a number of existing NLG systems, and it is broadly compatible with a wide range of work in the field.
 
Following are broadly the processes involved in NLG architecture (suggested by [EDU Reiter](https://ehudreiter.com/)):


![](images/nlg_arch.PNG)

Consider we want to create a Weather Reporter Application using NLG. Following is the ideal weather summary we are looking for:

`"The month of March was warmer and drier than average, with an average number of rain and sunny days. The total rain for the year so far is just above average. There was rain on every day for five days from the 20th to the 25th, with fog patches on the 23rd and 24th. Rainfall amounts were mostly less, with strong dry winds."`

Let's look at each step in NLG one by one by examining the issues involved in generating the above summary.

CONTENT DETERMINATION:

This is essentially the task of deciding what information should be communicated in the output document. The choice of what content should be expressed in a text is dependent upon certain features including:


***Different communicative goals:*** 

For a same problem, it may require different information to be expressed depending upon the goal. 

For e.g. 

If we have a NLG weather system that is capable of both describing the current weather and also of providing definitions and explanations of meteorological phenomena, then different content will be required in each case. 


**Content Consumer**

The content may also depend on assumed/known characteristics of the person who is going to consume the content. 

For e.g.: Suppose we make a chatbot for a query site like `stack overflow`. Depending on someone who is considered a novice in the domain of his query, they may require more explanatory information than someone who is considered an expert.


**Constraints**

The constraints upon the output produced by machine will also play a role in determining the content.

For e.g.: One of them could be is that it may be necessary that the text produced fits within a constrained space.

**Information Source**

The underlying source of information also plays an important role in determining what the content of the text should be. 


Even with all the above points, the information to be included in a text are ultimately very application dependent. That's why it's not possible to specify general rules for content determination.

For e.g. If we want to create a summary for a transportation logistics company that might look at the data and take appropriate action, the summary that should be put will involve how the weather is affecting the road and transport services.



***

`"The month of March was warmer and drier than average, with an average number of rain and sunny days. The total rain for the year so far is just above average. There was rain on every day for five days from the 20th to the 25th, with fog patches on the 23rd and 24th. Rainfall amounts were mostly less, with strong dry winds."`


In our summary, our application is weather report for the general public. We mentioned how the rains were below average and how there has been strong dry winds. This will help the public decide the clothes to buy, their work commute plans,etc.

***
DOCUMENT STRUCTURING:

Next is Document Structuring. It is concerned with the problem of imposing ordering and structure over the information to be conveyed. A text is not just a randomly-ordered collection of pieces of information. 

Our summary will look different and incomprehensible if ordered randomly

For eg: `"There was rain on every day for five days from the 25th to the 20th March, with fog patches on the 23rd and 24th.Rainfall with strong dry winds amounts were mostly less, the total rain for the year so far is just above average. The month March of was warmer and drier than average, with an average number of rain and sunny days.  "`

Consider another example of a newspaper story summary on Road pollution, you don't expect a story where sentences are arbitrarily reordered like the following:

Original News:

`
" The Welsh Government told Caerphilly council their decision does not meet legal requirements and they should buy homes on Woodside Terrace in Crumlin to be within clean air guidelines quicker.

Nitrogen dioxide levels on the street breach world health and EU rules. But councillors said greener vehicles would reduce pollution levels by 2025. Environment minister Lesley Griffiths will write to Caerphilly council bosses "reminding them of the legal requirements" set by the High Court and said "cost cannot be a consideration". 


Homeowners actually wanted the council to use compulsory purchase powers to buy their terraced houses on the busy Hafodyrynys Road, which has an estimated 21,000 vehicle movements a day. "

`

Reordered News:

`
"
But councillors said greener vehicles would reduce pollution levels by 2025.


Environment minister Lesley Griffiths will write to Caerphilly council bosses "reminding them of the legal requirements" set by the High Court and said "cost cannot be a consideration". 
Homeowners wanted the council to use compulsory purchase powers to buy their terraced houses on the busy Hafodyrynys Road, which has an estimated 21,000 vehicle movements a day. 

The Welsh Government told Caerphilly council their decision does not meet legal requirements and they should buy homes on Woodside Terrace in Crumlin to be within clean air guidelines quicker.
Nitrogen dioxide levels on the street breach world health and EU rules.
"
`

*Source:* [BBC News](https://www.bbc.com/news/uk-wales-47709095)

Readers also have genre-dependent expectations about the structure of texts, and meeting these expectations makes a text to be read easier.

For e.g. 
Business related news will have more numbers and statistics when compared with say, a news on the opening of a museum in the city.


There is therefore an underlying structure to the presentation of information in a text. 

So this step involves the task of deciding how chunks of content should be grouped and how these different chunks should be related. 

The text can be grouped based on certain criteria:

- Grouped according to information. 

Text may be grouped for presentation in terms of what the information is about.

For example, in the case of weather reporting summaries, all the information about rainfall may be collected in one paragraph and presented before another paragraph which presents all the information about the temperature.

- Grouped according to discourse relations

Information may be presented in a way that indicates particular DISCOURSE RELATIONS between the text elements.

For example, The weather summary might start with just the description of month's rainfall in general terms and then proceed to elaborate about the rainfall in detail. The second part of the text is therfore related to the first by means of a discourse relation of Elaboration.


*** 
`"The month of March was warmer and drier than average, with an average number of rain and sunny days. The total rain for the year so far is just above average. There was rain on every day for five days from the 20th to the 25th, with fog patches on the 23rd and 24th. Rainfall amounts were mostly less, with strong dry winds."`

In our summary, the description of the rainy spell is given and then elaborated with information about fog and a description of the rainfall quantity. This ordering has been decided in this step.
***


**LEXICALISATION**

This constitutes the task of deciding what specific words should be used to express the content selected by the previous `content determination` component. 


For example: The same information about football can be expressed in two ways:

- Ronaldo is a football player.

- Football is played by Ronaldo.

You can see that Lexicalisation is a problem because there are numerous ways a piece of information can be expressed. This again depends on the communicative goals that one has.


For e.g. If you want to create an automatic tweet bot for Twitter, you have to choose lexicals(words) such that information is conveyed in 140 words.

This act of Lexicalisation becomes even more difficult when dealing with multilingual systems because the same information must be expressed in different languages.  

*** 
`"The month of March was warmer and drier than average, with an average number of rain and sunny days. The total rain for the year so far is just above average. There was rain on every day for five days from the 20th to the 25th, with fog patches on the 23rd and 24th. Rainfall amounts were mostly less, with strong dry winds."`


Consider the phrase `There was rain on every day for five days from the 20th to the 25th March.` This time period in the phrase can be expressed in many ways including:

- There was rain on the 20th, 21st, 22nd, 23rd, 24th and 25th.
- There was rain on every day for five days from the 20th.
- Starting from 11th March, there was rain on every day for five days till the 20th.
- There was rain on every day for five days from the 20th to the 25th.

The first way with all the dates could be convenient for data processing use case but not so useful in most cases. In our summary, the 'Lexicaliser' went with the last one because our report is for the general public. We could also have gone with the third way if we wanted to emphasise to the reader that there was an unusually long rain spell.

***



**REFERRING EXPRESSION GENERATION**

This step is concerned with how we produce a description of an entity that enables the hearer to identify that entity in a given context.

In other words, it deals with deciding what expressions should be used to refer to entities.

This is difficult because the same entity can be referred to in multiple ways.
There's also different approaches when an entity is mentioned for the first time and when they are being subsequently referred.

When introducing say, Geoffrey Hinton into a conversation, for example, we may introduce him as a PHD  Professor, Google Engineer or maybe as 'Godfather of AI'.

With subsequent references, the problem is to distinguish the intended entity from other entities it might be confused with.

For example, we could refer to the Geoffrey Hinton by means of any of the following expressions depending upon the context of use:

- Geoffrey

- Hinton

- He

- Professor

Determining the above referring expressions usually takes into account the contextual factors.

***
`"The month of March was warmer and drier than average, with an average number of rain and sunny days. The total rain for the year so far is just above average. There was rain on every day for five days from the 20th to the 25th, with fog patches on the 23rd and 24th. Rainfall amounts were mostly less, with strong dry winds."`

In our summary we could have refered to the month March by using any of the following statements:

- March 2018. 
- March
- Month of March 
- Last month 

We went with the third way for our first reference of the month. Note that in the subseqent references of dates(20th to the 25th, 23rd and 24th), we have not explicitly mentioned the month. This is because the general reader can automatically infer the same.


***
**AGGREGATION** 

It is the task of deciding how the structures created by 'Document Structuring' should be mapped to sentences and paragraphs. 

Think of it as extension/refinement to 'Document Structuring' phase. 

For example, consider the two sentences:

- Football player Ronaldo scored 3 goals

- Ronaldo was awarded the 'man of the match' for his performance

The fluency of the above text can be improved in the following ways:

- Football player Ronaldo scored 3 goals and was awarded the 'man of the match' for his performance.

- Football player Ronaldo was awarded the 'man of the match' in a game where he scored 3 goals.

In the above case, sentence aggregation has been performed.

This step might feel redundant but natural languages result in different ways of expressing the same information. As the previous examples show, the appropriate use of aggregation can enhance the fluency and readability of a text. 


***

`"The month of March was warmer and drier than average, with an average number of rain and sunny days. The total rain for the year so far is just above average. There was rain on every day for five days from the 20th to the 25th, with fog patches on the 23rd and 24th. Rainfall amounts were mostly less, with strong dry winds."`


In our summary we have two statements:
- The month was warmer than average.
- The month was drier than average.


We decided to combine them both in the aggregation step as 'The month of March was warmer and drier than average'.

***

**LINGUISTIC REALISATION**

It is the task of converting abstract representations of sentences into the real text. This is the final refinement step of ensuring that sentences are grammatically/morphologically correct.

The advantage of doing this is that it allows detailed grammatical knowledge to be encapsulated 



***
`"The month of March was warmer and drier than average, with an average number of rain and sunny days. The total rain for the year so far is just above average. There was rain on every day for five days from the 20th to the 25th, with fog patches on the 23rd and 24th. Rainfall amounts were mostly less, with strong dry winds."`


In our summary, consider the phrase `It rained for five days from the 20th to the 25th`

Here, this step is responsible for adding the words 'for', 'from' and 'to' to mark those parts of the sentence which specify the duration, beginning, and end of the rain spell. The morphological component has produced the past tense form 'rained' from the root form 'rain'.
***

**STRUCTURE REALISATION**

The final step is the task of converting the text paragraphs and sections into the format understood by the document viwer component. 

For example: Consider the output of our summary as the following:

***
# Weather Summary

`"The month of March was warmer and drier than average, with an average number of rain and sunny days. The total rain for the year so far is just above average. There was rain on every day for five days from the 20th to the 25th, with fog patches on the 23rd and 24th. Rainfall amounts were mostly less, with strong dry winds."`

***


This step involves converting sentences to say appropriate mark up symbols such as:
***
"# Weather Summary\n",
    
    "\n",
    
    "\n",
    
   
     `"The month of March was warmer and drier than average, with an average number of rain and sunny days. The total rain for the year so far is just above average. There was rain on every day for five days from the 20th to the 25th, with fog patches on the 23rd and 24th. Rainfall amounts were mostly less, with strong dry winds."`
***



The above six steps are more or less how NLG systems work in solving a specific problem. There are also cases though when one or two steps might be omitted or few steps combined. The architecture depends on the use.

**Application Specificity**

As with all types of machine automation, NLG should be applied so as to allow humans and computers to complement each other by each doing what they are best at. 

A spell-checking program excels at spotting general misspellings, yet will be considered a hindrance to work with in documents containing proper names. Similarly a NLG system will be a hindrance if it tries to solve all of the problems all of the time. As we have mentioned previously both the application usecase and intended audience plays a major role in the success of NLG applications. In fact NLG systems have been most successful in cases, when it is tailor made for a specific use case.




With the understanding of how a basic NLG system works, let's now try to learn about one of the most popular NLG applications i.e. a chatbot.



# 2. Chatbots

Description: In this chapter, learn what chatbot it along with its different types

# 2.1 What is a chatbot?

**Rise of chatbots**        
        
Today all major companies have chatbots deployed to engage with the users. This sudden rise is not a fluke one.

We are living in times where customers are tech-savvy and more connected to online social apps than ever. A very recent interesting statisic suggests that people are using messenger apps more than they are using social network apps.

![](images/chatbot.PNG)


**What is a chatbot?**

A chatbot is an application(web/mobile) that tries to gauge consumer’s queries and then assist them to perform the needed task.

Chatbots popular uses include: 

- Acting as customer support
- Provide a medium to connect customers and their finances
- Show booking information(flights,movies,etc)

You can check out the following sites to get a comprehensive list of available chatbots in the market 

- [There is a bot for that](https://thereisabotforthat.com/)
- [Chatbottle](https://chatbottle.co/)
                  
        



Chatbots can be broadly divided into two categories:

- Rule Based Approach: 

The bot answers queries based on some specified rules. 

All the commands that the bot answers must be written separately using regular expressions or other forms of string analysis. 

An example of a rule-based chatbot is a basic text messaging application like the ones that few shopping companies use to send coupon codes or offers. You might have recieved a text message that says “Summer Sale! Text SUMMER to 54321 to get 10% off coupon code.” How it works is when you send the word, “SUMMER,” the application looks at the word and recognizes it as a command. It simply then follows a rule that tells it to send back the coupon code you want.

Though effective, it's too naive for other complex situations

If you sent “I want the coupon” to that same chatbot, it would not understand what you wanted. If the user has asked a question without using a single keyword, the bot can not understand it and, as a rule, responds with messages like “Sorry, I didn't understand” or "Sorry. Invalid text"

- Self- Learning Approach

These are the bots that use Machine Learning techniques. Instead of pre-defined answers, the robot responds with adequate suggestions on the topic.

A self-learning chatbot is intelligent because it responds by collecting information from the user and presenting it in the most appropriate way to serve the user’s purpose.

Self-Learning Approach is itself divided into two categories:
    
#### 1. Retrieval Based

Retrieval based chatbot retrieves the responses from a set of predefined responses and one heuristic is set up to pick an appropriate response based on the input and context. This heuristic could either be something very simple(rule-based expression match) or complex(ensemble of Machine Learning classifiers) depending on the situation.

Retrieval based bot is trained to rank the best response from a finite set of predefined responses. The responses are either manually entered or based on knowledge base of pre-existing information.


For e.g. 
Q- What is the duration of Jet Airways Flight from Mumbai to New York?
A-  16 hours

These kind of systems can also be extended to integrate with 3rd Party systems as well.

For e.g. 

Q: Where is my flight arriving?

Answer: It’s on schedule and should land in 15 mins. 


Retrieval based bots are the most common types of chatbots in the market today. One of the reasons for that is they allow developers and UX to control the experience and match it to the expectations of their customers. 

They provide brilliant results when used for goal-oriented tasks like customer support or lead generation & feedback. 
    
#### 2. Generative

A generative model chatbot doesn’t use any sort of predefined repository. They learn from historical conversations to generate new replies for the user. Language models are used to generate new content and require no guidance from any management system.


These bots can generate the answers on their own and there is no finite output.
To generate that it uses something called seq2seq model which uses sequences to maintain all the previous context and then reply

For example: Consider the following simple conversation:

![](images/seq.PNG)


You can see how every new sequence becomes part of the updated dataset i.e. Sequences based on conversations are build in real time

Research on generative chatbots has been a major area of research for its ability to be closest to humans during conversations. So far though, they haven't been as popular in the market as retrieval models.

You can go through this video on [Conversational AI at scale](https://www.youtube.com/watch?v=I9Em87eXRvE)(till 23:00 mark) to have a better understanding of the two different self-learning chatbots.

Let's now look at both of the self learning types a little more in detail.

# 2.2 Retrieval Based Chatbot

Suggested by the name itself, retrieval based chatbot retrieves the responses from a set of predefined responses and one heuristic is set up to pick an appropriate response based on the input and context. 

Following is a simple flow of its architecture:

![](images/ret.PNG)



Following are its advantages:

- Easier to control conversations(Super advantageous when dealing with a specific domain)

- Less data required(Precision is high even with less data)

- Timely retrieval(Current chatbots reply time <2 seconds)

- Training and retraining is easy and cost effective

Unfortunately, it also has some major disadvantages:
- Requires lot of manual entry of predefined data

- Rigid and unable to handle surprising or unexpected queries. 

- Non-Scalable for open domains(Domains where the user can take the conversation anywhere)


Let's try to implement a simple retrieval based system using NLTK

**Retreival Based Chatbot in Python**

We will be using NLTK to implement it. Particularly its 'chat' and 'reflections' library.

***Chat***

nltk.chat is a nltk module which simplifies building thd chatbot engines by providing a generic framework.

***Reflections***
It is a dictionary that contains a set of input values and its corresponding output values.

```python

reflections = {
  "i am"       : "you are",
  "i was"      : "you were",
  "i"          : "you",
  "i'm"        : "you are",
  "i'd"        : "you would",
  "i've"       : "you have",
  "i'll"       : "you will",
  "my"         : "your",
  "you are"    : "I am",
  "you were"   : "I was",
  "you've"     : "I have",
  "you'll"     : "I will",
  "your"       : "my",
  "yours"      : "mine",
  "you"        : "me",
  "me"         : "you"
}

```

You can also create custom reflections and pass it via 'chat' module.

Let's now look at the working. 

```python
from nltk.chat.util import Chat, reflections

# Automated replies defined using regex
pairs = [
    
     [
        r"what is your name ?",
        ["My name is Botty?",]
    ],
    [
        r"how are you ?",
        ["I'm fine\nWhat about you ?",]
    ],
    [
        r"my name is (.*)",
        ["Hello %1, How are you today ?",]
    ],
    [
        r"(.*) sorry ",
        ["Its alright","Its OK",]
    ],
  
    [
        r"hi|hey|hello",
        ["Hola", "Hey","Hi"]
    ],
    [
        r"(.*) age?",
        ["I'm a computer program \nTime doesn't affect me!",]
        
    ],
    
    [
        r"(.*) created (.*) ?",
        ["God created me using Python's NLTK library ",]
    ],
    [
        r"(.*) (location|city|live|place) ?",
        ["In your mind",]
    ],
     [
        r"how (.*) health(.*)",
        ["I'm a computer program, so I'm always healthy except for the occasional bugs ;)",]
    ],
    [
        r"(.*) (sports|game) ?",
        ["I'm a very big fan of football",]
    ],
    [
        r"any favourite player ?",
        ["Ronaldo. Just kidding its Messi",]
    ],
   
    [
        r"(bye|see ya|goodbye)",
        ["Talk to you soon :) ","Finally. Good Bye"]
],
]

# Chatbot creation using the above defined pairs(and reflections)    
chat = Chat(pairs, reflections)

print("Hi, I'm Botty and we can chat if you want\nPlease type commands in lowercase English. Type quit to leave ") #default message at the start

# Starting the chatbot conversation
chat.converse()

```
Following is a sample chat:
![](images/convo_2.PNG)    



Why don't you try the same in [Google Collab](https://colab.research.google.com) or in your local system?!



Consider another sample chat using on the same chatbot:

![](images/convo_3.PNG)


You can see that it despite being able to manage conversations using regular expression, it fails when encountering unexpected queries.

Let's now move to generative chatbot which helps to overcome the problem.

## 2.3 Generative Chatbot

In theory, generative model which can generate never seen before responses sound more flexible because they don't need a predefined set. The problem is that generative models don’t work well in practice(We will talk more about it later) 

Generative chatbots are the chatbots that are closest to resembling human interactions(atleast on paper that is) because of their ability to generate new responses


Following is a sample high level architecture followed by a generative chatbot:

![](images/chatbot_flow.jpeg)


*Source:[Verloop](https://blog.verloop.io/chatbot-18-how-do-chatbots-work/)*


It has the following advantages:

- It takes into account the previous context before generating a reply.

- Can work on unseen data

It also has the following disadvantages:

- Requires a 'lot' of training data

- Both building and the subsequent retraining is time consuming

- Difficult to keep a closed domain(You can't manually control the conversation flow, the model learns from the previous data patterns and replies)


Let's try to understand it better with an example

Suppose you want to buy shoes in cloth apparel website. You encounter a chatbot there.

Following is the process involved by the chatbot in helping you:


- You find a shoe model you like and send the bot a message saying you want this shoe.

- Using NLP in the backend, the machine converts this plain text request into codified commands that it can understand.

- The chatbot then passes the command into a decision engine(Data Layer) . In the decision engine, the bot finds the certain criteria to meet to exit the engine.

-  Suppose in this case the exit criteria is asking the number of shoe pairs you want. Using NLG it generates a sentence asking the same.

- This then goes back into messaging backend and is presented to you as a question. You reply the bot you want 1 pair and it goes back through NLP back into the decision engine.

- This bot then analyzes pre-fed data about the product, stores with their locations and their proximity to your location. It identifies the closest store that has this product in stock and informs you the price.

- After you confirm the order, it then directs you to the payment portal and after the recieving of payment confirmation from gateway, it places the order.




The above type of chatbot works brilliantly if human interactions were simple. Unfortunately, it's not the case. The most glaring difference between state-of-the-art chatbots and humans currently, is empathy understanding. Chatbots aren't adept as humans at understanding the conversational undertones.

Take Siri(Apple's Voice Assistant) for example; if you ask her a question on cholestrol, her response is “here is what I found on cholestrol…” and she goes on to provides facts. Wouldn't a more empathic response be 'Are you ok?'


This humanization of Chatbot is a major area of research because regardless of which industry Conversational Marketing/Support is the future.

We won't be creating a generative chatbot as it's currently out of scope with the syllabus.

# 3. RASA 

Description: In this chapter, you will learn what RASA is and how to create a chatbot using its open source library


## What is RASA?

To build an advanced chatbot, we are going to use an open-source framework called [RASA](https://rasa.com/)


The Rasa is a set of open source machine learning tools for developers to create contextual AI assistants and chatbots

You should go through this [video](https://www.youtube.com/watch?v=0hZay4KSLFw)(from 8:00-21:00 mark) to get a high-level understanding about RASA 

Rasa has two tools which make it powerful:

- Rasa NLU:

Rasa NLU is a tool for intent classification and entity extraction in chatbots. 

For example, taking a sentence like:

"I am looking for a Mexican restaurant in the center of town" and returning structured data like:

{
  "intent": "search_restaurant",
  "entities": {
    "cuisine" : "Mexican",
    "location" : "center"
  }
}




- Rasa Core: 

Rasa Core is an open source chatbot framework to handle contextual conversations.It moves beyond simple question-answer pairs and rigid decision trees. Its advanced dialogue management is based on ML and allows for smarter conversations and makes scaling easier.

RASA is one of the most popular tools in the market used to make `'retrieval chatbots'`

Before we start exploring Chatbot with Rasa, let's familiarise ourselves with certain keywords.

**Intent:** 
This is what is known as the aim or target of the user input. If the user says, “Which day is today?”, the intent is finding the day of the week.

Rasa uses the concept of intents to describe how user messages should be categorized. 

**Entity:** 

Understanding the user’s intent is only part of the problem. It is also important to extract relevant information from a user’s message, such as dates and places. This different required pieces of information is called entity and the process of extracting them is called entity recognition

Consider the previous example again, we understood the aim as to find the day of week, but of which date? “Today” is the entity that will help us.


**Actions:**

Its used to define any operation which is performed by the bot. It could be a reply or querying a database. 

**Stories:**

Stories are defined sample interaction between the user and bot. They are defined in terms of intents captured and actions performed. 

For example: If user intent is to find the day of week and entity is today, the story will be to find day of week of today and reply.


Following is the high level architecture RASA follows:

![](images/rasa_arch.png)


The steps are:

- The message is received and passed to an Interpreter, which converts it into a dictionary including the original text, the intent, and any entities that were found.

- The Tracker is the object which keeps track of conversation state. It receives the info that a new message has come in.

- The policy receives the current state of the tracker.

- The policy chooses which action to take next.

- The chosen action is logged by the tracker.

- A response is sent to the user.


RASA has the following advantages which make it suitable to implement:

- Able to learn the 'intent' of the user
- Free to implement
- Provide efficient learning mechanism 
- Interact intelligently and perform actions if users asks so

Too much theory right?!
Let's get our hands dirty now and see how we can make a chatbot using RASA.

# Building a chatbot with RASA


We will make a simple RASA chatbot that will fetch us `data science` links.



To have a better understanding you can check out the following code walkthroughs :

- [Link 1](https://www.youtube.com/watch?v=15J68Apk3go)

- [Link 2](https://www.youtube.com/watch?v=XM6kj7eHrag)

- [Link 3](https://rasa.com/docs/nlu/0.12.0/tutorial/)



**NOTE:** It's a little heavy topic, so its advised that you go through it a couple of times to really understand.

Let's start:

***Step 1: Load the intent/entities to create a model***

Before we implement the code, there are two files you need to familiarise yourself with-


***nlu.md(NLU File)*** - It contains examples of the user input along with its mapping to a suitable intent and entities present in each of them. 

The more varying examples there is, better the bot’s NLU capabilities.

File content sample:

```python
## intent:greet
- hey
- hello there
- good morning
- good evening
- hey there
- good afternoon

## intent:goodbye
- good afternoon
- bye
- goodbye
- have a nice day
- see you around
- see you later


## intent:mood_curious
- I want to learn
- I am curious
- I want to learn DS
- I want to learn [NLP](group:B)
- I want to learn [CV](group:A)
- [CV](group:A)
- [NLP](group:B)
- Data Science
- curious
- wanna learn
- wanna learn DS
- wanna learn [NLP](group:B)

```
You can see in the above file we have defined a specific intent called 'mood_curious'. This is to define the sentences that will convey to the bot that user's intent is to learn.

You will also see 'group:A' and 'group:B' assigned to NLP and CV which is something we will use for later retrieval of links 
 
***config.yml(Config file)*** - This is a file that defines the pipeline to use for training the NLU model.

File content sample:

```python

language: "en"

pipeline:
- name: "nlp_spacy"                   # loads the spacy language model
- name: "tokenizer_spacy"             # splits the sentence into tokens
- name: "ner_crf"                   # uses the pretrained spacy NER model
- name: "intent_featurizer_spacy"     # transform the sentence 
    
```
You can see in the file we have first defined nlp_spacy to load the language model. Using tokenizer_spacy we will split the tokens, so on and so forth.


Following is the code:

```python

from rasa_nlu.training_data import load_data
from rasa_nlu.config import RasaNLUModelConfig
from rasa_nlu.model import Trainer
from rasa_nlu import config

# loading the training samples
train_data = load_data("nlu.md")

# fitting the pipeline
trainer = Trainer(config.load("config.yml"))

# training the model
interpreter = trainer.train(training_data)

# storing it for future use
model_directory = trainer.persist("./models/nlu", fixed_model_name="current")

```
In the code:

- We first load the file of intent and entities as train data. 

- Then we specify the configuration to the trainer using 'config.yml'. 

- After that we train the model on the train data and save the model for future use. 

***Step 2: Create an action function for the bot.***

This is the function which will be called to retreive the required links.

```python
from rasa_core.actions import Action
from rasa_core.events import SlotSet
from IPython.core.display import Image, display

import requests

class ApiAction(Action):
    # Name of the action
    def name(self):
        return "action_retrieve_link"
    
    # Function to execute the action
    def run(self, dispatcher, tracker, domain):
        
        group = tracker.get_slot('group')
        
        # Checking of group(A:CV or B:NLP)
        if group== 'A':
            response= "https://blog.algorithmia.com/introduction-to-computer-vision/"            
        else:
            response= "http://www.nlp.com/what-is-nlp/"
            
            
        # Message associated with the action    
        dispatcher.utter_message("Here is something to help you up: {}".format(response))
```
In the code: 
- We have defined a class called 'ApiAction' and associated one action with it('action_retrieve_link.'). 

- We have also associated one function to retrieve the links. In the function you can see that to decide the link, we are using the group we had made in 'NLU.md'.

***Step 3: Load stories/domain and train the chatbot agent***

Before we look at the code, there are two files here too, that we need to familiarise ourselves with

***domain.yml(Domain File)*** - 

This file contains all the intents, entities, actions and similar information. Sample bot reply templates can also be added and used as actions.

Sample file:

```python
intents:
- greet
- goodbye
- mood_affirm
- mood_deny
- mood_great
- mood_unhappy
- inform

slots:
  group:
    type: text
    
entities:
- group

actions:
- utter_greet
- utter_did_that_help
- utter_happy
- utter_goodbye
- utter_unclear
- utter_ask_link
- __main__.ApiAction

templates:
  utter_greet:
  - text: "Hey! How are you?"

  utter_did_that_help:
  - text: "Did that help you?"

  utter_unclear:
  - text: "I am not sure what you are aiming for."
  
...
```

***stories.md(Stories)***

This file contains stories

```python
## happy path               <!-- name of the story - just for debugging -->
* greet              
  - utter_greet
* mood_great               <!-- user utterance, in format intent[entities] -->
  - utter_happy
* mood_affirm
  - utter_happy
* mood_affirm
  - utter_goodbye
  
## curious path 1               <!-- this is already the start of the next story -->
* greet
  - utter_greet             <!-- action the bot should execute -->
* mood_curious
  - utter_ask_link
* inform{"group":"A"}  
  - action_retrieve_link
  - utter_did_that_help
* mood_affirm
  - utter_happy

## curious path 2
* greet
  - utter_greet
* mood_curious
  - utter_ask_link
...
```

Following is the code:

```python

from rasa_core.policies import FallbackPolicy, KerasPolicy, MemoizationPolicy
from rasa_core.agent import Agent

# Fallback is to handle predictions the model isn't very certain about
# We define a threshold for both the NLU predictions and the action predictions

fallback = FallbackPolicy(fallback_action_name="utter_unclear",
                          core_threshold=0.2,
                          nlu_threshold=0.1)


# Creating agent
agent = Agent('domain.yml', policies=[MemoizationPolicy(), KerasPolicy(), fallback])

# Loading the stories(training dialogues)
training_data = agent.load_data('stories.md')


# Training the agent
agent.train(
    training_data,
    validation_split=0.2,
    epochs=200
)

# Saving the dialouges in a model
agent.persist('models/dialogue')
```
In the code:


- We have defined fallback policy that will take care if the bot doesn't understand the intent

- After that we are just creating the chatbot agent by first associating the domain, followed by loading the stories inside the agent

- We train our agent then and save it for future use


***Step 4: Run the chatbot***

```python
from rasa_core.agent import Agent

agent = Agent.load('models/dialogue', interpreter=model_directory)
print("Your bot is ready to talk! Type your messages here or send 'stop'")

while True:
    a = input()
    if a == 'stop':
        break
    responses = agent.handle_message(a)
    for response in responses:
        print(response["text"])

```

This is the final step where we are just runnning our chatbot agent.

In the code:

- We load the previously created chatbot agent

- We then run the chatbot for normal conversation until the user presses 'stop'


Following is a sample convo of the above run code:

![](../images/convo_4.PNG)


# Implementation


You can implement the above code in [Google Collab](https://colab.research.google.com).

Alternatively you can do it in your local system with the following requirements:

***
Libraries required

- python==3.6.7
- rasa_core==0.9.6
- rasa_nlu==0.14.4
- spacy==2.0.18
- pandas==0.22.0
- sklearn_crfsuite== 0.3.6
- pygraphviz==1.5 (optional)

Extra data installs required:
- EN-> python -m spacy download en(On anaconda prompt) 

***

You can find the full end to end code for the same [here]()



**Note:**

The above code is not for a perfect chatbot. It will also fail and give bizzare responses at times. 
Why don't you try improving it?

