<b>Groupe 9 - Innovation</b>
>- D. Jeauneau (Group leader)
>- A. Gahn (Assistant)
>- M. Pouchan (Quality manager)
>- N. Enjalbert
>- F. Estermann
>- T. Gallou
>- M. Lesavourey
>- N. Kired
>- M. Manson

<b>Contributors</b>  
> <b>*Authors</b> : All*

# Introduction
<b>Goal</b>  
The goal of this report is to present the work of the "Innovation" group during the "Interpromo 2020" project proposed by the SID formation from 6th to 20th January 2020. This project was given by the client Sogeclair for Airbus company.

<b>Group</b>  
Our group is composed of 9 students : two L3, four M1 and three M2. Unlike the 8 other groups we didn't have a defined subject from the start : we had to innovate in order to find an idea not carried out by another group.  

The first two days of the project were dedicated to the search for ideas which were presented to the client. After some turnovers the final subject was decided in Thursday 9th afternoon so we had 8 days to achieve our work which is the creation of a chatbot integrated into the group "Dashboard". This will allow interactions between the user and the dashboard.  

Furthermore we implemented a predictive system allowing to recommend actions on the site. 
This project has been a real challenge for our group because we had to show creativity and adaptability to create a useful work.  

<b>Fonctionality</b>  
When the user sends his request to the chatbot (to interact with the dashboard), the chatbot correct it if there is any typos, then parse the request and transform it into logical representation which is interpretable by both the chatbot and the dashboard. This representation is send back to the dashboard which can now respond to the user request. The chatbot also predict the next request of the user and recommand it in the chat. In order to do so a database of an history of interactions/actions on the dashboard is necessary. Since we had no data we had to generate some to train our models.  

<b>Differents parts</b>  
To understand a new request from a user, we decided to implement our own NER (Named Entity Recognition) model with a BERT neural network (Bidirectional Encoder Representations from Transformers). The first step of the project was to create a corpus of tagged sentences in order to train our model. Differents structures of sentences with differents propositions and a knowledge base permits us to generates almost a new sentence every time.

The second step was tho generate the chatbot-user interaction database. Once the database is created, every result of an interaction is transformed into its unique representation that become a state. We have chosen to represent this database as a markov chain were the probability to move from one state to another is encoded in a transition matrix.

A site-chatbot interface was needed between the "Dashboard" group and our group in order to have automatic communications. This interface necessitated a representation of the information in a json format. Associations between our tags and the differents part of the json was created with all the data transformations. This format was then used by all the parts of the chatbot.


At the end of the project we managed to have a functional V0. Unfortunately, due to lack of time, the chatbot could not be integrated into the dahsboard.

# Environment
## Libraries

In [1]:
import pandas as pd
import numpy as np
import nltk
import json
from simpletransformers.ner import NERModel
from gensim.models import KeyedVectors
from strsimpy.jaro_winkler import JaroWinkler
from spellchecker import SpellChecker  # pyspellchecker
from nltk.corpus import stopwords

## Data loading

##### Loads NER and W2V models 
source for the NER model : https://ufile.io/ichyycfe (or train it with bert_tagger.ipynb)  
source for the W2V model : https://github.com/eyaler/word2vec-slim/blob/master/GoogleNews-vectors-negative300-SLIM.bin.gz  
source for df_transitions : markov.ipynb

In [2]:
%run constants.ipynb
%run dictionnaries.ipynb

In [3]:
all_tags = get_all_tags()
tagger = NERModel(model_type='bert',
                  model_name=data_directory+'bert/current_model/',
                  labels=all_tags,
                  use_cuda=False)

model_w2v = KeyedVectors.load_word2vec_format(pathword2vec, binary=True)
voc_stopwords = set(stopwords.words('english'))
db = get_DB()

In [4]:
df_transitions = pd.read_csv(bdd_directory+'df_transitions.csv',
                             sep='§',
                             engine='python',
                             index_col=0,
                             encoding='utf-8')
df_transitions.shape

(10, 10)

In [5]:
df_facts = pd.read_csv(bdd_directory+'df_facts.csv',
                       sep='§',
                       engine='python',
                       index_col=0,
                       encoding='utf-8')
df_facts.shape

(104, 2)

## Functions

In [6]:
%run functions.ipynb
%run tag_to_filter.v1.ipynb

# Chatbot  
The chatbot will be a class Chatbot that contains differents part for differents intents (class Intent), with a classification system (class Classifieur) that permit to choose the best intent in regard to the user request. One of the intents also have a predictor (class Predictor).

## Intents
We create an Intent class. A intent is a function of a chatbot that a user wants to use, there can be multiple differents ones.

An Intent has a state (interface_in) that is the information representation (json format) that was send to the chatbot by the dashboard. It also has a interface_out state that is the response of the chatbot to the interface, this response is called by get_interface_out(). A user interact with the chatbot with the interact() method, while the dashboard can synchronise the chatbot with the synchronize() method when the user is not using the chatbot.

In [7]:
class Intent:
    """ An intent is a function fo a chatbot, there can be multiple one.
    This is a mother class"""

    def __init__(self, name: str, state: str = None):
        if state is None:
            state = json.dumps(init_event(tab=CT_tabs_default))
        self.name = name
        self.interface_in = state
        self.interface_out = None

    def synchronize(self, event: str):
        """ Even when there is no intercation with the chatboat, the dashboard
        can synchronize it to get information back"""

        self.interface_in = event
        self.interface_out = None

    def get_interface_in(self) -> str:
        """ get the traduction of the intput from the interface
        with the dashboard"""

        return self.interface_in

    def get_interface_out(self) -> str:
        """ get the traduction of the output for the interface
        with the dashboard"""

        return self.interface_out

    def interact(self, sentence: str = None) -> str:
        """ This method is called by the chatbot and must return the sentence
        to show in the chat"""

        return "Please implement this method"

## Displayer - recommandation intent
The main intent of our chatbot is to understand what the user wants to see on the dashboard. The user send a request in the form of a sentence in the chat, this request is corrected if there are typos and the chatbot send the decoded information back to the chatbot-dashboard interface to be interpretated by the dashboard.

It also predict the next graphs the user would like to see and ask him if he is interested by that.

##### Creation of the class Predictor for the recommandation (prediction of next state with a markov chain)
This object contain all the past interactions user-dashboard from where we can predict the next most probable state. Here we choose a random state and predict the next one as an example from a transition matrix.

In [8]:
class Predictor:
    """ This class is used to create the recommandation module"""

    def __init__(self, name: str, transitions: pd.DataFrame):
        self.name = name
        self.transitions = transitions

    def predict(self, event: str) -> str:
        """ give the recommanded state"""
        predict = predict_next_state(event, self.transitions)
        return predict

    def random_state(self):
        """ give a random state"""
        return np.random.choice(self.transitions.index)


markov = Predictor(name='Markov', transitions=df_transitions.dropna())
markov.predict(markov.random_state())

'{"filters": {"aircraft": ["a318", "a320", "a340-300", "a350-900"], "category": [], "company": [], "country": ["england", "france", "germany", "italy", "spain"], "date": ["01092017", "30092017"], "manufacturer": ["airbus"]}, "tab": "general"}'

##### creation of the intent displayer-recommandation

In [9]:
class Displayer(Intent):
    """ A displayer is an intent that understand a user sentence and return
    to the dashboard by the interface the board to show and a next
    recommandation """

    def __init__(self,
                 name: str,
                 model_tagger: NERModel,
                 model_predicteur: Predictor,
                 model_w2v: KeyedVectors,
                 db: dict = {},
                 voc_stopwords: set = set(),
                 state: str = None):

        super(Displayer, self).__init__(name)
        self.tagger = model_tagger
        self.predicteur = model_predicteur
        self.w2v = model_w2v
        self.voc_stopwords = voc_stopwords
        self.db = db

    def synchronize(self, event: str):
        self.interface_in = json_string_to_hash(event)
        self.interface_out = self.predicteur.predict(self.interface_in)

    def get_tags(self, sentence: str) -> list:
        """ Associate a tag to each word"""

        sentence_corrected = self.auto_correction(sentence)
        request = self.tagger.predict([sentence_corrected])[0][0]
        print(request, '\n')
        return request

    def get_filters(self, sentence: str) -> dict:
        """ create the state to communicate with the dashboard"""

        request = self.get_tags(sentence)
        tags_values = extract_tags(request)
        filters = tag_to_filters(tags_values)
        filters = apply_date(filters)
        event = {
            CT_tabs: CT_tabs_default,
            CT_filt: filters,
        }
        return event

    def get_output_sentence(self, pred_state_hash: str) -> str:
        """ create a sentence from a state """

        pred_sentence = make_sentence_fom_json(json.loads(pred_state_hash))
        return pred_sentence

    def interact(self, sentence: str) -> str:
        event = self.get_filters(sentence)
        event_state = json.dumps(event)
        self.synchronize(event_state)

        pred_state_hash = self.get_interface_out()
        pred_sentence = self.get_output_sentence(pred_state_hash)

        res = pred_sentence
        return res

    def auto_correction(self, sentence: str):
        """ correction for typos before processing """

        return auto_correction(self.w2v.vocab, db, voc_stopwords, sentence)

We make our Displayer-Recommandation intent using our NER model, Markov model, W2V model, and the information contained in the database. The W2V model with the database and some stopwords are used to make the automatic typos correction, while the tagger model is used to understand the request and the markov model to predict the next state.

In [10]:
Intent_displayer = Displayer(name='Displayer_Recommandation',
                             model_tagger=tagger,
                             model_predicteur=markov,
                             model_w2v=model_w2v,
                             voc_stopwords=voc_stopwords,
                             db=db)
print(Intent_displayer.name)
print(Intent_displayer.get_interface_out())
Intent_displayer.synchronize(Intent_displayer.interface_in)
print(Intent_displayer.get_interface_out())

Displayer_Recommandation
None
{"filters": {"aircraft": [], "category": [], "company": [], "country": ["england", "france", "germany", "italy", "spain"], "date": ["01092017", "30092017"], "manufacturer": ["airbus"]}, "tab": "general"}


## Random intent
This intent is here just to have multiple ones in order to create an intent classifier in our chatbot. Its purpose is to show differents facts in the airplanes industry.

In [11]:
class Random_info(Intent):
    """ An other intent implemetation which randomly choose a sentences
    from a database if called"""

    def __init__(self, name, db={}):
        super(Random_info, self).__init__(name)
        self.db = db

    def interact(self, sentence: str = None) -> str:
        output = None
        if db:
            k = np.random.choice(list(random_bdd.keys()))
            output = np.random.choice(list(random_bdd[k]["sentences"]))
        return output

In [12]:
random_bdd = {}
for s in list(df_facts['subject'].unique()):
    random_bdd[s] = {}
    random_bdd[s]['sentences'] = set(
        df_facts.loc[df_facts['subject'] == s, 'facts'].values)

Intent_random_info = Random_info(name='Random_Info', db=random_bdd)
print(Intent_random_info.interact())

A woman from Stockholm, Sweden, attempted to smuggle 75 live snakes onto an airplane by placing them in her bra. She also had six lizards under her shorts.


## Intent classifier
Here we create an intent classifier. If the user writes 'fact' in the chatbot, the classifier return 1 and the random_intent will be used, else it return 0 and the diplayer is used.

In [13]:
class Classifier:
    """ This classifier let the chatbot choose the best intent when
    receiving a user request"""

    def __init__(self, name: str):
        self.name = name

    def predict(self, sentence: str) -> int:
        if sentence.lower() == 'fact':
            c = 1
        else:
            c = 0
        return c


classifier = Classifier(name='Intent classifieur')
print(classifier.predict(sentence=""))

0


In [14]:
intents = {"classifier": classifier,
           "intents": {0: Intent_displayer,
                       1: Intent_random_info}
           }
intents

{'classifier': <__main__.Classifier at 0x7f4dabb078d0>,
 'intents': {0: <__main__.Displayer at 0x7f4dabad37f0>,
  1: <__main__.Random_info at 0x7f4dabae0940>}}

## Chatbot
Finally we create the Chatbot class

In [15]:
class ChatBot:
    """ The Chatbot
    """

    def __init__(self, name, intents: dict):
        self.name = name
        self.classifier = intents['classifier']
        self.intents = intents['intents']
        self.active_intent = list(intents.keys())[0]

    def classify(self, sentence: str) -> int:
        """ Choose the best intent """

        self.active_intent = self.classifier.predict(sentence)

    def interact(self, sentence: str) -> str:
        """ Call the interact method of the selected intent 
        and return the sentence generated"""

        self.classify(sentence)
        intent = self.intents[self.active_intent]
        interaction = intent.interact(sentence)
        return interaction

    def synchronize(self, event: str):
        """ synchronize all the intents """

        for intent in self.intents.values():
            intent.synchronize(event)

    def get_interface_in(self) -> str:
        """ get the traduction of the input from the interface
        with the dashboard"""

        return self.intents[self.active_intent].get_interface_in()

    def get_interface_out(self) -> str:
        """ get the traduction of the output for the interface
        with the dashboard"""

        return self.intents[self.active_intent].get_interface_out()


Hubert = ChatBot(name='Chatbot Hubert', intents=intents)
Hubert.classify('aertetreb')
Hubert.active_intent

0

In [16]:
sent = 'Show me the graph of passengrs contentments for Matra  \
    and MATRA and AIRBUS in CONGO from 10/1993 to the end of the year 1946'

output_sentence = Hubert.interact(sent)
input_interfac = Hubert.get_interface_in()
output_interfac = Hubert.get_interface_out()

print("IN :\t", input_interfac, '\n')
print(output_sentence, '\n')
print("OUT :\t", output_interfac, '\n')

Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


[{'Show': 'O'}, {'me': 'O'}, {'the': 'O'}, {'graph': 'STAT_B'}, {'of': 'O'}, {'passengers': 'STUDIED_B'}, {'contentment': 'STUDIED_E'}, {'for': 'O'}, {'Matra': 'MANU_B'}, {'and': 'O'}, {'MATRA': 'MANU_B'}, {'and': 'O'}, {'AIRBUS': 'MANU_B'}, {'in': 'O'}, {'CONGO': 'COUN_B'}, {'from': 'O'}, {'10/1993': 'DATE1_B'}, {'to': 'O'}, {'the': 'O'}, {'end': 'O'}, {'of': 'O'}, {'the': 'O'}, {'year': 'O'}, {'1946': 'DATE2_B'}] 

IN :	 {"filters": {"aircraft": [], "category": [], "company": [], "country": ["congo"], "date": ["01101993", "31121946"], "manufacturer": ["airbus", "matra"]}, "tab": "general"} 

We suggest you the global study from 01-09-2017 to 30-09-2017 for the manufacturer airbus and for the aircrafta350-900 in the countries england, france, germany, italy, spain. If you agree, click on the following link ;) 

OUT :	 {"filters": {"aircraft": ["a350-900"], "category": [], "company": [], "country": ["england", "france", "germany", "italy", "spain"], "date": ["01092017", "30092017"], "

# Next
The integration with the dashboard is not complete yet because of rapid changes in the dashboard design and because of the redirection of our subject during the project.

If we had more time, we could have created more tags, more training tagged sentences with differents structures for our NER model to catch more meanings from a user. We would have also implement a better recommandation module, with better choosen states.