# P10 - CHATBOT

In [1]:
from p10_00_helper_func import azure_helper
from p10_01_luis.utils import *

from azure.cognitiveservices.language.luis.authoring import LUISAuthoringClient
from azure.cognitiveservices.language.luis.authoring.models import ApplicationCreateObject
from azure.cognitiveservices.language.luis.runtime import LUISRuntimeClient
from msrest.authentication import CognitiveServicesCredentials
from functools import reduce

import os, json, time, uuid

ws = azure_helper.get_ws()

import dotenv
dotenv.load_dotenv()

True

In [6]:
get_luis_env = LuisEnv()
authoringKey = get_luis_env.LUIS_AUTH_KEY
authoringEndpoint = get_luis_env.LUIS_AUTH_ENDPOINT

### Authentifier le client
Créez un objet CognitiveServicesCredentials avec votre clé et utilisez-le avec votre point de terminaison pour créer un objet LUISAuthoringClient.

In [7]:
client = LUISAuthoringClient(authoringEndpoint, CognitiveServicesCredentials(authoringKey))

# 1. Créer une application LUIS

<center>Une application LUIS stocke le modèle de traitement en langage naturel contenant 

    - les intentions, 
    - les entités et 
    - les exemples d'énoncés.
</center>

Créez une méthode add pour l’objet AppsOperation afin de créer l’application. Le nom et la culture de la langue sont des propriétés obligatoires.

In [8]:
# define app basics
appName = 'p10app'
culture = 'en-us' #the language bot understands
versionId = '1.0'

try:
    # create app (will throw error if it exists, in that case get the existing)
    appDefinition = ApplicationCreateObject(
            name=appName, 
            initial_version_id=versionId, 
            culture='en-us')
            
    app_id = client.apps.add(appDefinition)
except:
    # get the existing
    app_id = client.apps.get(get_luis_env.LUIS_APP_ID)

# get app id - necessary for all other changes
print("LUIS app ID {}".format(app_id))

LUIS app ID {'additional_properties': {'ownerEmail': None, 'tokenizerVersion': '1.0.0'}, 'id': 'cfde1d4c-2cf0-437c-98b9-cdfb6abdbecb', 'name': 'p10app', 'description': '', 'culture': 'en-us', 'usage_scenario': '', 'domain': '', 'versions_count': 2, 'created_date_time': '2022-02-10T14:03:40Z', 'endpoints': {'STAGING': {'versionId': '1.1', 'directVersionPublish': False, 'endpointUrl': 'https://westeurope.api.cognitive.microsoft.com/luis/v2.0/apps/cfde1d4c-2cf0-437c-98b9-cdfb6abdbecb', 'isStaging': True, 'assignedEndpointKey': None, 'region': None, 'endpointRegion': 'westeurope', 'publishedDateTime': '2022-02-15T07:17:16Z', 'failedRegions': None}, 'PRODUCTION': {'versionId': '1.1', 'directVersionPublish': False, 'endpointUrl': 'https://westeurope.api.cognitive.microsoft.com/luis/v2.0/apps/cfde1d4c-2cf0-437c-98b9-cdfb6abdbecb', 'isStaging': False, 'assignedEndpointKey': None, 'region': None, 'endpointRegion': 'westeurope', 'publishedDateTime': '2022-02-16T10:44:55Z', 'failedRegions': None}

### 1.1. Créer une intention pour l’application

<center> <i>L’objet principal dans un modèle d’application LUIS est l’intention</i></center>

[more here](https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/cognitive-services/LUIS/concepts/intents.md)

    !!! All applications come with the predefined intent, "None", which is the fallback intent. The None intent is a required intent and can't be deleted or renamed. Fill it with utterances that are outside of your domain. The None intent is the fallback intent, and should have 10% of the total utterances. It is important in every app, because it’s used to teach LUIS utterances that are not important in the app domain (subject area). If you do not add any utterances for the None intent, LUIS forces an utterance that is outside the domain into one of the domain intents. This will skew the prediction scores by teaching LUIS the wrong intent for the utterance.


L’intention s’aligne sur un regroupement d’intentions d’énoncés utilisateur. Un utilisateur peut poser une question ou émettre un énoncé en souhaitant obtenir une réponse prévue particulière d’un bot (ou d’une autre application cliente). Réserver un billet d’avion, demander quelle est la météo dans une ville de destination et demander des informations de contact pour un service client sont des exemples d’intentions.

Utilisez la méthode model.add_intent avec le nom de l’intention unique, puis transmettez l’ID de l’application, l’ID de version et le nom de la nouvelle intention.

La valeur intentName est codée en dur dans OrderPizzaIntent dans le cadre des variables de la section Créer des variables pour l'application.

In [9]:
intentName = "ReserverVoyage"

existing_intents = client.model.list_intents(get_luis_env.LUIS_APP_ID, versionId)

already_exists = True
for i in existing_intents:
    if intentName == i.name:
        already_exists = True

if already_exists == False:
    client.model.add_intent(get_luis_env.LUIS_APP_ID, versionId, intentName)
    print('intent', intentName ,'added')
    # add as much intents as needed for your app
    # intent2= 'Greetings'
    # intent3= 'CheckWeather'

### 1.2. Créer des entités pour l’application
Bien que les entités ne soient pas obligatoires, elles sont présentes dans la plupart des applications. L’entité extrait des informations à partir de l’énoncé utilisateur, qui sont nécessaires pour répondre à l’intention de l’utilisateur. Il existe plusieurs types d’entités [prédéfinies](https://docs.microsoft.com/fr-fr/python/api/azure-cognitiveservices-language-luis/azure.cognitiveservices.language.luis.authoring.operations.modeloperations?view=azure-python#add-prebuilt-app-id--version-id--prebuilt-extractor-names--custom-headers-none--raw-false----operation-config-) et personnalisées, chacune avec leurs propres modèles DTO (Data Transformation Object). Les entités prédéfinies courantes à ajouter à votre application incluent number, datetimeV2, geographyV2 et ordinal.

- Il est important de savoir que les entités ne sont pas marquées avec une intention. Elles peuvent s’appliquer à de nombreuses intentions. Seuls les exemples d’énoncés utilisateur sont marqués pour une intention unique spécifique.

Les méthodes de création pour les entités font partie de la classe ModelOperations. Chaque type d’entité possède son propre modèle DTO (Data Transformation Object).

Le code de création d'entité crée une entité de Machine Learning avec des sous-entités et des fonctionnalités appliquées aux sous-entités Quantity.

Exemple d'entité: 
<table>
  <tr>
    <th>Énoncé</th>
    <th>Entité</th>
    <th>Résolution</th>
  </tr>
  <tr>
    <td>one thousand times</td>
    <td>"one thousand"</td>
    <td>"1000"</td>
  </tr>
  <tr>
    <td>1,000 people</td>
    <td>"1,000"</td>
    <td>"1000"</td>
  </tr>
  <tr>
    <td>one hundred fifty orders</td>
    <td>"one hundred fifty"</td>
    <td>"150"</td>
  </tr>
  <tr>
    <td>buy two dozen eggs</td>
    <td>"two dozen"</td>
    <td>"24"</td>
  </tr>  
</table>

    !!! The entity represents a word or phrase inside the utterance that you want extracted. Entities describe information relevant to the intent, and sometimes they are essential for your app to perform its task.

In [39]:
# NOT WORKING FROM SDK, DO IT FROM PORTAL

# entities = ["from_city", "to_city", "from_date", "to_date", "budget"]
# for e in entities:
#     client.model.add_entity(get_luis_env.LUIS_APP_ID, versionId,e)



### 1.3 Ajouter un exemple d’énoncé à une intention
Pour déterminer l’intention d’un énoncé et extraire des entités, l’application a besoin d’exemples d’énoncés. Les exemples doivent cibler une intention spécifique et unique, et doivent marquer toutes les entités personnalisées. Les entités prédéfinies n’ont pas besoin d’être marquées.

Ajoutez des exemples d’énoncés en créant une liste d’objets ExampleLabelObject, un objet pour chaque exemple d’énoncé. Chaque exemple doit marquer toutes les entités avec un dictionnaire de paires nom/valeur de nom d’entité et de valeur d’entité. La valeur de l’entité doit être exactement telle qu’elle apparaît dans le texte de l’exemple d’énoncé.


 - Group data for batch test
    It is important that utterances used for batch testing are new to LUIS. If you have a data set of utterances, divide the utterances into three sets: example utterances added to an intent, utterances received from the published endpoint, and utterances used to batch test LUIS after it is trained.

    The batch JSON file you use should include utterances with top-level machine-learning entities labeled including start and end position. The utterances should not be part of the examples already in the app. They should be utterances you want to positively predict for intent and entities.

    You can separate out tests by intent and/or entity or have all the tests (up to 1000 utterances) in the same file.
    
    - <b>Batch syntax template </b> for intents with entities:
    
        {
        "LabeledTestSetUtterances": [
            {
                "text": "play a song",
                "intent": "play_music",
                "entities": [
                    {
                        "entity": "song_parent",
                        "startPos": 0,
                        "endPos": 15,
                        "children": [
                            {
                                "entity": "pre_song",
                                "startPos": 0,
                                "endPos": 3
                            },
                            {
                                "entity": "song_info",
                                "startPos": 5,
                                "endPos": 15
                            }
                        ]
                    }
                ]
            }
        ]
    }


In [8]:
# Define labeled example
intentName = "ReserverVoyage"

labeledExampleUtteranceWithMLEntity = {
    "text": "I'd like to book a trip to Atlantis from Caprica on Saturday, August 13, 2016 for 8 adults. I have a tight budget of 1700.",
    "intentName": intentName,
    "entityLabels": [
        {
            "startCharIndex": 27,     #startPos for batch
            "endCharIndex": 34,       #endPos for batch
            "entityName": "to_city",  #entity for batch
            "children": []
        },
        
        {
            "startCharIndex": 41,
            "endCharIndex": 47,
            "entityName": "from_city",
            "children": []
        },
        {
            "startCharIndex": 52,
            "endCharIndex":76,
            "entityName": "from_date",
            "children": []
        },
        {
            "startCharIndex": 117,
            "endCharIndex":120,
            "entityName": "budget",
            "children": []
        }
    ]
}

print("Labeled Example Utterance:", labeledExampleUtteranceWithMLEntity)

# Add an example for the entity.
# Enable nested children to allow using multiple models with the same name.
# The quantity subentity and the phraselist could have the same exact name if this is set to True
client.examples.add(get_luis_env.LUIS_APP_ID, versionId, labeledExampleUtteranceWithMLEntity, { "enableNestedChildren": True })

Labeled Example Utterance: {'text': "I'd like to book a trip to Atlantis from Caprica on Saturday, August 13, 2016 for 8 adults. I have a tight budget of 1700.", 'intentName': 'ReserverVoyage', 'entityLabels': [{'startCharIndex': 27, 'endCharIndex': 34, 'entityName': 'to_city', 'children': []}, {'startCharIndex': 41, 'endCharIndex': 47, 'entityName': 'from_city', 'children': []}, {'startCharIndex': 52, 'endCharIndex': 76, 'entityName': 'from_date', 'children': []}, {'startCharIndex': 117, 'endCharIndex': 120, 'entityName': 'budget', 'children': []}]}


<azure.cognitiveservices.language.luis.authoring.models._models_py3.LabelExampleResponse at 0x2a6a7da6280>

# 2. Effectuer l’apprentissage de l’application
Une fois le modèle créé, l’application LUIS doit être entraînée pour cette version du modèle. Un modèle entraîné peut être utilisé dans un conteneur ou publié dans les emplacements intermédiaires ou produits.

La méthode train.train_version a besoin de l’ID de l’application et de l’ID de version.

Un modèle très petit, comme l’illustre ce guide de démarrage rapide, sera entraîné très rapidement. Pour les applications de niveau production, l’entraînement de l’application doit inclure un appel d’interrogation à la méthode get_status pour déterminer si l’entraînement a réussi. La réponse est une liste d’objets ModelTrainingInfo avec un état distinct pour chaque objet. Tous les objets doivent réussir pour que l’entraînement soit considéré comme terminé.

In [None]:
# the code is summarized as 'def' in utils.py for repetition use
client.train.train_version(app_id, versionId)
waiting = True
while waiting:
    info = client.train.get_status(app_id, versionId)

    # get_status returns a list of training statuses, one for each model. Loop through them and make sure all are done.
    waiting = any(map(lambda x: 'Queued' == x.details.status or 'InProgress' == x.details.status, info))
    if waiting:
        print ("Waiting 10 seconds for training to complete...")
        time.sleep(10)
    else: 
        print ("trained")
        waiting = False


# 3. Publier une application dans l’emplacement de production
Publiez l’application LUIS à l’aide de la méthode app.publish. Cela permet de publier la version entraînée actuelle à l’emplacement spécifié au point de terminaison. Votre application cliente utilise ce point de terminaison afin d’envoyer des énoncés utilisateur pour la prédiction de l’intention et l’extraction d’entité.

In [None]:
# Mark the app as public so we can query it using any prediction endpoint.
# Note: For production scenarios, you should instead assign the app to your own LUIS prediction endpoint. See:
# https://docs.microsoft.com/en-gb/azure/cognitive-services/luis/luis-how-to-azure-subscription#assign-a-resource-to-an-app
client.apps.update_settings(app_id, is_public=True)

responseEndpointInfo = client.apps.publish(app_id, versionId, is_staging=False)

# 4. Authentifier le client du runtime de prédiction
Utilisez l'objet d'informations d'identification avec votre clé, puis utilisez-le avec votre point de terminaison pour créer un objet LUISRuntimeClientConfiguration.

In [14]:
runtimeCredentials = CognitiveServicesCredentials(get_luis_env.LUIS_PRED_KEY)
clientRuntime = LUISRuntimeClient(endpoint=get_luis_env.LUIS_PRED_ENDPOINT, credentials=runtimeCredentials)

# 5. Obtenir une prédiction du runtime
Ajoutez le code suivant pour créer la requête à adresser au runtime de prédiction.

L’énoncé utilisateur fait partie de l’objet prediction_request.

La méthode get_slot_prediction nécessite plusieurs paramètres tels que l’ID de l’application, le nom de l’emplacement et l’objet de requête de prédiction pour traiter la requête. Les autres options que sont verbose (mode détaillé), showAllIntents (afficher toutes les intentions) et log (journal) sont facultatives. La requête retourne un objet PredictionResponse.

In [16]:
# Production == slot name
querytext = 'I want two small pepperoni pizzas with more salsa'

predictionRequest = { "query" : querytext }
app_id = get_luis_env.LUIS_APP_ID
predictionResponse = clientRuntime.prediction.get_slot_prediction(app_id, "Production", predictionRequest)
print("Top intent: {}".format(predictionResponse.prediction.top_intent))
print("Sentiment: {}".format (predictionResponse.prediction.sentiment))
print("Intents: ")

for intent in predictionResponse.prediction.intents:
    print("\t{}".format (json.dumps (intent)))
print("Entities: {}".format (predictionResponse.prediction.entities))

AttributeError: 'PredictionOperations' object has no attribute 'get_slot_prediction'

La réponse de prédiction est un objet JSON comprenant l'intention et toutes les entités trouvées.

{
    "query": "I want two small pepperoni pizzas with more salsa",
    "prediction": {
        "topIntent": "OrderPizzaIntent",
        "intents": {
            "OrderPizzaIntent": {
                "score": 0.753606856
            },
            "None": {
                "score": 0.119097039
            }
        },
        "entities": {
            "Pizza order": [
                {
                    "Pizza": [
                        {
                            "Quantity": [
                                2
                            ],
                            "Type": [
                                "pepperoni"
                            ],
                            "Size": [
                                "small"
                            ],
                            "$instance": {
                                "Quantity": [
                                    {
                                        "type": "builtin.number",
                                        "text": "two",
                                        "startIndex": 7,
                                        "length": 3,
                                        "score": 0.968156934,
                                        "modelTypeId": 1,
                                        "modelType": "Entity Extractor",
                                        "recognitionSources": [
                                            "model"
                                        ]
                                    }
                                ],
                                "Type": [
                                    {
                                        "type": "Type",
                                        "text": "pepperoni",
                                        "startIndex": 17,
                                        "length": 9,
                                        "score": 0.9345611,
                                        "modelTypeId": 1,
                                        "modelType": "Entity Extractor",
                                        "recognitionSources": [
                                            "model"
                                        ]
                                    }
                                ],
                                "Size": [
                                    {
                                        "type": "Size",
                                        "text": "small",
                                        "startIndex": 11,
                                        "length": 5,
                                        "score": 0.9592077,
                                        "modelTypeId": 1,
                                        "modelType": "Entity Extractor",
                                        "recognitionSources": [
                                            "model"
                                        ]
                                    }
                                ]
                            }
                        }
                    ],
                    "Toppings": [
                        {
                            "Type": [
                                "salsa"
                            ],
                            "Quantity": [
                                "more"
                            ],
                            "$instance": {
                                "Type": [
                                    {
                                        "type": "Type",
                                        "text": "salsa",
                                        "startIndex": 44,
                                        "length": 5,
                                        "score": 0.7292897,
                                        "modelTypeId": 1,
                                        "modelType": "Entity Extractor",
                                        "recognitionSources": [
                                            "model"
                                        ]
                                    }
                                ],
                                "Quantity": [
                                    {
                                        "type": "Quantity",
                                        "text": "more",
                                        "startIndex": 39,
                                        "length": 4,
                                        "score": 0.9320932,
                                        "modelTypeId": 1,
                                        "modelType": "Entity Extractor",
                                        "recognitionSources": [
                                            "model"
                                        ]
                                    }
                                ]
                            }
                        }
                    ],
                    "$instance": {
                        "Pizza": [
                            {
                                "type": "Pizza",
                                "text": "two small pepperoni pizzas",
                                "startIndex": 7,
                                "length": 26,
                                "score": 0.812199831,
                                "modelTypeId": 1,
                                "modelType": "Entity Extractor",
                                "recognitionSources": [
                                    "model"
                                ]
                            }
                        ],
                        "Toppings": [
                            {
                                "type": "Toppings",
                                "text": "more salsa",
                                "startIndex": 39,
                                "length": 10,
                                "score": 0.7250252,
                                "modelTypeId": 1,
                                "modelType": "Entity Extractor",
                                "recognitionSources": [
                                    "model"
                                ]
                            }
                        ]
                    }
                }
            ],
            "$instance": {
                "Pizza order": [
                    {
                        "type": "Pizza order",
                        "text": "two small pepperoni pizzas with more salsa",
                        "startIndex": 7,
                        "length": 42,
                        "score": 0.769223332,
                        "modelTypeId": 1,
                        "modelType": "Entity Extractor",
                        "recognitionSources": [
                            "model"
                        ]
                    }
                ]
            }
        }
    }
}

# 6. Exécution de l'application
Exécutez l’application avec la commande python de votre fichier de démarrage rapide.

python authoring_and_predict.py

# 7. Nettoyer les ressources
Vous pouvez supprimer l’application à partir du portail LUIS et supprimer les ressources Azure du portail Azure.

Si vous utilisez l’API REST, vous devrez supprimer le fichier ExampleUtterances.JSON du système de fichiers à la fin de ce guide de démarrage rapide.

# 8. Développement itératif de l'application LUIS

A LUIS app learns and performs most efficiently when you iteratively develop it. Here's a typical iteration cycle:

1. Create a new version
2. Edit the LUIS app schema. This includes:
    - Intents with example utterances
    - Entities
    - Features
3. Train, test, and publish
4. Test for active learning by reviewing utterances sent to the prediction endpoint
5. Gather data from endpoint queries

# DATASET ANALYSIS

In [10]:
with open("./00_data/frames.json") as f:
    frames = json.load(f)

print(len(frames))    

1369


qu'est ce qu'il y a l'intérieur des frames?

In [13]:
for item in frames[0]:
    print(item, len(frames[0][item]))

user_id 9
turns 7
wizard_id 9
id 36
labels 2


la conversation entre l'utilisateur de le bot est stockée dans 'turns'. Regardons en un exemple:

In [14]:
for n, i in enumerate(frames[80]["turns"]):
    print(str(n+1)+'.', i['author'].upper(), ':', i['text'])
    if n%2 ==0:
        print('-'*35)
    else:
        print('_'*70)

1. USER : Hi im from Houston and i want to go to San Antonio, but i have strict dates can you help?
-----------------------------------
2. WIZARD : Sure thing! What days are you hoping to travel?
______________________________________________________________________
3. USER : August 25th until September 11th are the days im available
-----------------------------------
4. WIZARD : And will you be traveling with anyone?
______________________________________________________________________
5. USER : ill be alone
-----------------------------------
6. WIZARD : Well, let me tell you about the best deal I've found: The Vertex Inn is a 2-star hotel with a 7.15/10 guest rating, free wifi, free parking and free breakfast. They have a vacancy from the 7th to the 10th of September. You would be flying business class, bringing your total to 516.9 USD. Do you want me to book this trip?
______________________________________________________________________
7. USER : that sounds very tempting! are 

## Transformation des données en format LUIS

1. Nous allons garder les données qui nous intéressent 
2. Nous allons les transformer en format LUIS
3. Nous allons séparer les données en train/test 
4. Nous allons enregistrer les données sur Azure workspace


Identify your intents:
- Think about the intents that are important to your application's task.

- Let's take the example of a travel app, with functions to book a flight and check the weather at the user's destination. You can define two intents, BookFlight and GetWeather for these actions.

- In a more complex app with more functions, you likely would have more intents, and you should define them carefully so they aren't too specific. For example, BookFlight and BookHotel may need to be separate intents, but BookInternationalFlight and BookDomesticFlight may be too similar.

!!! It is a best practice to use only as many intents as you need to perform the functions of your app. If you define too many intents, it becomes harder for LUIS to classify utterances correctly. If you define too few, they may be so general that they overlap.

!!! If you don't need to identify overall user intention, add all the example user utterances to the None intent. If your app grows into needing more intents, you can create them later.

[See here for App schema](https://docs.microsoft.com/en-us/azure/cognitive-services/luis/app-schema-definition) for params.json file

In [7]:
# try:
#     with open("./00_data/params.json") as f:
#         params = json.load(f)

#     print("Intents :", [i["name"] for i in params["model"]["intents"]])
#     print("Entities :", [i["name"] for i in params["model"]["entities"]])
# except:
#     pass

In [15]:
label_to_entity = {
    "or_city": "from_city",
    "dst_city": "to_city",
    "str_date": "from_date",
    "end_date": "to_date",
    "budget": "budget"
}

intentName = "ReserverVoyage"
df = user_turns_to_luis_ds(
    frames,
    intentName,
    label_to_entity,
    keep_only_first=True
)
    
df.shape

(1369, 11)

In [16]:
# On supprime les doublons
df = df.drop_duplicates(["text"])
df.shape

(1329, 11)

In [17]:
df.head()

Unnamed: 0,user_turn_id,text,intent,entities,entity_total_nb,from_city_nb,to_city_nb,from_date_nb,to_date_nb,budget_nb,text_word_nb
0,0,I'd like to book a trip to Atlantis from Capri...,ReserverVoyage,"[{'entity': 'to_city', 'startPos': 27, 'endPos...",4,1,1,1,0,1,25
1,0,"Hello, I am looking to book a vacation from Go...",ReserverVoyage,"[{'entity': 'to_city', 'startPos': 59, 'endPos...",3,1,1,0,0,1,16
2,0,Hello there i am looking to go on a vacation w...,ReserverVoyage,"[{'entity': 'to_city', 'startPos': 63, 'endPos...",1,0,1,0,0,0,20
3,0,"Hi I'd like to go to Caprica from Busan, betwe...",ReserverVoyage,"[{'entity': 'to_city', 'startPos': 21, 'endPos...",4,1,1,1,1,0,19
4,0,"Hello, I am looking to book a trip for 2 adult...",ReserverVoyage,"[{'entity': 'budget', 'startPos': 67, 'endPos'...",3,1,1,0,0,1,25


Regardons les phrases sans aucune entité définie, et sans intention

In [18]:
df[(df["intent"] == "None") & (df["entity_total_nb"] == 0)]['text']

40                                                Hi!
48                                              Heyo!
52                                      Good morning.
63                                      Hello wozbot!
106                                      ay whats up?
                            ...                      
1146                                I need your help!
1158                         Vacay time woooohooooooo
1165    Hi. First time trying this out. What do I do?
1259                           I have 9 days vacation
1348                        Hi! I am very excited!!!!
Name: text, Length: 69, dtype: object

In [19]:
df["intent"].value_counts()

ReserverVoyage    1135
None               194
Name: intent, dtype: int64

Regardons les échanges avec intention = None

In [20]:
for text in df[df["intent"] == "None"]["text"].iloc[:10]:
    print(text)

Hello, I have 15 vacation days available between June 1st and August 31st. I am leaving from Theed. I would like to go somewhere with lots of sunshine.
Hi!
Hi! I'd like to go to Boston from Mos Eisley on August 15th.
Heyo!
Good morning.
Hi. I need to book a vacation to Long Beach between August 25 and September 3. Departure is from Paris
Hi we're from Miami and we want to go to paris, can you help out?
Hello wozbot!
Hi im fro termina and i want to go on vacation on August 13th
Hi i am looking to go to Punta Cana with my three friends


On constate que certaines énoncés labellées avec intention=None possèdent des informations concentant certaines entités (from_city, to_city, dates etc). Modifions l'intention pour ceux-ci pour les inclure dans intention=ReserverVoyage.

In [21]:
def change_val(df):
    if df["intent"]=="None" and df["entity_total_nb"]==0:
        return 'None'
    else:
        return 'ReserverVoyage'

df["intent"] = df.apply(change_val, axis=1)


Qu'est-ce qui nous reste comme intention=None?

In [22]:
print(df['intent'].value_counts())
print('-'*70)
for text in df[df["intent"] == "None"]["text"].iloc[:10]:
    print(text)

ReserverVoyage    1260
None                69
Name: intent, dtype: int64
----------------------------------------------------------------------
Hi!
Heyo!
Good morning.
Hello wozbot!
ay whats up?
hi there
me again... I'm still burnt out from work
hey
hello hello
HEY


Regardons les statistiques des échanges ayant toutes les entités présentes. On constate qu'il y en a 33 (échanges)

In [23]:
df[df['entity_total_nb']==5].describe()

Unnamed: 0,user_turn_id,entity_total_nb,from_city_nb,to_city_nb,from_date_nb,to_date_nb,budget_nb,text_word_nb
count,33.0,33.0,33.0,33.0,33.0,33.0,33.0,33.0
mean,0.0,5.0,1.030303,0.969697,1.0,1.0,1.0,28.636364
std,0.0,0.0,0.174078,0.174078,0.0,0.0,0.0,14.439136
min,0.0,5.0,1.0,0.0,1.0,1.0,1.0,8.0
25%,0.0,5.0,1.0,1.0,1.0,1.0,1.0,19.0
50%,0.0,5.0,1.0,1.0,1.0,1.0,1.0,26.0
75%,0.0,5.0,1.0,1.0,1.0,1.0,1.0,35.0
max,0.0,5.0,2.0,1.0,1.0,1.0,1.0,69.0


In [24]:
# On échantillonne 67 individus au hazard pour arriver à 100
# On choisit ceux qui ne sont pas 'None'
nb = 167
df_tmp = df[(df["intent"] == "ReserverVoyage") & 
            (df["entity_total_nb"] < len(label_to_entity)) & 
            (df["entity_total_nb"] > 0)]
df_tmp = df_tmp.sample(n=nb, random_state=42)

In [25]:
df_tmp.head()

Unnamed: 0,user_turn_id,text,intent,entities,entity_total_nb,from_city_nb,to_city_nb,from_date_nb,to_date_nb,budget_nb,text_word_nb
1070,0,Hi. i need to get to cairo on the low\nkeep it...,ReserverVoyage,"[{'entity': 'to_city', 'startPos': 21, 'endPos...",2,1,1,0,0,0,28
1192,0,Need to go to Vancouver,ReserverVoyage,"[{'entity': 'to_city', 'startPos': 14, 'endPos...",1,0,1,0,0,0,5
1362,0,"Hey, I need to get back home to Milan!",ReserverVoyage,"[{'entity': 'to_city', 'startPos': 32, 'endPos...",1,0,1,0,0,0,9
1269,0,Minneapolis to Punta Cana,ReserverVoyage,"[{'entity': 'from_city', 'startPos': 0, 'endPo...",2,1,1,0,0,0,4
958,0,"santiago to burlington, go!",ReserverVoyage,"[{'entity': 'from_city', 'startPos': 0, 'endPo...",2,1,1,0,0,0,4


Consturisons le df final avec les énoncés appartenant aux deux intentions 'book_flight' et 'None'

In [26]:
df_utterances=pd.concat([df[df['entity_total_nb']==5],
                     df_tmp,
                     df[df["intent"] == "None"]
                    ])

In [27]:
df_utterances['intent'].value_counts()

ReserverVoyage    200
None               69
Name: intent, dtype: int64

## Separer train/test

In [28]:
df_train = df_utterances.sample(frac=0.7, random_state=42)
df_test = df_utterances.drop(df_train.index)

In [29]:
print(df_train["intent"].value_counts().tolist())
print(df_test["intent"].value_counts().tolist())

[142, 46]
[58, 23]


### Transofrmer les df finaux en format LUIS (list of dictionnaires)

[Voir les détails pour le modèle de donnée en format batch d'entrainement](https://docs.microsoft.com/en-us/azure/cognitive-services/luis/luis-how-to-batch-test?tabs=portal#batch-syntax-template-for-intents-with-entities)

In [30]:
utterances_train = df_train[["text", "intent", "entities"]].to_dict("records")

utterances_test = df_test[["text", "intent", "entities"]].to_dict("records")
utterances_test = {
    "LabeledTestSetUtterances": utterances_test
}

In [35]:
# save utterances_test for manual batch test in portal
path = './00_data/datasets/'
with open(path + "utterances_train.json", "w") as outfile:
    json.dump(utterances_train, outfile)

with open(path + "utterances_test.json", "w") as outfile:
    json.dump(utterances_test, outfile)

##### register dataset to azure workspace

In [None]:
# see azure helper for more details on this one, it gets default folder
path_in_datastore = 'utterances'
azure_helper.upload_and_register_datasets(ds_path=path_in_datastore)

### Train  the model

In [57]:
# download actual parameters
latest_v = get_latest_version(get_luis_env)
get_p = get_params(get_luis_env, latest_v)

##### - Load new utterances

In [61]:
# increment version number
versionId_next = str(round(float(latest_v) + 0.1,1))
# by adding utterrances, it will append it to current params['utterances']
create_new_version(get_luis_env, versionId_next, get_p, utterances_train)

##### - Train the model

In [62]:
train_luis(get_luis_env, versionId)

trained


##### - Deploy the model

In [63]:
url = deploy_luis(get_luis_env, versionId, is_staging=True)
print(url)
# client.apps.update_settings(get_luis_env.LUIS_APP_ID, is_public=True)
# responseEndpointInfo = client.apps.publish(get_luis_env.LUIS_APP_ID, 
#                                            versionId, 
#                                            is_staging=True)

None


##### - evaluate the model

In [None]:
# On envoie la requête permettant de lancer l'évaluation
res = evaluate(get_luis_env, True, utterances_test)
print(res)

Pour chaque intent et chaque entity, on obtient 3 scores :
- `precision` : parmi les labels prédit sur chaque mot, indique lesquels sont corrects.
- `recall` : parmi les labels à détecter, indique lesquels ont été détectés par le modèle.
- `f_score` : moyenne harmonique de la precision et du recall.

##### - get prediction

In [None]:
text = 'I want to book a flight from paris to marseille'
res = get_prediction_luis(get_luis_env,'',text)
print(res)

##### - Delete temporary model

In [None]:
delete_luis(get_luis_env, versionId)

### Push the code to GitHub

A LUIS app learns and performs most efficiently when you iteratively develop it. Here's a typical iteration cycle:

Create a new version
Edit the LUIS app schema. This includes:
Intents with example utterances
Entities
Features
Train, test, and publish
Test for active learning by reviewing utterances sent to the prediction endpoint
Gather data from endpoint queries

