In [39]:
import json
import datetime

import pandas as pd
import numpy as np

In [9]:
data = pd.read_json('data/frames.json')

In [10]:
data

Unnamed: 0,user_id,turns,wizard_id,id,labels
0,U22HTHYNP,[{'text': 'I'd like to book a trip to Atlantis...,U21DKG18C,e2c0fc6c-2134-4891-8353-ef16d8412c9a,"{'userSurveyRating': 4.0, 'wizardSurveyTaskSuc..."
1,U21E41CQP,"[{'text': 'Hello, I am looking to book a vacat...",U21DMV0KA,4a3bfa39-2c22-42c8-8694-32b4e34415e9,"{'userSurveyRating': 3.0, 'wizardSurveyTaskSuc..."
2,U21RP4FCY,[{'text': 'Hello there i am looking to go on a...,U21E0179B,6e67ed28-e94c-4fab-96b6-68569a92682f,"{'userSurveyRating': 2.0, 'wizardSurveyTaskSuc..."
3,U22HTHYNP,[{'text': 'Hi I'd like to go to Caprica from B...,U21DKG18C,5ae76e50-5b48-4166-9f6d-67aaabd7bcaa,"{'userSurveyRating': 5.0, 'wizardSurveyTaskSuc..."
4,U21E41CQP,"[{'text': 'Hello, I am looking to book a trip ...",U21DMV0KA,24603086-bb53-431e-a0d8-1dcc63518ba9,"{'userSurveyRating': 5.0, 'wizardSurveyTaskSuc..."
...,...,...,...,...,...
1364,U2AMZ8TLK,[{'text': 'Hi I've got 9 days free and I'm loo...,U21DMV0KA,957fd205-bb7c-4b81-8cb6-13c81c51c5c9,"{'userSurveyRating': 3.5, 'wizardSurveyTaskSuc..."
1365,U2AMZ8TLK,[{'text': 'I need to get to Fortaleza on Septe...,U260BGVS6,71b21b86-2d05-4372-a0ee-6ed64b0ddc42,"{'userSurveyRating': 4.5, 'wizardSurveyTaskSuc..."
1366,U231PNNA3,[{'text': 'We're finally going on vacation isn...,U21T9NMKM,ef2cd70e-c1f2-42be-8839-cb465af0bf41,"{'userSurveyRating': 5.0, 'wizardSurveyTaskSuc..."
1367,U2AMZ8TLK,"[{'text': 'Hi there, I'm looking for a place t...",U21DMV0KA,ffa79d2c-14eb-45e6-8573-b0817a1a1803,"{'userSurveyRating': 4.0, 'wizardSurveyTaskSuc..."


>#### Ce jeu de données contient les informations décrivant des dialogues liés à la réservation d'hotel, de voyage etc.
> - `user_id`: Refers to a unique identifier for the user taking part in the dialogue.
> - `turns`:
>      * `author`: The author of the message in a dialogue. i.e. “user” or “wizard”.
>      * `text`: The sentence that the author uttered. It is the exact text that the author of a turn said. E.g. “text”: “Consider it done. Have a great trip!”.
>      * `labels`: JSON object which has three keys: `active_frame`, `acts`, and `acts_without_refs`.
>           * `active_frame`: The id of the currently active frame.
>           * `acts`: The dialogue acts for the current utterance. Each act has a name and arguments args. The name is the name of the dialogue act, for instance, offer, or inform. The args contain the slot types (key) and slot values (val), for instance budget=$2000. Slot values are optional. An act contains a ref tag whenever a user or wizard refers to a past frame.
>           * `acts_without_refs`: They are similar to the `acts` except that they do not have these ref tags. We define the frame tracking task as the task that takes as input the acts_without_refs and outputs the acts.
>      * `timestamp`: Unix timestamp denoting the time at which the current turn occurred.
>      * `frames`: List of frames up to the current turn. Each frame has the following keys: frame_id, frame_parent_id, requests, binary_questions, compare_requests, and info.
>           * `frame_id`: Id of the frame.
>           * `frame_parent_id`: Id of the parent frame.
>           * `requests, binary_questions, compare_requests`: Requests are questions related to one frame, for instance “what is the price of this package?”. Compare_requests concern several frames. For example, the user might ask to compare different packages: “What is the guest rating of these two hotels?”. Binary_questions are questions with both a slot type and a slot value. These are special cases of requests and compare_requests, for instance “are both hotels 3.5 stars?”.
>           * `info`: The info contains all the constraints set by the user or the wizard in the frame. These constraints are expressed as slot types which have a value. Note that each slot can have multiple values, which accumulate as long as the frame does not change. For example, the price can be both “1000 USD” and “cheapest”. There are two additional fields to keep track of specific aspects of the dialogue: 
>                * REJECTED a boolean value expressing if the user negated or affirmed an offer made by the wizard.
>                * MOREINFO a boolean value expressing whether the user wants to know more about the frame in question
>      * `db`: It can only occur during a wizard’s turn. It is a list of search queries made by the wizard with the associated list of search results. E.g. “db”: {“search”: [{“ORIGIN_CITY”: “Montreal”}], “result”: []}
> - `wizard_id`: Refers to a unique identifier for the wizard taking part in the dialogue.
> - `id`: Refers to a unique identification for the dialogue.
> - `labels`
:
>      * `userSurveyRating`: A value that represents the user’s satisfaction with the Wizard’s service, ranging from 1 – complete dissatisfaction to 5 – complete satisfaction.
>      * `wizardSurveyTaskSuccessful`: A boolean which is true if the wizard thinks at the end of the dialogue that the user’s goal was achieved.
>
> Ce jeu de données est composé de **1369 dialogues** décrites par **XX variables**

In [11]:
data.describe(include='all').T

Unnamed: 0,count,unique,top,freq
user_id,1369,11,U22K1SX9N,345
turns,1369,1369,[{'text': 'i wanna go to sacramento from hiros...,1
wizard_id,1369,12,U21T9NMKM,301
id,1369,1369,0ab749ee-6bd0-4560-85c1-59aef778672a,1
labels,1369,16,"{'userSurveyRating': 5.0, 'wizardSurveyTaskSuc...",929


In [56]:
x = pd.DataFrame(data['turns'][12])
display(x)

for row in x.iterrows():
    time = int(row[1]['timestamp']/1000)
    date_time = datetime.datetime.fromtimestamp(time)
    print(f"{row[1]['author']:8} {date_time.strftime('%Y-%m-%d %H:%M:%S')}: {row[1]['text']}")

Unnamed: 0,text,labels,author,timestamp,db
0,Hi im looking for a nice destination that i co...,"{'acts': [{'args': [{'val': 'book', 'key': 'in...",user,1471277000000.0,
1,"I have 2 choices, Vancouver or Toronto.","{'acts': [{'args': [{'val': 'Vancouver', 'key'...",wizard,1471278000000.0,{'result': [[{'trip': {'returning': {'duration...
2,What options are there for both,"{'acts': [{'args': [{'key': 'count_name'}], 'n...",user,1471278000000.0,
3,I have a few. Do you have a budget in mind?,"{'acts': [{'args': [{'key': 'budget'}], 'name'...",wizard,1471278000000.0,"{'result': [], 'search': []}"
4,$3700 is my budget at the moment,"{'acts': [{'args': [{'val': '$3700', 'key': 'b...",user,1471278000000.0,
5,I have the Obsidian Gem Inn in Vancouver which...,"{'acts': [{'args': [{'val': '14 day', 'key': '...",wizard,1471278000000.0,"{'result': [], 'search': []}"
6,Toronto seems like a better place this time of...,"{'acts': [{'args': [{'key': 'dep_time_dst'}, {...",user,1471278000000.0,
7,I have a direct flight to Toronto departing on...,"{'acts': [{'args': [{'val': 'August 15th', 'ke...",wizard,1471278000000.0,"{'result': [], 'search': []}"
8,would by chance have one leaving on the 19th,"{'acts': [{'args': [{'val': '19th', 'key': 'st...",user,1471278000000.0,
9,I have a flight available departing on August ...,"{'acts': [{'args': [{'val': 'August 24th', 'ke...",wizard,1471278000000.0,{'result': [[{'trip': {'returning': {'duration...


user     2016-08-15 18:10:17: Hi im looking for a nice destination that i could go to from Columbus
wizard   2016-08-15 18:12:11: I have 2 choices, Vancouver or Toronto.
user     2016-08-15 18:13:31: What options are there for both
wizard   2016-08-15 18:14:24: I have a few. Do you have a budget in mind?
user     2016-08-15 18:14:49: $3700 is my budget at the moment
wizard   2016-08-15 18:17:51: I have the Obsidian Gem Inn in Vancouver which is a 4-star hotel at $1934.64 or the Hotel Richard in Toronto which is a 3-star hotel at $2393.00. Both are for a 14 day stay.
user     2016-08-15 18:19:06: Toronto seems like a better place this time of year, when would the flights be?
wizard   2016-08-15 18:20:36: I have a direct flight to Toronto departing on August 15th and returning on August 28th.  Would you like me to book that for you?
user     2016-08-15 18:20:54: would by chance have one leaving on the 19th
wizard   2016-08-15 18:24:02: I have a flight available departing on August 24th a

In [14]:
data['turns'][0]

[{'text': "I'd like to book a trip to Atlantis from Caprica on Saturday, August 13, 2016 for 8 adults. I have a tight budget of 1700.",
  'labels': {'acts': [{'args': [{'val': 'book', 'key': 'intent'}],
     'name': 'inform'},
    {'args': [{'val': 'Atlantis', 'key': 'dst_city'},
      {'val': 'Caprica', 'key': 'or_city'},
      {'val': 'Saturday, August 13, 2016', 'key': 'str_date'},
      {'val': '8', 'key': 'n_adults'},
      {'val': '1700', 'key': 'budget'}],
     'name': 'inform'}],
   'acts_without_refs': [{'args': [{'val': 'book', 'key': 'intent'}],
     'name': 'inform'},
    {'args': [{'val': 'Atlantis', 'key': 'dst_city'},
      {'val': 'Caprica', 'key': 'or_city'},
      {'val': 'Saturday, August 13, 2016', 'key': 'str_date'},
      {'val': '8', 'key': 'n_adults'},
      {'val': '1700', 'key': 'budget'}],
     'name': 'inform'}],
   'active_frame': 1,
   'frames': [{'info': {'intent': [{'val': 'book', 'negated': False}],
      'budget': [{'val': '1700.0', 'negated': False}],

In [13]:
data['labels'][0]

{'userSurveyRating': 4.0, 'wizardSurveyTaskSuccessful': True}