# Description of the task and code

We analize the `train.py` file, which is the main file to execute in order to create a model and test for results.
A couple of considerations:
* The main goal of this file is explain the details of the paper explained in this [tutorial](https://towardsdatascience.com/training-a-goal-oriented-chatbot-with-deep-reinforcement-learning-part-i-introduction-and-dce3af21d383)
* The code is downloaded from this [repository](https://github.com/maxbren/GO-Bot-DRL)
* Since the scope is explaining the lines of code and the meaning of the objects, for the sake of the description, many lines of codes and objects will be created and depicted, but are not part from the original train.py script.

The important thing to keep in mind is the GOAL of this whole code:
* The RL algorithm does not have an example of how to conduct the conversation. 
* We only count with the desire of the user (user goal), and the available options that the bot will provide.
* The objective is to build a chatbot that is able to reach the user goal, making the right inquiries filling the missing spaces.


First of all we bring the necessary imports. Many of them are scripts contained in the same folder, so we'll refer to those.

In [2]:
from user_simulator import UserSimulator
from error_model_controller import ErrorModelController
from dqn_agent import DQNAgent
from state_tracker import StateTracker
import pickle, argparse, json, math
from utils import remove_empty_slots, timeprint
from user import User
from time import time
import datetime

Inmediately after, the main function is introduced:

## 1.1. Main
`if __name__ == "__main__":`

The constants file is parsed from the folder. We comment these lines as we're not **executing the script, but a notebook.**
We use the default setup of constants, shown below:

### 1.1.1. Constants

We import the variables that will not change during the execution of the code, or CONSTANTS. These are normally either written in uppercase or saved in a separate file, as seen in this case.

In [6]:
# Can provide constants file path in args OR run it as is and change 'CONSTANTS_FILE_PATH' below
# 1) In terminal: python train.py --constants_path "constants.json"
# 2) Run this file as is
#parser = argparse.ArgumentParser()
#parser.add_argument('--constants_path', dest='constants_path', type=str, default='')
#args = parser.parse_args()
#params = vars(args)

# Load constants json into dict
CONSTANTS_FILE_PATH = 'constants.json'
#if len(params['constants_path']) > 0:
#    constants_file = params['constants_path']
#else:
#    constants_file = CONSTANTS_FILE_PATH

constants_file = CONSTANTS_FILE_PATH # Put in place for correct executing

with open(constants_file) as f:
    constants = json.load(f)

In [8]:
print(json.dumps(constants, indent=4, sort_keys=True))

{
    "agent": {
        "batch_size": 16,
        "dqn_hidden_size": 80,
        "epsilon_init": 0.0,
        "gamma": 0.9,
        "learning_rate": 0.001,
        "load_weights_file_path": "",
        "max_mem_size": 500000,
        "save_weights_file_path": "",
        "vanilla": true
    },
    "db_file_paths": {
        "database": "data/movie_db.pkl",
        "dict": "data/movie_dict.pkl",
        "user_goals": "data/movie_user_goals.pkl"
    },
    "emc": {
        "intent_error_prob": 0.0,
        "slot_error_mode": 0,
        "slot_error_prob": 0.05
    },
    "run": {
        "max_round_num": 20,
        "num_ep_run": 40000,
        "success_rate_threshold": 0.3,
        "train_freq": 100,
        "usersim": true,
        "warmup_mem": 1000
    }
}


Next, I proceed to describe the set or variables that we find here:   
Do not copy the content from here, since I'm writing on top of it.

``` javascript
{
    "agent": { 
            // All necessary hypterparamenters to modify the behavior of the agent. These should be changed during the GRID SEARCH.        
        
        "batch_size": 16, 
        "dqn_hidden_size": 80,
        "epsilon_init": 0.0,
        "gamma": 0.9,
        "learning_rate": 0.001,
        "load_weights_file_path": "",
        "max_mem_size": 500000,
        "save_weights_file_path": "",
        "vanilla": true
    },
        
        // The paths from the 3 different databases that I will explain in detail next. All 3 are necessary
        
    "db_file_paths": {
        "database": "data/movie_db.pkl",
        "dict": "data/movie_dict.pkl",
        "user_goals": "data/movie_user_goals.pkl"
    },
        
        // The ERROR MODEL CONTROLLER is the component that induces error on the agent and the enviornment.
        
    "emc": {
        "intent_error_prob": 0.0,
        "slot_error_mode": 0,
        "slot_error_prob": 0.05
    },
        
        // The parameters used in the training process
        
    "run": {
        "max_round_num": 20, // The maximum number of steps during an episode. Once reached, the episode is done. 
        "num_ep_run": 40000, //
        "success_rate_threshold": 0.3, // Not really sure, we'll come back to that
        "train_freq": 100, // The model does not train every time it predicts, but every "this variable" times
        "usersim": true, // If the user is simulated
        "warmup_mem": 1000 // The memory used in the warming up of the algorithm.
    }
}
```

Load and parse this file constants into code variables.

In [9]:
# Load file path constants
file_path_dict = constants['db_file_paths']
DATABASE_FILE_PATH = file_path_dict['database']
DICT_FILE_PATH = file_path_dict['dict']
USER_GOALS_FILE_PATH = file_path_dict['user_goals']

# Load run constants
run_dict = constants['run']
USE_USERSIM = run_dict['usersim']
WARMUP_MEM = run_dict['warmup_mem']
NUM_EP_TRAIN = run_dict['num_ep_run']
TRAIN_FREQ = run_dict['train_freq']
MAX_ROUND_NUM = run_dict['max_round_num']
SUCCESS_RATE_THRESHOLD = run_dict['success_rate_threshold']

### 1.1.2. Databases

Load the user databases

#### Movie database

This is the file of the options that the chatbot has available to offer. But the user does not choose. The chatbot must just offer the options and the user must confirm.
The example shown below depict the different parameters in the options, such as city, theater, critic_rating, etc.

In [15]:
# Load movie DB
# Note: If you get an unpickling error here then run 'pickle_converter.py' and it should fix it
database = pickle.load(open(DATABASE_FILE_PATH, 'rb'), encoding='latin1')
# Clean DB
remove_empty_slots(database)

database[0]

{'city': 'hamilton',
 'theater': 'manville 12 plex',
 'zip': '08835',
 'critic_rating': 'good',
 'date': 'tomorrow',
 'state': 'nj',
 'starttime': '10:30am',
 'genre': 'comedy',
 'moviename': 'zootopia'}

#### Dictionary database
It includes the single (unique) components of the movie database, in order to tag and insert into the network later.


In [16]:
# Load movie dict
db_dict = pickle.load(open(DICT_FILE_PATH, 'rb'), encoding='latin1')
db_dict

{'city': ['hamilton',
  'manville',
  'bridgewater',
  'seattle',
  'bellevue',
  'birmingham',
  'san francisco',
  'portland',
  'royal oak',
  'Royal Oak',
  'madison heights',
  'detroit',
  'des moines',
  'johnstown',
  'boston',
  'carbondale',
  'los angeles',
  'stony brook',
  '94952',
  'tampa',
  'hoover',
  'dramas',
  'Sacramento',
  'nashville',
  'Seattle',
  'st louis',
  'whittier village stadium cinemas',
  'southeast portland',
  'miami',
  'chicago',
  'nyc',
  'sacramento',
  'pittsburgh',
  'atlanta',
  'south barrington',
  'over seattle',
  'dallas',
  'st',
  'louis park',
  'Portland',
  'Monroe',
  'cary',
  'whittier',
  'sparta',
  'Shiloh',
  'Belleville',
  "o'fallon",
  'fairview heights',
  'springfield',
  'albany',
  'houma',
  'la',
  'evanston',
  'Southfield',
  'monroe',
  'Long Island',
  'northern san francisco',
  '94109',
  'louis',
  'sappington',
  'norfolk',
  'Los Angeles CA 90015',
  'campcreek area',
  'regency',
  'arlington',
  'phila

#### User goals
It is the input training tool, as each element represents a request of service from the user, and starts one episode. The more user goals, the more episodes to train with.

In [18]:
# Load goal File
user_goals = pickle.load(open(USER_GOALS_FILE_PATH, 'rb'), encoding='latin1')
# Init. Objects
if USE_USERSIM:
    user = UserSimulator(user_goals, constants, database)
else:
    user = User(constants)

user_goals

[{'request_slots': {},
  'diaact': 'request',
  'inform_slots': {'city': 'birmingham',
   'numberofpeople': '1',
   'theater': 'carmike summit 16',
   'state': 'al',
   'starttime': 'around 2pm',
   'date': 'today',
   'moviename': 'zootopia'}},
 {'request_slots': {},
  'diaact': 'request',
  'inform_slots': {'city': 'seattle',
   'numberofpeople': '2',
   'theater': 'amc pacific place 11 theater',
   'starttime': '9:00 pm',
   'date': 'tomorrow',
   'moviename': 'deadpool'}},
 {'request_slots': {},
  'diaact': 'request',
  'inform_slots': {'city': 'birmingham',
   'numberofpeople': '4',
   'theater': 'carmike summit 16',
   'state': 'al',
   'starttime': 'around 6pm',
   'date': 'today',
   'moviename': 'deadpool'}},
 {'request_slots': {},
  'diaact': 'request',
  'inform_slots': {'city': 'seattle',
   'numberofpeople': '2',
   'theater': 'regal meridian 16',
   'starttime': '9:10 pm',
   'date': 'tomorrow',
   'moviename': 'zootopia'}},
 {'request_slots': {},
  'diaact': 'request',
 

In [None]:

emc = ErrorModelController(db_dict, constants)
state_tracker = StateTracker(database, constants)
dqn_agent = DQNAgent(state_tracker.get_state_size(), constants)