# Practice 2. Building a Pipelined Conversation System using ConvLab

# index

## Step 1. Let's look at the MultiWOZ dataset

## Step 2. Build a Pipelined Conversation System Using ConvLab

## Step 3. Configure, diagnose, and evaluate models with modules provided by ConvLab

## Additional. Try the End-to-end Neural Pipeline (ACL 2020) model

# Step 1. Take a look at the MultiWOZ dataset

## Step 1.0 Preset

These are modules that help the code below, and can be modified if necessary.

In [None]:
import json
import os
import zipfile
import sys
from collections import Counter
from nltk.tokenize import word_tokenize

from textwrap import indent
from pprint import pformat
from pprint import pprint

CUDA_IDX = '0'
os.environ['CUDA_VISIBLE_DEVICES'] = CUDA_IDX

def read_zipped_json(filepath, filename):
    archive = zipfile.ZipFile(filepath, 'r')
    return json.load(archive.open(filename))

def pprint_manual(user_manual, name):
    """print user manual
        argument 'name' is needed to discriminate 'WOZ' from others
    """
    print('    User manual (message) : ')
    if 'WOZ' in name:
        print(" "*8, user_manual)
    else:
        for manual_one in user_manual:            print(" "*8, manual_one)


def pprint_goal(goal, name):
    """print user's goal
        argument 'name' is needed to discriminate "WOZ" from others.
    """
    if 'WOZ' in name:
        pass
    else:
        for i, mes in enumerate(goal['message']):
            mes = mes.replace('<span class=\'emphasis\'>', '')
            mes = mes.replace('</span>', '')
            goal['message'][i] = mes

    print("[Goals]")
    user_manual = None
    for key, value in goal.items():
        if not value:           # empty
            continue
        elif key == 'message':  # user manual
            user_manual = value
        else:                   # valid domain
            domain = key        
            print(indent(pformat({domain : value}), ' '*4))
    pprint_manual(user_manual, name)  

def get_valid_domains(goal):
    """return valid domains for pretty print"""
    domains = []
    for key, value in goal.items():
        if not value:           # empty
            continue
        elif key == 'message':  # user manual
            continue
        else:                   # valid domain
            domains.append(key)
    return domains

def pprint_turns(log, domains):
    """pretty print for dialogue turns"""
    
    # signal for stopping print
    signal = None
    
    for i, log_one in enumerate(log):
        
        # dummy input function for pausing
        print('-' * 20 + '1. Enter to keep going 2. Type \'stop\' and Enter to stop printing ' + '-' * 40)
        signal = input()
        if 'stop' in signal:
            break

        # check whether system turn or not
        bool_sys_turn = False
        if log_one['metadata']:
            bool_sys_turn = True

        # delete span_info
        if 'span_info' in log_one:
            del log_one['span_info']

        # delete unnecessary domains
        domain_pairs = log_one['metadata']
        del_domains = []
        for dom, _ in domain_pairs.items():
            if not dom in domains:
                del_domains.append(dom)
        for dom in del_domains:
            del domain_pairs[dom]
    
        # pretty print
        if bool_sys_turn: print("[SYS]", end=" ")
        else:             print("[USR]", end=" ")
        print("(turn {})".format(i))

        log_one['1. dialogue_state'] = log_one['metadata']
        log_one['2. dialogue_act'] = log_one['dialog_act']
        log_one['3. text'] = log_one['text']
        del log_one['metadata']
        del log_one['dialog_act']
        del log_one['text']
        print(indent(pformat(log_one, width=100), ' ' * 4))
    
    # transform signal to boolean
    if 'stop' in signal:
        signal = True
    else: 
        signal = False
    return signal

### Step 1.1 Import the MultiWOZ dataset.

The MultiWOZ dataset consists of 7 domains ('hotel', 'train', 'attraction', 'restaurant', 'taxi', 'policy', 'hospital'). This is a data set about the conversations the 'system' has to help.
It consists of about 10,000 dialog sets, and is divided for train, validation, and test.

If the code below is executed, 100 data names for train in the MultiWOZ data set are output.

In [None]:
cur_dir = os.path.abspath(os.curdir)
print("current directory :", cur_dir)
data_dir = "ConvLab-2/data/multiwoz"
processed_data_dir = os.path.join(cur_dir, 'multiwoz_data/all_data')
if not os.path.exists(processed_data_dir):
    os.makedirs(processed_data_dir)

data_key = ['train', 'val', 'test']
data = {}
for key in data_key:
    data[key] = read_zipped_json(os.path.join(data_dir, key + '.json.zip'), key + '.json')
    print('load {}.json...! '.format(os.path.join(data_dir, key)))
    print('number of dialogues : {}'.format(len(data[key])))
print()

# print available dialogue name until 100
print(list(data['train'].keys())[:100])


## Step 1.2 Let's see what the data looks like.

You can check the dialogue by inserting some of the data names printed above into the Python list. (ex. names = \['SNG0943', 'MUL1801'] ))

One dialogue is loud

1. The user's goal, ('goal' in the code))

2. Conversation between system and user, ('dialogue_turns' in code))

are separated by .

***

user (\[USR]]) reads the defined goal and manual, and conducts a conversation to achieve the goal.

The system (\[SYS]) does not know the user's goal, (1) grasps the conditions the user wants through conversation, (2) provides the information the user wants, and (3) makes a reservation if necessary.

***

within the goal,

`info` is the content from the user's point of view to inform the system of the conditions and needs that the user wants,

`reqt` is the content that uesr wants to request from the system from the user's point of view.

***

In the case of this dataset, the goal is to create a __system model__.

***

If you run the code below, you can see the goal and utterance, and you can skip it with Enter.

If you want to stop watching, type stop and then press Enter.

In [None]:
# You can handle dialogue_names
dialogue_names = ['SNG0943', 'MUL1801']

for name in dialogue_names:
    
    print()
    print('-' * 125)
    print("[Dialogue name] \'{}\'".format(name))

    # access datum using name
    datum = data['train'][name]
    goal = datum['goal']
    dialogue_turns = datum['log']

    # print goal and dialogue turns
    pprint_goal(goal, name)
    valid_domains = get_valid_domains(goal)
    break_signal = pprint_turns(dialogue_turns, valid_domains)  # If you don't want to see print, please comment!
    # break_signal = pprint_turns(dialogue_turns, valid_domains)    Like this!

    if break_signal:
        break

# Step 2. Build a Pipelined Conversation System using ConvLab

## Step 2.0 Define required modules

These are modules that help the code below, and can be modified if necessary.

In [None]:
!python -m spacy download en_core_web_sm

In [None]:
# common import: convlab2.$module.$model.$dataset
from convlab2.nlu.jointBERT.multiwoz import BERTNLU
from convlab2.nlu.milu.multiwoz import MILU
from convlab2.dst.rule.multiwoz import RuleDST
from convlab2.policy.rule.multiwoz import RulePolicy
from convlab2.nlg.template.multiwoz import TemplateNLG
from convlab2.dialog_agent import BiSession, Agent, PipelineAgent
from convlab2.evaluator.multiwoz_eval import MultiWozEvaluator
from pprint import pprint
import random
import numpy as np
import torch
import spacy

import logging 
# uncessary logging block
mpl_logger = logging.getLogger('matplotlib') 
mpl_logger.setLevel(logging.WARNING) 
cntp_logger = logging.getLogger('urllib3.connectionpool')
cntp_logger.setLevel(logging.WARNING)
ttu_logger = logging.getLogger('transformers.tokenization_utils')
ttu_logger.setLevel(logging.WARNING)
tcu_logger = logging.getLogger('transformers.configuration_utils')
tcu_logger.setLevel(logging.WARNING)
tmu_logger = logging.getLogger('transformers.modeling_utils')
tmu_logger.setLevel(logging.WARNING)
logging.getLogger().setLevel(logging.INFO)
import warnings
warnings.filterwarnings("ignore")

def set_seed(r_seed):
    random.seed(r_seed)
    np.random.seed(r_seed)
    torch.manual_seed(r_seed)


## Step 2.1 Let's look at an example Pipelined dialog system.

First, let's see how we can conduct a conversation when the system model is configured as a pipelined dialog system.

The pipelined dialog model consists of four main parts.

NLU (Natural Language Understanding (Language Recognition Module)): Understands and interprets the immediately preceding opponent's utterance.

DST (Dialogue State Tracking module): Tracks the context of a conversation up to now, and updates any changes.

Dialogue Policy module: Determines the policy in the form of structured words for the next utterance. (It only determines the intent, not the natural form of the sentence.)

NLG (Natural Language Generation): Generates human-readable natural language with a determined policy.

-----------------

Below is an example of configuring the most basic Pipelined dialog system.

BERT NLU: BERT NLU dealt with in practice 1 above

RuleDST : Rule-based DST module

RulePolicy : Rule-based Policy module

TemplateNLG: Template-based (a method of filling in words on a set template) NLG module

When you have finished defining 4 modules, declare sys_agent by wrapping it in a wrapper called PipelineAgent.

In [None]:
spacy.load('en_core_web_sm')
# MILU
sys_nlu = MILU()
# simple rule DST
sys_dst = RuleDST()
# rule policy
sys_policy = RulePolicy()
# template NLG
sys_nlg = TemplateNLG(is_user=False)
# assemble
sys_agent = PipelineAgent(sys_nlu, sys_dst, sys_policy, sys_nlg, name='sys')

sys_agent.response("user's utterance", print_nlu=False, print_dst=False, print_pol=False) , it responds to user's utterance.

If you change print_nlu, print_dst, and print_pol to True, you can print the corresponding value.

In [None]:
sys_agent.init_session()
sys_agent.response("I want to find a moderate hotel", print_nlu=False, print_dst=False, print_pol=False)

In [None]:
sys_agent.response("Which type of hotel is it ?")

In [None]:
sys_agent.response("OK , where is its address ?")

In [None]:
sys_agent.response("Thank you !")

In [None]:
sys_agent.response("Try to find me a Chinese restaurant in south area .")

In [None]:
sys_agent.response("Which kind of food it provides ?")

In [None]:
sys_agent.response("Book a table for 5 , this Sunday .")

## Step 2.2 Let's configure the user simulator to talk to the system agent.

In order to check the performance of the system model, a user simulator that can act as a user is required.

This is because it requires a lot of labor for a person to take the role of a user and exchange conversations every time.

In particular, when the Dialog Policy is set as the RL agent, the user simulator is essential for various conversation attempts.

In ConvLab, when RulePolicy(character='usr') is set, it is converted into `Agenda` policy, which can define a user model based on the user's goal.

Also, since the `Agenda` policy also includes the dst model, `user_dst = None` becomes.

In [None]:
# MILU
user_nlu = MILU()
# not use dst
user_dst = None
# rule policy
user_policy = RulePolicy(character='usr')   # UserPolicyAgendaMultiWoz()
# template NLG
user_nlg = TemplateNLG(is_user=True)
# user_nlg = SCLSTM(is_user=True)
# assemble
user_agent = PipelineAgent(user_nlu, user_dst, user_policy, user_nlg, name='user')

## Step 2.3 Let's conduct a conversation between the user simulator and the system model.

So far, we have defined a user simulator and a system model.

The `MultiWozEvaluator` class is used to evaluate performance. (Defines the user's goal.)

The `BiSession` class helps to talk and evaluate the user simulator and system model.

The `next_turn` function performs one turn of conversation.

### Rating metric

- Success rate: Reservation successful + recall == 1, that is, making a reservation that meets the user's condition, and outputting an appropriate value for all requested information

- Book rate: Reservation completion rate (= Number of successful reservations / Number of correct reservations)

- Inform precision: (TP) / (TP + FP), low precision can be interpreted as notifying a lot of unnecessary information other than the requested slot.

- Inform recall: (TP) / (TP + FN), low recall can be interpreted as not answering the requested slot.

- Inform F1: Harmonic Average for Precision & Recall

In [None]:
evaluator = MultiWozEvaluator()
sess = BiSession(sys_agent=sys_agent, user_agent=user_agent, kb_query=None, evaluator=evaluator)

set_seed(20200804)

sys_response = ''
sess.init_session()
print('init goal:')
pprint(sess.evaluator.goal)
print('-'*100)
for i in range(20):
    sys_response, user_response, session_over, reward = sess.next_turn(sys_response)
    print('user:', user_response)
    print('sys:', sys_response)
    print()
    if session_over is True:
        break
print('task success:', sess.evaluator.task_success())
print('book rate:', sess.evaluator.book_rate())
print('inform precision/recall/f1:', sess.evaluator.inform_F1())
print('-'*50)
print('final goal:')
pprint(sess.evaluator.goal)

print('='*100)

# Step 3. Configure, diagnose, and evaluate models with modules provided by ConvLab

## Step 3.0. Loads models supported by ConvLab.

Available models:

- NLU: BERTNLU, MILU, SVMNLU
- DST: RuleDST
- Word-DST: SUMBT, TRADE (set `sys_nlu` to `None`)
- Policy: RulePolicy, Imitation, REINFORCE, PPO, GDPL
- Word-Policy: MDRG, HDSA, LaRL (set `sys_nlg` to `None`)
- NLG: Template, SCLSTM
- End2End: Sequicity, DAMD, RNN_rollout (directly used as `sys_agent`)
- Simulator policy: Agenda, VHUS (for `user_policy`)

In [None]:
# available NLU models
from convlab2.nlu.svm.multiwoz import SVMNLU
from convlab2.nlu.jointBERT.multiwoz import BERTNLU
from convlab2.nlu.milu.multiwoz import MILU
# available DST models
from convlab2.dst.rule.multiwoz import RuleDST
#from convlab2.dst.mdbt.multiwoz import MDBT
from convlab2.dst.sumbt.multiwoz import SUMBT
from convlab2.dst.trade.multiwoz import TRADE
# available Policy models
from convlab2.policy.rule.multiwoz import RulePolicy
from convlab2.policy.ppo.multiwoz import PPOPolicy
from convlab2.policy.pg.multiwoz import PGPolicy
from convlab2.policy.mle.multiwoz import MLEPolicy
from convlab2.policy.gdpl.multiwoz import GDPLPolicy
#from convlab2.policy.vhus.multiwoz import UserPolicyVHUS
from convlab2.policy.mdrg.multiwoz import MDRGWordPolicy
from convlab2.policy.hdsa.multiwoz import HDSA
from convlab2.policy.larl.multiwoz import LaRL
# available NLG models
from convlab2.nlg.template.multiwoz import TemplateNLG
from convlab2.nlg.sclstm.multiwoz import SCLSTM
# available E2E models
from convlab2.e2e.sequicity.multiwoz import Sequicity
from convlab2.e2e.damd.multiwoz import Damd

## Step 3.1. Let's make our own dialog system with models supported by ConvLab.

Word-DST models refer to the combined model of NLU and DST. Therefore, it can be used without a separate NLU model.

Therefore, (1) NLU+RuleDST or (2) Word-DST can be combined.

**\[Caution!]** For Word-DST, sys_nlu = None.

Word-Policy models refer to the combined model of Dialogue Policy and NLG. So it can be used without a separate NLG model.

Therefore, it is possible to combine (1) Policy+NLG or Word-Policy.

**\[Caution!]** For Word-Policy, sys_nlg = None.

You can use the `PipelineAgent` class to create a Pipelined dialog system. Alternatively, you can use the End-to-End model.

In [None]:
# NLU+RuleDST:
sys_nlu = MILU()
# sys_nlu = SVMNLU()
# sys_nlu = BERTNLU()
sys_dst = RuleDST()

# or Word-DST:
# sys_nlu = None
# sys_dst = SUMBT()
# sys_dst = TRADE()
#### (not working!) sys_dst = MDBT()

# [Caution] In Word-DST case, sys_nlu must be "None"

# Policy+NLG:
sys_policy = RulePolicy()
# sys_policy = PPOPolicy()
# sys_policy = PGPolicy()
# sys_policy = MLEPolicy()
# sys_policy = GDPLPolicy()
sys_nlg = TemplateNLG(is_user=False)
#sys_nlg = SCLSTM(is_user=False)

# or Word-Policy:
# sys_policy = LaRL()
# sys_policy = HDSA()
# sys_policy = MDRGWordPolicy()
# sys_nlg = None

# [Caution] "In Word-policy case, sys_nlg must be None"

sys_agent = PipelineAgent(sys_nlu, sys_dst, sys_policy, sys_nlg, 'sys')
# sys_agent = Sequicity()
# sys_agent = Damd()

As we did before, we also define the user simulator.

(In ConvLab, when RulePolicy(character='usr') is set, it is converted to `Agenda` policy, which can define a user model based on the user's goal.)

In [None]:
user_nlu = BERTNLU()
# user_nlu = MILU()
# user_nlu = SVMNLU()
user_dst = None
user_policy = RulePolicy(character='usr')
# user_policy = UserPolicyVHUS(load_from_zip=True)
user_nlg = TemplateNLG(is_user=True)
# user_nlg = SCLSTM(is_user=True)
user_agent = PipelineAgent(user_nlu, user_dst, user_policy, user_nlg, name='user')

## Step 3.2 Let's diagnose the system model using the analysis tool.

Convlab2 provides an analysis tool, which allows you to analyze and diagnose the performance and vulnerabilities of the defined system model.

In addition, you can create an HTML report to obtain a little richer statistical information. (referring to results/\$model_name\$ )

In [None]:
from convlab2.util.analysis_tool.analyzer import Analyzer

# if sys_nlu!=None, set use_nlu=True to collect more information
analyzer = Analyzer(user_agent=user_agent, dataset='multiwoz')

set_seed(20200131)
analyzer.comprehensive_analyze(sys_agent=sys_agent, model_name='sys_agent', total_dialog=20)

## Step 3.3 Let's compare the performance between several system models.

Let's fill in the results for three different system models below. (Based on vscode, double-click to edit.)

NLU | DST | Policy | NLG | Success rate | Book rate | Inform P | Inform R | Inform F1 | Turn(succ/all) |
--------- | --------- | --------- | :----------: | :----------: | --------- | -------- | --------- | -------- | -------------- |
- | - | - | - | - | - | - | - | - | - |
- | - | - | - | - | - | - | - | - | - |
- | - | - | - | - | - | - | - | - | - |

In [None]:
set_seed(20200805)
# define your own system agent2
# sys_agent2 = PipelineAgent(...)
# define your own system agent3
# sys_agent3 = PipelineAgent(...)

analyzer.compare_models(agent_list=[sys_agent, sys_agent, sys_agent], model_name=['sys_agent1', 'sys_agent2', 'sys_agent3'], total_dialog=20)

# Additional. Try the End-to-end Neural Pipeline (ACL 2020) model

Paper : Donghoon Ham *, Jeong-Gwan Lee *, Youngsoo Jang, and Kee-Eung Kim. 2020. End-to-End Neural Pipeline for Goal-Oriented Dialogue System using GPT-2. ACL 2020

![Model architecture](image/e2e_model.png)

First, import the model in Convlab2 and download the pretrained weights with multiwoz.

In [None]:
from convlab2.e2e.Transformer import Transformer
sys_agent = Transformer()

Unlike other e2e agents used above, the neural pipeline model can check the dialogue state and system action (dialogue policy).

In [None]:
sys_agent.init_session()
sys_agent.response("I want to find a moderate hotel")

## Let's talk to the neural pipeline model!

In [None]:
sys_agent.init_session()
while True:
    raw_text = input(">>> ")
    while not raw_text:
        print('not empty')
        raw_text = input(">>> ")
    if raw_text == 'r':
        sys_agent.init_session()
        continue
    if raw_text == 'stop':
        break
    out_text = sys_agent.response(raw_text)
    print('sys: ', out_text)