<h1>DialogueAct Tagger</h1>

<h3>Abstract</h3>
This notebook provides an overview of the main features of the DialogueAct Tagger repository, including instructions on how to configure, train and test the Dialogue Act tagger on the various provided corpora. This project is currently under development and still contains various bugs and missing features. You're more than welcome to add any ideas or issues in the "Issues" section of the repo, or to contact anyone listed under the "Contacts" section for help and support. If you use this work, remember to cite 

<i>Mezza, Stefano, et al. "ISO-Standard Domain-Independent Dialogue Act Tagging for Conversational Agents." Proceedings of the 27th International Conference on Computational Linguistics. 2018.</i>

<h3> 1. Getting started </h3>
This notebook requires Python 3.5+ to work correctly.

After cloning the repository, please launch the <code>install.sh</code> script, which will install all the necessary python dependencies and download all the publicly-available corpora, placing them in their default directories. 

<h3> 2. Training an SVM Dialogue Act Tagger </h3>

We will begin by training a Dialogue Act Tagger based on Support Vector Machines and Scikit learn classifiers. The first thing to do is to create an SVM Config:

In [None]:
import os
from pathlib import Path
from config import SVMConfig

The SVM Config takes the following parameters, which you can change in the code below to obtain different Dialogue Act Taggers:

<ul>
    <li><b>taxonomy:</b> this is the taxonomy (i.e. set of tags) that you want to use. We currently support all the default taxonomies for the provided datasets, plus the ISO Standard for Dialogue Act Tagging [1]. 
    </li>
    <li><b>dep, indexed_dep, indexed_pos, prev, ngrams:</b> whether the SVM classifier should use any of those features in the learning and inference phases. The features are, in order: <i>Dependency tags</i>, <i>Indexed dependency tags</i> (i.e. dependency tags with the index of the corresponding token), <i>Indexed Part-Of-Speech (POS) tags</i>, <i>Previous Dialogue Act label</i>, <i>Length of the n-grams for lexical features</i>
    <li> <b>List of corpora to use for the training</b>,passed as a list of Tuples (Type of the corpus, folder containing the corpus)</li>
</ul>

In [None]:
from corpora.taxonomy import Taxonomy
from corpora.maptask import Maptask
from corpora.switchboard import Switchboard
from corpora.ami import AMI

config = SVMConfig(taxonomy=Taxonomy.ISO, 
                   dep=True, 
                   indexed_dep=True, 
                   indexed_pos=True, 
                   prev=True, 
                   ngrams=True,
                   corpora_list=[(Maptask, str(Path("data/Maptask").resolve())),
                                 (AMI, str(Path("data/AMI/corpus").resolve())),
                                 (Switchboard, str(Path("data/Switchboard").resolve()))])

Now that we have a config file, we can create the SVM Trainer object, which takes just our config file as input

In [None]:
from trainers.svm_trainer import SVMTrainer
trainer = SVMTrainer(config)

The trainer's <code>train</code> method will train a dialogue act tagger. It will both return the tagger as an output and save it in the <code>models</code> folder, in a subfolder based on the current timestamp

In [None]:
path = Path(os.path.dirname(trainer.config.out_folder))

In [None]:
da_tagger = trainer.train()

We can now finally use our DA tagger to tag an input utterance. The tagger is contextual, meaning that it will use the previous utterance as context when predicting the next one. It is possible to use the <code>Utterance</code> class as input to provide this information. Alternatively, the tagger will use the previous DA it predicted, which is stored internally by the class. We will now see an example of both these behaviours:

In [None]:
da_tagger.tag("Do you like chicken?")
da_tagger.tag("Yes")

da_tagger.tag(Utterance("yes", [], [], 0)))