In [1]:
pip install warpctc-pytorch11-cuda101!git clone https://github.com/facebookresearch/ParlAI.git ParlAI
!!cd ParlAI; python setup.py develop

Cloning into 'ParlAI'...
remote: Enumerating objects: 199, done.[K
remote: Counting objects: 100% (199/199), done.[K
remote: Compressing objects: 100% (151/151), done.[K
remote: Total 30558 (delta 100), reused 115 (delta 48), pack-reused 30359[K
Receiving objects: 100% (30558/30558), 58.64 MiB | 36.00 MiB/s, done.
Resolving deltas: 100% (21706/21706), done.


['running develop',
 'running egg_info',
 'creating parlai.egg-info',
 'writing parlai.egg-info/PKG-INFO',
 'writing dependency_links to parlai.egg-info/dependency_links.txt',
 'writing entry points to parlai.egg-info/entry_points.txt',
 'writing requirements to parlai.egg-info/requires.txt',
 'writing top-level names to parlai.egg-info/top_level.txt',
 "writing manifest file 'parlai.egg-info/SOURCES.txt'",
 "reading manifest template 'MANIFEST.in'",
 "writing manifest file 'parlai.egg-info/SOURCES.txt'",
 'running build_ext',
 'Creating /usr/local/lib/python3.6/dist-packages/parlai.egg-link (link to .)',
 'Adding parlai 0.1.20200517 to easy-install.pth file',
 '',
 'Installed /content/ParlAI',
 'Processing dependencies for parlai==0.1.20200517',
 'Searching for websocket-server==0.4',
 'Reading https://pypi.org/simple/websocket-server/',
 'Downloading https://files.pythonhosted.org/packages/74/64/e86581ee7775a2e08aca530b41e1a1e3ee6b320233b1eff301dcb86d1636/websocket_server-0.4.tar.gz#

# Getting a New Dataset Into ParlAI: the simplest way
Here’s an example dataset with a single episode with 2 examples:

In [0]:
text = "text:hello how are you today? \tlabels:i'm great thanks! what are you doing?\ntext:i've just been biking. \tlabels:oh nice, i haven't got on a bike in years! \tepisode_done:True"
with open('/tmp/data.txt', 'w') as handler:
    handler.write(text)

In [16]:
!python ParlAI/parlai/scripts/display_data.py -t fromfile:parlaiformat --fromfile_datapath /tmp/data.txt

[creating task(s): fromfile:parlaiformat]
[loading parlAI text data:/tmp/data.txt]
[1;31m- - - NEW EPISODE: tmp/data.txt - - -[0;0m
[0mhello how are you today? [0;0m
   [1;94mi'm great thanks! what are you doing?[0;0m
[0mi've just been biking. [0;0m
   [1;94moh nice, i haven't got on a bike in years! [0;0m
EPOCH DONE
[ loaded 1 episodes with a total of 2 examples ]


Essentially, there is one training example every line, and each field in a ParlAI message is tab separated with the name of the field, followed by a colon. E.g. the usual fields like ‘text’, ‘labels’, ‘label_candidates’ etc. can all be used, or you can add your own fields too if you have a special use for them.

# Creating a New Task: the more complete way
Tasks code is located in the parlai/tasks directory.
If your data is in the ParlAI format, you effectively only need a tiny bit of boilerplate to load it, see e.g. the code for the fromfile task agent we just used.

But right now, let’s go through all the steps. You will need to:
1. Add an `__init__.py` file to make sure imports work correctly.

2. Implement `build.py` to download and build any needed data.

3. Implement `agents.py`, with at least a `DefaultTeacher` which extends Teacher or one of its children.

4. Add the task to the the task list.

Below we go into more details for each of these steps.

## Part 1: Building the Data
We first need to create functionality for downloading and setting up the dataset that is going to be used for the task.

In [0]:
import parlai.core.build_data as build_data
import os

Now we define our build method, which takes in the argument `opt`, which contains parsed arguments from the command line (or their default), including the path to the data directory. 

In [0]:
def build(opt):
    # get path to data directory
    dpath = os.path.join(opt['datapath'], 'mnist')
    # define version if any
    version = None

    # check if data had been previously built
    if not build_data.built(dpath, version_string=version):
        print('[building data: ' + dpath + ']')

        # make a clean directory if needed
        if build_data.built(dpath):
            # an older version exists, so remove these outdated files.
            build_data.remove_dir(dpath)
        build_data.make_dir(dpath)

        # download the data.
        fname = 'mnist.tar.gz'
        url = 'http://parl.ai/downloads/mnist/' + fname # dataset URL
        build_data.download(url, dpath, fname)

        # uncompress it
        build_data.untar(dpath, fname)

        # mark the data as built
        build_data.mark_done(dpath, version_string=version)

## Part 2: Creating the Teacher
Now that we have our data, we need an agent that understand the task’s structure and is able to present it. In other words, we need a Teacher. Every task requires an agents.py file in which we define the agents for the task. It is there that we will define our teacher.



Teachers already in the ParlAI system use a series of subclasses, each with additional functionality (and fewer methods to implement). These follow the path 

`Agent` => `Teacher` => `FixedDialogTeacher` => `DialogTeacher` => `ParlAIDialogTeacher`


- The simplest method available for creating a teacher is to use the `ParlAIDialogTeacher` class, which makes the process very simple if the text data is already formatted in the ParlAI Dialog format.

- If the data is not in this format, one can still use the `DialogTeacher` which automates much of the work in setting up a dialog task, but gives the user more flexibility in loading the data from the disk. 

- If the data is still a fixed set (e.g. is not dynamic, is based on fixed files) and even more functionality is needed, such as providing extra information like the answer indices for the `SQuAD` dataset, one can use the `FixedDialogTeacher` class.

- Finally, if the requirements for the task do not fit any of the above, one can still write a task from scratch without much trouble.

```python
class Teacher:
__init__(), observe(), act()

class FixedDialogTeacher:
__init__(), get(), num_examples(), num_episodes()

class DialogTeacher:
__init__(), setup_data()

class ParlAIDialogTeacher:
__init__()
```