# DAgger on Part of Speech tagging

This notebook shows how to run the imitation learning algorithm DAgger (Dataset Aggregation, [Ross et al. (2011)](https://arxiv.org/pdf/1011.0686.pdf)) on a toy part of speech tagging dataset, showcasing its benefits. It follows the terminology of the EACL 2017 tutorial on imitation learning for structured prediction ([Vlachos et al. 2017](http://sheffieldnlp.github.io/ImitationLearningTutorialEACL2017/)) and the code from this [github repository](http://github.com/andreasvlachos/structured_imitation_demo). The latter uses [scikit-learn](http://scikit-learn.org/stable/) classifiers in Python3 to faciliate adoptions by academic researchers and software developers. The notebook follows closely the code in this [file](http://github.com/andreasvlachos/structured_imitation_demo/blob/master/src/POSdemo.py), if you would rather go straight there.

In what follows we show how to do this step-by-step. First import the library:

In [5]:
import imitation

Define the (typically structured) input and the structured output, combined in an instance:

In [6]:
class POSInput(imitation.StructuredInput):
    def __init__(self, tokens):
        self.tokens = tokens  

class POSOutput(imitation.StructuredOutput):
    def __init__(self, tags=None):
        self.tags = []
        if tags!=None:
            self.tags = tags

class POSInstance(imitation.StructuredInstance):
    def __init__(self, tokens, tags=None):
        super().__init__()
        self.input = POSInput(tokens)
        self.output = POSOutput(tags)

Most of the work is defining the transition system. The package has a class ```TransitionSystem``` that helps define it. See the comments in the code for some hints about its construction: 

In [None]:
POSTransitionSystem(imitation.TransitionSystem):

    class WordAction(imitation.TransitionSystem.Action):
        def __init__(self):
            # The superclass constructor initializes the label and the features that each action has
            super().__init__()

    # the agenda for word prediction is one action per token, left-to-right
    def __init__(self, structured_instance=None):
        super().__init__(structured_instance)
        if structured_instance == None:
            return
        for tokenNo, token in enumerate(structured_instance.input.tokens):
            newAction = self.WordAction()
            newAction.tokenNo = tokenNo
            self.agenda.append(newAction)

    # the expert policy is trivial in the case of PoS tagging: just return the correct label from gold
    def expert_policy(self, structured_instance, action):
        # just return the next action
        return structured_instance.output.tags[action.tokenNo]

    # In principle we could be doing more book-keeping 
    def updateWithAction(self, action, structuredInstance):
        # add it as an action though
        self.actionsTaken.append(action)

    # all the feature engineering goes here
    def extractFeatures(self, structured_instance, action):
        # e.g the word itself that we are tagging
        features = {"currentWord=" + structured_instance.input.tokens[action.tokenNo]: 1}

        # features based on the previous predictionsof this stage are to be accessed via the self.actionsTaken
        # e.g. the previous action
        if len(self.actionsTaken) > 0:
            features["prevPrediction=" + self.actionsTaken[-1].label] = 1
        else:
            features["prevPrediction=NULL"] = 1

        # features based on earlier stages via the state variable.

        return features

    def to_output(self):
        """
        Convert the action sequence in the state to the
        actual prediction, i.e. a sequence of tags
        """
        tags = []
        for action in self.actionsTaken:
            tags.append(action.label)
        return POSOutput(tags)


### Acknowledgments

Gerasimos, Sebastian