## Task

Use basic NLP to recognise food orders. A food order includes food, its ingredients and quantity desired. This exercise requires food orders to be in Spanish. Given an input food order, must return the food (comida), ingredients (ingredientes), and quantity (cantidad). There is no need to construct an intention classifier as the intention of all phrases is 'order food'. There is also no need to normalise the output. For example, it's not necessary to convert 'tres' to '3', nor 'pizzas' to 'pizza'.

An example input food order and the output would be:

    'Quiero 3 bocadillos de anchoas y 2 pizzas' -> 
    
    [
        {comida:'bocadillo', ingrediente:'anchoas', cantidad:'3'},
        {comida:'pizza', ingrediente:'null', cantidad:'2'}
    ]
     
The output is array of 2 dictionaries of 3 keys each (food, ingredient and quantity). When a quantity is not detected, the default quantity is 1.

Chapter 7 of the NLTK book Natural Language Processing with Python by Bird, Klein, and Loper is very instructive. Particularly section 3.3 (Training classifier-based chunkers)

Will need to use 4 different approaches; using `RegexpParser`, a unigram tagger, a bigram tagger, and `NaiveBayesClassifier` from the NLTK library.

Training phrases must be in Spanish, and POS tagger are not very good Spanish, so low precision can be expected.

Need to constructan NLP chain with NLTK with the following elements:
* Segmentation of phrases
* Tokenisation
* POS tagger

The POS tags will be used for the RegexParser, UnigramParser, BigramParser, and the NaiveBayesClassifier

For the RegexParser, there is no need to create a training corpus. Specific grammar will need to be generated, and pass that to the `RegexpParser` function of NLTK.

For the other methods, a training corpus needs to be created in the IOB format:

    yo PRP I                        Personal Pronoun
    quería VBD I                    Verb Past Tense
    una DT I-Cantidad               Determiner
    hamburguesa NN I-Comida         Noun, singular or mass

### Spanish POS Tagger, Corpus Creation, Grammar Generation

* We will begin by importing the libraries that we will use for this entire notebook
* Then we will import a Spanish corpus that contains 188,650 words that have been syntactically annotated
* We will then split the corpus into training and test
* Next, we will train our Spanish POS tagger useing the training corpus and evaluate it's accuracy with the test corpus
* Then we will create and tag our own corpus for the purposes of this exercise
* Finally, we will generate the grammar required to chunk our corpus into the desired chunk

In [87]:
# Import libraries

import nltk
from nltk.corpus import cess_esp # We will use this corpus to train our spanish pos tagger
from nltk.tag.hmm import HiddenMarkovModelTagger # This will be our spanish pos tagger

In [88]:
cess_sents = cess_esp.tagged_sents() # Get tagged sentences from corpus


# Split tagged sentences into 90% train, 10% test

training = [] 
test = []

for i in range(len(cess_sents)):
    if i % 10:
        training.append(cess_sents[i])
    else:
        test.append(cess_sents[i])

In [89]:
# Train spanish pos tagger

hmm_tagger = HiddenMarkovModelTagger.train(training)

In [90]:
# Print the accuracy of our spanish pos tagger

print('Tagger has an accuracy of' , round(hmm_tagger.evaluate(test)*100, 2), '%')

Tagger has an accuracy of 89.89 %


In [91]:
# Generate a corpus of food orders
sents = ['Yo quiero 3 hamburguesas', 'Yo quería 3 bocadillos de jamón', 'Yo quiero una pizza',
         'Queremos un bocadillo de pavo', 'Yo quiero 4 croquetas de jamón', 'Quiero un brownie',
         'Quiero cuatro galletas', 'Quiero una tarta de queso', 'Queiro 3 tacos al pastor', 'Quiero una Coca Cola',
         'Quiero 6 alitas de pollo']

sents_tagged = [] # Empty list to store tagged phrases

# Iterate through each phrase to obtain tags
for sent in sents:
    tokens = nltk.word_tokenize(sent) # Tokenize phrase
    tokens_tagged = hmm_tagger.tag(tokens) # Tag tokens
    sents_tagged.append(tokens_tagged) # Append tokens to sents_tagged list

In [92]:
# Generate grammar to chunk 'Cantidad', 'Ingrediente', and 'Comida' for each order 
grammar = r"""
        Cantidad: {<Z|di.*|c.*|dn.*>}
        Ingrediente: {<sp.*><da.*|ncm.*>}
        Comida: {<nc.*|sn.e-SUJ>*<aq.*>?}
"""

* Quantity is detected when a token is tagged with `Z` or any tags beginning with `di`, `c`, or `dn`.
<br></br>
* Ingrediente is detected when a token is tagged with any tag beginning with `sp` followed by any tag beginning with `da` or `ncm`.
    * the `sp` tag refers to 'de' as in 'tarta de chocolate'  
<br></br>
* Comida is detected when a token is tagged with any number of tags beginning with `nc` or `sn.e-SUJ` followed by any tag beginning with `aq` should such a tag exist after the previous.
    * The `aq` tag is optional is detecting 'Comida' and is not necessary to generate a 'Comida' chunk

### Regex Parser

* First we will create our parser using nltk's `RegexpParser` and the grammar written above
* Then we will parse our tagged corpus using our regex parser
* Then we will go through each parsed phrases tree to extract the desired inforation, storing each order in a dictionary, and storing all order dictionaries in a list
* Finally, we will print each order to the screen

In [93]:
parser = nltk.RegexpParser(grammar) # Create chunk parser

sents_parsed = [] # Emtpy list to store parsed phrases

# Parse phrases and store in sents_parsed list

for sent in sents_tagged:
    results = parser.parse(sent)
    sents_parsed.append(results)

print(sents_parsed) # Uncomment to print

[Tree('S', [('Yo', 'pp1csn00'), ('quiero', 'vmip1s0'), Tree('Cantidad', [('3', 'cs')]), Tree('Comida', [('hamburguesas', 'sn.e-SUJ')])]), Tree('S', [('Yo', 'pp1csn00'), ('quería', 'vmii3s0'), Tree('Cantidad', [('3', 'Z')]), Tree('Comida', [('bocadillos', 'ncmp000')]), Tree('Ingrediente', [('de', 'sps00'), ('jamón', 'ncms000')])]), Tree('S', [('Yo', 'pp1csn00'), ('quiero', 'vmip1s0'), Tree('Cantidad', [('una', 'di0fs0')]), Tree('Comida', [('pizza', 'ncfs000')])]), Tree('S', [('Queremos', 'sps00'), Tree('Cantidad', [('un', 'di0ms0')]), Tree('Comida', [('bocadillo', 'ncms000')]), Tree('Ingrediente', [('de', 'sps00'), ('pavo', 'da0fs0')])]), Tree('S', [('Yo', 'pp1csn00'), ('quiero', 'vmip1s0'), Tree('Cantidad', [('4', 'di0fp0')]), Tree('Comida', [('croquetas', 'ncfp000')]), Tree('Ingrediente', [('de', 'sps00'), ('jamón', 'ncms000')])]), Tree('S', [('Quiero', 'sps00'), Tree('Cantidad', [('un', 'di0ms0')]), Tree('Comida', [('brownie', 'ncms000')])]), Tree('S', [('Quiero', 'da0mp0'), Tree('Ca

In [94]:
orders = [] # Empty list to store orders

# Iterate over parsed phrases

for sent in sents_parsed:
    order = {} # Emtpy dictionary to store an order

    for n in sent: # Access each node of the tree
        if isinstance(n, nltk.tree.Tree): # Returns True as the nodes is a subclass of a tree. This step is necessary and doesn't work otherwise
            if n.label() == 'Cantidad': # Check if a chunk is labelled 'Cantidad'
                if n.leaves()[0][0] in ['una', 'un']: # Convert 'una' and 'un' to '1'
                    order['Cantidad'] = '1' # Add a key value pair to the order dictionary
                else:
                    order['Cantidad'] = n.leaves()[0][0] # Add the quantity number detecting in the chunking to the 'Cantidad' key of the order dictionary
            elif n.label() == 'Comida': # Checks if a chunk is labelled 'Comida'
                comida = '' # Empty string for formatting the output
                # Format output of 'Comida' to be more easily read if chunk has more than one element
                for i in range(len(n.leaves())):
                    comida += n.leaves()[i][0] + ' '
                order['Comida'] = comida.strip()
            if n.label() == 'Ingrediente': # Check if a chunk is labelled 'Ingrediente'
                order['Ingrediente'] = n.leaves()[1][0] # Add ingredient to order dictionary
                
    orders.append(order) # append order dictionary to orders list

In [95]:
# Print each order
for i in range(len(orders)):
    print('Order', i+1, ':', orders[i])

Order 1 : {'Cantidad': '3', 'Comida': 'hamburguesas'}
Order 2 : {'Cantidad': '3', 'Comida': 'bocadillos', 'Ingrediente': 'jamón'}
Order 3 : {'Cantidad': '1', 'Comida': 'pizza'}
Order 4 : {'Cantidad': '1', 'Comida': 'bocadillo', 'Ingrediente': 'pavo'}
Order 5 : {'Cantidad': '4', 'Comida': 'croquetas', 'Ingrediente': 'jamón'}
Order 6 : {'Cantidad': '1', 'Comida': 'brownie'}
Order 7 : {'Cantidad': 'cuatro', 'Comida': 'galletas'}
Order 8 : {'Cantidad': '1', 'Comida': 'tarta', 'Ingrediente': 'queso'}
Order 9 : {'Cantidad': '3', 'Comida': 'tacos', 'Ingrediente': 'pastor'}
Order 10 : {'Cantidad': '1', 'Comida': 'Coca Cola'}
Order 11 : {'Cantidad': '6', 'Comida': 'alitas', 'Ingrediente': 'pollo'}


#### In the cell above we can see the final output using the `RegexpParser`. For the given format of ordering a food item, we can accurately obtain what food is being ordered and in which quantity along with the ingredients.

#### Issues encountered and my solutions
* Generation of grammar to chunk the desired tokens. The complexity of Spanish POS tags meant having to allow for tags having certain beginnings and not use the entire tag. For example, the tag for 'bocadillos' is 'ncmp000', however, the tag for 'pizza' is 'ncfs000'. These two tags have the first two characters in common so I decided to chunk of those characters.
* To chunk ingredients, it made sense to create grammar that found 'de' followed by another noun. This meant having to place 'ingrediente' before 'comida' when writing the grammar. If 'comida' was written first, the actual ingredients would be chunked as food
* When tyring to extract the tokens referring to quantity, food, and ingredient. I initally has some trouble accessing the tree for each parsed phrase. Including `isinstance` with `nltk.tree.Tree` data type as the second arguement solved this inital issue. Secondly, I had to 'find' the token in each node. for the quantity leaf, this was relatively simple, only having to convert 'una' and 'un' to '1', otherwise just giving the token as the value to the 'Cantidad' key fo the order dictionary. Slightly more complicated was food. This was due to the inclusion of an order of Coca Cola. In this case, the 'Comida' chunk had 2 elements, both of which I wanted to add to the dictionary under the 'Comida' key. In order to solve this, I created an empty string then added the token with a space, finally adding the filled string, removing the extra whitespace at the end, to the dictionary. Finally, the 'Ingrediente' chunk also had 2 elements 'de' and the 'ingrediente'. Fortunately I only wanted the second element, so accessing its' index is straightforward.

### Unigram Chunker and Parser
* First we will create our training phrases from the corpus created earlier. This means adding the IOB tags to the tagged phrases
* Next we will create a test phrase and tag it with the Spanish POS tagger made earlier
* Then we will create the `UnigramChunker` class object adapted from Chapter 7 of the Natural Language Processing with Python book by Bird, Klein, and Loper
* We will then train the `UnigramChunker` with our training corpus and parse our test phrase
* We will then print the order to the screen as above

In [96]:
# Create a training corpus in the IOB format

train_sents = [] # Empty list to store training phrases

for sent in sents_parsed: # Iterate through parsed corpus
    IOB = nltk.chunk.tree2conlltags(sent) # Apply IOB tags to each phrase
    train_sents.append(IOB) # Append phrases in IOB format to train_sents list
    
# print(train_sents) # Uncomment to print

In [97]:
test_sent = 'Yo quiero 4 croquetas de jamón' # Create test phrase/order
test_tokens = nltk.word_tokenize(test_sent) # Tokenize
test_tagged = hmm_tagger.tag(test_tokens) # Tag tokens with Spanish POS tagger

# print(test_tagged) # Uncomment to print

In [98]:
# Create Unigram Chunker

class UnigramChunker(nltk.ChunkParserI):
    def __init__(self, train_sents):
        # Train UnigramTagger with training corpus
        train_data = [[(t,c) for w,t,c in sent] for sent in train_sents] # Extract tags and chunk tags for each phrase/order

        self.tagger = nltk.UnigramTagger(train_data) # Train tagger and create tagger object for parsing new phrases
        
    def parse(self, sentence):
        # Parse POS tagged phrase/order
        pos_tags = [pos for word,pos in test_tagged] # Extract POS tag
        tagged_pos_tags = self.tagger.tag(pos_tags) # Apply IOB tag using trained tagger
        chunktags = [chunktag for tag,chunktag in tagged_pos_tags] # Extract chunk tag 
        conlltags = [(word,pos,chunktag) for ((word,pos),chunktag) in zip(test_tagged, chunktags)] # Put token, POS tag and chunk tag together in IOB format 
        return nltk.conlltags2tree(conlltags) # return IOB format tree for each phrase/order

In [99]:
unigram_chunker = UnigramChunker(train_sents) # Create and train chunker object

test_tree = unigram_chunker.parse(test_tagged) # Parse POS tagged test phrase/order

print(test_tree) # Uncomment to print

(S
  Yo/pp1csn00
  quiero/vmip1s0
  (Cantidad 4/di0fp0)
  (Comida croquetas/ncfp000)
  (Ingrediente de/sps00 jamón/ncms000))


In [100]:
order = {} # Empty dictionary to store order

for n in test_tree: # Iterate through nodes of tree
    if isinstance(n, nltk.tree.Tree): # As above, returns True
        if n.label() == 'Cantidad': # Check if a chunk is labelled 'Cantidad'
            if n.leaves()[0][0] in ['una', 'un']: # Convert 'una' and 'un' to '1'
                order['Cantidad'] = '1' # Add a key value pair to the order dictionary
            else:
                order['Cantidad'] = n.leaves()[0][0] # Add the quantity number detecting in the chunking to the 'Cantidad' key of the order dictionary
        elif n.label() == 'Comida': # Check if a chunk is labelled 'Comida'
            order['Comida'] = n.leaves()[0][0] # Add food to 'Comida' key of order dictionary
        elif n.label() == 'Ingrediente': # Check if a chunk is labelled 'Ingrediente'
            order['Ingrediente'] = n.leaves()[1][0] # Add ingredient to order dictionary
                

print(order)

{'Cantidad': '4', 'Comida': 'croquetas', 'Ingrediente': 'jamón'}


#### The final output using a `UnigramChunker` and `UnigramTagger` is as expected, detecteing accurately the desired chunks.

#### Issues encountered and my solutions
* The only real issue encountered in this section of the exercise was adapting the UnigramChunker form the Natural Language Processing with Python book. Initially, just copying the code written in that book did not work with the format of the trianing corpus. This was due to the training corpus already being in the IOB format. Therefore removing `nltk.chunk.tree2conlltags` from the list comprehension for genereating the training data within the `UnigramChunker` class solved this issue.
* Aside from this there were no other issues of note

### Bigram Chunker and Parser
* Next we will create a Bigram chunker and parser in the same way we created the Unigram chunker and parser
* We already have the trainging corpus and test phrase so all that is needed is to adapt the `UnigramChunker` to used the `BigramTagger` instead of the `UnigramTagger`

In [101]:
# Create Bigram Chunker

class BigramChunker(nltk.ChunkParserI):
    def __init__(self, train_sents):
        # Train BigramTagger with training corpus
        train_data = [[(t,c) for w,t,c in sent] for sent in train_sents] # Extract tags and chunk tags for each phrase/order

        self.tagger = nltk.BigramTagger(train_data) # Train tagger and create tagger object for parsing new phrases
        
    def parse(self, sentence):
        # Parse POS tagged phrase/order
        pos_tags = [pos for word,pos in test_tagged] # Extract POS tag
        tagged_pos_tags = self.tagger.tag(pos_tags) # Apply IOB tag using trained tagger
        chunktags = [chunktag for tag,chunktag in tagged_pos_tags] # Extract chunk tag 
        conlltags = [(word,pos,chunktag) for ((word,pos),chunktag) in zip(test_tagged, chunktags)] # Put token, POS tag and chunk tag together in IOB format 
        return nltk.conlltags2tree(conlltags) # return IOB format tree for each phrase/order

In [102]:
bigram_chunker = BigramChunker(train_sents) # Create and train chunker object

test_tree = bigram_chunker.parse(test_tagged) # Parse POS tagged test phrase/order

print(test_tree) # Uncomment to print

(S
  Yo/pp1csn00
  quiero/vmip1s0
  (Cantidad 4/di0fp0)
  (Comida croquetas/ncfp000)
  (Ingrediente de/sps00 jamón/ncms000))


In [103]:
order = {} # Empty dictionary to store order

for n in test_tree: # Iterate through nodes of tree
    if isinstance(n, nltk.tree.Tree): # As above, returns True
        if n.label() == 'Cantidad': # Check if a chunk is labelled 'Cantidad'
            if n.leaves()[0][0] in ['una', 'un']: # Convert 'una' and 'un' to '1'
                order['Cantidad'] = '1' # Add a key value pair to the order dictionary
            else:
                order['Cantidad'] = n.leaves()[0][0] # Add the quantity number detecting in the chunking to the 'Cantidad' key of the order dictionary
        elif n.label() == 'Comida': # Check if a chunk is labelled 'Comida'
            order['Comida'] = n.leaves()[0][0] # Add food to 'Comida' key of order dictionary
        elif n.label() == 'Ingrediente': # Check if a chunk is labelled 'Ingrediente'
            order['Ingrediente'] = n.leaves()[1][0] # Add ingredient to order dictionary
                

print(order)

{'Cantidad': '4', 'Comida': 'croquetas', 'Ingrediente': 'jamón'}


#### This is completely expected as the `BigramTagger` adds little complexity and fidelity due to the simple nature of the training corpus and test phrase.

#### There were no issue encoutered adapting the `UnigramChunker` to a `BigramChunker`

### Naive-Bayes Classifier for Chunking and Parsing

* First we create a function adapted from Chapter 7 of Natural Language Processing with Python to extract the feature of ou training phrases. In this case we want to extract the POS tags and the IOB chunk tags
* Next we create a `NaiveBayesChunkTagger` class, also adapted from Chapter 7, that uses the trianing corpus for training a `NaiveBayesClassifier`
* Then we create a `NaiveBayesChunker` class, again adapted form Chapter 7, that takes the `NaiveBayesChunkTagger` and is capable of parsing a new phrase to apply IOB chunk tags

In [104]:
def chunk_features(sentence, i, history):
     pos, chunktag = sentence[i]
     if i == 0:
         prevpos, prevchunktag = "<START>", "<START>"
     else:
         prevpos, prevchunktag = sentence[i-1]
     if i == len(sentence)-1:
         nextpos, nextchunktag = "<END>", "<END>"
     else:
         nextpos, nextchunktag = sentence[i+1]
     return {"chunktag": chunktag,
             "pos": pos,
             "prevchunktag": prevchunktag,
             "nextchunktag": nextchunktag,
             "prevchunktag+chunktag": "%s+%s" % (prevchunktag, chunktag),
             "chunktag+nextchunktag": "%s+%s" % (chunktag, nextchunktag)}

In [105]:
class NaiveBayesChunkTagger(nltk.TaggerI):
    
    def __init__(self, train_sents):
        train_set = []
        for sent in train_sents:
            tags_chunks = []
            for w,t,c in sent: 
                tag_chunk = (t,c)
                tags_chunks.append(tag_chunk) # Create tags_chunks

            history = []
            for i, (t,c) in enumerate(tags_chunks): # Enumerate to access index of tags_chunks
                    featureset = chunk_features(tags_chunks, i, history) # Extract features using chunk_features defined above
                    train_set.append((featureset, c)) # Append features and chunk tag to train_set list
                    history.append(c) # Append chunk tag to history

        self.classifier = nltk.NaiveBayesClassifier.train(train_set) # Train classifier using train_set
        
    def tag(self, sentence):
        history = []
        for i, word in enumerate(sentence): # Access index of input sentence
            featureset = chunk_features(sentence, i, history) # Extract features (i.e. POS tags)
            tag = self.classifier.classify(featureset) # Tag POS tags with chunk tag
            history.append(tag) # Append chunk tag to history
        return zip(sentence, history) # Zip input sentence and history for IOB format
    
class NaiveBayesChunker(nltk.ChunkParserI):
    def __init__(self, train_sents):

        tagged_sents = [[(w,t,c) for w,t,c in sent] for sent in train_sents]

        self.tagger = NaiveBayesChunkTagger(tagged_sents) # Pass training corpus to tagger

    def parse(self, sentence):
        tagged_sents = self.tagger.tag(sentence) # Tags a new phrase with chunk tags
        conlltags = [(w,t,c) for ((w,t),c) in tagged_sents] # Recapitulate sentence with chunk tags for tree creation
        return nltk.conlltags2tree(conlltags) # Return tree

In [106]:
NB_chunker = NaiveBayesChunker(train_sents) # Define chunker

In [107]:
chunked = NB_chunker.parse(test_tagged) # Parse new phrase
print(chunked) # Print tree

(S
  (Comida
    Yo/pp1csn00
    quiero/vmip1s0
    4/di0fp0
    croquetas/ncfp000
    de/sps00
    jamón/ncms000))


As we can see, the `NaiveBayesChunker` is not capable of accurately applying IOB chunk tags to the test phrase/order. This appears to be because the `NaiveBayesChunkTagger` is tagging all pos tags as 'inside of chunk', 'I', tags. Therefore when converting back to a tree, the entire phrase is part of a 'Comida' chunk. Preusmably this is due to the small training corpus and that a much larger training corpus, over 100 phrases, might solve this issue. Having said that, I don't understand why all POS tags have been tagged with 'I-Comida' chunk tags as this chunk tag appears very infrequently, even in this small training corpus.
<br></br><br></br>
Below we can see that each element (pos tag) is assigned the 'I-Comida' IOB chunk tag for the entire training corpus.

In [108]:
# Modify NaiveBayesChunker to return the chunk tags rather than tree
class NaiveBayesChunker_mod(nltk.ChunkParserI):
    def __init__(self, train_sents):

        tagged_sents = [[(w,t,c) for w,t,c in sent] for sent in train_sents]

        self.tagger = NaiveBayesChunkTagger(tagged_sents) # Pass training corpus to tagger

    def parse(self, sentence):
        tagged_sents = self.tagger.tag(sentence) # Tags a new phrase with chunk tags
        conlltags = [(w,t,c) for ((w,t),c) in tagged_sents] # Recapitulate sentence with chunk tags for tree creation
#         return nltk.conlltags2tree(conlltags) # Commented out to show chunk tags
        return conlltags # Allows to view chunk tags rather than tree

In [109]:
NB_chunker_mod = NaiveBayesChunker_mod(train_sents) # Train modified chunker

# Iterate through training corpus
parsed_sents = []
for sent in sents_tagged:
    chunked = NB_chunker_mod.parse(sent) # Parse phrase with modified chunker
    parsed_sents.append(chunked)
    
for parsed_sent in parsed_sents:
    print(parsed_sent)

[('Yo', 'pp1csn00', 'I-Comida'), ('quiero', 'vmip1s0', 'I-Comida'), ('3', 'cs', 'I-Comida'), ('hamburguesas', 'sn.e-SUJ', 'I-Comida')]
[('Yo', 'pp1csn00', 'I-Comida'), ('quería', 'vmii3s0', 'I-Comida'), ('3', 'Z', 'I-Comida'), ('bocadillos', 'ncmp000', 'I-Comida'), ('de', 'sps00', 'I-Comida'), ('jamón', 'ncms000', 'I-Comida')]
[('Yo', 'pp1csn00', 'I-Comida'), ('quiero', 'vmip1s0', 'I-Comida'), ('una', 'di0fs0', 'I-Comida'), ('pizza', 'ncfs000', 'I-Comida')]
[('Queremos', 'sps00', 'I-Comida'), ('un', 'di0ms0', 'I-Comida'), ('bocadillo', 'ncms000', 'I-Comida'), ('de', 'sps00', 'I-Comida'), ('pavo', 'da0fs0', 'I-Comida')]
[('Yo', 'pp1csn00', 'I-Comida'), ('quiero', 'vmip1s0', 'I-Comida'), ('4', 'di0fp0', 'I-Comida'), ('croquetas', 'ncfp000', 'I-Comida'), ('de', 'sps00', 'I-Comida'), ('jamón', 'ncms000', 'I-Comida')]
[('Quiero', 'sps00', 'I-Comida'), ('un', 'di0ms0', 'I-Comida'), ('brownie', 'ncms000', 'I-Comida')]
[('Quiero', 'da0mp0', 'I-Comida'), ('cuatro', 'dn0cp0', 'I-Comida'), ('gall

#### Issues encountered and my solutions
* Issue 1 was having to adapt the `ConsecutiveNPChunkTagger` from HCpater 7 of Natural Language Processing with Python. After breaking is down and understanding what each part was doing, it became clear that the 'untagged_sent' variable was unncessary in this exercise. Also, in this exercise we were attempting to tag POS tags with chunk tags and so the training of the classifier had to be done not using (word, pos) but (pos, chunktag). This was easy enough to fix.
* Addtionally, the `chunk_features` function had to be adapted slightly, althoguh this only amounted to removing `tags_since_dt` from the `npchunk_features` function from Chapter 7.
* The main issue however, is that described above in that all POS tags appear to be tagged with the 'I-Comida' chunk tag. As said above, perhaps a much larger training corpus might solve this issue. If this is the case, then extracting the desired tokens from the chunked phrases would be as simple as has been done above.