# LELA70331 Coursework Assignment

This document contains instructions, guidance and code for the coursework assignment for this module. 

The assignment focuses on the task of intent classification. You heard about this in the lecture on dialogue systems, and read about it in this week's reading. It is an important step in modern task-based dialogue systems - given a particular piece of input from the speaker, the system tries to determine what goal the speaker is trying to achieve, in order that it can then produce an appropriate response. 

Your task is to build a system that takes a transcribed user utterance as input and outputs one of seven different intents:

'PlayMusic', e.g. "play easy listening" <br>
'AddToPlaylist' e.g. "please add this song to road trip" <br>
'RateBook' e.g. "give this novel 5 stars"  <br>
'SearchScreeningEvent' e.g. "give me a list of local movie times"  <br>
'BookRestaurant' e.g. "i'd like a table for four at 7pm at Asti"   <br>
'GetWeather' e.g. "what's it like outside"  <br>
'SearchCreativeWork' "show me the new James Bond trailer"  <br>

You are going to evaluate the performance of this system using a test set of 700 user utterances.

In order to create the system you have a training set of 700 utterances and a validation/development set of 700 utterances.

This notebook contains code that you can use in the development of your system. Once you have created and evaluated your system, you are going to write a report of no more than 1500 words that describes and evaluates the task, the system and the experiments.

A guide as to what should go in your report can be found [here](https://www.dropbox.com/s/7u1yazkok6hmeth/Writing%20your%20Computational%20Linguistics%20Research%20Report%20-%20postgrad.docx?dl=0)

Your report should be submitted via turnitin using the link on Blackboard. Once you have completed your work in this notebook please download a copy including all of your changes and additions (select File -> Download -> Download .ipynb) and email it to me at colin.bannard@manchester.ac.uk. This is just so that I can check what you have done if it isn't clear from your report. You will not be marked on the quality of any code you might include.

Please feel free to ask any questions about any part of the coursework. So that my response can benefit everyone please do so using the Coursework Discussion board on Blackboard. You can find a link to this down the left of the module Blackboard page.

## Preparation

### Link drive

Before you begin to build a system, there are a few steps necessary to set things up. The first of these is to link Colab to your Google Drive so that you can save files there. 

In [None]:
from google.colab import drive
drive.mount("/content/gdrive")

### Download data and some utilities

The next step is to download the data and some supporting code for the project and move it over to the Google Drive.

In [None]:
!wget https://www.dropbox.com/s/ztq3lolnvs57qop/Intent_Classification.zip
!unzip -q Intent_Classification.zip
!cp -r Intent_Classification /content/gdrive/My\ Drive/
!cp /content/gdrive/My\ Drive/Intent_Classification/nn_tools.py .
!cp /content/gdrive/My\ Drive/Intent_Classification/nn_tools2.py .

### Import packages

Finally we need to import some packages to use later

In [None]:
from argparse import Namespace
from collections import Counter
import json
import os
import re
import string
import random

import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from tqdm import tqdm_notebook
from nn_tools import Vocabulary, IntentVectorizer, IntentDataset, IntentClassifier
from nn_tools2 import *
from sklearn.metrics import confusion_matrix
import nltk
nltk.download('punkt')
nltk.download('wordnet')



## Rule-based approach

Your task here is to create a classifier using rules, in the form of regular expressions. I have provided you with the basic code for doing this. You will just need to edit the regular expressions in order to improve performance.

### Loading and inspecting project data
A valuable first step in order to understand the task is to inspect the data in order to understand the different intents being detected.

First you need to load the data:


In [None]:
intent_data=pd.read_csv('/content/gdrive/My Drive/Intent_Classification/intent_classification_with_splits.csv')

You can then examine example utterances for each of the intent types as follows.

##### Play Music examples


In [None]:
pd.set_option('display.max_colwidth', None)
intent_data[intent_data["split"] == "train"][intent_data["intent"] == "PlayMusic"]["text"].head(10).tolist

##### AddToPlaylist examples


In [None]:
pd.set_option('display.max_colwidth', None)
intent_data[intent_data["split"] == "train"][intent_data["intent"] == "AddToPlaylist"]["text"].head(10).tolist()

##### RateBook examples

In [None]:
pd.set_option('display.max_colwidth', None)
intent_data[intent_data["split"] == "train"][intent_data["intent"] == "RateBook"]["text"].head(10).tolist()

##### SearchScreeningEvent examples

In [None]:
pd.set_option('display.max_colwidth', None)
intent_data[intent_data["split"] == "train"][intent_data["intent"] == "SearchScreeningEvent"]["text"].head(10).tolist()

##### BookRestaurant examples

In [None]:
pd.set_option('display.max_colwidth', None)
intent_data[intent_data["split"] == "train"][intent_data["intent"] == "BookRestaurant"]["text"].head(10).tolist()

##### GetWeather examples

In [None]:
pd.set_option('display.max_colwidth', None)
intent_data[intent_data["split"] == "train"][intent_data["intent"] == "GetWeather"]["text"].head(10).tolist()

##### SearchCreativeWork examples

In [None]:
pd.set_option('display.max_colwidth', None)
intent_data[intent_data["split"] == "train"][intent_data["intent"] == "SearchCreativeWork"]["text"].head(10).tolist()

## Building a rule based classifier

### Define patterns

The function below takes an utterance as input and applies a series of regular expressions in order to identify the intent of the speaker. If none of the patterns match then an intent is randomly selected.

The regular expressions currently just looks for keywords taken from the intent name. You should update these patterns to be more appropriate and capture a wider range of utterances for each intent. Note that the function will return the first intent that it finds a match for, so that the order in which the if statements occur (which currently matches the order in which the patterns are listed) is important.

Each time you update the code you will need to run the code cell in order to then use the function.

In [None]:
def assign_intent(utt, verbose=False):
  verbose=False

  PlayMusic_Pattern = re.compile("play |music")
  AddToPlaylist_Pattern = re.compile("add|playlist")
  RateBook_Pattern = re.compile("rate|book")
  SearchScreeningEvent_Pattern = re.compile("screening")
  BookRestaurant_Pattern = re.compile("book|restaurant")
  GetWeather_Pattern = re.compile("get|weather")
  SearchCreativeWork_Pattern = re.compile("creative")
 
  intents = ['PlayMusic', 'AddToPlaylist', 'RateBook', 'SearchScreeningEvent', 'BookRestaurant', 'GetWeather', 'SearchCreativeWork']

  if re.search(PlayMusic_Pattern,  utt):
     return "PlayMusic"
  if re.search(AddToPlaylist_Pattern,  utt):
     return "AddToPlaylist"
  if re.search(RateBook_Pattern,  utt):
     return "RateBook"
  if re.search(SearchScreeningEvent_Pattern,  utt):
     return "SearchScreeningEvent"
  if re.search(BookRestaurant_Pattern,  utt):
     return "BookRestaurant"
  if re.search(GetWeather_Pattern,  utt):
     return "GetWeather"
  if re.search(SearchCreativeWork_Pattern,  utt):
     return "SearchCreativeWork"

  return random.choice(intents)

The next cell contains a different version of assign_intent. You can use whichever version you like. Just edit the patterns for and then run the cell with the version you prefer.

This version uses the re.findall function (see week 3 seminar) in order to make as many matches as possible with each pattern. The number of matches is then counted (using the len function) for each pattern. The intent with the largest number of matches is then returned as the predicted intent. Imagine for example that your patterns for PlayMusic and GetWeather were as follows: <br>
PlayMusic_Pattern = re.compile("play|music") <br>
GetWeather_Pattern = re.compile("get|weather") <br>
while the input utterances was "play the weather girls".
In this case the PlayMusic pattern would match twice (for play and music) while the GetWeather pattern would only match once. PlayMusic would be returned as the predicted intent. Where there is a tie a prediction is randomly sampled from among the tied intents. 

In [None]:
def assign_intent(utt, verbose=False):
  PlayMusic_Pattern = re.compile("play|music")
  AddToPlaylist_Pattern = re.compile("add|playlist")
  RateBook_Pattern = re.compile("rate|book")
  SearchScreeningEvent_Pattern = re.compile("screening")
  BookRestaurant_Pattern = re.compile("book|restaurant")
  GetWeather_Pattern = re.compile("get|weather")
  SearchCreativeWork_Pattern = re.compile("creative")
 
  weights = {}
  weights['PlayMusic'] = len(re.findall(PlayMusic_Pattern,  utt))
  weights['AddToPlaylist'] = len(re.findall(AddToPlaylist_Pattern,  utt))
  weights['RateBook'] = len(re.findall(RateBook_Pattern,  utt))
  weights['SearchScreeningEvent'] = len(re.findall(SearchScreeningEvent_Pattern,  utt))
  weights['BookRestaurant'] = len(re.findall(BookRestaurant_Pattern,  utt))
  weights['GetWeather'] = len(re.findall(GetWeather_Pattern,  utt))
  weights['SearchCreativeWork'] = len(re.findall(SearchCreativeWork_Pattern,  utt))
  if verbose:
      print(weights)
  if max(weights.values()) == 0:
      return random.choice(list(weights.keys()))
  else:
      weights_as_list = list(weights.items())
      random.shuffle(weights_as_list)
      weights=dict(weights_as_list)
      return max(weights, key=lambda key: weights[key])

### Evaluation
When you run this cell you will be asked to enter an utterance. When you press return a classification for your input will be printed. You can use this to check whether your assign_intent function is working as intended.

In [None]:
new_input = input("Enter a utterance to classify: ")
prediction = assign_intent(new_input)

print(prediction)

If you are using the second version of the assign_intent function, and would like to see how many matches are found for each pattern for the purposes of debugging, you can use the following code to test it on individual utterances. 

In [None]:
new_input = input("Enter a utterance to classify: ")
prediction = assign_intent(new_input,verbose=True)
print(prediction)

Here are some example utterances to stress-test your patterns

In [None]:
example_utts=['play the weather girls','add this to my italian film soundtrack playlist','give the restaurant guidebook 5 stars','find screenings of the book thief at around 7','book me a table outside for 2 for dinner at the national theatre restaurant','will it be warm enough to eat dinner outside at around 7 tonight','find me songs films or books about restaurants']
for utt in example_utts:
    prediction = assign_intent(utt)
    print(prediction + " : " + utt)

In order to perform a stricter assessment of the performance of your classifier, you should examine its performance on the validation dataset. If you run the cell below you will be told the accuracy score on those 700 utterances.

In [None]:
predicted = [assign_intent(item) for item in intent_data[intent_data['split'] == "val"]['text']]
true = intent_data[intent_data['split'] == "val"]['intent']
accuracy = (predicted == true).sum()/len(true)
print(accuracy)

Running the next cell will give you a "confusion matrix". This tells you the number of times that each intent as given in the rows is classified (correctly or otherwise) as each intent as represented by the columns. The columns are displayed as numbers but you can check which each of these numbers stands for by looking into the parentheses after each intent in the rows.

Studying this will give you an idea of where the classifier might be going wrong, and therefore of how you might update your patterns.

In [None]:
print(pd.DataFrame(confusion_matrix(predicted, true), index=["AddToPlaylist (0)", "BookRestaurant (1)", "GetWeather (2)", "PlayMusic (3)", "RateBook (4)", "SearchCreativeWork (5)", "SearchScreeningEvent (6)"]))

Once you are happy that you have defined the best patterns that you can, you should evaluate the performance of your classifier on the test data (a set of 700 utterances that you haven't looked at). The accuracy printed is what you should include in your report.

In [None]:
predicted = [assign_intent(item) for item in intent_data[intent_data['split'] == "test"]['text']]
true = intent_data[intent_data['split'] == "test"]['intent']
accuracy = (predicted == true).sum()/len(true)
print(accuracy)

You can also generate a confusion matrix for the test data and use this in the discussion of the results in your write up.

In [None]:
print(pd.DataFrame(confusion_matrix(predicted, true), index=["AddToPlaylist (0)", "BookRestaurant (1)", "GetWeather (2)", "PlayMusic (3)", "RateBook (4)", "SearchCreativeWork (5)", "SearchScreeningEvent (6)"]))

## Single Layer Perceptron Classifier

The next classifier you can build is single-layer perceptron. The specification of this model is in the next cell. You shouldn't make any changes to this.

In [82]:
class IntentClassifierPerceptron(nn.Module):
    """ a simple perceptron based classifier """
    def __init__(self, input_dim, output_dim):
        """
        Args:
            num_features (int): the size of the input feature vector
        """
        super(IntentClassifierPerceptron, self).__init__()
        self.fc1 = nn.Linear(input_dim, output_dim)
     
      


    def forward(self, x_in, apply_softmax=True):
        """The forward pass of the classifier
        
        Args:
            x_in (torch.Tensor): an input data tensor. 
                x_in.shape should be (batch, num_features)
            apply_softmax (bool): a flag for the softmax activation
        Returns:
            the resulting tensor. tensor.shape should be (batch,)
        """

        y_out = self.fc1(x_in)
        if apply_softmax:
            y_out = F.softmax(y_out,dim=1)
        return y_out


### Preprocessing
The utterances in our data (training, validation and test) are sequences of whole words. This means that morphological variants are considered to be completely unrelated entities (e.g. the system doesn't know that run, ran and running are related in any way). To increase the generalisation of your system you can preprocess the text to, for example, convert morphological variants to a single "lemma". We discussed methods for doing this earlier in the semester and saw a few tools that are built into NLTK in week 3, such as the PorterStemmer or the WordNetLemmatizer. 



Before we make changes to the data we should create a backup of the unaltered file in case we need to revert to it.


In [83]:
!cp /content/gdrive/My\ Drive/Intent_Classification/intent_classification_with_splits.csv .

When we need to revert to the original unaltered file, we can then run the following cell.

In [89]:
!cp intent_classification_with_splits.csv /content/gdrive/My\ Drive/Intent_Classification/
intent_data=pd.read_csv('/content/gdrive/My Drive/Intent_Classification/intent_classification_with_splits.csv')

The preprocessing steps that we are going to apply will be specified in a function called preprocess_utterance.

If you want to apply the Porter Stemmer, you should run the following code cell:

In [85]:
porter = nltk.PorterStemmer()
def preprocess_utterance(utt):
  utterance = nltk.word_tokenize(utt)
  newutt = ""
  for t in utterance:
    newutt = newutt + porter.stem(t) + " "
  return newutt.rstrip()

If you want to apply the WordNetLemmatizer you should instead run the following cell:

In [None]:
wnl = nltk.WordNetLemmatizer()
def preprocess_utterance(utt):
  utterance = nltk.word_tokenize(utt)
  newutt = ""
  for t in utterance:
    newutt = newutt + wnl.lemmatize(t) + " "
  return newutt.rstrip()

You can check that your preprocessing is doing what you want by testing them on examples using the following cell

In [87]:
test_input = input("Enter a utterance to test your preprocessing on: ")
print(preprocess_utterance(test_input))



Enter a utterance to test your preprocessing on: runs
run


Once you are happy with your preprocessing you can then transform the text in the training, validation and test data using the following cell. This alters the data on the disk so if you want to undo any changes later you will have to revert to the original data (as described above).


In [88]:
intent_data['text'] = [preprocess_utterance(item) for item in intent_data['text']]
intent_data.to_csv('/content/gdrive/My Drive/Intent_Classification/intent_classification_with_splits.csv', index=False)


### Training the model
In order to first initialise your model and then train it, you should run the following cell. The training will take a minute or two to complete - the progress bars will tell you how far along it is.


In [90]:
params = initialise()
classifier = IntentClassifierPerceptron(input_dim=len(params.vectorizer.text_vocab),output_dim=len(params.vectorizer.intent_vocab))
train_state = trainModel(params, params.dataset, classifier)

Expanded filepaths: 
	/content/gdrive/My Drive/Intent_Classification/vectorizer.json
	/content/gdrive/My Drive/Intent_Classification/model.pth
Using CUDA: False


training routine:   0%|          | 0/100 [00:00<?, ?it/s]

split=train:   0%|          | 0/5 [00:00<?, ?it/s]

split=val:   0%|          | 0/5 [00:00<?, ?it/s]

### Test on individual utterance

As with the rule-based classifier you can look at the performance of the system on example utterances. This will allow you to explore where the system is succeeding and where it is failing. 

In [91]:
torch.manual_seed(0)
new_utterance = input("Enter a utterance to classify: ")
classifier = classifier.to("cpu")
prediction = predict_intent(new_utterance, classifier, params.vectorizer)
print("{} -> {} (p={:0.2f})".format(new_utterance,
                                    prediction['intent'],
                                    prediction['probability']))

Enter a utterance to classify: book me table for 7
book me table for 7 -> BookRestaurant (p=0.47)


### Evaluate performance on validation data
In order to evaluate the performance of your system while you tweak your preprocessing you can evaluate performance on the validation data, and look at a confusion matrix.

In [92]:
torch.manual_seed(0)
predicted, true = evaluate(params, classifier, train_state, 'val')
print("Test Accuracy: {:.2f}".format(train_state['val_acc']))
print(pd.DataFrame(confusion_matrix(predicted, true), index=["AddToPlaylist", "BookRestaurant", "GetWeather", "PlayMusic", "RateBook", "SearchCreativeWork", "SearchScreeningEvent"]))

Test Accuracy: 87.50
                       0   1   2   3   4   5   6
AddToPlaylist         83   0   0   2   1   4   0
BookRestaurant         0  96   3   2   1   7  13
GetWeather             2   0  91   1   0   6   5
PlayMusic              2   1   1  79   0  11   7
RateBook               0   0   0   0  93   3   0
SearchCreativeWork     0   0   0   0   0  58   1
SearchScreeningEvent   0   0   2   0   0   5  60


### Evaluate performance on test data

Once your model is trained you should evaluate its accuracy on the test data. You can also use the confusion matrix for error analysis in your write up.

In [93]:
torch.manual_seed(0)
predicted, true = evaluate(params, classifier, train_state, 'test')
print("Test Accuracy: {:.2f}".format(train_state['test_acc']))
print(pd.DataFrame(confusion_matrix(predicted, true), index=["AddToPlaylist", "BookRestaurant", "GetWeather", "PlayMusic", "RateBook", "SearchCreativeWork", "SearchScreeningEvent"]))

Test Accuracy: 86.72
                        0   1   2   3   4   5   6
AddToPlaylist         103   0   1   3   1   0   1
BookRestaurant          1  85   5   0   0   4   4
GetWeather              1   0  86   1   1   3   1
PlayMusic               4   1   0  74   1  19   7
RateBook                1   1   0   1  67   4   2
SearchCreativeWork      1   0   1   0   0  66  10
SearchScreeningEvent    0   0   1   0   0   4  74


## Multilayer neural network classifier

The third kind of classifier that you should build is a two-layer neural network. The model is defined below. Again you don't need to change this code.

In [94]:
class IntentClassifierMLP(nn.Module):
    """ a simple perceptron based classifier """
    def __init__(self, input_dim, hidden_dim, output_dim):
        """
        Args:
            num_features (int): the size of the input feature vector
        """
        super(IntentClassifierMLP, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, output_dim)
      


    def forward(self, x_in, apply_softmax=True):
        """The forward pass of the classifier
        
        Args:
            x_in (torch.Tensor): an input data tensor. 
                x_in.shape should be (batch, num_features)
            apply_softmax (bool): a flag for the softmax activation
        Returns:
            the resulting tensor. tensor.shape should be (batch,)
        """
        intermediate_vector = F.relu(self.fc1(x_in))
        prediction_vector = self.fc2(intermediate_vector)

        if apply_softmax:
            prediction_vector = F.softmax(prediction_vector,dim=1)
        return prediction_vector


### Preprocessing
The utterances in our data (training, validation and test) are sequences of whole words. This means that morphological variants are considered to be completely unrelated entities (e.g. the system doesn't know that run, ran and running are related in any way). To increase the generalisation of your system you can preprocess the text to, for example, convert morphological variants to a single "lemma". We discussed methods for doing this earlier in the semester and saw a few tools that are built into NLTK in week 3, such as the PorterStemmer or the WordNetLemmatizer. 



Before we make changes to the data we should create a backup of the unaltered file in case we need to revert to it.


In [None]:
!cp /content/gdrive/My\ Drive/Intent_Classification/intent_classification_with_splits.csv .

When we need to revert to the original unaltered file, we can then run the following cell.

In [None]:
!cp intent_classification_with_splits.csv /content/gdrive/My\ Drive/Intent_Classification/
intent_data=pd.read_csv('/content/gdrive/My Drive/Intent_Classification/intent_classification_with_splits.csv')

The preprocessing steps that we are going to apply will be specified in a function called preprocess_utterance.

If you want to apply the Porter Stemmer, you should run the following code cell:

In [None]:
porter = nltk.PorterStemmer()
def preprocess_utterance(utt):
  utterance = nltk.word_tokenize(utt)
  newutt = ""
  for t in utterance:
    newutt = newutt + porter.stem(t) + " "
  return newutt.rstrip()

If you want to apply the WordNetLemmatizer you should instead run the following cell:

In [None]:
wnl = nltk.WordNetLemmatizer()
def preprocess_utterance(utt):
  utterance = nltk.word_tokenize(utt)
  newutt = ""
  for t in utterance:
    newutt = newutt + wnl.lemmatize(t) + " "
  return newutt.rstrip()

You can check that your preprocessing is doing what you want by testing them on examples using the following cell

In [None]:
test_input = input("Enter a utterance to test your preprocessing on: ")
print(preprocess_utterance(test_input))



Once you are happy with your preprocessing you can then transform the text in the training, validation and test data using the following cell. This alters the data on the disk so if you want to undo any changes later you will have to revert to the original data (as described above).


In [None]:
intent_data['text'] = [preprocess_utterance(item) for item in intent_data['text']]
intent_data.to_csv('/content/gdrive/My Drive/Intent_Classification/intent_classification_with_splits.csv', index=False)


### Training model
You can now move on to training the model.

One important variable that you can change is the number of nodes to include in your hidden layer. You can set this parameter in the cell below. The default is 100.

In [102]:
n_hidden_dims = 10

You can then initialise and train the model by running the following cell.

In [103]:
params = initialise()
classifier = IntentClassifierMLP(input_dim=len(params.vectorizer.text_vocab),hidden_dim=n_hidden_dims,output_dim=len(params.vectorizer.intent_vocab))
train_state = trainModel(params, params.dataset, classifier)

Expanded filepaths: 
	/content/gdrive/My Drive/Intent_Classification/vectorizer.json
	/content/gdrive/My Drive/Intent_Classification/model.pth
Using CUDA: False


training routine:   0%|          | 0/100 [00:00<?, ?it/s]

split=train:   0%|          | 0/5 [00:00<?, ?it/s]

split=val:   0%|          | 0/5 [00:00<?, ?it/s]

The following cell allows you to look at the performance of the system on a single example utterance. This can be used to see how your choice of hidden dimensions is impacting performance.

In [99]:
torch.manual_seed(0)
new_utterance = input("Enter a utterance to classify: ")
classifier = classifier.to("cpu")
prediction = predict_intent(new_utterance, classifier, params.vectorizer)
print("{} -> {} (p={:0.2f})".format(new_utterance,
                                    prediction['intent'],
                                    prediction['probability']))

Enter a utterance to classify: book me table for 2 at 7
book me table for 2 at 7 -> BookRestaurant (p=0.98)


### Evaluate model on validation data

In order to evaluate the performance of your system while you tweak it you can evaluate performance on the validation data, and look at a confusion matrix.

In [105]:
torch.manual_seed(0)
predicted, true = evaluate(params, classifier, train_state, 'val')
print("Test Accuracy: {:.2f}".format(train_state['val_acc']))
print(pd.DataFrame(confusion_matrix(predicted, true), index=["AddToPlaylist", "BookRestaurant", "GetWeather", "PlayMusic", "RateBook", "SearchCreativeWork", "SearchScreeningEvent"]))

Test Accuracy: 92.19
                       0   1   2   3   4   5   6
AddToPlaylist         84   0   0   1   0   0   0
BookRestaurant         0  95   2   2   0   1   5
GetWeather             1   0  84   0   0   1   4
PlayMusic              1   0   1  79   0   1   5
RateBook               0   0   0   0  94   0   0
SearchCreativeWork     1   1   1   2   1  89   7
SearchScreeningEvent   0   1   9   0   0   2  65


### Evaluate model on test data

Once you are happy that model performance is as good as you can achieve with this model type you should evaluate its accuracy on the test data. You can also use the confusion matrix for error analysis in your write up.

In [106]:
torch.manual_seed(0)
predicted, true = evaluate(params, classifier, train_state, 'test')
print("Test Accuracy: {:.2f}".format(train_state['test_acc']))
print(pd.DataFrame(confusion_matrix(predicted, true), index=["AddToPlaylist", "BookRestaurant", "GetWeather", "PlayMusic", "RateBook", "SearchCreativeWork", "SearchScreeningEvent"]))

Test Accuracy: 88.59
                        0   1   2   3   4   5   6
AddToPlaylist         100   0   1   1   0   0   1
BookRestaurant          1  82   3   0   0   3   0
GetWeather              1   0  85   1   1   0   1
PlayMusic               5   1   0  73   0  12   4
RateBook                0   0   0   0  68   0   1
SearchCreativeWork      4   3   2   4   1  81  14
SearchScreeningEvent    0   1   3   0   0   4  78


## That's it!

Once you have run through all of this notebook, made all of the additions and changes you want, and run all of the experiments you want you should write up the results following the guidance [here]( https://www.dropbox.com/s/7u1yazkok6hmeth/Writing%20your%20Computational%20Linguistics%20Research%20Report%20-%20postgrad.docx?dl=0)

Your report should be submitted via turnitin using the link on Blackboard. 

To repeat a couple of things from the top of the sheet: 

- Once you have completed your work in this notebook please download a copy including all of your changes and additions (select File -> Download -> Download .ipynb) and email it to me at colin.bannard@manchester.ac.uk. This is just so that I can check what you have done if it isn't clear from your report. You will not be marked on the quality of any code you might include.

- Please feel free to ask any questions about any part of the coursework. So that my response can benefit everyone please do so using the Coursework Discussion board on Blackboard. You can find a link to this down the left of the module Blackboard page.

I hope you find it an enjoyable and worthwhile exercise.
