**In this homework, you will implement several AI models to conduct the intent detection task.**
![alt text](https://i.ibb.co/fXmYHRq/ec5.jpg)

# Part 0: Data Preprocessing

### Data Acquisition

In this section, you will have a general idea of how the data looks like and do some simple transformation.

In [1]:
# download the data
!wget "https://drive.google.com/uc?export=download&id=1dLUN9oSB4u27NOleYE-Uksoh6RNQlZbi" -O sample.p

--2021-12-09 06:31:45--  https://drive.google.com/uc?export=download&id=1dLUN9oSB4u27NOleYE-Uksoh6RNQlZbi
Resolving drive.google.com (drive.google.com)... 142.250.157.138, 142.250.157.101, 142.250.157.100, ...
Connecting to drive.google.com (drive.google.com)|142.250.157.138|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://doc-10-a0-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/lgsenkogkf9lriku3pi5gr8ihg3f7dd7/1639031475000/15787019596848476183/*/1dLUN9oSB4u27NOleYE-Uksoh6RNQlZbi?e=download [following]
--2021-12-09 06:31:46--  https://doc-10-a0-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/lgsenkogkf9lriku3pi5gr8ihg3f7dd7/1639031475000/15787019596848476183/*/1dLUN9oSB4u27NOleYE-Uksoh6RNQlZbi?e=download
Resolving doc-10-a0-docs.googleusercontent.com (doc-10-a0-docs.googleusercontent.com)... 142.251.8.132, 2404:6800:4008:c15::84
Connecting to doc-10-a0-docs.googleusercontent.com (doc

In [2]:
# test sentences for evaluation
!wget "https://drive.google.com/uc?export=download&id=1gEW_qY5x8uPAhriiobubheYo6FC35btQ" -O test_sentences.p

--2021-12-09 06:31:46--  https://drive.google.com/uc?export=download&id=1gEW_qY5x8uPAhriiobubheYo6FC35btQ
Resolving drive.google.com (drive.google.com)... 142.250.157.139, 142.250.157.101, 142.250.157.113, ...
Connecting to drive.google.com (drive.google.com)|142.250.157.139|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://doc-0o-a0-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/evdl9qlq1glqndks0mvckjh21uhdqrlv/1639031475000/15787019596848476183/*/1gEW_qY5x8uPAhriiobubheYo6FC35btQ?e=download [following]
--2021-12-09 06:31:46--  https://doc-0o-a0-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/evdl9qlq1glqndks0mvckjh21uhdqrlv/1639031475000/15787019596848476183/*/1gEW_qY5x8uPAhriiobubheYo6FC35btQ?e=download
Resolving doc-0o-a0-docs.googleusercontent.com (doc-0o-a0-docs.googleusercontent.com)... 142.251.8.132, 2404:6800:4008:c15::84
Connecting to doc-0o-a0-docs.googleusercontent.com (doc

In [3]:
import pickle
samples = pickle.load(open("sample.p", "rb"))
test_sentences = pickle.load(open("test_sentences.p", "rb"))

In [4]:
###data structure###
### [[sentence, label]] ###
print(samples[:3])

[['Turn off the holoemitter.', 2], ['Halt.', 1], ['Get off tiptoes', 6]]


There are nine categories for these sentences, which are 'no', 'driving', 'light', 'head', 'state', 'connection', 'stance', 'animation' and 'grid'. The mapping from index to category name are shown below.

In [5]:
ind2cat = {0: 'no', 1: 'driving', 2: 'light', 3: 'head', 4: 'state', 5: 'connection', 6: 'stance', 7: 'animation', 8: 'grid'}

In [6]:
### Distribution on categories ###
cat2sentence = {}
for sample in samples:
  sentence = sample[0]
  cat = ind2cat[sample[1]]
  if cat not in cat2sentence:
    cat2sentence[cat] = [sentence]
  else:
    cat2sentence[cat].append(sentence)

print("number of sentences for each category")
for cat, sentences in cat2sentence.items():
  print(cat, ": ", len(sentences))

number of sentences for each category
light :  716
driving :  784
stance :  758
head :  698
grid :  678
state :  676
animation :  645
no :  629
connection :  673


### Train/Validation Split

In [7]:
from sklearn.model_selection import train_test_split
SENTENCES = [sample[0] for sample in samples]
LABELS = [sample[1] for sample in samples]
X_train, X_val, y_train, y_val = train_test_split(SENTENCES, LABELS, test_size=0.2, random_state=7)

### Clean Text
Write a tokenization function clean(sentence) which takes as input a string of text and returns a list of tokens derived from that text. Here, we define a token to be a contiguous sequence of non-whitespace characters. We will remove punctuation marks and convert the text to lowercase. Hint: Use the built-in constant string.punctuation, found in the string module, and/or python's regex library, re.

In [8]:
import nltk
import re
nltk.download('stopwords')
from nltk.corpus import stopwords
STOPWORDS = stopwords.words('english')

def clean(sentence):
    ''' 1. tokenize the sentence (remove punctuation)
        2. remove the stop words
        3. convert all words to lowercase
    '''
    sentence = sentence.lower()
    no_punct = re.sub(r'[^\w\s]', '', sentence)
    tokens = no_punct.split(' ')

    stop = set(STOPWORDS)
    sentence = [token for token in tokens if token not in stop]
    
    return sentence;
#pass

X_train_token = [clean(sentence) for sentence in X_train]
X_val_token = [clean(sentence) for sentence in X_val]
X_train_val = X_train_token + X_val_token

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [9]:
max_len = 0# Find the maximum length of tokens in train/val
for tokenized in X_train_token:
    l = len(tokenized)
    if l > max_len:
        max_len = l
for tokenized in X_val_token:
    l = len(tokenized)
    if l > max_len:
        max_len = l

print('The maximum length of tokens in our dataset is: ', max_len, ' tokens')

The maximum length of tokens in our dataset is:  31  tokens


### Build a Vocabulary
Build a vocabulary to map each word to an index, you need to first find the unique words in train/val set.

Once you build a vocabulary, it's better to save it to a file for future use. Because the vocabulary may change each time you run the code.

In [10]:
from collections import Counter
temp = [element for sample in X_train_val for element in sample]

X_train_val_counts = Counter(temp)
counts = Counter(X_train_val_counts)

In [11]:
word_count = dict(counts) # count the frequency of each word
word2ind = {} # build your vocabulary

words = list(word_count.keys())
for i, w in enumerate(words):
    word2ind[w] = i+1

vocab_size = len(word2ind)

# Part 1: Recurrent Neural Network

In [13]:
import numpy as np

from numpy.random import seed
seed(1)
from tensorflow.random import set_seed
set_seed(2)

### Convert token to vector
Convert each list of tokens into an array use the vocabulary you built before. The length of the vector is the max_len and remember to do zero-padding if a list's lenghth is smaller than max_len.

In [14]:
def vectorize(tokens, max_len, word2ind):
    '''
        Input: list of tokens
        Output: 1D numpy array (length = max_len)
    '''
    inds = []
    for token in tokens:
        if token in word2ind.keys():
            ind = word2ind[token]
        else:
            ind = 0
        inds.append(ind)
    n = len(inds)
    if n < max_len:
        zero_pad = [0 for i in range(max_len - n)]
        inds.extend(zero_pad)
    elif n > max_len:
        inds = inds[:max_len]
    return np.array(inds);
#pass

X_train_array = np.array([vectorize(tokens, max_len, word2ind) for tokens in X_train_token])
X_val_array = np.array([vectorize(tokens, max_len, word2ind) for tokens in X_val_token])
assert X_train_array.shape[-1] == max_len

### One-hot label
Convert the scalar label to 1D array (length = 9), e.g 0 -> array([1, 0, 0, 0, 0, 0, 0, 0, 0])

In [15]:
y_train_onehot = []
for label in y_train:
    zeros = [0 for i in range(9)]
    zeros[label] = 1
    y_train_onehot.append(zeros)
y_train_onehot = np.array(y_train_onehot)

y_val_onehot = []
for label in y_val:
    zeros = [0 for i in range(9)]
    zeros[label] = 1
    y_val_onehot.append(zeros)
y_val_onehot = np.array(y_val_onehot)

assert y_train_onehot.shape[1] == 9

### Build the Recurrent Neural Network
Now it's time to build the RNN network to do the classification task, you could just refer to this [official document](https://www.tensorflow.org/guide/keras/rnn).

You will need the Embedding layer, RNN layer and Dense layer, your last layer should project to the number of labels.

In [16]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential()
# Embedding Layer, Input Dimension = vocab_size, Output Dimension = 64
model.add(layers.Embedding(vocab_size, 64))

# Two LSTM layers with 64 Units
model.add(layers.LSTM(64, return_sequences=True))
model.add(layers.LSTM(64))

# Dense to the number of classes with softmax activation function
model.add(layers.Dense(9, activation='softmax'))

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, None, 64)          91968     
                                                                 
 lstm (LSTM)                 (None, None, 64)          33024     
                                                                 
 lstm_1 (LSTM)               (None, 64)                33024     
                                                                 
 dense (Dense)               (None, 9)                 585       
                                                                 
Total params: 158,601
Trainable params: 158,601
Non-trainable params: 0
_________________________________________________________________


In [17]:
model.compile(optimizer='adam', loss='categorical_crossentropy',metrics=['accuracy'])
model.fit(X_train_array, y_train_onehot, batch_size=8, epochs=12, validation_data=(X_val_array, y_val_onehot))

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x7f70c02628d0>

### Evaluate on the test sentences
Now run your model to predict on the test sentences, you need to do the preprocessing on these sentences first and save your prediction to a list of labels, e.g [0, 2, 1, 5, ....]

In [18]:
test_prediction = []
#TODO
X_test_token = [clean(sentence) for sentence in test_sentences]

first = True
for tokens in X_test_token:
    if first:
        X_test_array = vectorize(tokens, max_len, word2ind)
        X_test_array = X_test_array[np.newaxis, :]
        first = False
    else:
        temp = vectorize(tokens, max_len, word2ind)
        temp = temp[np.newaxis, :]
        X_test_array = np.vstack((X_test_array, temp))
y_preds = model.predict(X_test_array, batch_size=8)
test_prediction = np.argmax(y_preds, axis=1)

In [19]:
# Save the results and upload to Gradescope
pickle.dump(test_prediction, open("rnn.p", "wb"))

#Part 2. Word Embedding via pymagnitude
Instead of using the vocabulary to convert word to number, you could use pretrained word embeddings to do the task.

In [20]:
! echo "Installing Magnitude.... (please wait, can take a while)"
! (curl https://raw.githubusercontent.com/plasticityai/magnitude/master/install-colab.sh | /bin/bash 1>/dev/null 2>/dev/null)
! echo "Done installing Magnitude."

Installing Magnitude.... (please wait, can take a while)
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   137  100   137    0     0    324      0 --:--:-- --:--:-- --:--:--   324
Done installing Magnitude.


Next, you'll need to download a pre-trained set of word embeddings. We'll get a set trained with Google's word2vec algorithm, which we discussed in class. [Here](https://gitlab.com/Plasticity/magnitude), you can check the full list of available embeddings, feel free to try different embeddings.

In [21]:
# Download Pretrained Word-Embedding
! wget http://magnitude.plasticity.ai/word2vec/light/GoogleNews-vectors-negative300.magnitude

--2021-12-09 06:34:34--  http://magnitude.plasticity.ai/word2vec/light/GoogleNews-vectors-negative300.magnitude
Resolving magnitude.plasticity.ai (magnitude.plasticity.ai)... 52.216.81.162
Connecting to magnitude.plasticity.ai (magnitude.plasticity.ai)|52.216.81.162|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4211335168 (3.9G) [binary/octet-stream]
Saving to: ‘GoogleNews-vectors-negative300.magnitude.4’


2021-12-09 06:39:41 (13.1 MB/s) - ‘GoogleNews-vectors-negative300.magnitude.4’ saved [4211335168/4211335168]



In [22]:
# Load the embedding
from pymagnitude import *
vectors = Magnitude("GoogleNews-vectors-negative300.magnitude") 
D = vectors.query("cat").shape[0]

### Convert tokens to embeddings
You could now use the pymagnitude to query each token and convert them to a list of embeddings. Note that you need to do zero padding to match the maximum length.

In [23]:
def embedding(list_tokens, max_len, vectors, D):
    '''
    return an array with the shape (n_of_samples, max_len, D)
    '''
    embeddings = np.empty((len(list_tokens), max_len, D))

    for i, tokens in enumerate(list_tokens):
        for j, t in enumerate(tokens):
            if j == max_len:
                break;
            v = vectors.query(t)
            embeddings[i, j, :] = v
    return embeddings;
#pass
X_train_embedding = embedding(X_train_token, max_len, vectors, D)
X_val_embedding = embedding(X_val_token, max_len, vectors, D)

assert X_train_embedding.shape[-1] == D
assert X_train_embedding.shape[-2] == max_len

### Build the RNN model
Similar to Part 1, build a RNN model using your new embedding.

In [24]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential()
#TODO
# LSTM Layer with input shape (max_len, D), output shape (max_len, 256)
model.add(layers.LSTM(256, return_sequences=True))

# LSTM Layer with 128 units
model.add(layers.LSTM(128))

# Dense to 64 with tanh activation function
model.add(layers.Dense(64, activation='tanh'))

# Dense to number of classes with softmax function
model.add(layers.Dense(9, activation='softmax'))

model.build(input_shape=(8, max_len, D))
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm_2 (LSTM)               (8, 31, 256)              570368    
                                                                 
 lstm_3 (LSTM)               (8, 128)                  197120    
                                                                 
 dense_1 (Dense)             (8, 64)                   8256      
                                                                 
 dense_2 (Dense)             (8, 9)                    585       
                                                                 
Total params: 776,329
Trainable params: 776,329
Non-trainable params: 0
_________________________________________________________________


In [25]:
model.compile(optimizer='adam', loss='categorical_crossentropy',metrics=['accuracy'])
model.fit(X_train_embedding, y_train_onehot, batch_size=8, epochs=12, validation_data=(X_val_embedding, y_val_onehot))

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x7f6ee2158450>

### Evaluate on the test sentences
Now run your model to predict on the test sentences, you need to do the preprocessing on these sentences first and save your prediction to a list of labels, e.g [0, 2, 1, 5, ....]

In [26]:
test_prediction = []
#TODO

X_test_embedding = embedding(X_test_token, max_len, vectors, D)
y_pred = model.predict(X_test_embedding, batch_size=8)
test_predictions = np.argmax(y_pred, axis=1)

In [27]:
# Save the results and upload to Gradescope
pickle.dump(test_predictions, open("embedding.p", "wb"))

# Part 3: BERT

In this part, you will use the BERT pipeline to further improve the performance.

This part is open-ended, we just provide one example of using BERT, feel free to find other tutorial online to customize on this task.

[Here](https://huggingface.co/models) is the list of all existing models.

In [28]:
!pip install transformers
!pip install --upgrade tensorflow



In [29]:
import transformers
from transformers import BertTokenizer, TFBertModel, BertConfig, TFBertForSequenceClassification
bert_tokenizer = BertTokenizer.from_pretrained("bert-base-uncased") #feel free to change the model
bert_model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=9)

All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### Use BERT Tokenizer to preprocess the data
The BERT Tokenizer will return a dictionary which contains 'input_ids', 'token_type_ids' and 'attention_mask', we will use the 'input_ids' and 'attention_mask' later

In [41]:
# Test the tokenizer
sent = X_train[0]
tokenized_sequence = bert_tokenizer.encode_plus(sent,add_special_tokens = True,
                                              max_length =30,pad_to_max_length = True, 
                                              return_attention_mask = True)
print(tokenized_sequence)
print(bert_tokenizer.decode(tokenized_sequence['input_ids']))

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


{'input_ids': [101, 7632, 1010, 1045, 2052, 2066, 2000, 2173, 2019, 2344, 2005, 2796, 2833, 2005, 2202, 5833, 2005, 2093, 2111, 1010, 3531, 1012, 102, 0, 0, 0, 0, 0, 0, 0], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]}
[CLS] hi, i would like to place an order for indian food for takeout for three people, please. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]




Use the bert tokenizer described above, encode the training and validations sentences, note that the max length should be 64.

In [30]:
def BERT_Tokenizer(sentences):
    '''Input: list of sentences
        Output: two numpy array
    '''
    max_len = 64
    input_ids_arr = np.empty((len(sentences), max_len), dtype=np.int64)
    attention_mask_arr = np.empty((len(sentences), max_len), dtype=np.int64)
    for i, sentence in enumerate(sentences):
        tokenized_sequence = bert_tokenizer.encode_plus(sentence, add_special_tokens=True, truncation=True,
                                                max_length=max_len, padding='max_length', 
                                                return_attention_mask=True)
        input_ids = tokenized_sequence['input_ids']
        attention_mask = tokenized_sequence['attention_mask']
        input_ids_arr[i, :] = input_ids
        attention_mask_arr[i, :] = attention_mask
    return input_ids_arr, attention_mask_arr;
#pass

X_train_ids, X_train_masks = BERT_Tokenizer(X_train)
X_val_ids, X_val_masks = BERT_Tokenizer(X_val)
y_train_array = np.array(y_train)
y_val_array = np.array(y_val)
assert X_train_ids.shape[-1] == 64

In [31]:
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-6,epsilon=1e-08)
bert_model.compile(loss=loss,optimizer=optimizer,metrics=[metric])

In [32]:
bert_model.fit([X_train_ids,X_train_masks],y_train_array,batch_size=16,epochs=6,validation_data=([X_val_ids,X_val_masks],y_val_array))

Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6


<keras.callbacks.History at 0x7f6de7de7c90>

### Evaluate on test sentences
Again, use BERT to predict on the test sentences and submit to Gradescope.

In [33]:
#TODO
X_test_ids, X_test_masks = BERT_Tokenizer(test_sentences)
y_pred = bert_model.predict([X_test_ids, X_test_masks], batch_size=8)
test_predictions = np.argmax(y_pred['logits'], axis=1)

In [34]:
pickle.dump(test_predictions, open("bert.p", "wb"))

In [35]:
pickle.dump(test_predictions, open("best.p", "wb"))

# Part 4: Write your own commands

Please write 10 sentences for each category, this will be very helpful for future students!

In [37]:
my_no_sentences = []
my_driving_sentences = ['Drive Forward for 2 seconds and then come beck.',
                        'Drive in a small square.',
                        'Stop!',
                        'Drive in a serpentine pattern.',
                        'Drive North for 3 seconds, then stop.',
                        'I dont want to talk to you right now.',
                        'Pick up the pace!',
                        'Go for a joyride.',
                        'Tokyo Drifting!',
                        'Make a boston left turn.']

my_light_sentences = ['Sir, did you know that your tail light is out?',
                      'The lights are too bright!',
                      'I cant see anything!',
                      'Turn of the holoemitter.',
                      'Turn on party mode.',
                      'Turn your front light green and your tail light blue.',
                      'Maximum Brightness.',
                      'The room could use some color.',
                      'Let it shine!',
                      'Kill all the lights.']

my_head_sentences = ['Turn and face me like a man!',
                     'Face the corner and think about what you have done.',
                     'Turn your head like the girl from The Exorcist.',
                     'Look over there!',
                     'Turn your head 90 degrees.',
                     'Look around, its beautiful out!',
                     'Shake your head.',
                     'Turn on surveillance mode', # rotate head like a satelite dish
                     'Act like you are malfunctioning.' # quickly rotate head back an forth
                     'Look away.']

my_state_sentences = ['Is there a ledge in front of you?',
                      'What are the colors of each of your lights?',
                      'Tell me your current heading',
                      'Tell me where your head is at.',
                      'What is your battery level?',
                      'What stance are you in?',
                      'Tell me if there are any AprilTags in your camera view.',
                      'Give me your accelerometer data.',
                      'How fast are you moving right now?',
                      'How long have you been driving in this direction.']

my_connection_sentences = ['DC.',
                           'Disconnect from the server.',
                           'Were being hacked, abort, abort.',
                           'Connect to the nearest server.',
                           'Give me your IP address.',
                           'Start a Camera Server.',
                           'Start a Ultrasonic Server.',
                           'Start servers for all sensors.',
                           'Disconnect from all servers.',
                           'Stay connected to the main server, but disconnect from sensor servers.']

my_stance_sentences = ['Assume attack position.',
                       'Start waddling.',
                       'Stop waddling.',
                       'Assume resting position.',
                       'Hey R2D2.',
                       'Get ready to move.',
                       'Ok R2.',
                       'Act excited.',
                       'Stand up straight.',
                       'Use the force to move your wheel down.']

my_animation_sentences = ['Get Hype!',
                          'Act sad.',
                          'Act happy.',
                          'Do your happy dance.',
                          'Do some donuts to celebrate.',
                          'Do the Robot Dance.',
                          'Can you come sit next to me?',
                          'Pretend to be scared.',
                          'Tip yourself over.',
                          'Act like a turtle on its back.']

my_grid_sentences = ['The square in front of you has an obstacle in it.',
                     'Move to the square to your left.',
                     'Find the shorteset path between the bottom left square and the top right square.',
                     'The grid has 10x10 gridcells.',
                     'Are there any paths between the current cell and the bottom right cell?',
                     'What is the fastest way from point A to point B?',
                     'How many gridcells have you visited so far?',
                     'Visit every node once.',
                     'How many gridcells left in the current path?',
                     'Go to the left of the nearest obstacle.']

In [38]:
my_commands = {'no': my_no_sentences, 
               'driving': my_driving_sentences, 
               'light': my_light_sentences,
               'head': my_head_sentences,
               'state': my_state_sentences,
               'connection': my_connection_sentences, 
               'stance': my_stance_sentences, 
               'animation': my_animation_sentences,
               'grid': my_grid_sentences}

In [39]:
pickle.dump(my_commands, open("my_commands.p", "wb"))