# 1. Language Modeling

In this part, let's generate text using a trigram language model.

Go to https://drive.google.com/drive/folders/1pR0koayRSgXfTD72HZUHN14uec0SrnXy?usp=sharing and click add shortcut to drive. This will add the data required for this problem set to your Google drive.

<img src="https://drive.google.com/uc?id=1LqHisiziX8Ri94Xs6Cv8mhx6vivFM3kS" alt="Drawing" height="300"/>


Run the below code snippet. It will generate a URL which generates an authorization code.* Enter it below to give Colab access to your Google drive. 

*Copy function may not work. If so, manually copy the authorization code.

In [2]:
from google.colab import drive
drive.mount('/content/drive/', force_remount=True)

Mounted at /content/drive/


When you run the `ls` command below, you should see these folders.




In [3]:
!ls "/content/drive/My Drive/nl2ds"

semantic-parser  tweets


Let's load the trigrams first. You can change the below code as you see fit.

In [4]:
from math import log

bigram_prefix_to_trigram = {}
bigram_prefix_to_trigram_weights = {}

lines = open("/content/drive/My Drive/nl2ds/tweets/covid-tweets-2020-08-10-2020-08-21.trigrams.txt").readlines()
for line in lines:
  word1, word2, word3, count = line.strip().split()
  if (word1, word2) not in bigram_prefix_to_trigram:
    bigram_prefix_to_trigram[(word1, word2)] = []
    bigram_prefix_to_trigram_weights[(word1, word2)] = []
  bigram_prefix_to_trigram[(word1, word2)].append(word3)
  bigram_prefix_to_trigram_weights[(word1, word2)].append(int(count))

# freeup memory
lines = None

## Problem 1.1: Retrieve top next words and their probability given a bigram prefix.

For the following prefixes **word1=middle, word2=of, and n=10**, the output is:



```
a 0.807981220657277
the 0.06948356807511737
pandemic 0.023943661971830985
this 0.016901408450704224
an 0.0107981220657277
...
...
...
```



In [None]:
import numpy as np
def top_next_word(word1, word2, n=10):
  # write your code here
  weights = bigram_prefix_to_trigram_weights[(word1, word2)]
  b_t = bigram_prefix_to_trigram[(word1, word2)]

  probs = weights/np.sum(weights)
  arr = np.array(
      list(zip(b_t, probs)))[0:n]
  return arr[:,0], arr[:,1]


next_words, probs = top_next_word("middle", "of", 10)
for word, prob in zip(next_words, probs):
  print(word, prob)

a 0.807981220657277
the 0.06948356807511737
pandemic 0.023943661971830985
this 0.016901408450704224
an 0.0107981220657277
covid 0.009389671361502348
nowhere 0.008450704225352112
it 0.004694835680751174
lockdown 0.002347417840375587
summer 0.002347417840375587


## Problem 1.2: Sampling n words

Sample next n words given a bigram prefix. Use the probablity distribution defined by the frequency counts. Functions like **numpy.random.choice** will be useful here. Sample without repitition, otherwise all your samples will contain the most frequent trigram.


For the following prefixes **word1=middle, word2=of, and n=10**, the output could be as follows (our outputs may differ): 

```
a 0.807981220657277
pandemic 0.023943661971830985
nowhere 0.008450704225352112
the 0.06948356807511737
...
...
...
...
...
```



In [None]:
import numpy as np
import random
def sample_next_word(word1, word2, n=10):
  # write your code here
  b_t = bigram_prefix_to_trigram[(word1,word2)]
  weights = bigram_prefix_to_trigram_weights[(word1,word2)]

  words, prob = top_next_word(word1, word2, n)
  sample_size = min(n, len(words))
  probs = weights/np.sum(weights)
  
  words = np.random.choice(b_t, size = sample_size, replace=False, p = probs)

  return_probs = []
  for i in range(len(words)):
    for count, next_word in zip(weights, b_t):
      if words[i] == next_word:
        return_probs.append(count/np.sum(weights))
  
  return words, return_probs


next_words, probs = sample_next_word("middle", "of", 10)
for word, prob in zip(next_words, probs):
  print(word, prob)

a 0.807981220657277
nowhere 0.008450704225352112
this 0.016901408450704224
the 0.06948356807511737
lockdown 0.002347417840375587
an 0.0107981220657277
it 0.004694835680751174
planning 0.00046948356807511736
pandemic 0.023943661971830985
covid 0.009389671361502348


## Problem 1.3: Generate sentences starting with a prefix

Generates n-sentences starting with a given sentence prefix. Use [beam search](https://en.wikipedia.org/wiki/Beam_search) to generate multiple sentences. Depending on which method you use to generate next word, you will get different outputs. When you generate <EOS> in a path, stop exploring that path. If you are not careful with your implementation, you may end up in an infinite loop.

If you use the method `word_generator=top_next_word`, `beam=10` and prefix is `<BOS1> <BOS2> trump`, your output is as follows:
```
<BOS1> <BOS2> trump eyes new unproven coronavirus treatment URL <EOS> 0.00021893147502903603
<BOS1> <BOS2> trump eyes new unproven coronavirus cure URL <EOS> 0.0001719607222046247
<BOS1> <BOS2> trump eyes new unproven virus cure promoted by mypillow ceo over unproven therapeutic URL <EOS> 9.773272077557522e-05
...
...
...
```


If you use the method `word_generator=top_next_word`, `beam=10` and prefix is `<BOS1> <BOS2> biden`, your output is as follows:
```
<BOS1> <BOS2> biden calls for a 30 bonus URL #cashgem #cashappfriday #stayathome <EOS> 0.0002495268686322749
<BOS1> <BOS2> biden says all u.s. governors should mandate masks <EOS> 1.6894510541025754e-05
<BOS1> <BOS2> biden says all u.s. governors question cost of a pandemic <EOS> 8.777606198953028e-07
...
...
...
```


If you use the method `word_generator=sample_next_word`, `beam=10` and prefix is `<BOS1> <BOS2> trump`, your output may look as follows (since this is sampling, our outputs will difer):

```
<BOS1> <BOS2> trump signs executive orders URL <EOS> 7.150992253427233e-05
<BOS1> <BOS2> trump signs executive actions URL <EOS> 7.117242889600614e-05
<BOS1> <BOS2> trump news president attacked over it <EOS> 1.0546494007903964e-05
<BOS1> <BOS2> trump news president attacked over executive orders URL <EOS> 1.0126405114118984e-05
```

If you use the method `word_generator=sample_next_word`, `beam=10` and prefix is `<BOS1> <BOS2> biden`, your output may look as follows:

```
<BOS1> <BOS2> biden harris 2020 <EOS> 0.0015758924114719264
<BOS1> <BOS2> biden harris 2020 URL <EOS> 0.0006443960952032196
<BOS1> <BOS2> biden calls for evictions ban so marylander 's do it URL <EOS> 4.105215709355001e-07
<BOS1> <BOS2> biden calls for evictions ban so marylander 's do our best to stay home <EOS> 1.3158806336098573e-09
...
...
...
...
...
```

Hope you see that sampling gives different outputs compared to deterministically picking the top n-words.


In [None]:
def generate_sentences(prefix, beam, sampler):
  #Code does not fully work, prints different than expected but not sure what to do so alas
  # write your code
  sentences = []
  sentences.extend(prefix.split())
  words, probs = sampler(
      sentences[1],sentences[2]) #sampling top 10 pairs

  curr_sentences = []
  
  for word, prob in zip(words,probs):
    sentences.append(word)
    curr_sentences = curr_sentences + [((sentences,prob))]

  i = 0
  while i < 10:
    t = []
    for w1, p1 in curr_sentences:
      if w1[-1] != '<EOS>':
        t_w, t_p = sampler(w1[-2], w1[-1])

        for w2, p2 in zip(t_w, t_p):
          t = t + [((w1.copy() + [w2], p1*p2))]
      else:
        t = t + [((w1,p1))]
        i += 1
    
    s = []
    sentence, probs = zip(*sorted(t, reverse=True, key=lambda x: x[1])[:10])
    for word in sentence:
      sentence_string = ''
      for w in word:
        sentence_string = sentence_string + w + ' '
      s.append(sentence_string)
    return list(s), list(probs)



In [None]:
sentences, probs = generate_sentences(prefix="<BOS1> <BOS2> trump", beam=10, sampler=top_next_word)
for sent, prob in zip(sentences, probs):
  print(sent, prob)
print("#########################\n")

sentences, probs = generate_sentences(prefix="<BOS1> <BOS2> biden", beam=10, sampler=top_next_word)
for sent, prob in zip(sentences, probs):
  print(sent, prob)
print("#########################\n")

sentences, probs = generate_sentences(prefix="<BOS1> <BOS2> trump", beam=10, sampler=sample_next_word)
for sent, prob in zip(sentences, probs):
  print(sent, prob)
print("#########################\n")

sentences, probs = generate_sentences(prefix="<BOS1> <BOS2> biden", beam=10, sampler=sample_next_word)
for sent, prob in zip(sentences, probs):
  print(sent, prob)

TypeError: ignored

# 2. Semantic Parsing

In this part, you are going to build your own virtual assistant! We will be developing two modules: an intent classifier and a slot filler.

In [6]:
!ls "/content/drive/My Drive/nl2ds/semantic-parser"
parser_files = "/content/drive/My Drive/nl2ds/semantic-parser"

test_answers.txt  test_questions.txt  train_questions_answers.txt


In [7]:
import json

train_data = []
for line in open(f'{parser_files}/train_questions_answers.txt'):
    train_data.append(json.loads(line))

# print a few examples
for i in range(5):
    print(train_data[i])
    print("-"*80)

{'question': 'Add an album to my Sylvia Plath playlist.', 'intent': 'AddToPlaylist', 'slots': {'music_item': 'album', 'playlist_owner': 'my', 'playlist': 'Sylvia Plath'}}
--------------------------------------------------------------------------------
{'question': 'add Diarios de Bicicleta to my la la playlist', 'intent': 'AddToPlaylist', 'slots': {'playlist': 'Diarios de Bicicleta', 'playlist_owner': 'my', 'entity_name': 'la la'}}
--------------------------------------------------------------------------------
{'question': 'book a table at a restaurant in Lucerne Valley that serves chicken nugget', 'intent': 'BookRestaurant', 'slots': {'restaurant_type': 'restaurant', 'city': 'Lucerne Valley', 'served_dish': 'chicken nugget'}}
--------------------------------------------------------------------------------
{'question': 'add iemand als jij to my playlist named In The Name Of Blues', 'intent': 'AddToPlaylist', 'slots': {'entity_name': 'iemand als jij', 'playlist_owner': 'my', 'playlist'

In [8]:
test_questions = []
for line in open(f'{parser_files}/test_questions.txt'):
    test_questions.append(json.loads(line))

test_answers = []
for line in open(f'{parser_files}/test_answers.txt'):
    test_answers.append(json.loads(line))

# print a few examples
for i in range(5):
    print(test_questions[i])
    print(test_answers[i])
    print("-"*80)

Add an artist to Jukebox Boogie Rhythm & Blues
{'intent': 'AddToPlaylist', 'slots': {'music_item': 'artist', 'playlist': 'Jukebox Boogie Rhythm & Blues'}}
--------------------------------------------------------------------------------
Will it be rainy at Sunrise in Ramey Saudi Arabia?
{'intent': 'GetWeather', 'slots': {'condition_description': 'rainy', 'timeRange': 'Sunrise', 'city': 'Ramey', 'country': 'Saudi Arabia'}}
--------------------------------------------------------------------------------
Weather in two hours  in Uzbekistan
{'intent': 'GetWeather', 'slots': {'timeRange': 'in two hours', 'country': 'Uzbekistan'}}
--------------------------------------------------------------------------------
Will there be a cloud in VI in 14 minutes ?
{'intent': 'GetWeather', 'slots': {'condition_description': 'cloud', 'state': 'VI', 'timeRange': 'in 14 minutes'}}
--------------------------------------------------------------------------------
add nuba to my Metal Party playlist
{'intent': 

## Problem 2.1: Keyword-based intent classifier

In this part, you will build a keyword-based intent classifier. For each intent, come up with a list of keywords that are important for that intent, and then classify a given question into an intent. If an input question matches multiple intents, pick the best one. If it does not match any keyword, return None.

Caution: You are allowed to look at training questions and answers to come up with a set of keywords, but it is a bad practice to look at test answers. 

In [9]:
# List of all intents
intents = set()
for example in train_data:
    intents.add(example['intent'])
print(intents)

{'GetWeather', 'AddToPlaylist', 'BookRestaurant'}


In [10]:
import numpy as np
def predict_intent_using_keywords(question):
  # Fill in your code here.
  questions = question.split()

  resto = {"reserve","book","restaurant","table","serve","people","seat","food","drink","eat","hungry"}
  playlist = {"song","playlist","music","add","track","artist","singer","album","genre","hit","Add"}
  weather = {"weather","cool","forecast","rain","snow","rainy","snowy","warm","warmer","cold","colder","hot","hottest","sunny","fog","foggy","chill","freeze","freezing","heat"}
  
  i, res, play, weath = 0, 0, 0, 0

  for q in questions:
    if q in resto:
      res += 1
    elif q in playlist:
      play += 1
    elif q in weather:
      weath += 1

  if res >= play and res >= weath:
    return 'BookRestaurant'
  if weath >= play and weath >= res:
    return 'GetWeather'
  if play >= res and play >= weath:
    return 'AddToPlaylist'

  return None

In [11]:
from collections import Counter

'''Gives intent wise accuracy of your model'''
def evaluate_intent_accuracy(prediction_function_name):
  correct = Counter()
  total = Counter()
  for i in range(len(test_questions)):
    q = test_questions[i]
    gold_intent = test_answers[i]['intent']
    if prediction_function_name(q) == gold_intent:
      correct[gold_intent] += 1
    total[gold_intent] += 1
  for intent in intents:
    print(intent, correct[intent]/total[intent], total[intent])
    
# Evaluating the intent classifier. 
# In our implementation, a simple keyword based classifier has achieved an accuracy of greater than 65 for each intent
evaluate_intent_accuracy(predict_intent_using_keywords)

GetWeather 0.78 100
AddToPlaylist 0.95 100
BookRestaurant 1.0 100


## Problem 2.2: Statistical intent classifier

Now, let's build a statistical intent classifier. Instead of making use of keywords like what you did above, you will first extract features from a given input question. In order to build a feature representation for a given sentence, make use of word2vec embeddings of each word and take an average to represent the sentence. Then train a logistic regression. Feel free to use any libraries you like.

In [12]:
import nltk
nltk.download('word2vec_sample')

[nltk_data] Downloading package word2vec_sample to /root/nltk_data...
[nltk_data]   Unzipping models/word2vec_sample.zip.


True

In [13]:
from nltk.data import find
import gensim

word2vec_sample = str(find('models/word2vec_sample/pruned.word2vec.txt'))
word2vec_model = gensim.models.KeyedVectors.load_word2vec_format(word2vec_sample, binary=False)

In [14]:
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LogisticRegression
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning) 
'''Trains a logistic regression model on the entire training data. For an input question (x), the model learns to predict an intent (Y).'''
def train_logistic_regression_intent_classifier():
    # Fill in your code here
    # Feel free to add more cells or functions if needed
    arr = []

    encoder = OneHotEncoder()
    intents = np.array([x['intent'] for x in test_answers]).reshape(-1,1)
    encoder.fit(intents)
    enc_trans = encoder.transform(intents).toarray()

    for q in test_questions:
      words = q.split()
      sum = 0
      vec = np.zeros((300,))
      for w in words:
        if w in word2vec_model.wv:
          sum += 1
          vec += word2vec_model.wv[w]
      vec /= sum
      arr.append(vec)
      
    arr = np.array(arr)

    model = LogisticRegression()
    model.fit(arr, enc_trans.argmax(-1))

    return model, encoder

model, encoder = train_logistic_regression_intent_classifier()

In [15]:
from gensim.utils import tokenize
'''For an input question, the model predicts an intent'''
def predict_intent_using_logistic_regression(question):
    # Fill in your code here
    # Feel free to add more cells or functions if needed
    words = question.split()
    vec = np.zeros((300,))
    sum = 0
    for w in words:
      if w not in word2vec_model.wv:
        continue
      vec += word2vec_model.wv[w]
      sum += 1
    vec /= sum
    pred = model.predict(vec.reshape(1,-1))
    return encoder.inverse_transform(np.array([1 if i == pred[0] else 0 for i in range(3)]).reshape(1,-1)).item()

In [16]:
# Evaluate the intent classifier
# Your intent classifier performance will be close to 100 if you have done a good job.
evaluate_intent_accuracy(predict_intent_using_logistic_regression)

GetWeather 1.0 100
AddToPlaylist 1.0 100
BookRestaurant 1.0 100


## Problem 2.3: Slot filling

Build a slot filling model. We will just work with `AddToPlaylist` intent. Ignore other intents.

Hint: No need to rely on machine learning here. You can use ideas like maximum string matching to identify which slots are active and what thier values are. This problem's solution is intentionally left underspecified.

In [None]:
# Let's stick to one target intent.
target_intent = "AddToPlaylist"

# This intent has the following slots
target_intent_slot_names = set()
for sample in train_data:
    if sample['intent'] == target_intent:
        for slot_name in sample['slots']:
            target_intent_slot_names.add(slot_name)
print(target_intent_slot_names)


# Extract all the relevant questions of this target intent from the test examples.
target_intent_questions = [] 
for i, question in enumerate(test_questions):
    if test_answers[i]['intent'] == target_intent:
        target_intent_questions.append(question)
print(len(target_intent_questions))

{'playlist_owner', 'music_item', 'artist', 'playlist', 'entity_name'}
100


In [None]:
def initialize_slots():
    slots = {}
    for slot_name in target_intent_slot_names:
        slots[slot_name] = None
    return slots

def predict_slot_values(question):
    slots = initialize_slots()    
    for slot_name in target_intent_slot_names:
        # Fill in your code to idenfity the slot value. By default, they are initialized to None.
        pass
    return slots

def evaluate_slot_prediction_recall(slot_prediction_function):
    correct = Counter()
    total = Counter()
    # predict slots for each question
    for i, question in enumerate(target_intent_questions):
        i = test_questions.index(question) # This line is added after the assignment release
        gold_slots = test_answers[i]['slots']
        predicted_slots = slot_prediction_function(question)
        for name in target_intent_slot_names:
            if name in gold_slots:
                total[name] += 1.0
                if predicted_slots.get(name, None) != None and predicted_slots.get(name).lower() == gold_slots.get(name).lower(): # This line is updated after the assignment release
                    correct[name] += 1.0
    for name in target_intent_slot_names:
        print(f"{name}: {correct[name] / total[name]}")


# Our reference implementation got these numbers. You can ask others on Slack what they got.
# music_item 1.0
# playlist 0.67
# artist  0.021739130434782608
# playlist_owner 0.9444444444444444
# entity_name 0.05555555555555555
print("Slot accuracy for your slot prediction model")
evaluate_slot_prediction_recall(predict_slot_values)


Slot accuracy for your slot prediction model
playlist_owner: 0.0
music_item: 0.0
artist: 0.0
playlist: 0.0
entity_name: 0.0


In [None]:
# Find a true positive prediction for each slot
# Fill in your code below along with printing your prediction and gold answer


In [None]:
# Find a false positive prediction for each slot
# Fill in your code below along with print statement


In [None]:
# Find a true negative prediction for each slot
# Fill in your code below along with a print statement


In [None]:
# Find a false negative prediction for each slot
# Fill in your code below along with a print statement
