# 1. Language Modeling

In this part, let's generate text using a trigram language model.

Go to https://drive.google.com/drive/folders/1pR0koayRSgXfTD72HZUHN14uec0SrnXy?usp=sharing and click add shortcut to drive. This will add the data required for this problem set to your Google drive.

<img src="https://drive.google.com/uc?id=1LqHisiziX8Ri94Xs6Cv8mhx6vivFM3kS" alt="Drawing" height="300"/>


Run the below code snippet. It will generate a URL which generates an authorization code.* Enter it below to give Colab access to your Google drive. 

*Copy function may not work. If so, manually copy the authorization code.

In [None]:
from google.colab import drive
drive.mount('/content/drive/', force_remount=True)

Mounted at /content/drive/


When you run the `ls` command below, you should see these folders.




In [None]:
!ls "/content/drive/My Drive/nl2ds"

semantic-parser  tweets


Let's load the trigrams first. You can change the below code as you see fit.

In [None]:
from math import log

bigram_prefix_to_trigram = {}
bigram_prefix_to_trigram_weights = {}

lines = open("/content/drive/My Drive/nl2ds/tweets/covid-tweets-2020-08-10-2020-08-21.trigrams.txt").readlines()
for line in lines:
  word1, word2, word3, count = line.strip().split()
  if (word1, word2) not in bigram_prefix_to_trigram:
    bigram_prefix_to_trigram[(word1, word2)] = []
    bigram_prefix_to_trigram_weights[(word1, word2)] = []
  bigram_prefix_to_trigram[(word1, word2)].append(word3)
  bigram_prefix_to_trigram_weights[(word1, word2)].append(int(count))

# freeup memory
lines = None

## Problem 1.1: Retrieve top next words and their probability given a bigram prefix.

For the following prefixes **word1=middle, word2=of, and n=10**, the output is:



```
a 0.807981220657277
the 0.06948356807511737
pandemic 0.023943661971830985
this 0.016901408450704224
an 0.0107981220657277
...
...
...
```



In [None]:
def top_next_word(word1, word2, n=10):
  # write your code here
  top_words=bigram_prefix_to_trigram[(word1, word2)][0:n]
  top_weights=bigram_prefix_to_trigram_weights[(word1, word2)][0:n]
  top_probs=[]
  for weight in top_weights:
    probability=weight/sum(bigram_prefix_to_trigram_weights[(word1, word2)])
    top_probs.append(probability)
  prob_dict=dict(zip(top_words, top_probs))
  words=[k  for  k in  prob_dict.keys()]
  probs=[v  for  v in  prob_dict.values()]
  return words,probs
next_words, probs = top_next_word("middle", "of", 10)
for word, prob in zip(next_words,probs):
  print(word, prob)

a 0.807981220657277
the 0.06948356807511737
pandemic 0.023943661971830985
this 0.016901408450704224
an 0.0107981220657277
covid 0.009389671361502348
nowhere 0.008450704225352112
it 0.004694835680751174
lockdown 0.002347417840375587
summer 0.002347417840375587


## Problem 1.2: Sampling n words

Sample next n words given a bigram prefix. Use the probablity distribution defined by the frequency counts. Functions like **numpy.random.choice** will be useful here. Sample without repitition, otherwise all your samples will contain the most frequent trigram.


For the following prefixes **word1=middle, word2=of, and n=10**, the output could be as follows (our outputs may differ): 

```
a 0.807981220657277
pandemic 0.023943661971830985
nowhere 0.008450704225352112
the 0.06948356807511737
...
...
...
...
...
```



In [None]:
import numpy as np

def sample_next_word(word1, word2, n=10):
  # write your code here
  top_words=bigram_prefix_to_trigram[(word1, word2)]
  top_weights=bigram_prefix_to_trigram_weights[(word1, word2)]

  top_probs=[]
  prob_dict={}
  for weight in top_weights:
      probability=weight/sum(bigram_prefix_to_trigram_weights[(word1, word2)])
      top_probs.append(probability)
  prob_dict=dict(zip(top_words, top_probs))
  
  try:
        next_words = np.random.choice(top_words, n, p=top_probs,replace=False)
  except ValueError:
        next_words = np.random.choice(top_words, len(top_words), p=top_probs,replace=False)
  
  probs=[prob_dict[word] for word in next_words]
  return next_words,probs
next_words, probs = sample_next_word("<BOS2>", "trump", 10)
for word, prob in zip(next_words, probs):
  print(word, prob)

criticizes 0.0013550135501355014
proposes 0.0013550135501355014
's 0.06639566395663957
could 0.0033875338753387536
covid 0.0020325203252032522
news 0.0033875338753387536
is 0.0989159891598916
keeps 0.0033875338753387536
should 0.008807588075880758
continues 0.0040650406504065045


## Problem 1.3: Generate sentences starting with a prefix

Generates n-sentences starting with a given sentence prefix. Use [beam search](https://en.wikipedia.org/wiki/Beam_search) to generate multiple sentences. Depending on which method you use to generate next word, you will get different outputs. When you generate <EOS> in a path, stop exploring that path. If you are not careful with your implementation, you may end up in an infinite loop.

If you use the method `word_generator=top_next_word`, `beam=10` and prefix is `<BOS1> <BOS2> trump`, your output is as follows:
```
<BOS1> <BOS2> trump eyes new unproven coronavirus treatment URL <EOS> 0.00021893147502903603
<BOS1> <BOS2> trump eyes new unproven coronavirus cure URL <EOS> 0.0001719607222046247
<BOS1> <BOS2> trump eyes new unproven virus cure promoted by mypillow ceo over unproven therapeutic URL <EOS> 9.773272077557522e-05
...
...
...
```


If you use the method `word_generator=top_next_word`, `beam=10` and prefix is `<BOS1> <BOS2> biden`, your output is as follows:
```
<BOS1> <BOS2> biden calls for a 30 bonus URL #cashgem #cashappfriday #stayathome <EOS> 0.0002495268686322749
<BOS1> <BOS2> biden says all u.s. governors should mandate masks <EOS> 1.6894510541025754e-05
<BOS1> <BOS2> biden says all u.s. governors question cost of a pandemic <EOS> 8.777606198953028e-07
...
...
...
```


If you use the method `word_generator=sample_next_word`, `beam=10` and prefix is `<BOS1> <BOS2> trump`, your output may look as follows (since this is sampling, our outputs will difer):

```
<BOS1> <BOS2> trump signs executive orders URL <EOS> 7.150992253427233e-05
<BOS1> <BOS2> trump signs executive actions URL <EOS> 7.117242889600614e-05
<BOS1> <BOS2> trump news president attacked over it <EOS> 1.0546494007903964e-05
<BOS1> <BOS2> trump news president attacked over executive orders URL <EOS> 1.0126405114118984e-05
```

If you use the method `word_generator=sample_next_word`, `beam=10` and prefix is `<BOS1> <BOS2> biden`, your output may look as follows:

```
<BOS1> <BOS2> biden harris 2020 <EOS> 0.0015758924114719264
<BOS1> <BOS2> biden harris 2020 URL <EOS> 0.0006443960952032196
<BOS1> <BOS2> biden calls for evictions ban so marylander 's do it URL <EOS> 4.105215709355001e-07
<BOS1> <BOS2> biden calls for evictions ban so marylander 's do our best to stay home <EOS> 1.3158806336098573e-09
...
...
...
...
...
```

Hope you see that sampling gives different outputs compared to deterministically picking the top n-words.


In [None]:
import heapq
 
class Beam(object):
#For comparison of prefixes, the tuple (prefix_probability, complete_sentence) is used.
#This is so that if two prefixes have equal probabilities then a complete sentence is preferred over an incomplete one since (0.5, False) < (0.5, True)
 
    def __init__(self, beam_width):
        self.heap = list()
        self.beam_width = beam_width
 
    def add(self, prob, complete, prefix):
        heapq.heappush(self.heap, (prob, complete, prefix))
        if len(self.heap) > self.beam_width:
            heapq.heappop(self.heap)
     
    def __iter__(self):
        return iter(self.heap)
def generate_sentences(prefix, sampler, beam=10,clip_len=-1):
  # write your code
  prefix_split = prefix.split(" ")
  prev_beam = Beam(beam)
  sentence=["<BOS2>", prefix_split[len(prefix_split) - 1]]
  prev_beam.add(1.0, False, sentence)
  while True:
      curr_beam = Beam(beam)
      return_sents=[]
      return_probs=[]
      for (prefix_prob, complete, sentence) in prev_beam:
          if complete == True:
              curr_beam.add(prefix_prob, complete, sentence)
              continue
          else:
            last_word = sentence[-1]
            second_last = sentence[-2]
            next_words,next_probs=sampler(second_last,last_word,10)
            for i in range(len(next_words)):
                copy_sentence=sentence.copy()
                copy_sentence.append(next_words[i])
                complete = False
                if next_words[i] == "<EOS>": #If next word is EOS, change the complete status
                    complete=True
                    curr_beam.add(prefix_prob*next_probs[i], complete, copy_sentence)
                else: #if next word is not EOS, continue 
                    curr_beam.add(prefix_prob*next_probs[i], complete, copy_sentence) 
      top_beam = heapq.nlargest(10, curr_beam)    
      begin=['<BOS1>']#Add the first BOS since it wasn't included at first
      for best_prob, best_complete, best_sentence in top_beam:
        if best_complete == True:
          if best_sentence not in return_sents:
            return_sents.append(' '.join(begin+best_sentence))
            return_probs.append(best_prob)
      if len(return_sents) >= 10:
            return return_sents,return_probs
      prev_beam = curr_beam

     

sentences, probs = generate_sentences(prefix="<BOS1> <BOS2> trump", beam=10, sampler=top_next_word)
for sent, prob in zip(sentences, probs):
  print(sent, prob)
print("#########################\n")

sentences, probs = generate_sentences(prefix="<BOS1> <BOS2> biden", beam=10, sampler=top_next_word)
for sent, prob in zip(sentences, probs):
  print(sent, prob)
print("#########################\n")

sentences, probs = generate_sentences(prefix="<BOS1> <BOS2> trump", beam=10, sampler=sample_next_word)
for sent, prob in zip(sentences, probs):
  print(sent, prob)
print("#########################\n")

sentences, probs = generate_sentences(prefix="<BOS1> <BOS2> biden", beam=10, sampler=sample_next_word)
for sent, prob in zip(sentences, probs):
  print(sent, prob)

<BOS1> <BOS2> trump eyes new unproven coronavirus treatment URL <EOS> 0.00021893147502903603
<BOS1> <BOS2> trump eyes new unproven coronavirus cure URL <EOS> 0.0001719607222046247
<BOS1> <BOS2> trump eyes new unproven virus cure promoted by mypillow ceo over unproven therapeutic URL <EOS> 9.773272077557522e-05
<BOS1> <BOS2> trump eyes new unproven coronavirus therapeutic mypillow creator over unproven therapeutic URL <EOS> 8.212549111137046e-05
<BOS1> <BOS2> trump eyes new unproven virus cure promoted by mypillow ceo over unproven therapeutic URL via @USER <EOS> 7.432226908194607e-06
<BOS1> <BOS2> trump eyes new unproven virus cure promoted by mypillow ceo over unproven and dangerous <EOS> 5.61685494684627e-06
<BOS1> <BOS2> trump eyes new unproven virus cure promoted by mypillow ceo over unproven and dangerous covid-19 treatment URL <EOS> 5.235550241426875e-06
<BOS1> <BOS2> trump eyes new unproven virus cure promoted by ben carson and mypillow founder and ceo of mypillow URL <EOS> 2.14

# 2. Semantic Parsing

In this part, you are going to build your own virtual assistant! We will be developing two modules: an intent classifier and a slot filler.

In [None]:
!ls "/content/drive/My Drive/nl2ds/semantic-parser"
parser_files = "/content/drive/My Drive/nl2ds/semantic-parser"

test_answers.txt  test_questions.txt  train_questions_answers.txt


In [None]:
import json

train_data = []
for line in open(f'{parser_files}/train_questions_answers.txt'):
    train_data.append(json.loads(line))

# print a few examples
for i in range(5):
    print(train_data[i])
    print("-"*80)

{'question': 'Add an album to my Sylvia Plath playlist.', 'intent': 'AddToPlaylist', 'slots': {'music_item': 'album', 'playlist_owner': 'my', 'playlist': 'Sylvia Plath'}}
--------------------------------------------------------------------------------
{'question': 'add Diarios de Bicicleta to my la la playlist', 'intent': 'AddToPlaylist', 'slots': {'playlist': 'Diarios de Bicicleta', 'playlist_owner': 'my', 'entity_name': 'la la'}}
--------------------------------------------------------------------------------
{'question': 'book a table at a restaurant in Lucerne Valley that serves chicken nugget', 'intent': 'BookRestaurant', 'slots': {'restaurant_type': 'restaurant', 'city': 'Lucerne Valley', 'served_dish': 'chicken nugget'}}
--------------------------------------------------------------------------------
{'question': 'add iemand als jij to my playlist named In The Name Of Blues', 'intent': 'AddToPlaylist', 'slots': {'entity_name': 'iemand als jij', 'playlist_owner': 'my', 'playlist'

In [None]:
test_questions = []
for line in open(f'{parser_files}/test_questions.txt'):
    test_questions.append(json.loads(line))

test_answers = []
for line in open(f'{parser_files}/test_answers.txt'):
    test_answers.append(json.loads(line))

# print a few examples
for i in range(5):
    print(test_questions[i])
    print(test_answers[i])
    print("-"*80)

Add an artist to Jukebox Boogie Rhythm & Blues
{'intent': 'AddToPlaylist', 'slots': {'music_item': 'artist', 'playlist': 'Jukebox Boogie Rhythm & Blues'}}
--------------------------------------------------------------------------------
Will it be rainy at Sunrise in Ramey Saudi Arabia?
{'intent': 'GetWeather', 'slots': {'condition_description': 'rainy', 'timeRange': 'Sunrise', 'city': 'Ramey', 'country': 'Saudi Arabia'}}
--------------------------------------------------------------------------------
Weather in two hours  in Uzbekistan
{'intent': 'GetWeather', 'slots': {'timeRange': 'in two hours', 'country': 'Uzbekistan'}}
--------------------------------------------------------------------------------
Will there be a cloud in VI in 14 minutes ?
{'intent': 'GetWeather', 'slots': {'condition_description': 'cloud', 'state': 'VI', 'timeRange': 'in 14 minutes'}}
--------------------------------------------------------------------------------
add nuba to my Metal Party playlist
{'intent': 

## Problem 2.1: Keyword-based intent classifier

In this part, you will build a keyword-based intent classifier. For each intent, come up with a list of keywords that are important for that intent, and then classify a given question into an intent. If an input question matches multiple intents, pick the best one. If it does not match any keyword, return None.

Caution: You are allowed to look at training questions and answers to come up with a set of keywords, but it is a bad practice to look at test answers. 

In [None]:
# List of all intents
intents = set()
for example in train_data:
    intents.add(example['intent'])
print(intents)

{'BookRestaurant', 'GetWeather', 'AddToPlaylist'}


In [None]:
def predict_intent_using_keywords(question):
  # Fill in your code here.
  intent=''
  q=question.lower().split()
  if 'will' and 'be' and 'in' in q:
    intent='GetWeather'
  if 'be' in q:
    intent='GetWeather'
  if 'weather' in q:
    intent='GetWeather'
  if 'book' in q:
    intent='BookRestaurant'
  if 'add' in q:
    intent='AddToPlaylist'
  if 'playlist' in q:
    intent='AddToPlaylist'
  return intent

In [None]:
from collections import Counter

'''Gives intent wise accuracy of your model'''
def evaluate_intent_accuracy(prediction_function_name):
  correct = Counter()
  total = Counter()
  for i in range(len(test_questions)):
    q = test_questions[i]
    gold_intent = test_answers[i]['intent']
    if prediction_function_name(q) == gold_intent:
      correct[gold_intent] += 1
    total[gold_intent] += 1
  for intent in intents:
    print(intent, correct[intent]/total[intent], total[intent])
    
# Evaluating the intent classifier. 
# In our implementation, a simple keyword based classifier has achieved an accuracy of greater than 65 for each intent
evaluate_intent_accuracy(predict_intent_using_keywords)

BookRestaurant 0.86 100
GetWeather 0.84 100
AddToPlaylist 0.9 100


## Problem 2.2: Statistical intent classifier

Now, let's build a statistical intent classifier. Instead of making use of keywords like what you did above, you will first extract features from a given input question. In order to build a feature representation for a given sentence, make use of word2vec embeddings of each word and take an average to represent the sentence. Then train a logistic regression. Feel free to use any libraries you like.

In [None]:
import nltk
nltk.download('word2vec_sample')

[nltk_data] Downloading package word2vec_sample to /root/nltk_data...
[nltk_data]   Unzipping models/word2vec_sample.zip.


True

In [None]:
from nltk.data import find
import gensim

word2vec_sample = str(find('models/word2vec_sample/pruned.word2vec.txt'))
word2vec_model = gensim.models.KeyedVectors.load_word2vec_format(word2vec_sample, binary=False)

In [None]:
train_question=[d['question'] for d in train_data] #Extract questions and intents from the training dataset 
train_intent=[d['intent'] for d in train_data]

In [None]:
'''Trains a logistic regression model on the entire training data. For an input question (x), the model learns to predict an intent (Y).'''
from sklearn.linear_model import LogisticRegression
import numpy as np
def sentence_vector(sentence):
    sentence = [word for word in sentence if word in word2vec_model.vocab] #Helper method to return representation for the sentence   
    return np.mean(word2vec_model[sentence],axis=0)

def train_logistic_regression_intent_classifier(questions,intents):
    # Fill in your code here
    avg_pred=[]
    avg_intent=[]
    for question in questions:
      sent_em=sentence_vector(question)
      avg_pred.append(sent_em)
    X=np.array(avg_pred)
    lr_model=LogisticRegression()
    lr_model.fit(X, intents)


    # Feel free to add more cells or functions if needed
    return lr_model
lr_model=train_logistic_regression_intent_classifier(train_question,train_intent)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


In [None]:
'''For an input question, the model predicts an intent'''
def predict_intent_using_logistic_regression(question):
    # Fill in your code here
    test_list=[]
    test_em=sentence_vector(question)
    # Feel free to add more cells or functions if needed
    return lr_model.predict(test_em.reshape(1,-1))

In [None]:
# Evaluate the intent classifier
# Your intent classifier performance will be close to 100 if you have done a good job.
evaluate_intent_accuracy(predict_intent_using_logistic_regression)

GetWeather 0.91 100
BookRestaurant 0.93 100
AddToPlaylist 0.9 100


## Problem 2.3: Slot filling

Build a slot filling model. We will just work with `AddToPlaylist` intent. Ignore other intents.

Hint: No need to rely on machine learning here. You can use ideas like maximum string matching to identify which slots are active and what thier values are. This problem's solution is intentionally left underspecified.

In [None]:
# Let's stick to one target intent.
target_intent = "AddToPlaylist"

# This intent has the following slots
target_intent_slot_names = set()
for sample in train_data:
    if sample['intent'] == target_intent:
        for slot_name in sample['slots']:
            target_intent_slot_names.add(slot_name)
print(target_intent_slot_names)


# Extract all the relevant questions of this target intent from the test examples.
target_intent_questions = []
for i, question in enumerate(test_questions):
    if test_answers[i]['intent'] == target_intent:
        target_intent_questions.append(question)
print(target_intent_questions)

{'playlist', 'playlist_owner', 'artist', 'entity_name', 'music_item'}
['Add an artist to Jukebox Boogie Rhythm & Blues', 'add nuba to my Metal Party playlist', 'Add the album to the The Sweet Suite playlist.', 'Add a song to my playlist madden nfl 16', 'Can you put this song from Yutaka Ozaki onto my this is miles davis playlist?', 'Add give us rest to my 70s Smash Hits playlist.', 'Add this album to the Spanish Beat playlist', 'Add lofty fake anagram to the la mejor música de bso playlist.', 'Add the track to the drum & breaks playlist.', 'Add porter wagoner to the The Sleep Machine Waterscapes playlist.', 'Add Richard McNamara newest song to the Just Smile playlist', 'Add Recalled to Life to This Is Alejandro Fernández', 'Add the chris clark tune to my women of the blues playlist.', 'Put Jean Philippe Goncalves onto my running to rock 170 to 190 bpm.', 'add a song in my All Out 60s', 'Add The Famous Flower of Serving-Men to my evening acoustic playlist.', 'add tommy johnson to The Me

In [None]:
target_intent_answers = []#Build a list for answers with intent "AddToPlaylist"
for i, answer in enumerate(test_answers):
    if test_answers[i]['intent'] == target_intent:
        target_intent_answers.append(answer['slots'])

In [None]:
train_slots=[d['slots'] for d in train_data] #Retrieve slots from the training dataset 
slot_dict={} #Build a dictionary with slot names as keys and slot types as values 
for slot in train_slots:
    key_list = list(slot.keys()) 
    val_list = list(slot.values())
    for i in range(0,len(key_list)-1):
      slot_dict[val_list[i]]=key_list[i]


In [None]:
def initialize_slots():
    slots = {}
    for slot_name in target_intent_slot_names:
        slots[slot_name] = None
    return slots

def predict_slot_values(question):
    slots = initialize_slots() 
    q=question.lower().split() #Tokenize the question
    for word in q:
      for slot_name in target_intent_slot_names:
          if word in slot_dict.keys(): #Check if the tokens are in the dictionary
            slots[slot_name]=word
    return slots

def evaluate_slot_prediction_recall(slot_prediction_function):
    correct = Counter()
    total = Counter() 
    # predict slots for each question
    for i, question in enumerate(target_intent_questions):
        i = test_questions.index(question) # This line is added after the assignment release
        gold_slots = test_answers[i]['slots']
        predicted_slots = slot_prediction_function(question)
        for name in target_intent_slot_names:
            if name in gold_slots:
                total[name] += 1.0
                if predicted_slots.get(name, None).lower() if predicted_slots.get(name, None) is not None else None == gold_slots.get(name).lower(): # This line is updated after the assignment release
                    correct[name] += 1.0
    for name in target_intent_slot_names:
        print(f"{name}: {correct[name] / total[name]}")



# Our reference implementation got these numbers. You can ask others on Slack what they got.
# music_item 1.0
# playlist 0.67
# artist  0.021739130434782608
# playlist_owner 0.9444444444444444
# entity_name 0.1111111111111111
print("Slot accuracy for your slot prediction model")
evaluate_slot_prediction_recall(predict_slot_values)


Slot accuracy for your slot prediction model
playlist: 0.84
playlist_owner: 0.9629629629629629
artist: 0.782608695652174
entity_name: 0.7222222222222222
music_item: 0.9636363636363636


In [None]:
# Find a true positive prediction for each slot
# Fill in your code below along with printing your prediction and gold answer
break_num=0
for i in range(0,len(target_intent_questions)-1):
   predicted_slot={}
   predicted_slot=predict_slot_values(target_intent_questions[i])
   True_POS={}#Empty dict for output

   for predicted_key in predicted_slot.keys():
       if predicted_key in target_intent_answers[i].keys() and predicted_slot[predicted_key]==target_intent_answers[i][predicted_key]: #Condition for True Positive
         True_POS[predicted_key]=predicted_slot[predicted_key]

   if len(True_POS)!=0 and break_num<4:#Print 4 examples and break 
    break_num+=1
    print("Question:")
    print(target_intent_questions[i])
    print("Target:") #Print target answers
    print(target_intent_answers[i])
    print("True Positive:")
    print(True_POS)
    print('-'*80)
  

Question:
Add an artist to Jukebox Boogie Rhythm & Blues
Target:
{'music_item': 'artist', 'playlist': 'Jukebox Boogie Rhythm & Blues'}
True Positive:
{'music_item': 'artist'}
--------------------------------------------------------------------------------
Question:
add nuba to my Metal Party playlist
Target:
{'entity_name': 'nuba', 'playlist_owner': 'my', 'playlist': 'Metal Party'}
True Positive:
{'playlist_owner': 'my'}
--------------------------------------------------------------------------------
Question:
Add the album to the The Sweet Suite playlist.
Target:
{'music_item': 'album', 'playlist': 'The Sweet Suite'}
True Positive:
{'music_item': 'album'}
--------------------------------------------------------------------------------
Question:
Add a song to my playlist madden nfl 16
Target:
{'music_item': 'song', 'playlist_owner': 'my', 'playlist': 'madden nfl 16'}
True Positive:
{'playlist_owner': 'my'}
--------------------------------------------------------------------------------

In [None]:
import torch
import torch.nn as nn
a=torch.rand(1,4)
b=torch.rand(4,4)
cos = nn.CosineSimilarity(dim=1)
output = cos(a, b)
print(a)
print(b)
print(output)

tensor([[0.0523, 0.5786, 0.2013, 0.6792]])
tensor([[0.3773, 0.9794, 0.6747, 0.2990],
        [0.6708, 0.5720, 0.3575, 0.5668],
        [0.8677, 0.2762, 0.2937, 0.1248],
        [0.4726, 0.9698, 0.0773, 0.3016]])
tensor([0.7872, 0.8112, 0.3949, 0.7838])


In [None]:
# Find a false positive prediction for each slot
# Fill in your code below along with print statement
break_num=0
false_positive={}
for i in range(0,len(target_intent_questions)-1):
   predicted_slot={}
   predicted_slot=predict_slot_values(target_intent_questions[i])
   False_POS={} #Empty dict to save output
   for predicted_key in predicted_slot.keys():
       if predicted_slot[predicted_key]!=None and predicted_key not in target_intent_answers[i].keys():
         False_POS[predicted_key]=predicted_slot[predicted_key]
   if len(False_POS)!=0 and break_num<4:
    break_num+=1
    print("Question:")
    print(target_intent_questions[i])
    print("Target:")
    print(target_intent_answers[i])
    print("False Positive:")
    print(False_POS)
    print('-'*80)
   

Question:
Add an artist to Jukebox Boogie Rhythm & Blues
Target:
{'music_item': 'artist', 'playlist': 'Jukebox Boogie Rhythm & Blues'}
False Positive:
{'playlist_owner': 'artist', 'artist': 'artist', 'entity_name': 'artist'}
--------------------------------------------------------------------------------
Question:
add nuba to my Metal Party playlist
Target:
{'entity_name': 'nuba', 'playlist_owner': 'my', 'playlist': 'Metal Party'}
False Positive:
{'artist': 'my', 'music_item': 'my'}
--------------------------------------------------------------------------------
Question:
Add the album to the The Sweet Suite playlist.
Target:
{'music_item': 'album', 'playlist': 'The Sweet Suite'}
False Positive:
{'playlist_owner': 'album', 'artist': 'album', 'entity_name': 'album'}
--------------------------------------------------------------------------------
Question:
Add a song to my playlist madden nfl 16
Target:
{'music_item': 'song', 'playlist_owner': 'my', 'playlist': 'madden nfl 16'}
False Pos

In [None]:
v1=[0,1,0,0,0]

In [None]:
# Find a true negative prediction for each slot
# Fill in your code below along with a print statement
break_num=0
for i in range(0,len(target_intent_questions)-1):
   predicted_slot={}
   predicted_slot=predict_slot_values(target_intent_questions[i])
   True_NEG={}
   target_slot={}
   for predicted_key in predicted_slot.keys():
       if predicted_slot[predicted_key]==None and predicted_key not in target_intent_answers[i].keys():
         True_NEG[predicted_key]=predicted_slot[predicted_key]
   if len(True_NEG)!=0 and break_num<4:
    break_num+=1
    print("Question:")
    print(target_intent_questions[i])
    print("Target:")
    print(target_intent_answers[i])
    print("True Negative:")
    print(True_NEG)
    print('-'*80)

Question:
Add lofty fake anagram to the la mejor música de bso playlist.
Target:
{'entity_name': 'lofty fake anagram', 'playlist': 'la mejor música de bso'}
True Negative:
{'playlist_owner': None, 'artist': None, 'music_item': None}
--------------------------------------------------------------------------------
Question:
Add porter wagoner to the The Sleep Machine Waterscapes playlist.
Target:
{'artist': 'porter wagoner', 'playlist': 'The Sleep Machine Waterscapes'}
True Negative:
{'playlist_owner': None, 'entity_name': None, 'music_item': None}
--------------------------------------------------------------------------------
Question:
Add Recalled to Life to This Is Alejandro Fernández
Target:
{'entity_name': 'Recalled to Life', 'playlist': 'This Is Alejandro Fernández'}
True Negative:
{'playlist_owner': None, 'artist': None, 'music_item': None}
--------------------------------------------------------------------------------
Question:
add tommy johnson to The MetalSucks Playlist
Targe

In [None]:
# Find a false negative prediction for each slot
# Fill in your code below along with a print statement
break_num=0
for i in range(0,len(target_intent_questions)-1):
   predicted_slot={}
   predicted_slot=predict_slot_values(target_intent_questions[i])
   False_NEG={}
   target_slot={}
   for predicted_key in predicted_slot.keys():
       if predicted_slot[predicted_key]==None and predicted_key in target_intent_answers[i].keys():
         False_NEG[predicted_key]=None
   if len(False_NEG)!=0 and break_num<4:
    break_num+=1
    print("Question:")
    print(target_intent_questions[i])
    print("Target:")
    print(target_intent_answers[i])
    print("False Negative:")
    print(False_NEG)
    print('-'*80)

Question:
Add lofty fake anagram to the la mejor música de bso playlist.
Target:
{'entity_name': 'lofty fake anagram', 'playlist': 'la mejor música de bso'}
False Negative:
{'playlist': None, 'entity_name': None}
--------------------------------------------------------------------------------
Question:
Add porter wagoner to the The Sleep Machine Waterscapes playlist.
Target:
{'artist': 'porter wagoner', 'playlist': 'The Sleep Machine Waterscapes'}
False Negative:
{'playlist': None, 'artist': None}
--------------------------------------------------------------------------------
Question:
Add Recalled to Life to This Is Alejandro Fernández
Target:
{'entity_name': 'Recalled to Life', 'playlist': 'This Is Alejandro Fernández'}
False Negative:
{'playlist': None, 'entity_name': None}
--------------------------------------------------------------------------------
Question:
add tommy johnson to The MetalSucks Playlist
Target:
{'artist': 'tommy johnson', 'playlist': 'The MetalSucks Playlist'}
