# Homework 6: Neural Coreference Resolution

**Due April 20, 2020 at 11:59pm**

In this homework,  you will be implementing parts of a Pytorch implementation for neural coreference resolution, inspired by [Lee et al.(2017), “End-to-end Neural Coreference Resolution” (EMNLP)](https://arxiv.org/pdf/1707.07045.pdf). 

### REMEMBER TO UPLOAD THE DATASET!
Click the Files icon > Upload > Upload train.conll and dev.conll that you have downloaded from bCourses: Files/HW_6

### Setup

In [0]:
import sys, re
from collections import Counter

import torch
from torch import nn
import torch.optim as optim

import numpy as np
from scipy.stats import spearmanr

We noticed that running this on CPU is faster than running on GPU. Thus, we will default to running on CPU. However, feel free to change it to GPU if you wish.

In [9]:
device = torch.device("cpu")
print("Running on {}".format(device))

Running on cpu


### Download and process data
Note: You do **not** have to modify this section.

In [45]:
!wget http://nlp.stanford.edu/data/glove.6B.zip
!unzip glove*.zip

--2020-04-21 01:42:31--  http://nlp.stanford.edu/data/glove.6B.zip
Resolving nlp.stanford.edu (nlp.stanford.edu)... 171.64.67.140
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://nlp.stanford.edu/data/glove.6B.zip [following]
--2020-04-21 01:42:31--  https://nlp.stanford.edu/data/glove.6B.zip
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://downloads.cs.stanford.edu/nlp/data/glove.6B.zip [following]
--2020-04-21 01:42:32--  http://downloads.cs.stanford.edu/nlp/data/glove.6B.zip
Resolving downloads.cs.stanford.edu (downloads.cs.stanford.edu)... 171.64.64.22
Connecting to downloads.cs.stanford.edu (downloads.cs.stanford.edu)|171.64.64.22|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 862182613 (822M) [application/zip]
Saving to: ‘glove.6B.zip’


2020-0

In [0]:
def read_conll(filename):

  docid=None
  partID=None

  # collection
  all_sents=[]
  all_ents=[]

  # for one doc
  all_doc_sents=[]
  all_doc_ents=[]

  # for one sentence
  sent=[]
  ents=[]

  named_ents=[]
  cur_tid=0
  open_count=0

  global_eid=0
  doc_eid_to_global_eid={}

  with open(filename, encoding="utf-8") as file:
    for line in file:
      if line.startswith("#begin document"):

        all_doc_ents=[]
        all_doc_sents=[]

        open_ents={}
        open_named_ents={}

        docid=None
        matcher=re.match("#begin document \((.*)\); part (.*)$", line.rstrip())
        if matcher != None:
          docid=matcher.group(1)
          partID=matcher.group(2)

      elif line.startswith("#end document"):

        all_sents.append(all_doc_sents)
        all_ents.append(all_doc_ents)

        
      else:

        parts=re.split("\s+", line.rstrip())

        # sentence boundary
        if len(parts) < 2:
    
          all_doc_sents.append(sent)

          ents=sorted(ents, key=lambda x: (x[0], x[1]))

          all_doc_ents.append(ents)

          sent=[]
          ents=[]

          cur_tid=0

          continue

        tid=cur_tid
        token=parts[3]
        cur_tid+=1

        identifier="%s.%s" % (docid, partID)

        coref=parts[-1].split("|")

        for c in coref:
          if c.startswith("(") and c.endswith(")"):
            c=re.sub("\(", "", c)
            c=int(re.sub("\)", "", c))

            if (identifier, c) not in doc_eid_to_global_eid:
              doc_eid_to_global_eid[(identifier, c)]=len(doc_eid_to_global_eid)

            ents.append((tid, tid, doc_eid_to_global_eid[(identifier, c)], identifier))

          elif c.startswith("("):
            c=int(re.sub("\(", "", c))

            if c not in open_ents:
              open_ents[c]=[]
            open_ents[c].append(tid)
            open_count+=1

          elif c.endswith(")"):
            c=int(re.sub("\)", "", c))

            assert c in open_ents

            start_tid=open_ents[c].pop()
            open_count-=1

            if (identifier, c) not in doc_eid_to_global_eid:
              doc_eid_to_global_eid[(identifier, c)]=len(doc_eid_to_global_eid)

            ents.append((start_tid, tid, doc_eid_to_global_eid[(identifier, c)], identifier))

        sent.append(token)

  return all_sents, all_ents

def load_embeddings(filename, vocab_size):
  # 0 idx is for padding
  # 1 idx is for unknown words

  # get the embedding size from the first embedding
  with open(filename, encoding="utf-8") as file:
    word_embedding_dim=len(file.readline().split(" "))-1

  vocab={"[PAD]":0, "[UNK]":1}

  print("word_embedding_dim:", word_embedding_dim)

  embeddings=np.zeros((vocab_size, word_embedding_dim))

  with open(filename, encoding="utf-8") as file:
    for idx,line in enumerate(file):

      if idx + 2 >= vocab_size:
        break

      cols=line.rstrip().split(" ")
      val=np.array(cols[1:])
      word=cols[0]
      embeddings[idx+2]=val
      vocab[word]=idx+2

  return torch.FloatTensor(embeddings), vocab

In [46]:
#embeddingFile = "glove.6B.50d.txt"
embeddingFile = "glove.6B.100d.txt"
trainFile = "train.conll"
devFile = "dev.conll"

all_sents, all_ents=read_conll(trainFile)	
dev_all_sents, dev_all_ents=read_conll(devFile)

embeddings, vocab=load_embeddings(embeddingFile, 50000)


word_embedding_dim: 100


### **Part 1. Implement B3**

In this part, you’ll implement the B3 coreference metric as discussed in class without importing external libraries. 

Recall the definition: 
$B^{_{precision}^{3}} = \frac{1}{n}\sum_{i}^{n} \frac{\left |Gold_{i} \cap  System_{i} \right |}{\left | System_{i} \right |}$
$B^{_{recall}^{3}} = \frac{1}{n}\sum_{i}^{n} \frac{\left |Gold_{i} \cap  System_{i} \right |}{\left | Gold_{i} \right |}$

You should be able to pass the sanity check b3_test() after implementing it.


In [0]:
def b3(gold, system):
  """ Calculate B3 metrics given the gold and system output
    Args:
        gold  : A dictionary that contains true refereneces. The key is a tuple, (docid, absolute_start_idx, absolute_end_idx)
                representing a target to be predicted; the value is the true reference entity id.
        system: A dictionary that contains predicted referenece. The key in gold and system should be identical; the value
                is the predicted entity generated by the model.
    Returns:
        precision, recall, F(following the formula above)

    """
  precision=0.
  recall=0.
  F = 0.
  #####
  # Your code here
  #Build a dictionary with key as entity and value as the actual word for both gold and system
  gold_ref = {}
  for key in system:
    if(gold[key] in gold_ref):
      gold_ref[gold[key]].append(key)
    else:
      gold_ref[gold[key]] = [key]
  #print(gold_ref)

  system_ref = {}
  for key in system:
    if(system[key] in system_ref):
      system_ref[system[key]].append(key)
    else:
      system_ref[system[key]] = [key]
  #print(system_ref)

  gold_mapping = {}
  for key in gold_ref:
    for mention in gold_ref[key]:
      if(mention not in gold_mapping):
        gold_mapping[mention] = gold_ref[key]
  #print(gold_mapping)

  system_mapping = {}
  for key in system_ref:
    for mention in system_ref[key]:
      if(mention not in system_mapping):
        system_mapping[mention] = system_ref[key]
  #print(system_mapping)

  #Now loop over all the mentions and find the intersections and calculate precision, recall, F
  precision = 0
  recall = 0
  F=0

  for key in gold_mapping:
    intersection = set(system_mapping[key]).intersection(set(gold_mapping[key]))
    if(len(intersection)!=0):
      precision+= len(intersection)/len(system_mapping[key])
      recall+=  len(intersection)/len(gold_mapping[key])
    
  precision = precision / len(gold_mapping)   #Note: gold_mapping and system_mapping has same number of keys
  recall = recall / len(gold_mapping)
  
  if(precision !=0 and recall !=0):
    F = 2 * precision * recall / (precision+recall)

  #####

  return precision, recall, F

In [13]:
def b3_test():
  gold={"a1":1, "a2": 2, "a3": 1, "a4":1, "a5": 3, "a6":3, "a7":2, "a8":2, "a9":1, "a10":1}
  system={"a1":5, "a2": 6, "a3": 6, "a4":6, "a5": 7, "a6":7, "a7":5, "a8":5, "a9":5, "a10":8}

  precision, recall, F=b3(gold, system)
  print("P: %.3f, R: %.3f, F: %.3f" % (precision, recall, F))

  assert abs(precision-0.667) < 0.001
  assert abs(recall-0.547) < 0.001
  assert abs(F-0.601) < 0.001
  
  print ("B3 sanity check passed")
b3_test()

P: 0.667, R: 0.547, F: 0.601
B3 sanity check passed


### **Part 2. Neural coref**
In part 2, the skeleton code for mention-ranking model is provided to you, you will not need to change any code until Part 2.1 begins. The following section provides the Mention class which is used to store relavant information about a mention and the BasicCorefModel. You will, at the very least, need to carefully read these two classes and understand the information stored in Mention and the structure of the model to complete this homework.


In [0]:
class Mention():

  """
  An object to contain information about each mention
  """

  def __init__(self, mention_id, docid, absolute_start_idx, absolute_end_idx, sentence_start_idx, sentence_end_idx, sentence, vocab):
    self.docid=docid

    # mention id (globally unique within one file, but not across different train and test files)
    self.mention_id=mention_id
    # the token index of the mention start position, measured from the beginning of the document
    self.absolute_start_idx=absolute_start_idx
    # the token index of the mention end position, measured from the beginning of the document
    self.absolute_end_idx=absolute_end_idx
    # the token index of the mention start position, measured from the beginning of the sentence
    self.sentence_start_idx=sentence_start_idx
    # the token index of the mention end position, measured from the beginning of the sentence
    self.sentence_end_idx=sentence_end_idx
    # a list of tokens for all the words in the mention's sentence
    self.sentence=sentence
    # a list of tokens ids for all the words in the mention's sentence
    self.sentence_ids=[]
    self.sentence_length=len(sentence)

    for word in sentence:
      word=word.lower()
      self.sentence_ids.append(vocab[word] if word in vocab else vocab["[UNK]"])

In [0]:
def convert_data_to_training_instances(all_sents, all_ents, vocab):
  X=[]
  Y=[]
  M=[]

  global_id=0
  truth={}

  for doc_idx, doc_ent in enumerate(all_ents):

    current_token_position=0

    existing_mentions=[]

    for sent_idx, mention_list in enumerate(doc_ent):
      sent=all_sents[doc_idx][sent_idx]

      for mention_idx, mention in enumerate(mention_list):

        start_sent_idx, end_sent_idx, entity_id, identifier=mention

        mention=Mention(global_id, identifier, current_token_position+start_sent_idx, current_token_position+end_sent_idx, start_sent_idx, end_sent_idx, sent, vocab)
        M.append(mention)
        truth[global_id]=entity_id

        global_id+=1

        x=[]
        y=[]

        for aidx, antecedent in enumerate(existing_mentions):
          x.append(antecedent)
          if truth[antecedent.mention_id] == truth[mention.mention_id]:
            y.append(aidx)

        X.append(x)
        Y.append(torch.LongTensor(y).to(device))

        existing_mentions.append(mention)

      current_token_position+=len(sent)

  return X, Y, M, truth

In [0]:
class BasicCorefModel(nn.Module):

	def __init__(self, vocab, embeddings):
		super(BasicCorefModel, self).__init__()

		self.vocab=vocab

		self.embeddings = nn.Embedding.from_pretrained(embeddings)

		_, embedding_size=embeddings.shape

		self.hidden_dim=50

		self.input_size=2 * embedding_size

		self.W1 = nn.Linear(self.input_size, self.hidden_dim)
		self.tanh=nn.Tanh()
		self.W2 = nn.Linear(self.hidden_dim, 1)	

	def scorer(self, batch_x, batch_m):

		"""
		Input: a batch containing:
			-- batch_m [list of Mention objects]: mention to resolve.  batch_m[i] contains a single Mention
			-- batch_x [list of [list of Mention objects]]: candidate antecedents. batch_x[i] contains a list of candidate antecedents for mention batch_m[i]

		Each input batch is batched to contain the same number of candidate antecedents

		Output: numpy matrix [batch_size, number_of_antecedents + 1, 1] containing scores for all antecedents
			-- for j < number_of_antecedents, output[i,j] contains the score of batch_x[i][j] being the correct antecedent for batch_m[i] 
			-- for j == number_of_antecedents, output[i,j] = 0 (the score for batch_m[i] being linked to no antecedent)

		"""

		this_batch_size=len(batch_x)
		num_ants=len(batch_x[0])

		# get representations for mentions
		lastWordID=[]

		for idx, mention in enumerate(batch_m):
			lastWordID.append(mention.sentence_ids[mention.sentence_end_idx])

		# [this_batch_size, 1, embedding_size]
		mention_LW_embeddings=self.embeddings(torch.LongTensor(lastWordID).to(device)).unsqueeze(1)

		# get representations for antecedents
		antLastWords=[]
		for idx in range(len(batch_x)):
			antWords=[]
			for ant_idx, ant in enumerate(batch_x[idx]):
				antWords.append(ant.sentence_ids[ant.sentence_end_idx])

			antLastWords.append(antWords)

		# [this_batch_size, num_ants, embedding_size]
		antecedent_LW_embeddings=self.embeddings(torch.LongTensor(antLastWords).to(device))

		# We want to generate a score for each antecedent for each mention. However,
		# mention_LW_embeddings is [this_batch_size, 1, embedding_size] while,
		# antecedent_LW_embeddings is [this_batch_size, num_ants, embedding_size].
		# So let's make a bunch of copies of mention_LW_embeddings (one for each of its candidate antecedents)

		# [this_batch_size, num_ants, embedding_size]
		mention_LW_embeddings_copies=mention_LW_embeddings.expand_as(antecedent_LW_embeddings)

		# Now that they're the same size, we can concatenate them together into one big matrix

		# [this_batch_size, num_ants, (embedding_size + embedding_size)]
		all_features=torch.cat([mention_LW_embeddings_copies, antecedent_LW_embeddings], 2)
		
		# [this_batch_size, num_ants, 1]
		preds=self.W2(self.tanh(self.W1(all_features))).squeeze(-1)

		# Let's fix the score for starting a new entity to be 0; all of the other scores for candidate antecedents will end up 
		# being relative to that.

		# [this_batch_size, 1]
		zeros=torch.FloatTensor(np.zeros((this_batch_size, 1))).to(device)

		# [this_batch_size, num_ants + 1, 1]		
		preds=torch.cat((preds, zeros), 1)

		return preds

In [0]:
######### HELPER FUNCTION FOR TRAINING STARTS #########
#########  DONT'T EDIT THIS SECTION OF CODE   #########
def forward_predict(batch_x, batch_m, scoring_function):

  this_batch_size=len(batch_x)
  num_ants=len(batch_x[0])

  # if this batch has no antecedents, then it must start a new entity
  if num_ants == 0:
    return torch.LongTensor([0]*this_batch_size)
  
  # get predictions
  preds=scoring_function(batch_x, batch_m)

  # 
  arg_sorts=torch.argsort(preds, descending=True, dim=1)
  tops=arg_sorts[:,0]

  return tops


def forward_train(batch_x, batch_y, batch_m, scoring_function):

  num_batch=len(batch_x)
  num_ants=len(batch_x[0])

  # if this batch has no candidate antecedents, then each mention must start a new entity so there is only only choice we could make (hence no loss)
  if num_ants == 0:
    return None

  preds=scoring_function(batch_x, batch_m)
  preds_sum=torch.logsumexp(preds, 1)

  running_loss=None


  for i in range(num_batch):

    # optimize marginal log-likelihood of true antecedents
    if batch_y[i].nelement() == 0:
      golds_sum=0.
    else:
      golds=torch.index_select(preds[i], 0, batch_y[i])
      golds_sum=torch.logsumexp(golds, 0)

    diff=preds_sum[i]-golds_sum

    running_loss = diff if running_loss is None else running_loss + diff

  return running_loss

def get_batches(X, Y, M, batchsize):
  sizes={}
  for i in range(len(M)):
    size=len(X[i])
    if size not in sizes:
      sizes[size]=[]
    sizes[size].append((X[i], Y[i], M[i]))

  batches=[]

  for size in sizes:
    i=0
    while (i < len(sizes[size])):

      data=sizes[size][i:i+batchsize]
      batch_x=[]
      batch_y=[]
      batch_m=[]
      for x, y, m in data:
        batch_x.append(x)
        batch_y.append(y)
        batch_m.append(m)

      batches.append((batch_x, batch_y, batch_m))
      i+=batchsize

  return batches


def train(X, Y, M, train_gold, test_X, test_Y, test_M, test_gold, model):

  batches=get_batches(X, Y, M, 32)
  test_batches=get_batches(test_X, test_Y, test_M, 32)
  optimizer = optim.Adam(model.parameters(), lr=0.001)

  for epoch in range(10):

    model.train()
    # train
    bigloss=0.
    for batch_x, batch_y, batch_m in batches:
      model.zero_grad()
      loss=forward_train(batch_x, batch_y, batch_m, model.scorer)
      if loss is not None:
        loss.backward()
        optimizer.step()
        bigloss+=loss

    # evaluate
    model.eval()

    gold={}
    predicted={}

    eid=0
    tot=0

    for batch_x, batch_y, batch_m in test_batches:
      predictions=forward_predict(batch_x, batch_m, model.scorer)

      for idx, mention in enumerate(batch_m):

        gold[mention.docid, mention.absolute_start_idx, mention.absolute_end_idx]=test_gold[mention.mention_id]
        prediction=predictions[idx]
        tot+=1
      
        # prediction is to start a new entity
        if prediction >= len(batch_x[idx]):
          predicted[mention.docid, mention.absolute_start_idx, mention.absolute_end_idx]=eid
          eid+=1

        # prediction is to link to a previous mention
        else:

          best_antecedent=batch_x[idx][prediction]
          predicted_entity_id=predicted[best_antecedent.docid, best_antecedent.absolute_start_idx, best_antecedent.absolute_end_idx]
          predicted[mention.docid, mention.absolute_start_idx, mention.absolute_end_idx]=predicted_entity_id

    P, R, F=b3(gold, predicted)
    print("loss: %.3f, B3 F: %.3f, unique entities: %s, num mentions: %s" % (bigloss, F, eid, tot))

def set_seed(seed):
  """
  Sets random seeds and sets model in deterministic
  training mode. Ensures reproducible results
  """
  torch.manual_seed(seed)
  torch.backends.cudnn.deterministic = True
  torch.backends.cudnn.benchmark = False
  np.random.seed(seed)
######### HELPER FUNCTION FOR TRAINING ENDS #########
#########  DONT'T EDIT THIS SECTION OF CODE   #########

Now, everything is set up to run the BasicCorefModel. Let's run the cell below to train the model and look at the result of the model.

In [21]:
X, Y, M, train_truth=convert_data_to_training_instances(all_sents, all_ents, vocab)
dev_X, dev_Y, dev_M, dev_truth=convert_data_to_training_instances(dev_all_sents, dev_all_ents, vocab)
set_seed(159)
model=BasicCorefModel(vocab, embeddings)
model=model.to(device)
print ("Training BasicCorefModel")
train(X, Y, M, train_truth, dev_X, dev_Y, dev_M, dev_truth, model)

Training BasicCorefModel
loss: 41274.824, B3 F: 0.764, unique entities: 29578, num mentions: 29597
loss: 33196.121, B3 F: 0.764, unique entities: 29569, num mentions: 29597
loss: 29578.076, B3 F: 0.765, unique entities: 29323, num mentions: 29597
loss: 27773.998, B3 F: 0.771, unique entities: 28797, num mentions: 29597
loss: 26788.645, B3 F: 0.779, unique entities: 27948, num mentions: 29597
loss: 26151.428, B3 F: 0.782, unique entities: 27427, num mentions: 29597
loss: 25682.391, B3 F: 0.783, unique entities: 27372, num mentions: 29597
loss: 25319.143, B3 F: 0.786, unique entities: 26932, num mentions: 29597
loss: 25024.688, B3 F: 0.789, unique entities: 26854, num mentions: 29597
loss: 24776.920, B3 F: 0.793, unique entities: 26450, num mentions: 29597


### **Part 2.1 Incorporate distance**

In this part, you should incorporate the word distance information to BasicCorefModel described in the HW. The below code structure provided to you is exactly the same as BasicCorefModel, your job is to add code into both __init__() and scorer() functions as you see fit.

Hint: You might consider initialize distance embedding in __init__() function, then concatenate the original embedding and the corresponding distance embedding in scorer(). 

After implementing this, run the sanity check, test_distance(), provided to you.

In [0]:
class DistanceCorefModel(nn.Module):

	""" The code provided here starts out as just a copy of BasicCorefModel """

	def __init__(self, vocab, embeddings):
		super(DistanceCorefModel, self).__init__()
		self.vocab=vocab
		self.embeddings = nn.Embedding.from_pretrained(embeddings)
		dist_embeddings=torch.FloatTensor(np.eye(10)) #There are 10 buckets
		self.dist_embeddings = nn.Embedding.from_pretrained(dist_embeddings, freeze = False) #Trainable
		
		_, embedding_size= embeddings.shape
		_, dist_embed_size = dist_embeddings.shape
		
		self.hidden_dim=50
		self.input_size= 2 * embedding_size + dist_embed_size
		self.W1 = nn.Linear(self.input_size, self.hidden_dim)
		self.tanh=nn.Tanh()
		self.W2 = nn.Linear(self.hidden_dim, 1)	

	def scorer(self, batch_x, batch_m):

		"""
		Input: a batch containing:
			-- batch_m [list of Mention objects]: mention to resolve.  batch_m[i] contains a single Mention
			-- batch_x [list of [list of Mention objects]]: candidate antecedents. batch_x[i] contains a list of candidate antecedents for mention batch_m[i]

		Each input batch is batched to contain the same number of candidate antecedents

		Output: numpy matrix [batch_size, number_of_antecedents + 1, 1] containing scores for all antecedents
			-- for j < number_of_antecedents, output[i,j] contains the score of batch_x[i][j] being the correct antecedent for batch_m[i] 
			-- for j == number_of_antecedents, output[i,j] = 0 (the score for batch_m[i] being linked to no antecedent)

		"""
		device = torch.device("cpu")

		this_batch_size=len(batch_x)
		num_ants=len(batch_x[0])

		# get representations for mentions
		lastWordID=[]

		for idx, mention in enumerate(batch_m):
			lastWordID.append(mention.sentence_ids[mention.sentence_end_idx])

		# [this_batch_size, 1, embedding_size]
		mention_LW_embeddings=self.embeddings(torch.LongTensor(lastWordID).to(device)).unsqueeze(1)

		# get representations for antecedents
		antLastWords=[]
		for idx in range(len(batch_x)):
			antWords=[]
			for ant_idx, ant in enumerate(batch_x[idx]):
				antWords.append(ant.sentence_ids[ant.sentence_end_idx])
			antLastWords.append(antWords)
	 

		# [this_batch_size, num_ants, embedding_size]
		antecedent_LW_embeddings=self.embeddings(torch.LongTensor(antLastWords).to(device))

		# We want to generate a score for each antecedent for each mention. However,
		# mention_LW_embeddings is [this_batch_size, 1, embedding_size] while,
		# antecedent_LW_embeddings is [this_batch_size, num_ants, embedding_size].
		# So let's make a bunch of copies of mention_LW_embeddings (one for each of its candidate antecedents)

		# [this_batch_size, num_ants, embedding_size]
		mention_LW_embeddings_copies=mention_LW_embeddings.expand_as(antecedent_LW_embeddings)
		# Now that they're the same size, we can concatenate them together into one big matrix

		#Adding distance feature
		all_mention_dist=[]
		for idx, mention in enumerate(batch_m):
			mention_id = mention.absolute_end_idx
			mention_dist=[]

			for ant in batch_x[idx]:
				ant_id = ant.absolute_end_idx
				dist = abs(mention_id - ant_id)
				if(dist<5):
					mention_dist.append(dist)
				elif(dist<8):
					mention_dist.append(5)
				elif(dist<16):
					mention_dist.append(6)
				elif(dist<32):
					mention_dist.append(7)
				elif(dist<64):
					mention_dist.append(8)
				else:
					mention_dist.append(9)
			all_mention_dist.append(mention_dist)
		 
		dist_result_embeddings=self.dist_embeddings(torch.LongTensor(all_mention_dist).to(device)) 		 
		
		# [this_batch_size, num_ants, (embedding_size + embedding_size + distance_embedding_size)]
		all_features=torch.cat([mention_LW_embeddings_copies, antecedent_LW_embeddings, dist_result_embeddings], 2)
	
		# [this_batch_size, num_ants, 1]
		preds=self.W2(self.tanh(self.W1(all_features))).squeeze(-1)

		# Let's fix the score for starting a new entity to be 0; all of the other scores for candidate antecedents will end up 
		# being relative to that.

		# [this_batch_size, 1]
		zeros=torch.FloatTensor(np.zeros((this_batch_size, 1))).to(device)

		# [this_batch_size, num_ants + 1, 1]		
		preds=torch.cat((preds, zeros), 1)

		return preds

In [0]:
def test_distance(model):
  batch_x=[]
  maxLen=100
  for i in range(maxLen):
    mention=Mention(i, "testdoc", i, i+1, 0, 1, ["John", "Smith", "is", "a", "person"], model.vocab)
    batch_x.append(mention)

  mention=Mention(maxLen, "testdoc", maxLen, maxLen, 0, 0, ["He", "is", "a", "person"], model.vocab)

  preds=model.scorer([batch_x], [mention])
  preds=preds.detach().cpu().numpy()[0]
  spearman, _=spearmanr(preds, np.arange(len(preds)))
  print("Distance check: %.3f" % spearman)
  with open("distance_predictions.txt", "w", encoding="utf-8") as out:
    out.write(' '.join(["%.5f" % x for x in preds]))

In [32]:
set_seed(159)
model=DistanceCorefModel(vocab, embeddings)
model=model.to(device)
print ("Training DistanceCorefModel")
train(X, Y, M, train_truth, dev_X, dev_Y, dev_M, dev_truth, model)
test_distance(model)

Training DistanceCorefModel
loss: 38149.914, B3 F: 0.773, unique entities: 27258, num mentions: 29597
loss: 31470.498, B3 F: 0.793, unique entities: 25529, num mentions: 29597
loss: 27259.770, B3 F: 0.801, unique entities: 24760, num mentions: 29597
loss: 24698.734, B3 F: 0.810, unique entities: 24304, num mentions: 29597
loss: 23273.674, B3 F: 0.813, unique entities: 24084, num mentions: 29597
loss: 22417.852, B3 F: 0.815, unique entities: 23925, num mentions: 29597
loss: 21831.350, B3 F: 0.817, unique entities: 23835, num mentions: 29597
loss: 21388.816, B3 F: 0.818, unique entities: 23737, num mentions: 29597
loss: 21033.633, B3 F: 0.818, unique entities: 23649, num mentions: 29597
loss: 20734.506, B3 F: 0.819, unique entities: 23599, num mentions: 29597
Distance check: 0.890


### **Part 2.2 Design a fancier model**
Here comes the fun part! After completing DistanceCorefModel, you have certain degree of familiarity with the model architecture. In the section, you will be implementing a fancier model using any features you'd like. Feel free to make changes to the architecture you see fit.

Submit this notebook to gradescope and a writeup file "fancymodel.txt" describing your model and the features you use.
**Your code must implement exactly what you describe in your writeup**

In [0]:
class FancyCorefModel(nn.Module):

	""" The code provided here starts out as just a copy of BasicCorefModel """

	def __init__(self, vocab, embeddings):
		super(FancyCorefModel, self).__init__()

		self.vocab=vocab
		self.embeddings = nn.Embedding.from_pretrained(embeddings)

		#Distance embeddings
		dist_embeddings=torch.FloatTensor(np.eye(10)) #There are 10 buckets
		self.dist_embeddings = nn.Embedding.from_pretrained(dist_embeddings, freeze = False) #Trainable

		#Additional features: Gender, Number(Singular vs Plural), Use better pretrained embeddding
		gender_embeddings = torch.FloatTensor(np.eye(4)) #NGender:Male, Female, Neuter, Unknown
		self.gender_embeddings = nn.Embedding.from_pretrained(gender_embeddings, freeze = False) #Trainable
		number_embeddings = torch.FloatTensor(np.eye(3)) #Number:Singular, Plural, Unknown
		self.number_embeddings = nn.Embedding.from_pretrained(number_embeddings, freeze = False) #Trainable
		
		_, embedding_size= embeddings.shape
		_, dist_embed_size = dist_embeddings.shape
		_, gender_embed_size = gender_embeddings.shape
		_, number_embed_size = number_embeddings.shape
		
		self.hidden_dim= 128
		self.input_size= 2 * (embedding_size + gender_embed_size + number_embed_size) + dist_embed_size
		#self.lstm = nn.LSTM(self.input_size, self.hidden_dim, bidirectional=False, dropout = 0.2)
		self.W1 = nn.Linear(self.input_size, self.hidden_dim)
		#self.W1 = nn.Linear(self.hidden_dim, self.hidden_dim)
		self.tanh=nn.Tanh()
		self.W2 = nn.Linear(self.hidden_dim, 1)	

	def scorer(self, batch_x, batch_m):

		"""
		Input: a batch containing:
			-- batch_m [list of Mention objects]: mention to resolve.  batch_m[i] contains a single Mention
			-- batch_x [list of [list of Mention objects]]: candidate antecedents. batch_x[i] contains a list of candidate antecedents for mention batch_m[i]

		Each input batch is batched to contain the same number of candidate antecedents

		Output: numpy matrix [batch_size, number_of_antecedents + 1, 1] containing scores for all antecedents
			-- for j < number_of_antecedents, output[i,j] contains the score of batch_x[i][j] being the correct antecedent for batch_m[i] 
			-- for j == number_of_antecedents, output[i,j] = 0 (the score for batch_m[i] being linked to no antecedent)

		"""

		this_batch_size=len(batch_x)
		num_ants=len(batch_x[0])

		# get representations for mentions
		lastWordID=[]
		mention_gender=[]
		mention_number=[]
		male_pronouns=['he', 'him', 'himself', 'his', 'mister', 'mr.']
		female_pronouns=['she', 'her', 'hers', 'herself', 'miss', 'mrs.', 'ms.']
		neuter_pronouns=['inc.', 'ltd.', 'it', 'thing', 'it', 'its']
		singular_pronouns=['mine', 'yours', 'hers', 'his', 'her', 'my', 'myself', 'herself', 'himself', 'it', 'I', 'me', 'a', 'an', 'this']
		plural_pronouns = ['they', 'we', 'both', 'their', 'them', 'us', 'those', 'these', 'some']
		for idx, mention in enumerate(batch_m):
			lastWordID.append(mention.sentence_ids[mention.sentence_end_idx])
			#mention_entity=' '.join(z.lower() for z in mention.sentence[mention.sentence_start_idx : mention.sentence_end_idx+1])
			mention_entity=[x.lower() for x in mention.sentence[mention.sentence_start_idx : mention.sentence_end_idx+1]]
			#Check Gender
			if(any([x in mention_entity for x in male_pronouns])):
				mention_gender.append(0)
			elif(any([x in mention_entity for x in female_pronouns])):
				mention_gender.append(1)
			elif(any([x in mention_entity for x in neuter_pronouns])):
				mention_gender.append(2)
			else:
				mention_gender.append(3)
			
			#Check number agreement
			if(any([x in mention_entity for x in singular_pronouns])):
				mention_number.append(0)
			elif(any([x in mention_entity for x in plural_pronouns])):
				mention_number.append(1)
			else:
				mention_number.append(2)

			




		# [this_batch_size, 1, embedding_size]
		mention_LW_embeddings=self.embeddings(torch.LongTensor(lastWordID).to(device)).unsqueeze(1)
		mention_gender_embeddings=self.gender_embeddings(torch.LongTensor(mention_gender).to(device)).unsqueeze(1)	
		mention_number_embeddings=self.number_embeddings(torch.LongTensor(mention_number).to(device)).unsqueeze(1)

		# get representations for antecedents
		antLastWords=[]
		ant_gender_all=[]
		ant_number_all=[]
		for idx in range(len(batch_x)):
			antWords=[]
			ant_gender=[]
			ant_number=[]
			for ant_idx, ant in enumerate(batch_x[idx]):
				antWords.append(ant.sentence_ids[ant.sentence_end_idx])
				ant_entity=[x.lower() for x in ant.sentence[ant.sentence_start_idx : ant.sentence_end_idx+1]]
				#Check Gender
				if(any([x in ant_entity for x in male_pronouns])):
					ant_gender.append(0)
				elif(any([x in ant_entity for x in female_pronouns])):
					ant_gender.append(1)
				elif(any([x in ant_entity for x in neuter_pronouns])):
					ant_gender.append(2)
				else:
					ant_gender.append(3)
				
				#Check number agreement
				if(any([x in ant_entity for x in singular_pronouns])):
					ant_number.append(0)
				elif(any([x in ant_entity for x in plural_pronouns])):
					ant_number.append(1)
				else:
					ant_number.append(2)
		 
			antLastWords.append(antWords)
			ant_gender_all.append(ant_gender)
			ant_number_all.append(ant_number)

		# [this_batch_size, num_ants, embedding_size]
		antecedent_LW_embeddings=self.embeddings(torch.LongTensor(antLastWords).to(device))
		antecedent_gender_embeddings=self.gender_embeddings(torch.LongTensor(ant_gender_all).to(device))
		antecedent_number_embeddings=self.number_embeddings(torch.LongTensor(ant_number_all).to(device))

		# We want to generate a score for each antecedent for each mention. However,
		# mention_LW_embeddings is [this_batch_size, 1, embedding_size] while,
		# antecedent_LW_embeddings is [this_batch_size, num_ants, embedding_size].
		# So let's make a bunch of copies of mention_LW_embeddings (one for each of its candidate antecedents)

		# [this_batch_size, num_ants, embedding_size]
		mention_LW_embeddings_copies=mention_LW_embeddings.expand_as(antecedent_LW_embeddings)
		mention_gender_embeddings_copies=mention_gender_embeddings.expand_as(antecedent_gender_embeddings)	
		mention_number_embeddings_copies=mention_number_embeddings.expand_as(antecedent_number_embeddings)

		# Now that they're the same size, we can concatenate them together into one big matrix
		#Adding distance feature
		all_mention_dist=[]
		for idx, mention in enumerate(batch_m):
			mention_id = mention.absolute_end_idx
			mention_dist=[]
			for ant in batch_x[idx]:
				ant_id = ant.absolute_end_idx
				dist = abs(mention_id - ant_id)
				if(dist<5):
					mention_dist.append(dist)
				elif(dist<8):
					mention_dist.append(5)
				elif(dist<16):
					mention_dist.append(6)
				elif(dist<32):
					mention_dist.append(7)
				elif(dist<64):
					mention_dist.append(8)
				else:
					mention_dist.append(9)
			all_mention_dist.append(mention_dist)
		 
		dist_result_embeddings=self.dist_embeddings(torch.LongTensor(all_mention_dist).to(device)) 		 	



		# [this_batch_size, num_ants, (embedding_size + embedding_size + distance_embedding_size)]
		all_features=torch.cat([mention_LW_embeddings_copies, antecedent_LW_embeddings, mention_gender_embeddings_copies, antecedent_gender_embeddings,mention_number_embeddings_copies,antecedent_number_embeddings,dist_result_embeddings], 2)
		
		# [this_batch_size, num_ants, 1]
		#packed_output, _ = self.lstm(all_features)
		#preds=self.W2(self.tanh(self.W1(packed_output))).squeeze(-1)		
		preds=self.W2(self.tanh(self.W1(all_features))).squeeze(-1)

		# Let's fix the score for starting a new entity to be 0; all of the other scores for candidate antecedents will end up 
		# being relative to that.

		# [this_batch_size, 1]
		zeros=torch.FloatTensor(np.zeros((this_batch_size, 1))).to(device)

		# [this_batch_size, num_ants + 1, 1]		
		preds=torch.cat((preds, zeros), 1)

		return preds

In [62]:
model=FancyCorefModel(vocab, embeddings)
model=model.to(device)

print ("Training FancyCorefModel")
train(X, Y, M, train_truth, dev_X, dev_Y, dev_M, dev_truth, model)

Training FancyCorefModel
loss: 34787.516, B3 F: 0.793, unique entities: 25378, num mentions: 29597
loss: 26634.689, B3 F: 0.813, unique entities: 24364, num mentions: 29597
loss: 22444.150, B3 F: 0.824, unique entities: 23718, num mentions: 29597
loss: 20458.004, B3 F: 0.827, unique entities: 23347, num mentions: 29597
loss: 19305.000, B3 F: 0.831, unique entities: 23000, num mentions: 29597
loss: 18466.619, B3 F: 0.833, unique entities: 22795, num mentions: 29597
loss: 17778.592, B3 F: 0.834, unique entities: 22640, num mentions: 29597
loss: 17177.316, B3 F: 0.833, unique entities: 22526, num mentions: 29597
loss: 16631.020, B3 F: 0.833, unique entities: 22516, num mentions: 29597
loss: 16125.800, B3 F: 0.833, unique entities: 22442, num mentions: 29597
