# AIR - Exercise in Google Colab

## Colab Preparation

Open via google drive -> right click: open with Colab

**Get a GPU**

Toolbar -> Runtime -> Change Runtime Type -> GPU

**Mount Google Drive**

* Download data and clone your github repo to your Google Drive folder
* Use Google Drive as connection between Github and Colab (Could also use direct github access, but re-submitting credentials might be annoying)
* Commit to Github locally from the synced drive

**Keep Alive**

When training google colab tends to kick you out, This might help: https://medium.com/@shivamrawat_756/how-to-prevent-google-colab-from-disconnecting-717b88a128c0

**Get Started**

Run the following script to mount google drive and install needed python packages. Pytorch comes pre-installed.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [47]:
import pandas as pd
import numpy as np
from statistics import mean

from core_metrics import compute_f1, compute_exact

In [17]:
# Reading the data
answers = pd.read_csv("../msmarco-fira-21.qrels.qa-answers.tsv", sep='\t', error_bad_lines=False, names=[0,1,2,3,4,5,6,7,8,9, 10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39])
tuples = pd.read_csv("../msmarco-fira-21.qrels.qa-tuples.tsv", sep='\t', error_bad_lines=False, names=[0,1,2,3,4,5,6,7,8,9, 10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44])

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


In [18]:
# Data preparation/formatting

for y in range (0,len(answers)):
  res = answers.loc[y,4]
  for x in range(5,40):
    if isinstance(answers.loc[y,x], str) == False:
      answers.loc[y,3] = res
      break
    else:
      res += " " + answers.loc[y,x]

answers = answers.loc[:,0:3]

for y in range (0,len(tuples)):
  res = tuples.loc[y,6]
  for x in range(7,44):
    if isinstance(tuples.loc[y,x], str) == False:
      tuples.loc[y,5] = res
      break
    else:
      res += " " + tuples.loc[y,x]

tuples = tuples.loc[:,0:5]

In [None]:
!pip install transformers

In [None]:
# Predicting answers using Roberta Base Squad 2
from transformers import RobertaTokenizer, RobertaForQuestionAnswering

import torch

tokenizer = RobertaTokenizer.from_pretrained("deepset/roberta-base-squad2")

model = RobertaForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2")

results = []

for i in range(0,len(tuples)):

  question, text = tuples.loc[i,3], tuples.loc[i,4]

  inputs = tokenizer(question, text, return_tensors="pt")

  with torch.no_grad():

      outputs = model(**inputs)

  answer_start_index = outputs.start_logits.argmax()

  answer_end_index = outputs.end_logits.argmax()

  predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]

  answer = tokenizer.decode(predict_answer_tokens)

  results.append(answer)

In [19]:
# Appending the predicted answers to the tuples table
tuples[6] = results

In [29]:
# Inspecting results
tuples.head(10)

Unnamed: 0,0,1,2,3,4,5
0,135386,100163,3,definition of imagination,imagination - the formation of a mental image ...,the formation of a mental image of something t...
1,290779,101026,3,how many oscars has clint eastwood won?pdrijgh...,Clint Eastwood -- five-time Oscar winner and e...,five
2,21741,1021598,3,are cold sores and fever blisters the same,"Cold sores, sometimes called fever blisters, a...","Cold sores, sometimes called fever blisters"
3,810210,1029662,3,what is the cause of blood in the stool,Having blood in the stool can be the result of...,"wide variety of conditions, such as hemorrhoid..."
4,1097448,103635,3,how many calories in slim fast shakes,"The chocolate-flavored shake contains 190, whi...",chocolate-flavored shake contains 190 Cafe Cla...
5,36133,103776,3,average download speed,So what’s the average US Internet speed? Accor...,8.6 Mbps
6,85018,1042657,2,causes for shingles,Shingles is caused by the varicella-zoster vir...,caused by the varicella-zoster virus chickenpox
7,987100,10462,2,where is magma found within our earth,Magma (from Ancient Greek μάγμα (mágma) meanin...,beneath the surface of the Earth
8,709560,1050990,3,what is all in basic metabolic panel,Basic Metabolic Panel. The basic metabolic pan...,seven to eight blood tests thatmeasure certain...
9,285729,1054505,2,how many hours are in fmla,FMLA Eligibility. Employees are considered to ...,"1,250 hours"


In [16]:
# Counting questions without predicted answers
len(tuples[tuples[6]=="<s>"])

11971

In [55]:
# Returning the number of total questions
len(tuples)

52606

In [31]:
# Due to the long runtime of the application of the question-answering prediction model, the results are saved into the "results.txt" file.
f = open("../task3_results.txt", "r")
results = f.read().split("\n")[:-1]
f.close()

In [32]:
tuples[6] = results

In [34]:
# Computing f-statistics using the answers from provided FiRA gold-label pairs
f1_tuples = []
for i in range(len(tuples)):
    res = compute_f1(tuples.loc[i,5] , tuples.loc[i,6])
    f1_tuples.append(res)

In [36]:
# Computing exact matching using the answers from provided FiRA gold-label pairs
compute_exact_tuples = []
for i in range(len(tuples)):
    res = compute_exact(tuples.loc[i,5] , tuples.loc[i,6])
    compute_exact_tuples.append(res)

In [40]:
# Computing f-statistics using results from the best re-ranking model 
f1_answers = []
for i in range(len(tuples)):
    res = compute_f1(str(answers.loc[i,3]) , tuples.loc[i,6])
    f1_answers.append(res)

In [42]:
# Computing exact matching using results from the best re-ranking model 
compute_exact_answers = []
for i in range(len(tuples)):
    res = compute_exact(str(answers.loc[i,3]) , tuples.loc[i,6])
    compute_exact_answers.append(res)

Calculating the mean evaluation scores:

In [50]:
# f-statistics with gold-label pairs
mean(f1_tuples)

0.31953954058379935

In [52]:
# Exact matching with gold-label pairs
mean(compute_exact_tuples)

0.08768961715393682

In [53]:
# f-statistics with pairs resulting from best re-ranking model
mean(f1_answers)

0.31953145154614404

In [54]:
# Exact matching with pairs resulting from best re-ranking model
mean(compute_exact_answers)

0.08768961715393682