<a href="https://colab.research.google.com/github/arinakosovskaia/SQuAD2.0/blob/main/T5_Squad2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
import os
os.chdir('/content/drive/MyDrive/Colab Notebooks/Practical_09/tutorial_9_revised')
os.getcwd()

'/content/drive/MyDrive/Colab Notebooks/Practical_09/tutorial_9_revised'

In [3]:
!pip -q install datasets

In [4]:
import datasets

squad2_qa = datasets.load_dataset('squad_v2', split='validation')
squad2_qa[:5]



{'id': ['56ddde6b9a695914005b9628',
  '56ddde6b9a695914005b9629',
  '56ddde6b9a695914005b962a',
  '56ddde6b9a695914005b962b',
  '56ddde6b9a695914005b962c'],
 'title': ['Normans', 'Normans', 'Normans', 'Normans', 'Normans'],
 'context': ['The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse ("Norman" comes from "Norseman") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants would gradually merge with the Carolingian-based cultures of West Francia. The distinct cultural and ethnic identity of the Normans emerged initially in the first half of the 10th century, and it continued to evolve over the succeeding centuries.',
  'The Normans (N

In [5]:
!pip -q install transformers==4.22.2
!pip -q install -U sentence-transformers

In [6]:
import os
from sentence_transformers import util

wikipedia_filepath = 'simplewiki-2020-11-01.jsonl.gz'
if not os.path.exists(wikipedia_filepath):
    util.http_get('http://sbert.net/datasets/simplewiki-2020-11-01.jsonl.gz', wikipedia_filepath)

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


Moving 0 files to the new cache system


0it [00:00, ?it/s]

In [7]:
import json
import gzip

# NOTE: Change this flag to use only first paragraph
only_first = False

passages = []
# Open the file with the dump of Simple Wikipedia
with gzip.open(wikipedia_filepath, 'rt', encoding='utf8') as f:
    # Iterate over the lines
    for line in f:
        # Parse the document using JSON
        data = json.loads(line.strip())
        if only_first:
            # Only add the first paragraph
            passages.append(data['paragraphs'][0])
        else:
            # Add all paragraphs
            passages.extend(data['paragraphs'])

print(f"Retrieved {len(passages)} passages")

Retrieved 509663 passages


In [8]:
from sentence_transformers import SentenceTransformer, CrossEncoder

semb_model = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')
xenc_model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

In [9]:
import os
import pickle

# Define hnswlib index path
embeddings_cache_path = './qa_embeddings_cache.pkl'

# Load cache if available
if os.path.exists(embeddings_cache_path):
    print('Loading embeddings cache')
    with open(embeddings_cache_path, 'rb') as f:
        corpus_embeddings = pickle.load(f)
# Else compute embeddings
else:
    print('Computing embeddings')
    corpus_embeddings = semb_model.encode(passages, convert_to_tensor=True, show_progress_bar=True)
    # Save the index to a file for future loading
    print(f'Saving index to: \'{embeddings_cache_path}\'')
    with open(embeddings_cache_path, 'wb') as f:
        pickle.dump(corpus_embeddings, f)

Loading embeddings cache


In [10]:
!pip -q install hnswlib

In [11]:
import os
import hnswlib

# Create empthy index
index = hnswlib.Index(space='cosine', dim=384)

# Define hnswlib index path
index_path = './qa_hnswlib.index'

# Load index if available
if os.path.exists(index_path):
    print('Loading index...')
    index.load_index(index_path)
# Else index data collection
else:
    # Initialise the index
    print('Start creating HNSWLIB index')
    index.init_index(max_elements=corpus_embeddings.size(0), ef_construction=400, M=64)
    #  Compute the HNSWLIB index (it may take a while)
    index.add_items(corpus_embeddings.cpu(), list(range(len(corpus_embeddings))))
    # Save the index to a file for future loading
    print(f'Saving index to: {index_path}')
    index.save_index(index_path)

Loading index...


In [12]:
!pip -q install transformers sentencepiece accelerate

In [13]:
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-large", device_map="auto", torch_dtype=torch.float16)

In [14]:
input_text = "Translate the following sentence from English to Romanian: \"Daria is the best and loves to cook\""
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)

output_ids = model.generate(input_ids, max_new_tokens=32)
output_text = tokenizer.decode(output_ids[0])
print(output_text)

<pad> "Daria este cea mai bună şi a o dragoste cu cozine"</s>


In [16]:
import random

random.seed(1995)

idx = random.choice(range(len(squad2_qa)))

sample = squad2_qa[idx]
sample

{'id': '5ad28846d7d075001a429931',
 'title': 'Force',
 'context': "Torque is the rotation equivalent of force in the same way that angle is the rotational equivalent for position, angular velocity for velocity, and angular momentum for momentum. As a consequence of Newton's First Law of Motion, there exists rotational inertia that ensures that all bodies maintain their angular momentum unless acted upon by an unbalanced torque. Likewise, Newton's Second Law of Motion can be used to derive an analogous equation for the instantaneous angular acceleration of the rigid body:",
 'question': "Which of Newton's Laws described a rotational inertia equation?",
 'answers': {'text': [], 'answer_start': []}}

In [17]:
question = sample['question']
question

"Which of Newton's Laws described a rotational inertia equation?"

In [21]:
target_answer = sample['answers']['text']
target_answer

[]

In [22]:
print(f'Question {idx}: {question}?')

Question 11828: Which of Newton's Laws described a rotational inertia equation??


In [24]:
question_embedding = semb_model.encode(question, convert_to_tensor=True)

In [25]:
corpus_ids, distances = index.knn_query(question_embedding.cpu(), k=64)
scores = 1 - distances

print("Cosine similarity model search results")
print(f"Query: \"{question}\"")
print("---------------------------------------")
for idx, score in zip(corpus_ids[0][:5], scores[0][:5]):
    print(f"Score: {score:.4f}\nDocument: \"{passages[idx]}\"\n\n")

Cosine similarity model search results
Query: "Which of Newton's Laws described a rotational inertia equation?"
---------------------------------------
Score: 0.5642
Document: "The formula invented by Newton is called the "Law of gravitation"."


Score: 0.5627
Document: "Isaac Newton developed three laws of motion that are fundamental to dynamics."


Score: 0.5608
Document: "Moment of inertia (formula_1), also called "angular mass" (kg·m), is the inertia of a rotating body with respect to its rotation."


Score: 0.5606
Document: "Rotation is the movement of an object in a circular motion."


Score: 0.5559
Document: "Inertia is the resistance of the object to any change in its motion, including a change in direction. An object will stay still or keep moving at the same speed and in a straight line, unless it is acted upon by an external unbalanced force."




In [26]:
import numpy as np

model_inputs = [(question, passages[idx]) for idx in corpus_ids[0]]
cross_scores = xenc_model.predict(model_inputs)

print("Cross-encoder model re-ranking results")
print(f"Query: \"{question}\"")
print("---------------------------------------")
for idx in np.argsort(-cross_scores)[:5]:
    print(f"Score: {cross_scores[idx]:.4f}\nDocument: \"{passages[corpus_ids[0][idx]]}\"\n\n")

Cross-encoder model re-ranking results
Query: "Which of Newton's Laws described a rotational inertia equation?"
---------------------------------------
Score: 2.1127
Document: "In 1687 Isaac Newton published the "Principia". He included a proof that a rotating self-gravitating fluid body in equilibrium takes the form of an oblate ellipsoid of revolution (a spheroid). The amount of flattening depends on the density and the balance of gravitational force and centrifugal force."


Score: 1.4326
Document: "From a practical point of view, this means that Newton's laws of motion are valid in all inertial systems, which means those at rest or those moving with constant speed relative to one considered at rest. This is the law of inertia: a body at rest continues at rest and a body in motion continues in motion in a straight line unless influenced by an external force. A Galilean coordinate system is one where the law of inertia is valid. The laws of mechanics of Galileo and Newton are valid i

In [27]:
passage_idx = np.argsort(-cross_scores)[0]
passage = passages[corpus_ids[0][passage_idx]]

input_text = f"Given the following passage, answer the related question.\n\nPassage:\n\n{passage}\n\nQ: {question}?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)
print(input_text, "\n")

output_ids = model.generate(input_ids, max_new_tokens=32)
output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(output_text, "\n")

print(f"A (target): {target_answer}")

Given the following passage, answer the related question.

Passage:

In 1687 Isaac Newton published the "Principia". He included a proof that a rotating self-gravitating fluid body in equilibrium takes the form of an oblate ellipsoid of revolution (a spheroid). The amount of flattening depends on the density and the balance of gravitational force and centrifugal force.

Q: Which of Newton's Laws described a rotational inertia equation?? 

Newton's Law of Inertia 

A (target): []


In [32]:
#should be no answer lalalalaaa
input_text = f"Answer the following question.\n\nQ: {question}?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)
print(input_text)

output_ids = model.generate(input_ids, max_new_tokens=32)
output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(f"\nA: {output_text}")

Answer the following question.

Q: Which of Newton's Laws described a rotational inertia equation??

A: law of conservation of mass
