<a href="https://colab.research.google.com/github/Taaniya/exploring-gpt2-language-model/blob/main/Explore_gpt2_for_QA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook explores how to prompt the GPT2 model to generate output as answers to questions given in the prompt. 

This notebook experiments ways to generate valid answer outputs by the model by variety of questions ranging from being vague to also including some context before asking questions from within the context using samples from SQUAD dataset.

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
pip install transformers

In [3]:
import transformers
from transformers import pipeline
from transformers import GPT2TokenizerFast

In [4]:
transformers.__version__

'4.27.1'

In [None]:
# Load text generation pipeline 

model = pipeline(task='text-generation', model="gpt2")
gpt2_tokenizer = GPT2TokenizerFast.from_pretrained('distilgpt2')
gpt2_tokenizer.vocab_size

In [6]:
# Helper function to get model's completion output using greedy decoding strategy

def getModelCompletions(prefix, max_len=500):
  text = model(prefix, max_length=max_len, no_repeat_ngram_size=2, 
                        pad_token_id=gpt2_tokenizer.eos_token_id, 
                        do_sample=False, return_full_text=False)[0]
  print(f"output - {text['generated_text']}")

**Let's use a usual prompt to generate completion by the model that contains a desired answer.**

In [7]:
prefix = "Pythagoras theorem was discovered by"

In [12]:
getModelCompletions(prefix, max_len=30)

output -  the Greek mathematician Pythagorus in the year 476. The theorem states that the number of possible numbers is


**Let's ask questions directly**

In [13]:
question = "Who discovered the Pythagoras theorem?"
getModelCompletions(prefix, max_len=30)

output -  the Greek mathematician Pythagorus in the year 476. The theorem states that the number of possible numbers is


**Works! Even the response is completely the same. The question may be of a topic very common and properly learnt by the model during pre-training. Let's try with questions with uncommon topics.**

**Asking questions from [SQUAD on Normans](https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/Normans.html)**

In [17]:
question = "In what country is Normandy located?"
getModelCompletions(question, max_len=30)

output - 

The Normandy is located in the north of France. The Normandy was founded in 1789 by the French,


**Still working...**

In [20]:
question = "From which countries did the Norse originate?"
getModelCompletions(question, max_len=100)

output - 

The Norse were the first to have a language, and the language of the people was called the "Norse tongue". The Norse language was the most common language in the world, with over 100,000 languages spoken in Europe. The language is still spoken today in many parts of Europe, including the United States.
...
, the word "norse" is a common name for the Nordic people. It is also used in a


**Not quite right..**

In [23]:
question = "What religion were the Normans?"
getModelCompletions(question, max_len=50)

output - 

The Norman religion was the religion of the Anglo-Saxons, and the Norman religion is the belief that the gods are the same as the human beings. The Normaans were a people of great


In [24]:
question = "When were the Normans in Normandy?"
getModelCompletions(question, max_len=50)

output - 

The Norman people were a small group of people who lived in the mountains of France. They were very religious people. The Normannians were not religious.
...
 (The French


**As described in GPT2, providing context to the model can help model generate relevant response in the output.**

In [25]:
context = "The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse ('Norman' comes from 'Norseman') raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants would gradually merge with the Carolingian-based cultures of West Francia. The distinct cultural and ethnic identity of the Normans emerged initially in the first half of the 10th century, and it continued to evolve over the succeeding centuries."

In [26]:
question_prompt = "From which countries did the Norse originate?"
prompt = context + question_prompt
getModelCompletions(prompt, max_len=300)

output -  The Norse were originally from the Scandinavian countries of Norway, Sweden and Denmark. In the early 10 th century they were introduced to the English Isles, where they settled in England and Scotland. From there they spread to Ireland, Scotland and Wales.The Norse also settled on the shores of Africa, the Middle East and the Pacific Ocean. Their first settlement was in Africa in 1520, when they established their first colony in South Africa. By the end of this century the Vikings had settled the continent of South America, but they had not yet settled much of Europe.In the 15th to 16th Centuries, they began to settle in Europe,


In [28]:
question_prompt = "When were the Normans in Normandy?"
prompt = context + question_prompt
getModelCompletions(prompt, max_len=300)

output -  The Norman people lived in a small, isolated area of Normandy. In the early 10 th century the population was about 10,000, but by the end of that century it had grown to about 20,500. By the mid-10 th centuries, the number of Normannians had reached about 30,200.By the middle of this century there were about 40,600 Normandans living in England, Wales and Northern Ireland. These numbers were much higher than the numbers of other European populations.The Norman invasion of England in 1066 was the largest invasion in recorded history. It was a major blow to the Norman empire


In [29]:
question_prompt = "Who was the Norse leader?"
prompt = context + question_prompt
getModelCompletions(prompt, max_len=300)

output -  The Norman leader was a man named Rollos, who was born in Normandy in 1066. He was one of three men who were to become the Norman king. Rollot was also the son of a nobleman named Oskar, the father of King Arthur. Oslar was an important figure in Norman history, as he was responsible for the founding of Norman rule in England. In 1071, Osmond was crowned king of Normandy. His son, King Oth, was named king by the king's son Othin. King Rollott was not a Norman, but he had a strong connection to the culture of his people.


**As mentioned in GPT2 paper, we can make the responses more concise by seeding a few question answer pairs to induce the model into infering this QA task and responding with short answers. Let's include QA pairs of questions which it answered correctly earlier.**

In [65]:
qa_pairs = " Question : In what country is Normandy located? Answer : France | Question: What century did the Normans first gain their separate identity? Answer : 10th century | Question : "

In [48]:
question_prompt = "From which countries did the Norse originate? Answer : "
prompt = context + qa_pairs + question_prompt
getModelCompletions(prompt, max_len=300)

output - Â France,  (Norsk) | �� France (Norway) ��� � ����������� | Answer: _____ ____ ________ ______________________ |
The Norman Conquest of Normandy
In the early 10 th century the Norman invasion of France was a major event in European history. It was the culmination of a long series of events that began in 1066, when the French invaded the Netherlands and the Dutch were


In [49]:
question_prompt = "When were the Normans in Normandy? Answer : "
prompt = context + qa_pairs + question_prompt
getModelCompletions(prompt, max_len=300)

output -  10th-11th Century | Answer: 10-12th Centuries |
Question : What is the origin of Normandy's name? Question 1 : The Norman name is derived from the Latin word for 'Norm' (Norma) which means 'to be' or 'in' in French. This is a common name for the French people of France, which is also the name of a large number of other European countries. In the


In [50]:
question_prompt = "Who was the Norse leader? Answer : "
prompt = context + qa_pairs + question_prompt
getModelCompletions(prompt, max_len=300)

output - Â The Norman King of France, King Arthur, the King Henry VIII of England, Henry VI of Scotland, Edward IV of Ireland, William of Orange, Richard of Cornwall, John of Wales, Thomas of York, George of Saxony, Charles of Gloucester, James of Normandy and William the Conqueror. Answer: The Norman King, Norman of Norway, was a Norman king who ruled over a small island in Normandy. He was known as the 'King


In [51]:
question_prompt = "Who was the Norman leader? Answer : "
prompt = context + qa_pairs + question_prompt
getModelCompletions(prompt, max_len=300)

output - Â King Louis XVI of France

The Norman King of England
.
 (Photo: Wikimedia Commons)
,
-
(Photo : Wikimedia Foundation)


In [55]:
question_prompt = "Which region is Normandy located in? Answer : "
prompt = context + qa_pairs + question_prompt
getModelCompletions(prompt, max_len=300)

output - Â France

The first Norman settlement in Normandy was in 1604, when the Norman Conquest of Normandy took place. It was the largest settlement of its kind in Europe, with a population of about 1,000, including about 2,500 children.
. In 1605, the French and the English conquered Normandy. This was followed by the conquest of England in 1710, which was a major victory for the British Empire. After the war,


In [56]:
question_prompt = "In which century did the Normans gain their identity? Answer :"
prompt = context + qa_pairs + question_prompt
getModelCompletions(prompt, max_len=300)

output -  11 th century

The Norman Conquest of Normandy
 (10th Century)
.
,
:
-
"The first Norman conquest of France was in 10 th cent. and was followed by the Norman invasion of England in 11 cent."
(N.C.A.E. - The Norman Invasion of Europe - A History of Norman France - by William H. Houghton, p. 5) The Norman


In [58]:
# A question with long desriptive answer

question_prompt = "Who are the Norse? Answer : "
prompt = context + qa_pairs + question_prompt
getModelCompletions(prompt, max_len=300)

output - Â The Norman people of Normandy were a group of people from the North of France who were known as the 'Normans' (or 'Omen'). They lived in a small, isolated area of land in Normandy. Their culture was very different from that of other people in Europe. In the early 10 th century the Norman people were divided into two groups: the Omen and the Franks. These two peoples were not very friendly to each other, but they were


**Providing a new context**

In [60]:
context2 = "The Norman dynasty had a major political, cultural and military impact on medieval Europe and even the Near East. The Normans were famed for their martial spirit and eventually for their Christian piety, becoming exponents of the Catholic orthodoxy into which they assimilated. They adopted the Gallo-Romance language of the Frankish land they settled, their dialect becoming known as Norman, Normaund or Norman French, an important literary language. The Duchy of Normandy, which they formed by treaty with the French crown, was a great fief of medieval France, and under Richard I of Normandy was forged into a cohesive and formidable principality in feudal tenure. The Normans are noted both for their culture, such as their unique Romanesque architecture and musical traditions, and for their significant military accomplishments and innovations. Norman adventurers founded the Kingdom of Sicily under Roger II after conquering southern Italy on the Saracens and Byzantines, and an expedition on behalf of their duke, William the Conqueror, led to the Norman conquest of England at the Battle of Hastings in 1066. Norman cultural and military influence spread from these new European centres to the Crusader states of the Near East, where their prince Bohemond I founded the Principality of Antioch in the Levant, to Scotland and Wales in Great Britain, to Ireland, and to the coasts of north Africa and the Canary Islands."


In [66]:
question_prompt = "What religion were the Normans? Answer :"
prompt2 = context2 + qa_pairs + question_prompt
getModelCompletions(prompt2, max_len=350)

output -  Protestant | Answer: Catholic |

The Norman religion was founded by the Franks in 1150, after the death of King Richard II of France


In [67]:
question_prompt = "Which region was Norman dynasty in? Answer :"
prompt2 = context2 + qa_pairs + question_prompt
getModelCompletions(prompt2, max_len=350)

output -  North America | Answer: Europe |

The Norman Empire was founded by the Franks in 1150, the first of which was the Saxons


In [68]:
question_prompt = "Which language did Normans speak? Answer :"
prompt2 = context2 + qa_pairs + question_prompt
getModelCompletions(prompt2, max_len=350)

output -  English | Answer: French |

Question : What is the origin of Norman culture? Question 1 : The Norman language was first spoken in France in


In [70]:
question_prompt = "In which year did battle of Hastings take place? Answer :"
prompt2 = context2 + qa_pairs + question_prompt
getModelCompletions(prompt2, max_len=350)

output -  11th | Answer: In the year 1070 |

The Norman Empire was founded by the Franks in 1150, after


In [74]:
question_prompt = "Which conquerer led the Norman conquest of England in the battle of Hastings? Answer :"
prompt2 = context2 + qa_pairs + question_prompt
getModelCompletions(prompt2, max_len=350)

output -  Richard II | Answer: Richard III |

The Norman Empire was founded by Richard the Great in 11


In [75]:
question_prompt = "Who ruled the duchy of Normandy? Answer :"
prompt2 = context2 + qa_pairs + question_prompt
getModelCompletions(prompt2, max_len=350)

output -  Richard II | Answer: Richard III |

The Norman Empire was founded by Richard the Great in 1112, after the death of


### References
* [GPT2 Paper (2019)](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
* [Pre-train, Prompt and Predict: Survey of prompting methods in Natural Language Processing (2021)](https://arxiv.org/abs/2107.13586)
* [SQUAD samples - Normans](https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/Normans.html)
