## Import libraries

In [1]:
import json
import numpy as np
import random
import torch
import time
import pandas as pd
import gc

Making the notebook deterministic

In [2]:
def fix_random(seed: int) -> None:
    np.random.seed(seed)
    random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)

    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True


fix_random(seed=42)

In [3]:
data_path = "/kaggle/input/squad0"

This function loads a little SQuAD dataset for evaluation.

In [4]:
def load(path):
    data = []
    with open(f"{data_path}/squad_val_1k.jsonl", "r") as file:
          for line in file:
              try:
                  data.append(json.loads(line))
              except Exception as e:
                  print("json processing exception", e)
                  continue
    return data

In [5]:
data = load('squad_val_1k.jsonl')
print(data[0])

{'context': 'The Panthers beat the Seattle Seahawks in the divisional round, running up a 31–0 halftime lead and then holding off a furious second half comeback attempt to win 31–24, avenging their elimination from a year earlier. The Panthers then blew out the Arizona Cardinals in the NFC Championship Game, 49–15, racking up 487 yards and forcing seven turnovers.', 'prompt': 'How many yards did the Panthers get for the division championshipt game?', 'response': '487'}


This functions necessary for evaluating the models. More description below.

In [6]:
#This function generate the prompt. If we set the few_shot parameter to True, then it will generate
#a prompt, which uses few-shot prompting technique, otherwise not. Provide a possibility for two-shot
#prompting, because the three-shot prompting generate so long contexts, that these are longer
#than the max_context_window of the model.
def generate_prompt(few_shot, question_with_context):
    questions = []
    if few_shot == 'two-shot':
        two_shot_prompting = [
        {
            "context" : "On April 12, 1961, Soviet cosmonaut Yuri Gagarin became the first person to fly in space, reinforcing American fears about being left behind in a technological competition with the Soviet Union. At a meeting of the US House Committee on Science and Astronautics one day after Gagarin's flight, many congressmen pledged their support for a crash program aimed at ensuring that America would catch up. Kennedy was circumspect in his response to the news, refusing to make a commitment on America's response to the Soviets.",
            "question": "How many days after Gagarin's flight did the US House Committee on Science and Astronautics meet?",
            "answer": "one day",
        },
        {
            "context" : "The customary law of Normandy was developed between the 10th and 13th centuries and survives today through the legal systems of Jersey and Guernsey in the Channel Islands. Norman customary law was transcribed in two customaries in Latin by two judges for use by them and their colleagues: These are the Très ancien coutumier (Very ancient customary), authored between 1200 and 1245; and the Grand coutumier de Normandie (Great customary of Normandy, originally Summa de legibus Normanniae in curia laïcali), authored between 1235 and 1245.",
            "question": "Where are Jersey and Guernsey?",
            "answer": "Channel Islands",
        }
        ]
        for item in question_with_context:
            prompt = f"You are a question answering bot. Your task is to answer the questions based on the appropriate contexts and your own knowledge. Your answers should contain only the most important things and should be as short as possible."
            prompt = f"{prompt}\n\n" + "\n\n".join([f"{p['context']}\n\nQ: {p['question']}\n\nA: {p['answer']}" for p in two_shot_prompting])
            prompt = f"{prompt}\n\n{item}\n\nA:"
            questions.append(prompt)
    elif few_shot == 'three-shot':
        three_shot_prompting = [
        {
            "context" : "In 1993, the FCC repealed the Financial Interest and Syndication Rules, once again allowing networks to hold interests in television production studios. That same year, Capital Cities/ABC purchased the French animation studio DIC Entertainment; it also signed an agreement with Time Warner Cable to carry its owned-and-operated television stations on the provider's systems in ABC O&O markets. By that year, ABC had a total viewership share of 23.63% of American households, just below the limit of 25% imposed by the FCC.",
            "question": "What French animation studio did ABC purchase in 1993?",
            "answer": "DIC Entertainment",
        },
        {
            "context" : "On April 12, 1961, Soviet cosmonaut Yuri Gagarin became the first person to fly in space, reinforcing American fears about being left behind in a technological competition with the Soviet Union. At a meeting of the US House Committee on Science and Astronautics one day after Gagarin's flight, many congressmen pledged their support for a crash program aimed at ensuring that America would catch up. Kennedy was circumspect in his response to the news, refusing to make a commitment on America's response to the Soviets.",
            "question": "How many days after Gagarin's flight did the US House Committee on Science and Astronautics meet?",
            "answer": "one day",
        },
        {
            "context" : "The customary law of Normandy was developed between the 10th and 13th centuries and survives today through the legal systems of Jersey and Guernsey in the Channel Islands. Norman customary law was transcribed in two customaries in Latin by two judges for use by them and their colleagues: These are the Très ancien coutumier (Very ancient customary), authored between 1200 and 1245; and the Grand coutumier de Normandie (Great customary of Normandy, originally Summa de legibus Normanniae in curia laïcali), authored between 1235 and 1245.",
            "question": "Where are Jersey and Guernsey?",
            "answer": "Channel Islands",
        }
        ]
        for item in question_with_context:
            prompt = f"You are a question answering bot. Your task is to answer the questions based on the appropriate contexts and your own knowledge. Your answers should contain only the most important things and should be as short as possible."
            prompt = f"{prompt}\n\n" + "\n\n".join([f"{p['context']}\n\nQ: {p['question']}\n\nA: {p['answer']}" for p in three_shot_prompting])
            prompt = f"{prompt}\n\n{item}\n\nA:"
            questions.append(prompt)
    else:
        for item in question_with_context:
            prompt = f"You are a question answering bot. Your task is to answer the questions based on the appropriate contexts and your own knowledge. Your answers should contain only the most important things and should be as short as possible."
            prompt = f"{prompt}\n\n{item}\n\nA:"
            questions.append(prompt)
    return questions

#This function generate the text for the prompt.
def run_with_SQuAD(model, tokenizer, question, few_shot):
    prompt = generate_prompt(few_shot, question)
    
    inputs = tokenizer(prompt, padding=True, truncation=True, return_tensors="pt").to(0)
    model.to(0)
    
    out = model.generate(
        input_ids=inputs['input_ids'],
        max_new_tokens= 128,
        attention_mask=inputs['attention_mask'],
        eos_token_id=tokenizer.eos_token_id
    )
    
    answers = []
    num_tokens = []
    
    decoded = tokenizer.batch_decode(out)

    for decodedText, promptText in zip(decoded, prompt):
        cleaned = decodedText.replace("<|endoftext|>", "")
        num_tokens.append(len(cleaned) - len(promptText))
        cleaned = cleaned.replace(promptText, "")
        answer = cleaned.split("\n\n")[0].strip()
        answers.append(answer)

    return answers, num_tokens

#In this function we making the evaluation for every single datapoint. During the generation we
#are making a statistics about the generation.
def eval(data, model, tokenizer, output_file, few_shot, batch_size):
    numberOfTokens = 0
    timeOfStart = 0
    results = []
    total_qs = len(data)
    correct = 0
    i = 0

    for batch_start in range(0, total_qs, batch_size):
        batch_end = min(batch_start + batch_size, total_qs)
        batch_data = data[batch_start:batch_end] 

        # Prepare batch inputs
        questions = [item['prompt'] for item in batch_data]
        contexts = [item['context'] for item in batch_data]
        answers = [item['response'] for item in batch_data]
        inputs = [f"Question: {que}\n\nContext: {ctx}\n\nQuestion: {que}" for ctx, que in zip(contexts, questions)]
        
        # Generate outputs in a batch
        start_time = time.time()
        batch_guesses, batch_num_tokens = run_with_SQuAD(model, tokenizer, inputs, few_shot)
        end_time = time.time()
        timeOfStart += end_time - start_time

        # Process results 
        for guess, num_tokens, answer, question, context in zip(batch_guesses, batch_num_tokens, answers, questions, contexts):
            if guess and guess[-1] == '.' and answer[-1] != '.':
                guess = guess[:-1]
            numberOfTokens += num_tokens
            tkps = num_tokens / (end_time - start_time)
            is_correct = (answer.strip().lower() == guess.strip().lower())
            print(f"Question {i+1}/{total_qs}")
            print(f"Q: {question}")
            print(f"A: {answer}")
            print(f"?: {guess}")
            if is_correct:
                print(f"✅")
                correct += 1
            else:
                print(f"❌")
            print("="*80)
            result = {
                "idx": i,
                "question": question,
                "context": context,
                "answer": answer,
                "guess": guess,
                "is_correct": is_correct,
                "time": end_time - start_time,
                "num_tokens": num_tokens,
                "tokens_per_sec": tkps
            }
            results.append(result)
            i += 1
            
            if len(results) % 20 == 0:
                write_results(results, output_file)

    print(f"Accuracy: {correct / total_qs * 100}% -- {correct} correct and {total_qs - correct} incorrect")
    print(f"Number of tokens generated: {numberOfTokens} -- Time: {timeOfStart} -- Tokens-Per-Sec: {numberOfTokens / timeOfStart}")
    
#This function writes out the results.
def write_results(results, output_file):
    df = pd.DataFrame(results)
    df = df[["idx", "question", "context", "answer", "guess", "is_correct", "time", "num_tokens", "tokens_per_sec"]]
    print(f"Writing {output_file}")
    df.to_json(output_file, orient="records", lines=True)

## Transformer-based LLM - GPT2 124m model

In [7]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer_gpt2 = GPT2Tokenizer.from_pretrained('gpt2', padding_side='left')
model_gpt2 = GPT2LMHeadModel.from_pretrained('gpt2')
tokenizer_gpt2.eos_token = "<|endoftext|>"
tokenizer_gpt2.pad_token = tokenizer_gpt2.eos_token

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [8]:
eval(data[:100], model_gpt2, tokenizer_gpt2, "evalGPT2_124M.json", few_shot="none", batch_size=32)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 1/100
Q: How many yards did the Panthers get for the division championshipt game?
A: 487
?: The Panthers beat the Seattle Seahawks in the divisional round, running up a 31–0 halftime lead and then holding off a furious second half comeback attempt to win 31–24, avenging their elimination from a year earlier. The Panthers then blew out the Arizona Cardinals in the NFC Championship Game, 49–15, racking up 487 yards and forcing seven turnovers
❌
Question 2/100
Q: What caused Jacksonville's tourism to become less desirable at the latter half of the 19th century?
A: yellow fever outbreaks
?: The first major change in the city's tourism was the construction of the new downtown. The new downtown was built on the site of the former Confederate Soldiers and Sailors Home. The new downtown was built on the site of the former Confederate Soldiers and Sailors Home. The new downtown was built on the site of the former Confederate Soldiers and Sailors Home. The new downtown was built on the 

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 33/100
Q: What type of materials inside the cabin were removed to help prevent more fire hazards in the future?
A: flammable cabin and space suit materials
?: The Block II crew members would wear modified, fire-resistant Block II space suits, and would be designated by the Block II titles, regardless of whether a LM was present on the flight or not
❌
Question 34/100
Q: How many nations contain "Amazonas" in their names?
A: States or departments in four nations contain "Amazonas" in their names.
?: The Amazon is a tropical rainforest, with a total area of over 1,000 square kilometres (1,000 sq mi). The Amazon is a tropical rainforest with a total area of over 1,000 square kilometres (1,000 sq mi). The Amazon is a tropical rainforest with a total area of over 1,000 square kilometres (1,000 sq mi).
❌
Question 35/100
Q: What was Tesla's position with Westinghouse?
A: consultant
?: Tesla was a pioneer in the electric motor industry. He was the first to develop a motor that could be

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 65/100
Q: What ABC action series went up against NBC's variety shows in Fall 1957?
A: Zorro
?: The first series of the series, "The Big Bang Theory", was broadcast on ABC in the fall of 1957. The second series, "The Big Bang Theory: The Next Generation", was broadcast on ABC in the fall of 1958. The third series, "The Big Bang Theory: The Next Generation: The Next Generation", was broadcast on ABC in the fall of 1959
❌
Question 66/100
Q: What was the significance of victory at Forth Niagara for British?
A: cut off the French frontier forts further to the west and south
?: The victory at Forth Niagara was a major victory for the British. The British were able to hold the French and French-speaking territories of the North Atlantic for a long time, and they were able to hold the British-speaking territories of the South Atlantic for a long time. The British were able to hold the British-speaking territories of the South Atlantic for a long time, and they were able to hold the Br

In [9]:
eval(data[:100], model_gpt2, tokenizer_gpt2, "evalGPT2_124M_few_shot.json", few_shot="two-shot", batch_size=32)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 1/100
Q: How many yards did the Panthers get for the division championshipt game?
A: 487
?: The Panthers beat the Seattle Seahawks in the divisional round, running up a 31–0 halftime lead and then holding off a furious second half comeback attempt to win 31–24, avenging their elimination from a year earlier. The Panthers then blew out the Arizona Cardinals in the NFC Championship Game, 49–15, racking up 487 yards and forcing seven turnovers. Question: How many yards did the Panthers get for the division championshipt game?
❌
Question 2/100
Q: What caused Jacksonville's tourism to become less desirable at the latter half of the 19th century?
A: yellow fever outbreaks
?: The Great Depression
❌
Question 3/100
Q: What are pharmacists in the United Kingdom being increasingly paid for?
A: medicine use reviews
?: The United Kingdom has a long history of pharmacists in the United Kingdom. In the early 19th century, the British Medical Association (BMA) was established to provide pharm

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 33/100
Q: What type of materials inside the cabin were removed to help prevent more fire hazards in the future?
A: flammable cabin and space suit materials
?: The Block II design already called for replacement of the Block I plug-type hatch cover with a quick-release, outward opening door. NASA discontinued the manned Block I program, using the Block I spacecraft only for unmanned Saturn V flights. Crew members would also exclusively wear modified, fire-resistant Block II space suits, and would be designated by the Block II titles, regardless of whether a LM was present on the flight or not
❌
Question 34/100
Q: How many nations contain "Amazonas" in their names?
A: States or departments in four nations contain "Amazonas" in their names.
?: The Amazon is a tropical rainforest, with a total area of over 1,000 square kilometres (1,000 sq mi). The Amazon is a tropical rainforest with a total area of over 1,000 square kilometres (1,000 sq mi). The Amazon is a tropical rainforest wi

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 65/100
Q: What ABC action series went up against NBC's variety shows in Fall 1957?
A: Zorro
?: The ABC action series, which was broadcast on the ABC-owned Channel 4, was a series of short stories that were based on the popular television series "The Good Wife". The series was based on the popular television series "The Good Wife", which was broadcast on the ABC-owned Channel 4, and was based on the popular television series "The Good Wife", which was broadcast on the ABC-owned Channel 4. The series was based on the popular television series "The Good Wife", which was broadcast on the ABC-owned Channel 4, and was based on the popular television series "The Good Wife", which was broadcast on the ABC
❌
Question 66/100
Q: What was the significance of victory at Forth Niagara for British?
A: cut off the French frontier forts further to the west and south
?: The British victory at Forth Niagara was the first major victory for the British in the Annus Mirabilis of 1759. The British w

In [10]:
model_gpt2 = None
tokenizer_gpt2 = None

gc.collect()

30

## Transformer based LLM - GPT-Neo 2.7B 

In [7]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer_gpt = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B", padding_side='left')
model_gpt = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-2.7B")
tokenizer_gpt.eos_token = "<|endoftext|>"
tokenizer_gpt.pad_token = tokenizer_gpt.eos_token

tokenizer_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/10.7G [00:00<?, ?B/s]

In [12]:
eval(data[:100], model_gpt, tokenizer_gpt, "evalGPT_Neo.json", few_shot="none", batch_size=4)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 1/100
Q: How many yards did the Panthers get for the division championshipt game?
A: 487
?: The Panthers got 487 yards for the divisional round game
❌
Question 2/100
Q: What caused Jacksonville's tourism to become less desirable at the latter half of the 19th century?
A: yellow fever outbreaks
?: The city's tourism was dealt major blows in the late 19th century by yellow fever outbreaks. In addition, extension of the Florida East Coast Railway further south drew visitors to other areas
❌
Question 3/100
Q: What are pharmacists in the United Kingdom being increasingly paid for?
A: medicine use reviews
?: The remuneration of pharmacists in the United Kingdom is increasing. In the United Kingdom, pharmacists are being paid for their services in the following ways:
❌
Question 4/100
Q: Geoglyphs dating to what period were found in deforested land along the Amazon River?
A: AD 0–1250
?: The first European to travel the length of the Amazon River was Francisco de Orellana in 1542. The

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 5/100
Q: Who does a gender pay gap tend to favor?
A: males in the labor market
?: Women
❌
Question 6/100
Q: What is the annual construction industry revenue in 2014?
A: $960 billion
?: $960 billion
✅
Question 7/100
Q: Which sculpture by Michelangelo has a full-size replica in the Cast Courts?
A: David
?: The David
❌
Question 8/100
Q: Richard Allen and Absalom Jones were licensed by St. George's Church in what year?
A: 1784
?: 1784
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 9/100
Q: Where does the Rhine end?
A: Hoek van Holland
?: The Rhine ends at the confluence of the Rhine and Moselle rivers in the French city of Strasbourg
❌
Question 10/100
Q: The V&A has the world's most comprehensive collection of sculptures from which period?
A: post-classical European
?: The V&A has the world's most comprehensive collection of sculptures from which period?
❌
Question 11/100
Q: Which country rationed gasoline and heating gas?
A: Sweden
?: The UK
❌
Question 12/100
Q: The United States is divided into how many jurisdictions?
A: five
?: There are five jurisdictions: Northeastern, Southeastern, North Central, South Central and Western
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 13/100
Q: What is an example of a machine model that deviates from a generally accepted multi-tape Turing machine?
A: random access machines
?: 
❌
Question 14/100
Q: What company developed the most successful steam engine indicator?
A: Charles Porter
?: The most useful instrument for analyzing the performance of steam engines is the steam engine indicator. Early versions were in use by 1851, but the most successful indicator was developed for the high speed engine inventor and manufacturer Charles Porter by Charles Richard and exhibited at London Exhibition in 1862. The steam engine indicator traces on paper the pressure in the cylinder throughout the cycle, which can be used to spot various problems and calculate developed horsepower. It was routinely used by engineers, mechanics and insurance inspectors. The engine indicator can also be used on internal combustion engines. See image of indicator diagram below (in Types of motor units section)
❌
Question 15/100
Q: Who dates r

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 17/100
Q: What is the second most abundant element?
A: helium
?: Oxygen is the second most abundant element in the universe, after hydrogen. At standard temperature and pressure, two atoms of the element bind to form dioxygen, a colorless and odorless diatomic gas with the formula O
2. Diatomic oxygen gas constitutes 20.8% of the Earth's atmosphere. However, monitoring of atmospheric oxygen levels show a global downward trend, because of fossil-fuel burning. Oxygen is the most abundant element by mass in the Earth's crust as part of oxide compounds such as silicon dioxide, making up almost half of the crust's mass
❌
Question 18/100
Q: what was invented in 1880 that revolutionized warfare?
A: the machine gun
?: The machine gun
✅
Question 19/100
Q: When did Watt finish the development of his improvements to Newcomen's engine?
A: 1775
?: Watt's improvements to Newcomen's engine were not complete until 1775
❌
Question 20/100
Q: What is the largest item from Italy that is part of t

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 21/100
Q: Antibodies are transferred to the gut of the infant through what means?
A: Breast milk or colostrum
?: Antibodies are transferred to the gut of the infant through what means?
❌
Question 22/100
Q: Which century was there a program to straighten the Rhine? 
A: 19th Century
?: The Rhine was straightened in the 19th Century
❌
Question 23/100
Q: Where was the first horse racetrack located?
A: Pole Mokotowskie
?: In the city of Praga, in the area of the Praga Park
❌
Question 24/100
Q: What did the Islamic State proclaim itself in 2014?
A: a caliphate
?: The Islamic State is a terrorist organization that has declared itself a caliphate, or Islamic state, and has declared its intention to impose Islamic law on all Muslims worldwide
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 25/100
Q: What plants don't need light to make chloroplasts?
A: Gymnosperms
?: 
❌
Question 26/100
Q: Where was Dyrrachium located?
A: the Adriatic
?: Dyrrachium was located in the northern part of the Peloponnese, on the coast of the Gulf of Corinth. It was a small town, with a population of about 10,000
❌
Question 27/100
Q: What is the expression used to denote a worst case complexity as expressed by time taken?
A: O(n2)
?: The worst case is when the input is sorted or sorted in reverse order, and the algorithm takes time O(n2) for this case
❌
Question 28/100
Q: How would the capabilities approach achieve it's goal?
A: through increasing functionings
?: The capabilities approach would achieve its goal by increasing the freedom to enjoy functionings and capabilities
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 29/100
Q: Within what variable is L constrained according to the space hierarchy theorem?
A: PSPACE
?: The space hierarchy theorem tells us that L is strictly contained in PSPACE
❌
Question 30/100
Q: What are the "Big Five" animals in Kenya?
A: lion, leopard, buffalo, rhinoceros, and elephant
?: The "Big Five" game animals of Africa, that is the lion, leopard, buffalo, rhinoceros, and elephant, can be found in Kenya and in the Masai Mara in particular. A significant population of other wild animals, reptiles and birds can be found in the national parks and game reserves in the country. The annual animal migration occurs between June and September with millions of animals taking part, attracting valuable foreign tourism. Two million wildebeest migrate a distance of 2,900 kilometres (1,802 mi) from the Serengeti in neighbouring Tanzania to the Masai Mara in Kenya, in a constant
❌
Question 31/100
Q: What is another way of referring to stators?
A: static discs
?: A stator is a sta

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 33/100
Q: What type of materials inside the cabin were removed to help prevent more fire hazards in the future?
A: flammable cabin and space suit materials
?: The cabin was designed to be fireproof, and the materials used to make it fireproof were selected to be fireproof. The materials used to make the cabin fireproof were selected to be fireproof
❌
Question 34/100
Q: How many nations contain "Amazonas" in their names?
A: States or departments in four nations contain "Amazonas" in their names.
?: 
❌
Question 35/100
Q: What was Tesla's position with Westinghouse?
A: consultant
?: Tesla was hired by Westinghouse to design and build a new type of induction motor that would be used in the new Westinghouse electric power plant. Tesla's design was based on the principle of the "magneto-hydrodynamic" (MHD) motor, which was a new type of motor that used a rotating magnetic field to generate a rotating electric field. Tesla's design was a "rotating magnetic field" motor, which was a n

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 37/100
Q: What is the term for a task that generally lends itself to being solved by a computer?
A: computational problems
?: A problem is a task that is in principle amenable to being solved by a computer
❌
Question 38/100
Q: What was happening to subscriber numbers in other areas of europe?
A: flattened
?: The number of subscribers to BSkyB's direct-to-home satellite service in the UK has reached 25m, the broadcaster said on Wednesday
❌
Question 39/100
Q: Dutch architect Janjaap Ruijssenaars's performative architecture 3D-printed building is scheduled to be built when?
A: Working versions of 3D-printing building technology are already printing
?: 2014
❌
Question 40/100
Q: When was the most recent Super Bowl hosted in the South Florida/Miami area?
A: 2010
?: The league announced on October 16, 2012, that the two finalists were Sun Life Stadium and Levi's Stadium. The South Florida/Miami area has previously hosted the event 10 times (tied for most with New Orleans), with the m

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 41/100
Q: The atomic number of the periodic table for oxygen?
A: 8
?: 8
✅
Question 42/100
Q: How were enemy prisoners used tactically by Mongol armies?
A: driving them in front of the army
?: The Mongol military was also successful in siege warfare, cutting off resources for cities and towns by diverting certain rivers, taking enemy prisoners and driving them in front of the army, and adopting new ideas, techniques and tools from the people they conquered, particularly in employing Muslim and Chinese siege engines and engineers to aid the Mongol cavalry in capturing cities. Another standard tactic of the Mongol military was the commonly practiced feigned retreat to break enemy formations and to lure small enemy groups away from the larger group and defended position for ambush and counterattack
❌
Question 43/100
Q: What is the CJEU's duty?
A: ensure that in the interpretation and application of the Treaties the law is observed
?: The CJEU's duty is to ensure that in the interp

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 45/100
Q: When did the ban on cigarette advertising take effect for television networks?
A: January 2, 1971
?: The ban on cigarette advertising took effect on January 2, 1971
❌
Question 46/100
Q: In what form is oxygen transported in smaller containers?
A: compressed gas
?: Liquid oxygen is transported in bulk as a liquid in specially insulated tankers, since one liter of liquefied oxygen is equivalent to 840 liters of gaseous oxygen at atmospheric pressure and 20 °C (68 °F). Such tankers are used to refill bulk liquid oxygen storage containers, which stand outside hospitals and other institutions with a need for large volumes of pure oxygen gas. Liquid oxygen is passed through heat exchangers, which convert the cryogenic liquid into gas before it enters the building. Oxygen is also stored and shipped in smaller cylinders containing the compressed gas; a form that is useful in certain portable medical applications and oxy-
❌
Question 47/100
Q: When rich countries trade with po

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 49/100
Q: In what year did the NFL switch to a 16-game regular season?
A: 1978
?: In what year did the NFL switch to a 16-game regular season?
❌
Question 50/100
Q: What does computational complexity theory most specifically seek to answer? 
A: computational problems
?: 
❌
Question 51/100
Q: What was the final score of Super Bowl 50? 
A: 24–10
?: 24–10
✅
Question 52/100
Q: What did Luther call the mass instead of sacrifice?
A: a gift
?: The Mass is a sacrifice
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 53/100
Q: What did Queen Elizabeth II open in Newcastle in 1981?
A: A bridge
?: The Queen opened the Tyne and Wear Metro in 1981
❌
Question 54/100
Q: What kind of disorders are the result of an overactive immune response?
A: autoimmune disorders
?: The immune system is a complex system that is responsible for protecting the body from foreign invaders. It is also responsible for recognizing and eliminating self-antigens
❌
Question 55/100
Q: Where hotel did the Panthers stay at?
A: San Jose Marriott.
?: The Panthers stayed at the San Jose Marriott.
❌
Question 56/100
Q: Why did Westinghouse not secure a patent for a similar motor?
A: decided Tesla's patent would probably control the market
?: Westinghouse did not secure a patent for a similar motor because it was not a patentable invention
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 57/100
Q: What words to they think are linked to the work Kenya?
A: Kirinyaga, Kirenyaa and Kiinyaa
?: Kenya is a country in East Africa. It is bordered by Uganda to the north, Tanzania to the east, and the Indian Ocean to the south. It is the largest country in Africa and the second largest in the world. It is the most populous country in Africa, with an estimated population of about 30 million people
❌
Question 58/100
Q: When was the first known historical reference to immunity?
A: Athens in 430 BC
?: The first known reference to immunity was during the plague of Athens in 430 BC. Thucydides noted that people who had recovered from a previous bout of the disease could nurse the sick without contracting the illness a second time. In the 18th century, Pierre-Louis Moreau de Maupertuis made experiments with scorpion venom and observed that certain dogs and mice were immune to this venom. This and other observations of acquired immunity were later exploited by Louis Pasteur in h

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 61/100
Q: How to Baptized Members become Professing Members?
A: confirmation and sometimes the profession of faith
?: The UMC practices infant and adult baptism. Baptized Members are those who have been baptized as an infant or child, but who have not subsequently professed their own faith. These Baptized Members become Professing Members through confirmation and sometimes the profession of faith. Individuals who were not previously baptized are baptized as part of their profession of faith and thus become Professing Members in this manner. Individuals may also become a Professing Member through transfer from another Christian denomination
❌
Question 62/100
Q: After asking if the books were his, what else did Eck ask Luther?
A: stood by their contents
?: Luther confirmed he was their author, but requested time to think about the answer to the second question. He prayed, consulted friends, and gave his response the next day:
❌
Question 63/100
Q: When did the Pope warned Luther 

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 65/100
Q: What ABC action series went up against NBC's variety shows in Fall 1957?
A: Zorro
?: The series that ABC's western series went up against in the fall of 1957 were the variety shows that NBC aired in the fall of 1957
❌
Question 66/100
Q: What was the significance of victory at Forth Niagara for British?
A: cut off the French frontier forts further to the west and south
?: The victory at Forth Niagara was a major turning point in the war. It was the first time that the British had been able to cut off the French from the French-Canadian frontier. The victory was also a major turning point in the war because it was the first time that the British had been able to cut off the French from the French-Canadian frontier
❌
Question 67/100
Q: What is the Latin name for Black Death?
A: atra mors
?: Black Death
❌
Question 68/100
Q: What are the phagocytes that are located in tissues in contact with the external environment called?
A: Dendritic cells
?: Dendritic cells (DC) are p

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 69/100
Q: What profession was Nathan Alterman?
A: poet
?: Poet
✅
Question 70/100
Q: What type of vote must the Parliament have to either block or suggest changes to the Commission's proposals?
A: a majority
?: The Parliament must vote by a majority of all MEPs (not just those present) to block or suggest changes, and the Council must vote by qualified majority to approve changes, but by unanimity to block Commission amendment
❌
Question 71/100
Q: What percentage of money raised was earmarked for causes in the San Francisco area?
A: 25 percent
?: The Super Bowl 50 Host Committee has vowed to be "the most giving Super Bowl ever", and will dedicate 25 percent of all money it raises for philanthropic causes in the Bay Area. The committee created the 50 fund as its philanthropic initiative and focuses on providing grants to aid with youth development, community investment and sustainable environments
❌
Question 72/100
Q: In which year was the Musical Instruments gallery closed?
A: 

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 73/100
Q: What scientific field's theory has received contributions from the steam engine?
A: thermodynamic
?: The steam engine contributed much to the development of thermodynamic theory; however, the only applications of scientific theory that influenced the steam engine were the original concepts of harnessing the power of steam and atmospheric pressure and knowledge of properties of heat and steam. The experimental measurements made by Watt on a model steam engine led to the development of the separate condenser. Watt independently discovered latent heat, which was confirmed by the original discoverer Joseph Black, who also advised Watt on experimental procedures. Watt was also aware of the change in the boiling point of water with pressure. Otherwise, the improvements to the engine itself were more mechanical in nature. The therm
❌
Question 74/100
Q: What type of organization would need large quantities of pure oxygen?
A: hospitals
?: An organization that needs large quan

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 77/100
Q: What type of people were being brought back to the way of truth?
A: misguided
?: The people who were brought back to the way of truth were those who had been brought up in the Lutheran church. They were people who had been brought up in the Lutheran church and had been brought up in the Lutheran church and had been brought up in the Lutheran church and had been brought up in the Lutheran church and had been brought up in the Lutheran church and had been brought up in the Lutheran church and had been brought up in the Lutheran church and had been brought up in the Lutheran church and had been brought up in the Lutheran church and had been brought up in the Lutheran church and had been brought up in the Lutheran church and had been
❌
Question 78/100
Q: What did the Greek root pharmakos imply?
A: sorcery or even poison
?: The Greek word pharmakos (φαρμακος) is derived from the Greek word pharma (φάρμακον), which means "drug", "medicine" (or "poison"). The word pharmakos

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 81/100
Q: What ranking does the Super Bowl 50 halftime show have on the list of most watched TV broadcasts?
A: third
?: The Super Bowl 50 halftime show was the third-most watched U.S. broadcast ever
❌
Question 82/100
Q: What music did the fourth soundtrack feature?
A: music from the 2008–2010 specials
?: The soundtrack for Series 5 featured music from the 2008–2010 specials (The Next Doctor to End of Time Part 2)
❌
Question 83/100
Q: What administration did Ludwig Mies van der Rohe designa buiding?
A: School of Social Service Administration
?: Ludwig Mies van der Rohe
❌
Question 84/100
Q: What was Manning's passer rating for the 2015 season?
A: 67.9
?: 67.9
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 85/100
Q: Issues dealt with at Westminster are not ones who is able to deal with?
A: Scottish Parliament
?: The Scottish Parliament is unable to legislate on such issues that are reserved to, and dealt with at, Westminster (and where Ministerial functions usually lie with UK Government ministers). These include abortion, broadcasting policy, civil service, common markets for UK goods and services, constitution, electricity, coal, oil, gas, nuclear energy, defence and national security, drug policy, employment, foreign policy and relations with Europe, most aspects of transport safety and regulation, National Lottery, protection of borders, social security and stability of UK's fiscal, economic and monetary system
❌
Question 86/100
Q: What was the wedding date?
A: 12 May 1191
?: 12 May 1191
✅
Question 87/100
Q: What did the development of this fertile soil provide in hostile environment?
A: agriculture and silviculture
?: The development of this fertile soil allowed agriculture

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 89/100
Q: Which country does the Rhine encounter it's main tributaries?
A: Germany
?: Germany
✅
Question 90/100
Q: What French animation studio did ABC purchase in 1993?
A: DIC Entertainment
?: DIC Entertainment
✅
Question 91/100
Q: What rive joins the Rhine in Duisburg? 
A: Ruhr
?: The Rhine in Duisburg is joined by the Ruhr, which is a tributary of the Rhine. The Ruhr is a tributary of the Rhine, which is a tributary of the Rhine. The Ruhr is a tributary of the Rhine, which is a tributary of the Rhine
❌
Question 92/100
Q: Which animal's venom did Pierre-Louis Moreau de Maupertuis work with?
A: scorpion
?: The venom of the scorpion, which is a type of arthropod
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 93/100
Q: In the capabilities approach, grow and income are considered a means to an end rather than what?
A: the end itself
?: In the capabilities approach, income is considered a means to an end rather than what?
❌
Question 94/100
Q: Where are Jersey and Guernsey
A: Channel Islands
?: Jersey and Guernsey are two of the Channel Islands. They are both located in the English Channel, between England and France
❌
Question 95/100
Q: What was the boat called?
A: teleautomaton
?: The boat was called the "Tesla-Boat."
❌
Question 96/100
Q: What did the SNP publicly opine about the oil revenues?
A: not benefitting Scotland as much as they should
?: The SNP argued that the revenues from the oil were not benefitting Scotland as much as they should. The combined effect of these events led to Prime Minister Wilson committing his government to some form of devolved legislature in 1974. However, it was not until 1978 that final legislative proposals for a Scottish Assembly were passed by th

In [8]:
eval(data[:100], model_gpt, tokenizer_gpt, "evalGPT_Neo.json", few_shot="three-shot", batch_size=2)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 1/100
Q: How many yards did the Panthers get for the division championshipt game?
A: 487
?: 487 yards
❌
Question 2/100
Q: What caused Jacksonville's tourism to become less desirable at the latter half of the 19th century?
A: yellow fever outbreaks
?: Yellow fever outbreaks
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 3/100
Q: What are pharmacists in the United Kingdom being increasingly paid for?
A: medicine use reviews
?: pharmacists in the United Kingdom are being increasingly paid for their services
❌
Question 4/100
Q: Geoglyphs dating to what period were found in deforested land along the Amazon River?
A: AD 0–1250
?: AD 0–1250
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 5/100
Q: Who does a gender pay gap tend to favor?
A: males in the labor market
?: Men
❌
Question 6/100
Q: What is the annual construction industry revenue in 2014?
A: $960 billion
?: $960 billion
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 7/100
Q: Which sculpture by Michelangelo has a full-size replica in the Cast Courts?
A: David
?: David
✅
Question 8/100
Q: Richard Allen and Absalom Jones were licensed by St. George's Church in what year?
A: 1784
?: 1784
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 9/100
Q: Where does the Rhine end?
A: Hoek van Holland
?: The Rhine ends at the North Sea
❌
Question 10/100
Q: The V&A has the world's most comprehensive collection of sculptures from which period?
A: post-classical European
?: The V&A has the world's most comprehensive collection of sculptures from which period?
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 11/100
Q: Which country rationed gasoline and heating gas?
A: Sweden
?: The UK
❌
Question 12/100
Q: The United States is divided into how many jurisdictions?
A: five
?: five
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 13/100
Q: What is an example of a machine model that deviates from a generally accepted multi-tape Turing machine?
A: random access machines
?: A Turing machine with a single tape and a single head
❌
Question 14/100
Q: What company developed the most successful steam engine indicator?
A: Charles Porter
?: Charles Richard
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 15/100
Q: Who dates rocks, precisely, within the stratigraphic section?
A: Geochronologists
?: geochronologists
✅
Question 16/100
Q: Who formulated the idea of clonal selection theory of immunity?
A: Frank Burnet
?: Frank Burnet
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 17/100
Q: What is the second most abundant element?
A: helium
?: Oxygen
❌
Question 18/100
Q: what was invented in 1880 that revolutionized warfare?
A: the machine gun
?: the machine gun
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 19/100
Q: When did Watt finish the development of his improvements to Newcomen's engine?
A: 1775
?: 1775
✅
Question 20/100
Q: What is the largest item from Italy that is part of the sculpture collection?
A: Chancel Chapel
?: The Chancel Chapel from Santa Chiara Florence
❌
Writing evalGPT_Neo.json


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 21/100
Q: Antibodies are transferred to the gut of the infant through what means?
A: Breast milk or colostrum
?: through breast milk
❌
Question 22/100
Q: Which century was there a program to straighten the Rhine? 
A: 19th Century
?: 19th Century
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 23/100
Q: Where was the first horse racetrack located?
A: Pole Mokotowskie
?: in Praga
❌
Question 24/100
Q: What did the Islamic State proclaim itself in 2014?
A: a caliphate
?: The Islamic State
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 25/100
Q: What plants don't need light to make chloroplasts?
A: Gymnosperms
?: Gymnosperms
✅
Question 26/100
Q: Where was Dyrrachium located?
A: the Adriatic
?: Dyrrachium
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 27/100
Q: What is the expression used to denote a worst case complexity as expressed by time taken?
A: O(n2)
?: The worst case is when the input is sorted or sorted in reverse order, and the algorithm takes time O(n2) for this case. If we assume that all possible permutations of the input list are equally likely, the average time taken for sorting is O(n log n). The best case occurs when each pivoting divides the list in half, also needing O(n log n) time
❌
Question 28/100
Q: How would the capabilities approach achieve it's goal?
A: through increasing functionings
?: It would increase the freedom to enjoy functionings and increase the level of achieved well-being
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 29/100
Q: Within what variable is L constrained according to the space hierarchy theorem?
A: PSPACE
?: L is constrained according to the space hierarchy theorem
❌
Question 30/100
Q: What are the "Big Five" animals in Kenya?
A: lion, leopard, buffalo, rhinoceros, and elephant
?: The "Big Five" game animals of Africa, that is the lion, leopard, buffalo, rhinoceros, and elephant, can be found in Kenya and in the Masai Mara in particular. A significant population of other wild animals, reptiles and birds can be found in the national parks and game reserves in the country. The annual animal migration occurs between June and September with millions of animals taking part, attracting valuable foreign tourism. Two million wildebeest migrate a distance of 2,900 kilometres (1,802 mi) from the Serengeti in neighbouring Tanzania to the Masai Mara in Kenya, in a constant
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 31/100
Q: What is another way of referring to stators?
A: static discs
?: Turbine stators
❌
Question 32/100
Q: When was the new two-year affiliate agreement approved?
A: 2002
?: 2002
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 33/100
Q: What type of materials inside the cabin were removed to help prevent more fire hazards in the future?
A: flammable cabin and space suit materials
?: cabin interior
❌
Question 34/100
Q: How many nations contain "Amazonas" in their names?
A: States or departments in four nations contain "Amazonas" in their names.
?: nine nations
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 35/100
Q: What was Tesla's position with Westinghouse?
A: consultant
?: Tesla was hired by Westinghouse to be a consultant at the company's Pittsburgh labs
❌
Question 36/100
Q: What year were the first Doctor Who stories available on CD?
A: 1999
?: 1999
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 37/100
Q: What is the term for a task that generally lends itself to being solved by a computer?
A: computational problems
?: NP-complete
❌
Question 38/100
Q: What was happening to subscriber numbers in other areas of europe?
A: flattened
?: BSkyB's direct-to-home satellite service became available in 10 million homes in 2010, Europe's first pay-TV platform in to achieve that milestone. Confirming it had reached its target, the broadcaster said its reach into 36% of households in the UK represented an audience of more than 25m people. The target was first announced in August 2004, since then an additional 2.4m customers had subscribed to BSkyB's direct-to-home service. Media commentators had debated whether the figure could be reached as the growth in subscriber numbers elsewhere in Europe flattened
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 39/100
Q: Dutch architect Janjaap Ruijssenaars's performative architecture 3D-printed building is scheduled to be built when?
A: Working versions of 3D-printing building technology are already printing
?: 2014
❌
Question 40/100
Q: When was the most recent Super Bowl hosted in the South Florida/Miami area?
A: 2010
?: Super Bowl XLIV in 2010
❌
Writing evalGPT_Neo.json


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 41/100
Q: The atomic number of the periodic table for oxygen?
A: 8
?: 8
✅
Question 42/100
Q: How were enemy prisoners used tactically by Mongol armies?
A: driving them in front of the army
?: They were used tactically by Mongol armies to divert resources from cities and towns by diverting certain rivers, taking enemy prisoners and driving them in front of the army, and adopting new ideas, techniques and tools from the people they conquered, particularly in employing Muslim and Chinese siege engines and engineers to aid the Mongol cavalry in capturing cities
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 43/100
Q: What is the CJEU's duty?
A: ensure that in the interpretation and application of the Treaties the law is observed
?: to interpret the treaties and to ensure that the law is observed
❌
Question 44/100
Q: Who was an important figure in the twentieth-century Islamic revival in India?
A: Sayyid Abul Ala Maududi
?: Sayyid Abul Ala Maududi
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 45/100
Q: When did the ban on cigarette advertising take effect for television networks?
A: January 2, 1971
?: January 2, 1971
✅
Question 46/100
Q: In what form is oxygen transported in smaller containers?
A: compressed gas
?: Liquid
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 47/100
Q: When rich countries trade with poor countries, whose wages increase?
A: workers in the poor countries
?: The wages of low-skilled workers in the rich countries
❌
Question 48/100
Q: Who led the Science and Environmental Policy Project?
A: Fred Singer
?: Fred Singer
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 49/100
Q: In what year did the NFL switch to a 16-game regular season?
A: 1978
?: 1978
✅
Question 50/100
Q: What does computational complexity theory most specifically seek to answer? 
A: computational problems
?: The complexity of a problem is the amount of time it takes to solve it
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 51/100
Q: What was the final score of Super Bowl 50? 
A: 24–10
?: 24–10
✅
Question 52/100
Q: What did Luther call the mass instead of sacrifice?
A: a gift
?: Luther called the mass a sacrifice, not a sacrifice
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 53/100
Q: What did Queen Elizabeth II open in Newcastle in 1981?
A: A bridge
?: the Tyne and Wear Metro
❌
Question 54/100
Q: What kind of disorders are the result of an overactive immune response?
A: autoimmune disorders
?: Autoimmune disorders
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 55/100
Q: Where hotel did the Panthers stay at?
A: San Jose Marriott.
?: Stanford University
❌
Question 56/100
Q: Why did Westinghouse not secure a patent for a similar motor?
A: decided Tesla's patent would probably control the market
?: Westinghouse did not want to be seen as a competitor to Tesla
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 57/100
Q: What words to they think are linked to the work Kenya?
A: Kirinyaga, Kirenyaa and Kiinyaa
?: Kenya
❌
Question 58/100
Q: When was the first known historical reference to immunity?
A: Athens in 430 BC
?: Thucydides
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 59/100
Q: What was the name of the Mongol Empire's legal code?
A: Yassa
?: Yassa
✅
Question 60/100
Q: When had the Six Ministries existed?
A: since the Sui and Tang dynasties
?: The Six Ministries were established in the Yuan dynasty (1271–1368) and continued to exist in the Ming dynasty (1368–1644)
❌
Writing evalGPT_Neo.json


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 61/100
Q: How to Baptized Members become Professing Members?
A: confirmation and sometimes the profession of faith
?: The UMC practices infant and adult baptism. Baptized Members are those who have been baptized as an infant or child, but who have not subsequently professed their own faith. These Baptized Members become Professing Members through confirmation and sometimes the profession of faith. Individuals who were not previously baptized are baptized as part of their profession of faith and thus become Professing Members in this manner. Individuals may also become a Professing Member through transfer from another Christian denomination
❌
Question 62/100
Q: After asking if the books were his, what else did Eck ask Luther?
A: stood by their contents
?: Eck asked Luther if he stood by the contents of the books
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 63/100
Q: When did the Pope warned Luther of excommunication? 
A: 15 June 1520
?: 15 June 1520
✅
Question 64/100
Q: When large groups of people all boycott a system or don't pay taxes it can be considered?
A: coercive
?: No
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 65/100
Q: What ABC action series went up against NBC's variety shows in Fall 1957?
A: Zorro
?: The ABC series The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The Fugitive, The
❌
Question 66/100
Q: What was the significance of victory at Forth Niagara for British?
A: cut off the French frontier forts further to the west and south
?: British victories continued in all theaters in the Annus Mirabilis of 1759, when they finally captured Ticonderoga, James Wolfe defeated Montcalm at Quebec (in a battle that claimed the lives of both commanders), and victory at Fort Niagara successfully cut off the French frontier for

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 67/100
Q: What is the Latin name for Black Death?
A: atra mors
?: pestilentia
❌
Question 68/100
Q: What are the phagocytes that are located in tissues in contact with the external environment called?
A: Dendritic cells
?: Dendritic cells (DC) are phagocytes in tissues that are in contact with the external environment; therefore, they are located mainly in the skin, nose, lungs, stomach, and intestines. They are named for their resemblance to neuronal dendrites, as both have many spine-like projections, but dendritic cells are in no way connected to the nervous system. Dendritic cells serve as a link between the bodily tissues and the innate and adaptive immune systems, as they present antigens to T cells, one of the key cell types of the adaptive immune system
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 69/100
Q: What profession was Nathan Alterman?
A: poet
?: Poet
✅
Question 70/100
Q: What type of vote must the Parliament have to either block or suggest changes to the Commission's proposals?
A: a majority
?: a majority of all MEPs (not just those present)
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 71/100
Q: What percentage of money raised was earmarked for causes in the San Francisco area?
A: 25 percent
?: 25%
❌
Question 72/100
Q: In which year was the Musical Instruments gallery closed?
A: 2010
?: 2010
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 73/100
Q: What scientific field's theory has received contributions from the steam engine?
A: thermodynamic
?: Thermodynamics
❌
Question 74/100
Q: What type of organization would need large quantities of pure oxygen?
A: hospitals
?: Oxygen storage methods include high pressure oxygen tanks, cryogenics and chemical compounds. For reasons of economy, oxygen is often transported in bulk as a liquid in specially insulated tankers, since one liter of liquefied oxygen is equivalent to 840 liters of gaseous oxygen at atmospheric pressure and 20 °C (68 °F). Such tankers are used to refill bulk liquid oxygen storage containers, which stand outside hospitals and other institutions with a need for large volumes of pure oxygen gas. Liquid oxygen is passed through heat exchangers, which convert the cryogenic liquid into gas before it enters the building. Oxygen is also stored and
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 75/100
Q: How much Saharan dust falls over the Amazon basin each year?
A: 27.7 million tons
?: 182 million tons
❌
Question 76/100
Q: How long after a banquet with Tugh Temur did Kusala die?
A: four days
?: four days
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 77/100
Q: What type of people were being brought back to the way of truth?
A: misguided
?: The people who were brought back to the way of truth were those who had been brought back to the way of truth by Luther
❌
Question 78/100
Q: What did the Greek root pharmakos imply?
A: sorcery or even poison
?: The Greek word pharmakos (φαρμακος) means "poison" or "poisonous". It is derived from the Greek word pharma (φάρμα), which means "drug" or "medicine". The word pharmakos is also used in the sense of "poison" in the sense of "poisonous" in the sense of "poisonous" in the sense of "poisonous" in the sense of "poisonous" in the sense of "poisonous" in the sense of "poisonous" in
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 79/100
Q: What was Fort Caroline renamed to after the Spanish attack?
A: San Mateo
?: Fort Caroline
❌
Question 80/100
Q: Which territory did Jochi try to protect that may have led to a dispute with his brothers?
A: Urgench
?: Khorasan
❌
Writing evalGPT_Neo.json


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 81/100
Q: What ranking does the Super Bowl 50 halftime show have on the list of most watched TV broadcasts?
A: third
?: The Super Bowl 50 halftime show was the third-most watched U.S. broadcast ever
❌
Question 82/100
Q: What music did the fourth soundtrack feature?
A: music from the 2008–2010 specials
?: The soundtrack for Series 5 featured music from the 2008–2010 specials (The Next Doctor to End of Time Part 2)
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 83/100
Q: What administration did Ludwig Mies van der Rohe designa buiding?
A: School of Social Service Administration
?: Ludwig Mies van der Rohe
❌
Question 84/100
Q: What was Manning's passer rating for the 2015 season?
A: 67.9
?: 67.9
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 85/100
Q: Issues dealt with at Westminster are not ones who is able to deal with?
A: Scottish Parliament
?: The Scottish Parliament is unable to legislate on such issues that are reserved to, and dealt with at, Westminster (and where Ministerial functions usually lie with UK Government ministers). These include abortion, broadcasting policy, civil service, common markets for UK goods and services, constitution, electricity, coal, oil, gas, nuclear energy, defence and national security, drug policy, employment, foreign policy and relations with Europe, most aspects of transport safety and regulation, National Lottery, protection of borders, social security and stability of UK's fiscal, economic and monetary system
❌
Question 86/100
Q: What was the wedding date?
A: 12 May 1191
?: 12 May 1191
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 87/100
Q: What did the development of this fertile soil provide in hostile environment?
A: agriculture and silviculture
?: It allowed agriculture and silviculture in the previously hostile environment; meaning that large portions of the Amazon rainforest are probably the result of centuries of human management, rather than naturally occurring as has previously been supposed
❌
Question 88/100
Q: Who did the North declare for during the English civil war?
A: the King
?: Charles I
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 89/100
Q: Which country does the Rhine encounter it's main tributaries?
A: Germany
?: Germany
✅
Question 90/100
Q: What French animation studio did ABC purchase in 1993?
A: DIC Entertainment
?: DIC Entertainment
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 91/100
Q: What rive joins the Rhine in Duisburg? 
A: Ruhr
?: the Ruhr
❌
Question 92/100
Q: Which animal's venom did Pierre-Louis Moreau de Maupertuis work with?
A: scorpion
?: Scorpion
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 93/100
Q: In the capabilities approach, grow and income are considered a means to an end rather than what?
A: the end itself
?: The capabilities approach – sometimes called the human development approach – looks at income inequality and poverty as form of “capability deprivation”. Unlike neoliberalism, which “defines well-being as utility maximization”, economic growth and income are considered a means to an end rather than the end itself. Its goal is to “wid[en] people’s choices and the level of their achieved well-being” through increasing functionings (the things a person values doing), capabilities (the freedom to enjoy functionings) and agency (the ability to pursue valued goals)
❌
Question 94/100
Q: Where are Jersey and Guernsey
A: Channel Islands
?: Channel Islands
✅


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 95/100
Q: What was the boat called?
A: teleautomaton
?: Tesla's "Teleautomaton"
❌
Question 96/100
Q: What did the SNP publicly opine about the oil revenues?
A: not benefitting Scotland as much as they should
?: The SNP argued that the revenues from the oil were not benefitting Scotland as much as they should
❌


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question 97/100
Q: How many days after Gagarin's flight did the US House Committee on Science and Astronautics meet?
A: one day
?: one day
✅
Question 98/100
Q: How can function problems typically be restated?
A: decision problems
?: The set of triples (a, b, c) such that the relation a × b = c holds. Deciding whether a given triple is a member of this set corresponds to solving the problem of multiplying two numbers
❌
Question 99/100
Q: What does Fortiter Defendit Triumphans mean?
A: Triumphing by a brave defence
?: Fortiter Defendit Triumphans
❌
Question 100/100
Q: Recent studies believe  that ctenophores are the sister lineage to what?
A: all other animals
?: Cnidaria
❌
Writing evalGPT_Neo.json
Accuracy: 33.0% -- 33 correct and 67 incorrect
Number of tokens generated: 51982 -- Time: 608.1826891899109 -- Tokens-Per-Sec: 85.47102856419532


In [9]:
tokenizer_gpt = None
model_gpt = None

gc.collect()

390