![NVIDIA Logo](images/nvidia.png)

# LoRA for Extractive Question Answering

In this notebook you will fine tune GPT8B with LoRA to perform extractive question answering.

![Extract LoRA](images/extract_lora.png)

---

## Learning Objectives

By the time you complete this notebook you will be able to:
- LoRA fine tune a GPT8B model for extractive question answering.

---

## Imports

In [1]:
import json

from llm_utils.models import LoraModels, Models
from llm_utils.nemo_service_models import NemoServiceBaseModel
from llm_utils.mocks import upload_qa as upload
from llm_utils.mocks import create_qa_lora_customization as create_customization

---

## List Models

In [2]:
LoraModels.list_models()

gpt8b: gpt-8b-000-lora
gpt43b: gpt-43b-002-lora


---

## Load Train Data From File

We will begin this notebook by loading the train and test prompt and label data we created in the previous notebook.

---

In [3]:
with open('data/squad_prompts_and_answers.json', 'r') as f:
    prompts_and_answers = json.load(f)

In [4]:
len(prompts_and_answers)

2349

---

## Split Data

In preparation for fine-tuning, let's split the data, which currently contains over 2000 samples. We'll create a training set of 1000 samples, a validation set of 200 samples, and a small test set of 20 samples.

In [5]:
train_n = 1000
val_n = 200
test_n = 20

train_end = train_n
val_end = train_end + val_n
test_end = val_end + test_n

train_prompts_and_answers = prompts_and_answers[:train_n]
val_prompts_and_answers = prompts_and_answers[train_n: train_n+val_n]
test_prompts_and_answers = prompts_and_answers[train_n+val_n: train_n+val_n+test_n]

In [6]:
len(train_prompts_and_answers)

1000

In [7]:
len(val_prompts_and_answers)

200

In [8]:
len(test_prompts_and_answers)

20

---

## Exercise: Format Data Fine-tuning

For this exercise, you will format `train_prompts_and_answers` and `val_prompts_and_answers` for NeMo Service fine tuning.

As a reminder, NeMo Service expects that data be in JSON Lines (`jsonl`) format, with each line in the file being in the following format:

```python
{"prompt": <prompt>, "completion": <completion/label>}
```

Your task is to populate the `qa_lora_train_data` and `qa_lora_val_data` lists with one dictionary for each data sample in `train_prompts_and_answers` and `val_prompts_and_answers` respectively, formatted as needed for NeMo Service LoRA fine-tuning.

If you get stuck, feel free to look at the solution below.

In [9]:
qa_lora_train_data = [{'prompt': prompt, 'completion': answer} for prompt, answer in train_prompts_and_answers]

In [10]:
qa_lora_val_data = [{'prompt': prompt, 'completion': answer} for prompt, answer in val_prompts_and_answers]

Here we see examples of data well-formatted for p-tuning.

In [11]:
qa_lora_train_data[0]

{'prompt': 'The National Archives and Records Administration (NARA) is an independent agency of the United States government charged with preserving and documenting government and historical records and with increasing public access to those documents, which comprise the National Archives. NARA is officially responsible for maintaining and publishing the legally authentic and authoritative copies of acts of Congress, presidential proclamations and executive orders, and federal regulations. The NARA also transmits votes of the Electoral College to Congress.\nNARA is responsible for what collection of archives? answer: ',
 'completion': 'National Archives'}

In [12]:
qa_lora_val_data[0]

{'prompt': 'The Space Race was a 20th-century competition between two Cold War rivals, the Soviet Union (USSR) and the United States (US), for supremacy in spaceflight capability. It had its origins in the missile-based nuclear arms race between the two nations that occurred following World War II, enabled by captured German rocket technology and personnel. The technological superiority required for such supremacy was seen as necessary for national security, and symbolic of ideological superiority. The Space Race spawned pioneering efforts to launch artificial satellites, unmanned space probes of the Moon, Venus, and Mars, and human spaceflight in low Earth orbit and to the Moon. The competition began on August 2, 1955, when the Soviet Union responded to the US announcement four days earlier of intent to launch artificial satellites for the International Geophysical Year, by declaring they would also launch a satellite "in the near future". The Soviet Union beat the US to this, with th

---

## Write NeMo Customization Data to File

We will ultimately upload our p-tuning data to the NeMo Service where it can be used for fine tuning. First we need to write it to file.

In [13]:
qa_nemo_train_filename = 'data/squad_nemo_train_prompts_and_answers_1000.jsonl'
qa_nemo_val_filename = 'data/squad_nemo_val_prompts_and_answers_200.jsonl'

In [14]:
with open(qa_nemo_train_filename, 'w') as f:
    for p_and_a in qa_lora_train_data:
        f.write(json.dumps(p_and_a) + '\n')

In [15]:
with open(qa_nemo_val_filename, 'w') as f:
    for p_and_a in qa_lora_val_data:
        f.write(json.dumps(p_and_a) + '\n')

---

## Upload Data to NeMo Service

With the data written to file in JSON lines format, we can now upload it to NeMo Service. As we did earlier, we will mock this step.

In [16]:
train_response = upload(qa_nemo_train_filename)

In [17]:
train_response

{'id': 'f17e25cd-fd08-42b4-a508-12f48985be35',
 'name': 'data/squad_nemo_train_prompts_and_answers_1000.jsonl',
 'size': 834612,
 'number_of_samples': 1000,
 'format': 'jsonl',
 'usage_category': 'dataset',
 'org_id': 'abcdefghijkl',
 'user_id': 'abcdefghijklmnopqrstuvwxyz',
 'ready_at': '0001-01-01T00:00:00Z',
 'created_at': '2024-05-29T17:07:41.645836Z'}

In [18]:
val_response = upload(qa_nemo_val_filename)

In [19]:
val_response

{'id': '30655aa3-17de-41b1-8d73-ddd4a3fadded',
 'name': 'data/squad_nemo_val_prompts_and_answers_200.jsonl',
 'size': 172165,
 'number_of_samples': 200,
 'format': 'jsonl',
 'usage_category': 'dataset',
 'org_id': 'abcdefghijkl',
 'user_id': 'abcdefghijklmnopqrstuvwxyz',
 'ready_at': '0001-01-01T00:00:00Z',
 'created_at': '2024-05-29T17:07:46.835624Z'}

---

## Exercise: LoRA Fine-tune GPT8B for Extractive QA

For this exercise you will perform LoRA fine-tuning on GPT8B with the training and validation data you just wrote to file.

### Your Work Here

Correctly launch a (mock) LoRA customization using `create_customization` immediately below. On success, when you ascertain the customization ID, set the `customization_id` variable below to it for use later in the notebook.

In order to complete this task you'll need to pass `create_customization` the following arguments:
- `model`: This should be a LoRA fine-tuneable GPT8B model. You can use the `LoraModels` enum provided above if you wish.
- `training_dataset_file_id`: This should be the file ID returned to you above when you (mock) uploaded the training data to NeMo Service.
- `validation_dataset_file_id`: This should be the file ID returned to you above when you (mock) uploaded the validation data to NeMo Service.
- `adapter_dim`: Use the default value of `32`.
- `epochs`: Train for 1 epoch.

Worth mentioning is that since we are not providing `validation_data` explicity, NeMo Service will simply use 10% of the training data we provide for validation.

If you get stuck, feel free to check out the *Solution* below.

In [20]:
create_customization(model=LoraModels.gpt8b.value,
                     training_dataset_file_id='f17e25cd-fd08-42b4-a508-12f48985be35',
                     validation_dataset_file_id='30655aa3-17de-41b1-8d73-ddd4a3fadded',
                     adapter_dim=32,
                     epochs=1)

'LoRA customization job for GPT8B succesfully launched! Customization ID: ebd552dc-a050-4987-afca-9136d45fbad1'

In [21]:
customization_id = 'ebd552dc-a050-4987-afca-9136d45fbad1'

---

## Perform Extractive QA with GPT8B LoRA

Next we will try the LoRA fine-tuned GPT8B model for the extractive QA task. First we create a model instance, using the LoRA GPT8B base model and providing the model customization ID ascertained from NeMo Service.

In [22]:
gpt8b_lora = NemoServiceBaseModel(LoraModels.gpt8b.value, customization_id=customization_id)

### Sanity Check

Let's try a single QA prompt out on GPT8B.

In [23]:
prompt, label = test_prompts_and_answers[10]

In [24]:
prompt

"The Sahara (Arabic: الصحراء الكبرى\u200e, aṣ-ṣaḥrāʾ al-kubrā\u202f, 'the Greatest Desert') is the largest hot desert in the world. It is the third largest desert after Antarctica and the Arctic. Its surface area of 9,400,000 square kilometres (3,600,000 sq mi)[citation needed]—including the Libyan Desert—is comparable to the respective land areas of China or the United States. The desert comprises much of the land found within North Africa, excluding the fertile coastal region situated against the Mediterranean Sea, the Atlas Mountains of the Maghreb, and the Nile Valley of Egypt and Sudan. The Sahara stretches from the Red Sea in the east and the Mediterranean in the north, to the Atlantic Ocean in the west, where the landscape gradually transitions to a coastal plain. To the south, it is delimited by the Sahel, a belt of semi-arid tropical savanna around the Niger River valley and Sudan Region of Sub-Saharan Africa. The Sahara can be divided into several regions, including the weste

In [25]:
label

'9,400,000 square kilometres (3,600,000 sq mi)'

In [26]:
gpt8b_lora.generate(prompt).strip()

'9,400,000 square kilometres (3,600,000 sq mi)'

At a glance it looks like the LoRA fine-tuned GPT8B model is doing well. Unlike in the previous notebook where we used just the base GPT8B model, this response does not go on and on, the answer looks to be extracted directly from the text, and is correct.

### Try on Test Data

Now let's try the fine-tuned GPT8B model on the full test set.

In [27]:
for prompt, answer in test_prompts_and_answers:
    response = gpt8b_lora.generate(prompt).strip()
    print(f'Response: {response}')
    print(f'Label: {answer}\n')

Response: 
Label: Chekiang

Response: 1503
Label: 1503

Response: 1999, the Detroit River
Label: Detroit River

Response: 19th century revival of Georgian architecture
Label: Colonial Revival

Response: 1) volcanic
Label: volcanic

Response: 1588
Label: 1588

Response: ‘The Church of Jesus Christ of Latter-day Saints’
Label: The Church of Jesus Christ of Latter-day Saints

Response: 1,214 km (754 mi) long and considered the longest uninterrupted border within the European Union.
Label: Atlantic Ocean

Response: 161,785 people resided on Guam
Label: United States

Response: 1. spheres 2. rods 3. spirals
Label: spheres to rods and spirals

Response: 9,400,000 square kilometres (3,600,000 sq mi)
Label: 9,400,000 square kilometres (3,600,000 sq mi)

Response: 55% are non-denominational Muslims
Label: (55%) are non-denominational Muslims

Response: 1. Roman Catholicism and 2. Eastern Orthodoxy
Label: Roman Catholicism and Eastern Orthodoxy

Response: furthering a particular social cause or 

### Analysis

The LoRA fine-tuned GPT8B model is not peforming perfectly, however it is doing a relatively good job. At times its answers are incorrect, and it sometimes lists out its responses, but for the most part it is able to perform the task we would like. We will be interested to see how it peforms on the task we intend it for instead of responding to the SQuAD questions.