# Fine-Tuning Nemotron-3 8B using Low-Rank Adaptation (LoRA)
Nemotron-3 is a robust, powerful family of Large Language Models that can provide compelling responses on a wide range of tasks. While the 8B parameter base model serves as a strong baseline for multiple downstream tasks, they can lack in domain-specific knowledge or proprietary or otherwise sensitive information. Fine-tuning is often used as a means to update a model for a specific task or tasks to better respond to domain-specific prompts. This notebook walks through preparing a dataset and using Low Rank Adaptation (LoRA) to fine-tune the base Nemotron-3 8B model from Hugging Face against the dataset.

The implementation of LoRA is based on the paper, [LoRA: Low-Rank Adaptation of Large Language Models](https://openreview.net/pdf?id=nZeVKeeFYf9) by Hu et al.

# Getting the model
You will need to request access to the [Nemotron-3-8B-Base-4K Model](https://huggingface.co/nvidia/nemotron-3-8b-base-4k) through Hugging Face. 

Once you have access, download the model into a persistent storage location (either a Domino Dataset or External Data Volumne). You can do so by uncommenting and running the cells below.

In [2]:
# !curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash

In [3]:
# !sudo apt-get install git-lfs

In [4]:
# !git clone https://<USER_NAME>:<TOKEN>huggingface.co/nvidia/nemotron-3-8b-base-4k <FOLDER_NAME>

**Set the path below to where your model is located**

In [1]:
MODEL_PATH = "/mnt/dgx-models/Nemotron-3-8B-Base-4k.nemo"

# Preparing The Dataset
We will be using LoRA to teach our model to do Extractive Question Answering. The dataset being used for fine-tuning needs to be converted to a .jsonl file and follow a specific format. In general, question and answer datasets are easiest to work with by providing context (if applicable), a question, and the expected answer, though different downstream tasks work as well.

### Downloading the dataset
We will be using the [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) reading comprehension dataset, consisting of questions posed by crowd workers on a set of Wikipedia articles, where the answer to every question is a segment of text. More information on [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) can be found on their website or in their paper by Rajpurkar et. al "[Know What You Don’t Know: Unanswerable Questions for SQuAD](https://arxiv.org/pdf/1806.03822.pdf)".

In [2]:
DATA_DIR = "/mnt/code/data"

In [3]:
import os 
import wget
import sys

os.environ['OPENBLAS_NUM_THREADS'] = '8'
os.makedirs(DATA_DIR, exist_ok=True)
SQUAD_DIR = os.path.join(DATA_DIR, "SQuAD")
os.makedirs(SQUAD_DIR, exist_ok=True)

In [4]:
# Download the SQuAD dataset
!wget -nc https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
!wget -nc https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json
!mv train-v1.1.json {SQUAD_DIR}
!mv dev-v1.1.json {SQUAD_DIR}

--2024-03-19 16:01:58--  https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
Resolving rajpurkar.github.io (rajpurkar.github.io)... 185.199.108.153, 185.199.109.153, 185.199.110.153, ...
Connecting to rajpurkar.github.io (rajpurkar.github.io)|185.199.108.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 30288272 (29M) [application/json]
Saving to: ‘train-v1.1.json’


2024-03-19 16:01:59 (98.1 MB/s) - ‘train-v1.1.json’ saved [30288272/30288272]

--2024-03-19 16:01:59--  https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json
Resolving rajpurkar.github.io (rajpurkar.github.io)... 185.199.108.153, 185.199.109.153, 185.199.110.153, ...
Connecting to rajpurkar.github.io (rajpurkar.github.io)|185.199.108.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4854279 (4.6M) [application/json]
Saving to: ‘dev-v1.1.json’


2024-03-19 16:02:00 (60.7 MB/s) - ‘dev-v1.1.json’ saved [4854279/4854279]



### Preprocessing the dataset
Datasets often need some form of preprocessing to convert it into a form ready for fine-tuning. LoRA (and all PEFT tuning) models expect at least two fields in the jsonl files. The `input` field should contain all the tokens necessary for the model to generate the `output`. For example for extractive QA, the `input` should contain the context text as well as the question.

```
[
    {"input": "User: Context: [CONTEXT_1] Question: [QUESTION_1]\n\nAssistant:", "output": [ANSWER_1]},
    {"input": "User: Context: [CONTEXT_2] Question: [QUESTION_2]\n\nAssistant:", "output": [ANSWER_2]},
    {"input": "User: Context: [CONTEXT_3] Question: [QUESTION_3]\n\nAssistant:", "output": [ANSWER_3]},
]
```
Note that we use keywords in the input like `Context:`, `Question:` to separate the text representing the context and question. We also use the keyword `User:` and end each of the input with `\n\nAssistant:` tokens. These are recommended because NeMo's instruction-tuned models are trained with a prefix of `User:` and suffix `\n\nAssistant:`.

The SQuAD dataset does not already reflect this, so let's go ahead and preprocess it to fit the above format. 

To do so, a processing script has been included with this project template. Feel free to take a look inside the `prompt_learning_squad_preprocessing.py` script.

In [5]:
# Preprocess squad data
!python /opt/NeMo/scripts/dataset_processing/nlp/squad/prompt_learning_squad_preprocessing.py --sft-format --data-dir {SQUAD_DIR}

Saving train split to /mnt/data/SQuAD/squad_train.jsonl
100%|█████████████████████████████████| 87599/87599 [00:00<00:00, 167577.87it/s]
Saving val split to /mnt/data/SQuAD/squad_val.jsonl
100%|█████████████████████████████████| 10570/10570 [00:00<00:00, 173191.06it/s]
Saving test split to /mnt/data/SQuAD/squad_test_ground_truth.jsonl
100%|█████████████████████████████████| 10570/10570 [00:00<00:00, 148402.11it/s]
Saving test split to /mnt/data/SQuAD/squad_test.jsonl
100%|█████████████████████████████████| 10570/10570 [00:00<00:00, 172725.48it/s]


Let's split the datasets into train and validation files, and take a look at a few samples of the data to confirm the preprocessing is satisfactory. 

In [6]:
# What the squad dataset looks like after processing
! head -5000 $SQUAD_DIR/squad_train.jsonl > $SQUAD_DIR/squad_short_train.jsonl
! head -500 $SQUAD_DIR/squad_val.jsonl > $SQUAD_DIR/squad_short_val.jsonl
! head -4 $SQUAD_DIR/squad_short_val.jsonl
! head -4 $SQUAD_DIR/squad_short_train.jsonl

{"input": "User: Context:Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\u201310 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50. Question:Which NFL team represented the AFC at Super Bowl 50?\n\nAssistant:", "output": "Denver Broncos"}
{"input": "User: Context:Super Bowl 50 was an American football game to determine th

# Training

Now that the model is available and the data is prepared, we are ready to start the training.

### Load Config

The NeMo toolkit leverages a configuration file to make it easy to define and explore with training parameters without having to change the code. For this project template, a default configuration for fine-tuning has been included.

We will start by loading in that configuration.

In [7]:
from omegaconf import OmegaConf

cfg = OmegaConf.load("/mnt/code/conf/nemotron-finetune-config.yaml")

With the config loaded, we can override certain settings for our environment. The default values should work but here are some parameter that you may want to adjust:

* `config.trainer.precision` - This is the precision that will be used during fine-tuning. The model might be more accurate with higher values but it also uses more memory than lower precisions. If the fine-tuning process runs out of memory, try reducing the precision here.
* `config.trainer.devices` - This is the number of devices that will be used. If running on a multi-GPU system, increase this number as appropriate.
* `config.model.global_batch_size` - If using a higher GPU count or if additional GPU memory allows, this value can be increased for higher performance. Note that higher batch sizes use more GPU memory.

One config that you will want to update is the `config.model.restore_from_path`. This should point to the `.nemo` file where your model is stored.

In [8]:
cfg.model.restore_from_path=MODEL_PATH

By default, this notebook doesn't use distributed training so we will set some environment variables accordingly. If you do choose to use distributed training methods, you may want to change the environment variables below.

In [9]:
os.environ["LOCAL_RANK"] = '0'
os.environ["RANK"] = '0'
os.environ["WORLD_SIZE"] = '1'

### Configure Training

We now load in our model and configure the trainer using the loaded config.

In [10]:
from nemo.collections.nlp.parts.megatron_trainer_builder import MegatronLMPPTrainerBuilder
from nemo.collections.nlp.models.language_modeling.megatron_gpt_sft_model import MegatronGPTSFTModel
from nemo.collections.nlp.parts.peft_config import LoraPEFTConfig

trainer = MegatronLMPPTrainerBuilder(cfg).create_trainer()
model_cfg = MegatronGPTSFTModel.merge_cfg_with(cfg.model.restore_from_path, cfg)
model = MegatronGPTSFTModel.restore_from(cfg.model.restore_from_path, model_cfg, trainer=trainer)
model.add_adapter(LoraPEFTConfig(model_cfg))

[NeMo I 2024-03-19 16:02:15 megatron_trainer_builder:51] Detected interactive environment, using NLPDDPStrategyNotebook


GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[NeMo W 2024-03-19 16:02:28 megatron_base_model:1104] The model: MegatronGPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-19 16:02:28 megatron_base_model:1104] The model: MegatronGPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-19 16:02:28 megatron_base_model:1104] The model: MegatronGPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-19 16:02:28 megatron_base_model:1104] The model: MegatronGPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make

[NeMo I 2024-03-19 16:02:28 megatron_init:241] Rank 0 has data parallel group : [0]
[NeMo I 2024-03-19 16:02:28 megatron_init:247] Rank 0 has combined group of data parallel and context parallel : [0]
[NeMo I 2024-03-19 16:02:28 megatron_init:252] All data parallel group ranks with context parallel combined: [[0]]
[NeMo I 2024-03-19 16:02:28 megatron_init:255] Ranks 0 has data parallel rank: 0
[NeMo I 2024-03-19 16:02:28 megatron_init:272] Rank 0 has context parallel group: [0]
[NeMo I 2024-03-19 16:02:28 megatron_init:275] All context parallel group ranks: [[0]]
[NeMo I 2024-03-19 16:02:28 megatron_init:276] Ranks 0 has context parallel rank: 0
[NeMo I 2024-03-19 16:02:28 megatron_init:287] Rank 0 has model parallel group: [0]
[NeMo I 2024-03-19 16:02:28 megatron_init:288] All model parallel group ranks: [[0]]
[NeMo I 2024-03-19 16:02:28 megatron_init:298] Rank 0 has tensor model parallel group: [0]
[NeMo I 2024-03-19 16:02:28 megatron_init:302] All tensor model parallel group ranks: 

[NeMo W 2024-03-19 16:02:28 megatron_base_model:1104] The model: MegatronGPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-19 16:02:28 megatron_base_model:1104] The model: MegatronGPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-19 16:02:28 megatron_base_model:1104] The model: MegatronGPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-19 16:02:28 megatron_base_model:1104] The model: MegatronGPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-19 16:02:28 megatron_base_model:1104] The model: MegatronGPTSFTModel() does not have field.name: tp_comm_atomic_ag in 

[NeMo I 2024-03-19 16:02:28 tokenizer_utils:191] Getting SentencePiece with model: /tmp/tmptbtbbirm/586f3f51a9cf43bc9369bd53fa08868c_a934dc7c3e1e46a6838bb63379916563_3feba89c944047c19d5a1d0c07a85c32_mt_nlg_plus_multilingual_ja_zh_the_stack_frac_015_256k.model
[NeMo I 2024-03-19 16:02:28 megatron_base_model:539] Padded vocab_size: 256000, original vocab_size: 256000, dummy tokens: 0.


[NeMo W 2024-03-19 16:02:28 megatron_base_model:1104] The model: MegatronGPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-19 16:02:28 megatron_base_model:1104] The model: MegatronGPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-19 16:02:28 megatron_base_model:1104] The model: MegatronGPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-19 16:02:28 megatron_base_model:1104] The model: MegatronGPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-19 16:02:28 megatron_base_model:1104] The model: MegatronGPTSFTModel() does not have field.name: tp_comm_atomic_ag in 

Loading distributed checkpoint with TensorStoreLoadShardedStrategy
[NeMo I 2024-03-19 16:04:19 nlp_overrides:1108] Model MegatronGPTSFTModel was successfully restored from /domino/edv/dgx-models/Nemotron-3-8B-Base-4k.nemo.
[NeMo I 2024-03-19 16:04:19 nlp_adapter_mixins:184] Before adding PEFT params:
      | Name  | Type     | Params
    -----------------------------------
    0 | model | GPTModel | 8.5 B 
    -----------------------------------
    0         Trainable params
    8.5 B     Non-trainable params
    8.5 B     Total params
    34,160.542Total estimated model params size (MB)
[NeMo I 2024-03-19 16:04:21 nlp_adapter_mixins:197] After adding PEFT params:
      | Name  | Type     | Params
    -----------------------------------
    0 | model | GPTModel | 8.6 B 
    -----------------------------------
    16.8 M    Trainable params
    8.5 B     Non-trainable params
    8.6 B     Total params
    34,227.651Total estimated model params size (MB)


### Configure experiment
We will also activate the experiment logging so that we can create checkpoints to resume from later on.

In [11]:
from nemo.utils.exp_manager import exp_manager

exp_dir = exp_manager(trainer, cfg.get("exp_manager", None))

[NeMo W 2024-03-19 16:04:21 exp_manager:759] No version folders would be created under the log folder as 'resume_if_exists' is enabled.


[NeMo I 2024-03-19 16:04:21 exp_manager:644] Resuming training from checkpoint: /mnt/nemo_experiments/megatron_gpt_peft_lora_tuning/checkpoints/megatron_gpt_peft_lora_tuning--validation_loss=0.391-step=1000-consumed_samples=1000.0-last.ckpt
[NeMo I 2024-03-19 16:04:21 exp_manager:396] Experiments will be logged at /mnt/nemo_experiments/megatron_gpt_peft_lora_tuning
[NeMo I 2024-03-19 16:04:21 exp_manager:842] TensorboardLogger has been set up


[NeMo W 2024-03-19 16:04:21 exp_manager:952] The checkpoint callback was told to monitor a validation value and trainer's max_steps was set to 1000. Please ensure that max_steps will run for at least 1 epochs to ensure that checkpointing will not error out.


### Train model
Lastly, we can finally train our model!

In [12]:
trainer.fit(model)

      rank_zero_warn(
    
      rank_zero_warn(
    
      rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
    


[NeMo I 2024-03-19 16:04:21 megatron_gpt_sft_model:767] Building GPT SFT validation datasets.
[NeMo I 2024-03-19 16:04:21 text_memmap_dataset:116] Building data files
[NeMo I 2024-03-19 16:04:21 text_memmap_dataset:525] Processing 1 data files using 2 workers
[NeMo I 2024-03-19 16:04:21 text_memmap_dataset:535] Time building 0 / 1 mem-mapped files: 0:00:00.105439
[NeMo I 2024-03-19 16:04:21 text_memmap_dataset:525] Processing 1 data files using 2 workers
[NeMo I 2024-03-19 16:04:21 text_memmap_dataset:535] Time building 0 / 1 mem-mapped files: 0:00:00.088355
[NeMo I 2024-03-19 16:04:21 text_memmap_dataset:158] Loading data files
[NeMo I 2024-03-19 16:04:21 text_memmap_dataset:249] Loading /mnt/data/SQuAD/squad_short_val.jsonl
[NeMo I 2024-03-19 16:04:21 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.001443
[NeMo I 2024-03-19 16:04:21 text_memmap_dataset:165] Computing global indices
[NeMo I 2024-03-19 16:04:21 megatron_gpt_sft_model:770] Length of val dataset: 20
[Ne

      counts = torch.cuda.LongTensor([1])
    


[NeMo I 2024-03-19 16:04:22 dataset_utils:1341]  > loading indexed mapping from /mnt/data/SQuAD/squad_short_train.jsonl_squad_short_train.jsonl_indexmap_1005mns_2046msl_0.00ssp_1234s.npy
[NeMo I 2024-03-19 16:04:22 dataset_utils:1344]     loaded indexed file in 0.000 seconds
[NeMo I 2024-03-19 16:04:22 dataset_utils:1345]     total number of samples: 1200
make: Entering directory '/opt/NeMo/nemo/collections/nlp/data/language_modeling/megatron'
make: Nothing to be done for 'default'.
make: Leaving directory '/opt/NeMo/nemo/collections/nlp/data/language_modeling/megatron'
[NeMo I 2024-03-19 16:04:22 blendable_dataset:67] > elapsed time for building blendable dataset indices: 0.10 (sec)
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 1, achieved: 1
[NeMo I 2024-03-19 16:04:22 megatron_gpt_sft_model:783] Length of train dataset: 1005
[NeMo I 2024-03-19 16:04:22 megatron_gpt_sft_model:788] Building dataloader with consumed samples: 1000
[NeMo I 2024-03-1

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


[NeMo I 2024-03-19 16:04:22 nlp_overrides:227] Configuring DDP for model parallelism.


[NeMo W 2024-03-19 16:04:22 megatron_base_model:1145] Ignoring `trainer.max_epochs` when computing `max_steps` because `trainer.max_steps` is already set to 1000.


[NeMo I 2024-03-19 16:04:22 adapter_mixins:435] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-03-19 16:04:22 adapter_mixins:435] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-03-19 16:04:22 adapter_mixins:435] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-03-19 16:04:22 adapter_mixins:435] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-03-19 16:04:22 adapter_mixins:435] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-03-19 16:04:22 adapter_mixins:435] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-03-19 16:04:22 adapter_mixins:435] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-03-19 16:04:22 adapter_mixins:435] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-03-19 16:04:22 adapter_mixins:435] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-03-19 16:04:22 adapter_mixins:435] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-03-19 16:04:22 adapter_mixins:435] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-03-19 16:04:22 adapter_mixins:435] Unfrozen adapter : lora_kqv_


  | Name  | Type     | Params
-----------------------------------
0 | model | GPTModel | 8.6 B 
-----------------------------------
16.8 M    Trainable params
8.5 B     Non-trainable params
8.6 B     Total params
34,227.651Total estimated model params size (MB)
Restoring states from the checkpoint path at /mnt/nemo_experiments/megatron_gpt_peft_lora_tuning/checkpoints/megatron_gpt_peft_lora_tuning--validation_loss=0.391-step=1000-consumed_samples=1000.0-last.ckpt
Restored all states from the checkpoint at /mnt/nemo_experiments/megatron_gpt_peft_lora_tuning/checkpoints/megatron_gpt_peft_lora_tuning--validation_loss=0.391-step=1000-consumed_samples=1000.0-last.ckpt
      rank_zero_warn(
    


ValueError:  `val_check_interval` (50) must be less than or equal to the number of the training batches (5). If you want to disable validation set `limit_val_batches` to 0.0 instead. If you want to validate based on the total training batches, set `check_val_every_n_epoch=None`.

# Evaluate
Now that we have finished fine-tuning, let's try to make some predictions on it from our test dataset.

### Load config
Just like with fine-tuning, we have prepared a config for this project template. Let's start by loading that in.

In [None]:
config_eval = OmegaConf.load("/mnt/code/conf/nemotron-eval-config.yaml")

We will override the model path with the last checkpoint that was logged during fine-tuning.

In [None]:
CHECKPOINT_PATH="/mnt/code/nemo_experiments/megatron_gpt_peft_lora_tuning/checkpoints/megatron_gpt_peft_lora_tuning.nemo"
config_eval.model.restore_from_path=MODEL_PATH
config_eval.model.peft.restore_from_path=CHECKPOINT_PATH

### Load model
Now we load in the model and trainer that we will use for evaluation.

In [None]:
from nemo.collections.nlp.parts.megatron_trainer_builder import MegatronTrainerBuilder
from nemo.collections.nlp.models.language_modeling.megatron_gpt_sft_model import MegatronGPTSFTModel
from nemo.collections.nlp.parts.peft_config import LoraPEFTConfig

trainer_eval = MegatronTrainerBuilder(config_eval).create_trainer()
eval_model_cfg = MegatronGPTSFTModel.merge_inference_cfg(config_eval.model.peft.restore_from_path, config_eval)
model_eval = MegatronGPTSFTModel.restore_from(config_eval.model.restore_from_path, eval_model_cfg, trainer=trainer_eval)
model_eval.load_adapters(config_eval.model.peft.restore_from_path)
model_eval.freeze()

print("Parameter count manually:\n", model_eval.summarize())

### Load test dataset
We load in the test dataset as well.

In [None]:
_test_ds = model_eval._build_dataset(eval_model_cfg.data.test_ds, is_train=False)
from torch.utils.data import DataLoader
request_dl = DataLoader(
    dataset=_test_ds[0],
    batch_size=eval_model_cfg.data.test_ds.global_batch_size,
    collate_fn=_test_ds[0].collate_fn,
)
config_inference = OmegaConf.to_container(config_eval.inference, resolve=True)
model_eval.set_inference_config(config_inference)

### Run predictions
And now it is time to run the predictions through the model and see the results!

**Keep in mind the results you see may vary in quality. The hyperparameters presented in this notebook are not optimal and only serve as examples. Could you be underfitting? Overfitting? These can be adjusted in the configs to improve performance. The point is fine tuning the out-of-the-box model to the general QA task is easy and straightforward with this workflow!**

In [None]:
response = trainer_eval.predict(model_eval, request_dl)
for batch in response:
    for s in batch['sentences']:
        print(f"{s}\n\n")