# H02C8b Information Retrieval and Search Engines: RAG Project

Welcome to the notebook companion for the IRSE project. You will find all starter code here. You are encouraged to use this code, as it has been confirmed to work for the RAG pipeline described in the assignment handout. However, you are certainly welcome to make any changes you see fit, provided that your code is written in Python and runs without issue.

**IMPORTANT**: Do not submit a notebook as your final solution. It will not be graded. Refer to assignment handout for more information about the submission format.

**IMPORTANT**: Be mindful of your runtime usage, if working in Colab. At the beginning of every session, navigate to the top menu bar in Colab and select **Runtime > Change runtime type > CPU (Python 3)**. This will ensure that your session runs on CPU and that you do not waste any GPU allocation for the day. GPUs are provided by Google on a limited daily basis, and access is given every 24 hours. It is best that you complete the TF-IDF/search component before loading models and running inference on the GPU runtime.


If you have any questions, feel free to email [Thomas](mailto:thomas.bauwens@kuleuven.be) or [Kushal](mailto:kushaljayesh.tatariya@kuleuven.be).

In [1]:
# from google.colab import userdata
# userdata.get("HF_TOKEN")
import dotenv
import os
import json
import datasets
dotenv.load_dotenv()

HF_TOKEN = os.getenv("HF_TOKEN")

if HF_TOKEN:
    print("Hugging Face token loaded successfully!")
else:
    print("Warning: HF_TOKEN not found in environment variables.")

# !wget https://people.cs.kuleuven.be/~thomas.bauwens/irse_documents_2025_recipes.parquet

Hugging Face token loaded successfully!


## RAG for recipe recommendation:

We will begin by installing the huggingface `datasets` library for easily loading our data.

Let's first download the recipes dataset. After that, we can load the dataset via the huggingface `datasets` library, which offers easy integration with `transformers`.

In [4]:
!wget https://people.cs.kuleuven.be/~thomas.bauwens/irse_documents_2025_recipes.parquet

--2025-04-09 11:48:57--  https://people.cs.kuleuven.be/~thomas.bauwens/irse_documents_2025_recipes.parquet
Resolving people.cs.kuleuven.be (people.cs.kuleuven.be)... 134.58.40.32, 2a02:2c40:500:a030:c515:1337:0:32
Connecting to people.cs.kuleuven.be (people.cs.kuleuven.be)|134.58.40.32|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 119645533 (114M)
Saving to: ‘irse_documents_2025_recipes.parquet.2’


2025-04-09 11:49:19 (5.42 MB/s) - ‘irse_documents_2025_recipes.parquet.2’ saved [119645533/119645533]



In [5]:
dataset = datasets.load_dataset("parquet", data_files="./irse_documents_2025_recipes.parquet")['train']

`datasets` allows us to index directly into a `Dataset` object and easily access the data associated with a sample. You can find more information about working with datasets [here](https://huggingface.co/docs/datasets/access).

In [6]:
print("One document:")
for example in dataset:
    for k,v in example.items():
        print(f"'{k}' = {v}\n")
    break

One document:
'name' = arriba baked winter squash mexican style

'ingredients' = winter squash, mexican seasoning, mixed spice, honey, butter, olive oil, salt

'steps' = make a choice and proceed with recipe, depending on size of squash , cut into half or fourths, remove seeds, for spicy squash , drizzle olive oil or melted butter over each cut squash piece, season with mexican seasoning mix ii, for sweet squash , drizzle melted honey , butter , grated piloncillo over each cut squash piece, season with sweet mexican spice mix, bake at 350 degrees , again depending on size , for 40 minutes up to an hour , until a fork can easily pierce the skin, be careful not to burn the squash especially if you opt to use sugar or butter, if you feel more comfortable , cover the squash with aluminum foil the first half hour , give or take , of baking, if desired , season with salt

'tags' = 60-minutes-or-less, time-to-make, course, main-ingredient, cuisine, preparation, occasion, north-american, side-

We can also load the `queries.json` file, which contains the gold queries created by the instructors. You can use this to debug your retriever and estimate MAP (see project instructions for details).

In [7]:
!wget https://people.cs.kuleuven.be/~thomas.bauwens/irse_queries_2025_recipes.json
queries = json.load(open("./irse_queries_2025_recipes.json", "r"))

--2025-04-09 11:49:20--  https://people.cs.kuleuven.be/~thomas.bauwens/irse_queries_2025_recipes.json
Resolving people.cs.kuleuven.be (people.cs.kuleuven.be)... 134.58.40.32, 2a02:2c40:500:a030:c515:1337:0:32
Connecting to people.cs.kuleuven.be (people.cs.kuleuven.be)|134.58.40.32|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 27562 (27K) [application/json]
Saving to: ‘irse_queries_2025_recipes.json.2’


2025-04-09 11:49:20 (616 KB/s) - ‘irse_queries_2025_recipes.json.2’ saved [27562/27562]



In [8]:
print(queries["queries"][0])

{'q': 'What temperature should I pre-heat my oven to when making chicken quesadillas?', 'r': [[167945, 1], [167954, 1], [21548, 1], [218187, 1], [168524, 1], [68174, 1], [34390, 1], [34410, 1], [85623, 1], [46749, 1], [83613, 1], [210101, 1], [192707, 1], [19157, 1], [46809, 1], [168697, 1], [139022, 1], [168732, 1], [151851, 1], [45356, 1], [45357, 1], [40241, 1], [6453, 1], [179511, 1], [168270, 1], [19788, 1], [223067, 1], [19339, 1], [98716, 1], [191431, 1], [25072, 1]], 'a': '375 is a good temperature, but can go as low as 350 or as high as 400. Adjust times accordingly (longer for lower temperatures).'}


You can see that the `queries` dictionary object contains a list of dictionaries, consisting of query (`q`), answer (`a`), and relevant documents (`r`) fields. The integer values in `r` correspond to the `official_id` field in the `recipes.parquet` dataset (see above), along with a relevance score.



Now that the dataset and queries have been loaded, you are free to implement your RAG pipeline. You are welcome to use any implementation of TF-IDF that you are familiar with. Keep in mind that the TF-IDF model must be fit on the recipes dataset provided above, although you are free to experiment what field is most salient for your relevant document search.

In [9]:
# TODO: implement TF-IDF
# TODO: fit TF-IDF model on recipes dataset
# TODO: implement nearest neighbors search

Once your TF-IDF model has been implemented and fit on the recipes dataset, you can experiment with retrieving the k-most relevant documents for the queries provided below:

In [10]:
sample_queries = [
    "a cajun style gumbo with an easy roux",
    "I am feeling like eating shrimp tacos tonight. What's a good recipe?",
    "recipe for easy vegetarian lasagna",
    "How do I make spageti and meatballs?",
    "15 minute lunch recipe",
    "Give me suggestion for some easy vegetarian weeknight dinner recipes"
]

For a given query and set of relevant documents, you are also required to create a prompt that instructs a model to complete a certain task (e.g. recipe recommendation). You should experiment with formatting the prompt, as language models have been shown to be sensitive to the exact verbiage of instructions.

In [11]:
prompt = f"""

YOUR PROMPT GOES HERE

"""

In [12]:
irrelevant_context = """
Richard Gary Brautigan (January 30, 1935 – c. September 16, 1984)
was an American novelist, poet, and short story writer. A prolific writer,
he wrote throughout his life and published ten novels, two collections of
short stories, and four books of poetry. Brautigan's work has been published
both in the United States and internationally throughout Europe, Japan,
and China. He is best known for his novels Trout Fishing in America (1967),
In Watermelon Sugar (1968), and The Abortion: An Historical Romance 1966 (1971).
"""

Before loading a model from the HuggingFace hub, you will likely want to create an account at https://huggingface.co/ so that you can get an [**access token**](https://huggingface.co/docs/hub/security-tokens) for models which require identification before usage.

- On your personal machine, you can input this access token by running `huggingface-cli login` in a terminal window.

- In Colab, click the icon in the left sidebar that looks like a key, and *Add a secret* called `HF_TOKEN`.

If you don't do this, you risk running into a "Cannot access gated repo" error.

**IMPORTANT**: only run the following code when you have implemented a working retrieval system. When you are ready to work with language models, navigate to the menu bar in Colab and select **Runtime > Change runtime type > T4 GPU**. If you find yourself working on not GPU-intenstive tasks in this notebook, change your runtime back to CPU to preserve access.


In [13]:
# ! uv pip -q install git+https://github.com/huggingface/transformers

In [14]:
# ! uv pip -q install datasets bitsandbytes accelerate xformers einops

In [15]:
import torch
import transformers
import numpy as np

from transformers import AutoTokenizer, AutoModelForCausalLM

The code below will load a Mistral 7B instruct model and quantize it via `bitesandbytes`. Doing so will ensure that the model will not take up too much memory and make inference more efficient. Note that the call to `AutoModelForCausalLM.from_pretrained()` will take a while, as the model's weights must be downloaded from the huggingface hub. Also note that you are not restricted to using Mistral, and are welcome to experiment with other models (though you will have more luck with chat and instruction-tuned variants).

In [16]:
model_id = "mistralai/Mistral-7B-Instruct-v0.2"
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

In [None]:
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    quantization_config=bnb_config,
    device_map='auto'
)

Fetching 3 files:   0%|          | 0/3 [11:29<?, ?it/s]


A tokenizer is required in order to convert strings into integer sequences that can be passed as input to the model.

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_id)

In [None]:
input_string = prompt + sample_queries[0]

In [None]:
encoded_prompt = tokenizer(input_string, return_tensors="pt", add_special_tokens=False)
encoded_prompt = encoded_prompt.to("cuda")

This is the final generation step, where a forward pass must be made through the entire model. Since the model is large (even after quantization), it might take a while.

In [None]:
generated_ids = model.generate(**encoded_prompt, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

We can see that, even without additional context and reference documents, the model is able to generate very coherent recipe instructions. Now, it is up to you to experiment with the RAG framework and see if you can further improve the quality of the model's generation with relevant documents. Refer to the assignment handout for the exact questions we expect you to answer.  