# Getting started with RAG in Python

This notebook aims to give a basic introduction to Retrieval-Augmented Generation (RAG) with a small LLM in Python. The intent is to give an extemely transparent (if simple) runthrough of a basic RAG setup in Python. It is not intended to act as a technical reference for any production RAG-based systems. In such cases you should consider using a dedicated vector database (e.g. Qdrant, Chroma, Vespa) and some dedicated LLM tooling such as LangChain. To get started, make sure you're using Python 3.10 or greater. Install the following packages:

In [None]:
!pip uninstall -y transformers
!pip install git+https://github.com/huggingface/transformers
#!pip install openai==0.28
!pip install torch
!pip install scikit-learn
!pip install accelerate==0.31.0 #install for fix error "cannot import name 'split_torch_state_dict_into_shards' from 'huggingface_hub'"

Found existing installation: transformers 4.46.3
Uninstalling transformers-4.46.3:
  Successfully uninstalled transformers-4.46.3
Collecting git+https://github.com/huggingface/transformers
  Cloning https://github.com/huggingface/transformers to /tmp/pip-req-build-qvbpkvof
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /tmp/pip-req-build-qvbpkvof
  Resolved https://github.com/huggingface/transformers to commit ca03842cdcf2823301171ab27aec4b6b1cafdbc1
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting tokenizers<0.22,>=0.21 (from transformers==4.48.0.dev0)
  Downloading tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Downloading tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚

## Concepts

The basic RAG pattern involves two key components: the retriever and the generator. The generator is typically a LLM -- in this case it is going to be a small LLM from HuggingFace. The retriever can be any external database, but is commonly centered on some form of vector database loaded with _embeddings_. However it is set up, the aim to have the retriever retrieve information in the form of relevant documents or snippets of documents from its data store and to use these to _augment_ the input (prompt) for the Generator in order to allow it to produce better responses. Sounds simple enough, right?


## The data

For this example, you'll use some short snippets about space missions in 2023. These were taken from [Wikipedia](https://en.wikipedia.org/wiki/2023_in_spaceflight) on 29th October 2023 with small modifications. At the time this notebook was written, GPT-4 had access to data up to October 2021. As such all of these events are outside of its parametric memory. Here are the documents:

In [None]:
documents = [
  "On 14 April, ESA launched the Jupiter Icy Moons Explorer (JUICE) spacecraft to explore Jupiter and its large ice-covered moons following an eight-year transit.",
  "ISRO launched its third lunar mission Chandrayaan-3 on 14 July 2023 at 9:05 UTC; it consists of lander, rover and a propulsion module, and successfully landed in the south pole region of the Moon on 23 August 2023.",
  "Russian lunar lander Luna 25 was launched on 10 August 2023, 23:10 UTC, atop a Soyuz-2.1b rocket from the Vostochny Cosmodrome, it was the first Russian attempt to land a spacecraft on the Moon since the Soviet lander Luna 24 in 1974, it crashed on the Moon on 19 August after technical glitches.",
  "JAXA launched SLIM (Smart Lander for Investigating Moon) lunar lander (carrying a mini rover) and a space telescope (XRISM) on 6 September.",
  "The OSIRIS-REx mission returned to Earth on 24 September with samples collected from asteroid Bennu.",
  "NASA launched the Psyche spacecraft on 13 October 2023, an orbiter mission that will explore the origin of planetary cores by studying the metallic asteroid 16 Psyche, on a Falcon Heavy launch vehicle.",
  "Sometime in Mid-2023 NASA further developed its preparations for its Europa Clipper Mission, aimed to examine Jupiter's moon called Europa, narrowing its examination on its icy surface and whether it can be habitable. The aim of launch is scheduled within the month of October in the year of 2024, and has the objective to reach a conclusion of whether Europa's moon subsurface ocean can enable the support of life.",
  "Following ISRO's successive third lunar mission Chandrayaan-3 on 14 July 2023 at 9:05 UTC, ISRO launched a solar mission that was its first ever called Aditya-L1 in the month of Spetember of the year of 2023. It was designed to have the objectives of examining the Sun's corona (corona being Latin for crown), chromosphere (the Sun's second of three main layers of atmosphere, literally the `sphere of colour'; it is above the photosphere with a range of distance between 400km (250 miles) and 2,100km (1,300 miles) above the solar Sun's surface), and solar wind interactions.",
  "Scientists formualted the very first neutrino map (one of the fundamental particles map that virtually has no mass) of our Galaxy called the Milky Way using the IceCube Neutrino Observatory. This major discovery helps reveal high-energy cosmic events, giving another way to examine the Galaxy beyond traditional visible light-based imaging.",
  "James Webb Space Telescope (JWST) identified Galaxies that challenges our current understanding of the Early Universe; the observations made by JWST impies that these Galaxies had formed just 500-700 million years after the Big Bang, much faster and more massvie than previously thought possible.",
  "Private space firms such as Space X have been using and examining the potential uses of reusable rocket technology to expand further the boundaries of commercial space travel. SPace X is prepearing for further Starship test flights as well as various Chinese companies, and other companies such as Virgin Galactic are testing these reusable rocket technologies.",
  "Blue Origin, a space company aimed for commerical travel and purposes owned by Amazon Founder Jeff Bezos, commenced a suborbital mission called Blue Origin's NS-28 in November 2024, with Emily Calandrelli becoming the 100th women in space, marking a milestone.",
  "The European Space Agency (ESA) had conducted a mission launched in October 2024, it aims to examine the asteroid called Didymos, following the mission called DART (Double Asteroid Redirection Test, by NASA on September 26th 2022, using a kinetic impactor (in this case crashing a satellite) for asteroid deflection to protect Earth) launched on 10:21 p.m. PST, Nov. 23, 2021, (1:21 a.m. EST, Nov. 24), aboard a SpaceX Falcon 9 rocket from Vandenberg Space Force Base in California, by analyzing its effects on the asteroid system."
  "JAXA is expecting to launch a mission called ispace Mission 2 in the month of December of the year of 2024 to examine lunar landing and to deploy various payloads including scientific instruments.",
  "SpaceX launched 1,984 Starlink satellites across 63 missions in 2023, deploying upgraded Starlink V2 Mini satellites. These new models offer quadruple the data capacity compared to previous versions, featuring enhancements like inter-satellite laser links and Argon-based Hall Effect thrusters‚Äã.",
  "NASA‚Äôs Psyche spacecraft was launched on October 13, 2023, on a Falcon Heavy rocket to study the metal-rich asteroid 16 Psyche, aiming to explore planetary core formation‚Äã",
  "SpaceX executed significant Falcon Heavy launches in 2023. These included carrying NASA's Euclid Telescope, designed to study dark energy and dark matter, and deploying military payloads such as the GPS-III spacecraft.",
  "ESA's JUICE Mission launched on April 14, 2023, to explore Jupiter and its moons, marking the beginning of an eight-year journey.",
  "The release of ChatGPT-4 by OpenAI in March 2024 introduced multimodal capabilities, combining text, images, and audio analysis‚Äã.",
  "Tesla‚Äôs Cybertruck deliveries began in November 2023 after years of delays.",
  "Finland officially joined NATO on April 4, 2023, following its application in response to Russia‚Äôs actions in Ukraine‚Äã.",
  "On September 10, 2023, India hosted the G20 Summit, emphasizing the Global South's role in world affairs‚Äã.",
  "Lionel Messi led Inter Miami to their first Leagues Cup title in August 2023, boosting soccer‚Äôs popularity in the United States.",
  "The 2024 Paris Olympics torch relay began in late 2023, marking preparations for the Summer Games.",
  "UN's Beyond Oil and Gas Alliance gained traction in 2024, aiming to phase out fossil fuel investments amid rising climate crises.",
  "2023 saw July declared the hottest month on record, with global average temperatures exceeding pre-industrial levels‚Äã.",
  "Barbie (2023) by Greta Gerwig became one of the highest-grossing films of all time, sparking global conversations on gender roles and capitalism‚Äã. A major Hollywood Blockbuster.",
  "The Taylor Swift Era‚Äôs Tour generated a massive cultural and economic impact, creating what has been dubbed a ‚ÄúSwift-conomy‚Äù in various cities‚Äã.",
  "The 2023 Nobel Prize in Physiology or Medicine was awarded to Katalin Karik√≥ and Drew Weissman for their development of mRNA technology used in COVID-19 vaccines.",
  "WHO (World Health Organisation) declared an end to the global COVID-19 emergency in May 2023.",
  "On October 14, 2023, a rare Ring of Fire solar eclipse was visible across parts of the Americas‚Äã.",
  "Kylie Jenner's Timoth√©e Chalamet romance made headlines in the celebrity world in mid-2023‚Äã."
  "A second malaria vaccine, R21, was approved, with plans for rollout in 2024.",
  "Finland joined NATO in April 2023, doubling NATO's border with Russia.",
  "BRICS announced the addition of six new members starting in 2024, including Saudi Arabia, United Arab Emirates, Argentina, Ethiopia, Iran, and Egypt.",
  "NASA's SIRIS-REx mission successfully returned asteroid samples to Earth on September 24, 2023.",
  "2023 saw the world‚Äôs first High Seas Treaty aimed at protecting international waters, and global shipping began steps toward decarbonization with new IMO (International Maritime Organization) policies.",
  "Extreme weather events, such as record heat waves, highlighted the growing impact of climate change‚Äã.",
  "Inflation moderated in several regions, but challenges persisted globally, especially around housing and energy costs.",
  "Remote work patterns stabilized, reshaping cities and economies‚Äã.",
  "The Maui wildfires in August 2023 became the deadliest in U.S. history.",
  "Catastrophic floods affected Libya, Pakistan, and parts of the Horn of Africa, displacing thousands."
  "Youth Leadership: The world reached a record high of young people among its population, with significant contributions from youth-led climate and social movements globally‚Äã.",
  "The 2023 Nobel Prize winners are as follows: Physiology or Medicine: Katalin Karik√≥ and Drew Weissman: Awarded for their pioneering work on mRNA vaccine technology, which was crucial in the development of COVID-19 vaccines‚Äã, Physics:Pierre Agostini, Ferenc Krausz, and Anne L‚ÄôHuillier: Recognized for experimental methods that generate attosecond pulses of light, enabling the study of electron dynamics in matter‚Äã, Chemistry: Moungi G. Bawendi, Louis E. Brus, and Alexei I. Ekimov: Honored for the discovery and synthesis of quantum dots, which have applications in technologies such as medical imaging and optoelectronics‚Äã, Literature: Jon Fosse: A Norwegian author and playwright, awarded for his innovative plays and prose, which delve into existential themes and give voice to the unsayable, Peace: Narges Mohammadi: Iranian human rights advocate, recognized for her efforts to combat the oppression of women in Iran and her activism in promoting human rights and freedom for all‚Äã, and finally Economics: Claudia Goldin: Honored for her work on gender and labor economics, which has significantly advanced the understanding of women‚Äôs participation in the labor force over centuries‚Äã.",
  "The 2024 Nobel Prize winners are as follows: Physics: Hiroshi Amano, G√©rard Mourou, and Michael Thorpe received the award for advancing laser and optical systems technologies‚Äã, Chemistry: David Baker was honored for computational protein design. Demis Hassabis and John Jumper shared the other half for predicting protein structures using AI (AlphaFold). Physiology or Medicine: Victor Ambros and Gary Ruvkun were recognized for discovering microRNA and elucidating its regulatory roles in gene expression‚Äã. Literature: Han Kang, from South Korea, received the prize for her poetic prose that reflects on historical traumas and human fragility‚Äã. Peace Prize: Awarded to Nihon Hidankyo, representing atomic bomb survivors (Hibakusha), for efforts towards global nuclear disarmament‚Äã. Economic Sciences: Daron Acemoglu, Simon Johnson, and James Robinson were recognized for their studies on the formation and impact of institutions on economic prosperity‚Äã.",
  "In July 2024, a prominenet and large IT (Information Technology) outage had occured. This was due to the fact that there was a defective Endpoint Detection and Response (EDR) software. The update had resulted in Wondows Operating Software (OS) systems to malfunction and crash, affecting approximately 8.5 million devices worldwide. The outage had widespread implications, disrupting daily life, businesses, and government operations across various industries, including airlines, banks, hospitals, and hotels. The health care and banking sectors were particularly hard-hit, with estimated financial losses of $1.94 billion and $1.15 billion, respectively. CrowdStrike has since addressed the issue and implemented measures to prevent future occurrences.",
  "Donald Trump had won the 2024 November US Presidential ELections.",
  "Open AI's CEO Sam Altman was  temporarily ousted and reinstated in late 2023.",
  "Apple' Vison Pro Vritual and Augmented Reality (VR and AR) headset was released within the early months of the year of 2024.",
  "Politics & Global Events: Ongoing Ukraine War ‚Äì Russia‚Äôs invasion continued with key counteroffensives and shifting frontlines. Israel-Hamas Conflict ‚Äì War erupted in October 2023, with global calls for ceasefire amidst heavy casualties. Sudan Civil War ‚Äì Fighting broke out in April 2023 between the Sudanese Armed Forces and the Rapid Support Forces (RSF). French Protests ‚Äì Massive strikes erupted in March-April 2023 over pension reform (raising the retirement age). Indian Elections 2024 ‚Äì Prime Minister Narendra Modi secured a third term. US Politics ‚Äì Preparations for the 2024 Presidential Elections intensified with candidates campaigning. COP28 in Dubai ‚Äì Hosted in November-December 2023, it addressed climate challenges and energy transitions.",
  "Sports: FIFA Women's World Cup ‚Äì Hosted by Australia and New Zealand in July-August 2023. Spain won their first title. T20 Cricket World Cup ‚Äì The West Indies and USA co-hosted the T20 World Cup in June 2024. NBA Finals 2024 ‚Äì The Boston Celtics won their 18th title, securing the most championships in NBA history. Paris Olympics 2024 ‚Äì Anticipated games set to begin in July 2024.",
  "Entertainment & Celebrity: Hollywood Strikes ‚Äì The Writers Guild of America (WGA) and Screen Actors Guild (SAG-AFTRA) strikes disrupted TV and film productions in 2023. Barbie & Oppenheimer ‚Äì Released in July 2023; the phenomenon was dubbed Barbenheimer. Taylor Swift‚Äôs Eras Tour ‚Äì Became the first tour to gross over $1 billion, boosting global economies. Beyonc√©‚Äôs Renaissance Tour ‚Äì Also dominated the charts and arenas. Kanye West & Controversies ‚Äì Continued to make headlines with music and erratic public behavior. P Diddy controversies- Sexual Assualt and Rape charges.",
  "Business & Economy: Banking Crisis 2023 ‚Äì The collapse of Silicon Valley Bank (SVB) triggered global financial concerns. Amazon, Meta, Google Layoffs ‚Äì Widespread tech industry layoffs occurred as companies adjusted after pandemic booms. Bitcoin Surge ‚Äì Cryptocurrency prices rebounded sharply by mid-2024."
]

## The retriever

In situations where you intend to use an off-the-shelf LLM (i.e. Generator), the retriever is the aspect of the system you have most control over. A common design for retrievers is to use a pre-trained language model to convert your documents into embeddings, and to then use a vector database to store these and query these embeddings during operation.

In this simple example, you'll use a pre-trained embedding model available through the `transformers` library from [Hugging Face](huggingface.co). This [pre-trained model](https://huggingface.co/BAAI/bge-base-en) is the English language version of the 'General Embedding' model from Beijing Academy of Artificial Intelligence (BAAI).

Here's a function that creates embeddings from a set of documents:

In [None]:
import torch
from transformers import AutoTokenizer, AutoModel


def embed_documents(docs, model_name):
  """Embed the provided documents to create a document index"""
  # load the tokenizer and model
  tokenizer = AutoTokenizer.from_pretrained(model_name)
  model = AutoModel.from_pretrained(model_name)

  # encode the docs with the tokenizer
  encoded_docs = tokenizer(
      docs, padding=True, truncation=True,
      return_tensors='pt'
  )

  # generate your output embedding vectors
  with torch.no_grad():
      model_output = model(**encoded_docs)
      doc_embeddings = model_output[0][:, 0]

  # convert to numpy vectors for ease of use
  return doc_embeddings.numpy()

As you can see, there are two main elements here, the `tokenizer` and the `model`. You'll notice both use the same `model_name`. Language models often have their own tokenizers. This allows them to convert from human-readable natural language into machine-readable format compatible with the model. This machine-readable format is then used by the model to generate your embeddings.

You can now generate your document index as:

In [None]:
document_index = embed_documents(documents, model_name="BAAI/bge-base-en")
document_index[0].shape # shape of each vector in the index

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/719 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

(768,)

Note that it can be a good idea to 'chunk' your documents before embedding them and creating your document index. This can help reduce the context length subsequently required for the LLM you use, which in turn can improve inference speed and reduce cost. Additionally, it can help the resultant prompt reference specific facts or segments within a document more easily too, which can improve the quality of responses in some cases.

Now that you have your document index, you can create a simple retrieval function to search the index for matching documents:

In [None]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity


def retrieve_documents(query_string, doc_index, docs, k=5, doc_model_name="BAAI/bge-base-en"):
  # embed the query string to obtain a query vector
  query_vector = embed_documents(
      [query_string],
      model_name="BAAI/bge-base-en"
  ).reshape(1, -1)

  # use the query vector to find the most similar document to the query
  similarity = cosine_similarity(query_vector, doc_index).flatten()

  # return the top k most similar docs
  # here, argsort assigns the indices that would order the similarities
  # from least similar to most similar. The [::-1] slice reverses this
  # to return most similar to least, and slices the top k of these
  return [docs[i] for i in np.argsort(similarity)[::-1][:k]]

In this case you're using [cosine similarity](https://developers.google.com/machine-learning/clustering/similarity/measuring-similarity), a common method for comparing embedding vectors. The approach implemented here finds the similarity of your query vector to _every other document vector_. When you have lots of documents, this can be extremely expensive. It is in these circumstances that tools that provide _approximate_ search over the document index are useful. You can check out [FAISS](https://faiss.ai/index.html) from Facebook AI Research as an example of a tool that supports efficient similarity search over very large document indexes. For production environments, this is also where vector databases like [Qdrant](https://qdrant.tech/), [Chroma](https://www.trychroma.com/) or [Vespa](https://vespa.ai/) start to come in handy: they manage efficient similarity search for you!

With all that said, you can then test this simple Retriever (i.e. `retrieve_documents`) using:

In [None]:
example_retrieved_docs = retrieve_documents(
    "Tell me about the Japanese lunar mission.",
    document_index,
    documents,
    k=3
)

example_retrieved_docs

['JAXA launched SLIM (Smart Lander for Investigating Moon) lunar lander (carrying a mini rover) and a space telescope (XRISM) on 6 September.',
 'ISRO launched its third lunar mission Chandrayaan-3 on 14 July 2023 at 9:05 UTC; it consists of lander, rover and a propulsion module, and successfully landed in the south pole region of the Moon on 23 August 2023.',
 'Russian lunar lander Luna 25 was launched on 10 August 2023, 23:10 UTC, atop a Soyuz-2.1b rocket from the Vostochny Cosmodrome, it was the first Russian attempt to land a spacecraft on the Moon since the Soviet lander Luna 24 in 1974, it crashed on the Moon on 19 August after technical glitches.']

Clearly there are not many documents in this document index. However, you should see that the top document is indeed most relevant to the query text: exactly what you want to see!

With this done, it is time to use the retrieved documents to create an augmented input for the LLM (i.e. create an augmented prompt). You'll use a very simple prompt in this case (you should think about ways to make it better!). Here's a simple function to achieve this:

In [None]:
def create_augmented_prompt(query_string, docs):
  # concatenate the retrieved docs as context for the LLM
  # you could do other pre-processing here too
  context = "\n".join(docs)
  # define your prompt template
  prompt_template = """Here is some relevant information:
  {context}

  Q: {query}
  A:
  """
  # render the prompt template
  return prompt_template.format(context=context, query=query_string)

And you can see how this behaves with the following:

In [None]:
example_augmented_prompt = create_augmented_prompt(
    "Tell me about the Japanese lunar mission.",
    example_retrieved_docs
)
example_augmented_prompt

'Here is some relevant information:\n  JAXA launched SLIM (Smart Lander for Investigating Moon) lunar lander (carrying a mini rover) and a space telescope (XRISM) on 6 September.\nISRO launched its third lunar mission Chandrayaan-3 on 14 July 2023 at 9:05 UTC; it consists of lander, rover and a propulsion module, and successfully landed in the south pole region of the Moon on 23 August 2023.\nRussian lunar lander Luna 25 was launched on 10 August 2023, 23:10 UTC, atop a Soyuz-2.1b rocket from the Vostochny Cosmodrome, it was the first Russian attempt to land a spacecraft on the Moon since the Soviet lander Luna 24 in 1974, it crashed on the Moon on 19 August after technical glitches.\n\n  Q: Tell me about the Japanese lunar mission.\n  A:\n  '

The last important piece is querying the LLM itself. This is simple enough. Here is another simple function to query the chosen model:

In [None]:
#import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3.5-mini-instruct",
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-mini-instruct")

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

def generate_response(query_string, chosen_model,generation_arguments):
  messages = [{"content": query_string, "role": "user"}]
  output = chosen_model(messages,**generation_arguments)
  return output[0]['generated_text']

config.json:   0%|          | 0.00/3.45k [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/11.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3.5-mini-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.8k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3.5-mini-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.3k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/195 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.98k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Device set to use cuda:0


Once again, you can see how this behaves with:

In [None]:
generate_response("Hello, world!", chosen_model=pipe,generation_arguments=generation_args)

The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
`get_max_cache()` is deprecated for all Cache classes. Use `get_max_cache_shape()` instead. Calling `get_max_cache()` will raise error from v4.48


" Hello! I'm Phi, Microsoft's language model. I'm here to help you with any questions or tasks you have. What can I do for you today?"

Okay, you've now seen all of the core components of creating using RAG to query an LLM. Time to bring it all together! üöÄ

In [None]:
def generate_rag_response(
    query_string,
    docs,
    doc_index,
    model_name=pipe,
    generation_arguments=generation_args,
    k=3
):

  # R: retrieve documents
  retrieved_docs = retrieve_documents(
      query_string, doc_index, documents
  )
  # A: create augmented prompt
  augmented_prompt = create_augmented_prompt(query_string, retrieved_docs)

  # G: generate response!
  #generated_response = generate_response(augmented_prompt, model_name)
  generated_response = generate_response(augmented_prompt, chosen_model=pipe,generation_arguments=generation_arguments)
  return generated_response

Now generate a RAG response with:

In [None]:
generate_rag_response("Tell me about the status of the latest Indian lunar mission.",
                      documents, document_index, model_name=pipe,
                      generation_arguments=generation_args)



" The latest Indian lunar mission, Chandrayaan-3, was launched by the Indian Space Research Organisation (ISRO) on 14 July 2023 at 9:05 UTC. The mission consisted of a lander, rover, and a propulsion module. After a successful journey through space, Chandrayaan-3 successfully landed in the south pole region of the Moon on 23 August 2023.\n\nAs of the latest information available, the status of Chandrayaan-3 would be considered successful, given the successful landing in the desired lunar region. However, for the most up-to-date status, it would be necessary to check the latest reports from ISRO or other reliable sources.\n\nISRO's Chandrayaan-3 mission is part of India's ongoing efforts to explore the Moon and contribute to the global understanding of lunar science. The mission's objectives include studying the lunar surface, analyzing the lunar exosphere, and conducting experiments to understand the Moon's geology and mineralogy.\n\nIn summary, the Chandrayaan-3 mission has successful

Up-to-date and accurate. Nice. Let's compare the RAG response to a 'raw' LLM response:

In [None]:
generate_response("Tell me about the status of the latest Indian lunar mission.", chosen_model=pipe,generation_arguments=generation_args)

" As of my knowledge cutoff in March 2023, the latest Indian lunar mission is Chandrayaan-3. India's space agency, the Indian Space Research Organisation (ISRO), announced plans for this mission in August 2021. The objective of Chandrayaan-3 is to demonstrate the capability to land on the Moon and conduct in-situ resource utilization (ISRU) experiments.\n\nThe mission is planned to include a lander and a rover, similar to the Chandrayaan-2 mission that experienced a setback during its landing phase. Chandrayaan-3 aims to build on the experience gained from Chandrayaan-2 and to ensure a successful soft landing on the lunar surface.\n\nThe mission is expected to be launched in the second half of 2023. The exact launch date and details about the mission's progress are subject to change and would be best obtained from ISRO's official updates or news releases.\n\nFor the most current status of the Chandrayaan-3 mission, it is recommended to check ISRO's official website or other reliable ne