# Basic Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) is a young and constantly evolving field. It is based on the assumptions that so-called hallucinations (factually incorrect statements by the LLM) can be avoided if relevant context is provided with a user's question.

In a nutshell, RAG attempts to automate the step of providing relevant context. In its most common implementation, RAG involves a large collection of potentially relevant documents (the _knowledge base_), which are then indexed and accessed based on the user's message using some sort of retrieval mechanism.

In this example, we will deal with a simpler version: We expect the user to specify the knowledge base in the form of a single website, or a YouTube video transcript. We will then automate the step of retrieving this context and add it to any question the user might have in its entirety (a.k.a. _context stuffing_).

In [1]:
from dotenv import find_dotenv, load_dotenv

load_dotenv(find_dotenv())

True

In [2]:
from langchain_dartmouth.llms import ChatDartmouth

llm = ChatDartmouth(model_name="llama-3-1-8b-instruct")

Since RAG is a new concept, an LLM may give a helpful answer if asked without any context:

In [10]:
response = llm.invoke("What is RAG?")

response

AIMessage(content="RAG can have multiple meanings depending on the context. \n\n1. **Roller Coaster**: RAG is an abbreviation for Roller Coaster. \n2. **Reactive Attachment Disorder**: RAG is an abbreviation for Reactive Attachment Disorder, a mental health disorder in children that involves difficulty forming emotional connections with others.\n3. **Reagent**: In chemistry, RAG can refer to a reagent, a substance used in chemical reactions.\n4. **Radio Amateur Guild**: RAG is an abbreviation for Radio Amateur Guild, an amateur radio club. \n5. **RAG**: In some gaming communities, RAG is an abbreviation for Rolled Against God, which is typically used as a humorous expression in the context of tabletop role-playing games.\n\nWithout more context, it's difficult to determine which definition is most relevant.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 166, 'prompt_tokens': 40, 'total_tokens': 206}, 'model_name': 'meta-llama/Meta-Llama-3

If we know [a potentially helpful website](https://en.wikipedia.org/wiki/Retrieval-augmented_generation), we can leverage the LangChain framework to stuff its contents into our question:

In [14]:
from langchain_community.document_loaders import WebBaseLoader

URL = "https://en.wikipedia.org/wiki/Retrieval-augmented_generation"

pages = WebBaseLoader(URL).load()

context = pages[0].page_content
context

'\n\n\n\nRetrieval-augmented generation - Wikipedia\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nJump to content\n\n\n\n\n\n\n\nMain menu\n\n\n\n\n\nMain menu\nmove to sidebar\nhide\n\n\n\n\t\tNavigation\n\t\n\n\nMain pageContentsCurrent eventsRandom articleAbout WikipediaContact usDonate\n\n\n\n\n\n\t\tContribute\n\t\n\n\nHelpLearn to editCommunity portalRecent changesUpload file\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nSearch\n\n\n\n\n\n\n\n\n\n\n\nSearch\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nAppearance\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nCreate account\n\nLog in\n\n\n\n\n\n\n\n\nPersonal tools\n\n\n\n\n\n Create account Log in\n\n\n\n\n\n\t\tPages for logged out editors learn more\n\n\n\nContributionsTalk\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nContents\nmove to sidebar\nhide\n\n\n\n\n(Top)\n\n\n\n\n\n1\nProcess\n\n\n\n\nToggle Process subsection\n\n\n\n\n\n1.1\nIndexing\n\n\n\n\n\n\n\n\n1.2\nRetrieval\n\n\n\n\n\n\n\n\n1.3\nAugmentation\n\n\n\n\n\n\n\n\n

In [17]:
prompt = "Based on the following information, what is RAG? Context: \n\n" + context
response = llm.invoke(prompt)

response.pretty_print()


RAG stands for Retrieval-Augmented Generation. It is a type of generative artificial intelligence that combines the capabilities of information retrieval and language generation. RAG uses a large language model (LLM) to generate responses to user queries, but instead of relying solely on its own training data, it also incorporates relevant information from a specified set of documents or databases. This allows the model to provide more accurate and up-to-date responses to user queries.


There are many [other loaders available](https://python.langchain.com/v0.2/docs/integrations/document_loaders/) we can leverage to obtain context information from a variety of sources. 

To do one more example, let's try to summarize a YouTube video based on its transcript:

In [24]:
from langchain_community.document_loaders import YoutubeLoader

VIDEO_URL = "https://www.youtube.com/watch?v=pqWUuYTcG-o"

transcripts = YoutubeLoader.from_youtube_url(VIDEO_URL, add_video_info=True).load()

context = transcripts[0].page_content

In [25]:
prompt = "Summarize the following video transcript. Transcript: \n\n" + context
response = llm.invoke(prompt)

response.pretty_print()


The video transcript appears to be a commencement speech delivered by Roger Federer, a renowned tennis player, at Dartmouth College's Class of 2024 graduation ceremony. Federer starts by expressing his excitement and gratitude for being awarded an honorary degree by the college.

He then shares his personal story, mentioning that he left school at the age of 16 to pursue a full-time tennis career. Federer talks about how he had to work hard to achieve his goals and how he overcame obstacles, such as the perception that he played effortlessly. He emphasizes the importance of discipline, grit, and patience in achieving success.

Federer shares three key lessons he has learned from his tennis career, which he believes can be applied to life beyond the court:

1. Effortless is a myth: Federer explains that while people often perceive him as playing effortlessly, he had to work extremely hard to achieve his success. He encourages the graduates to recognize that success is not achieved over

As the size of the knowledge base increases, eventually the LLM will not be able to process all of this information at once. This is where the conventional RAG paradigm comes in: Instead of stuffing the entire context into the prompt, the knowledge base is indexed and queried based on the user's question to only retrieve relevant chunks of context. This sort of architecture is beyond the scope of this workshop, but check out the [`langchain_dartmouth` Cookbook](https://dartmouth-libraries.github.io/langchain-dartmouth-cookbook/) to explore more!