# Basic Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) is a young and constantly evolving field. It is based on the assumptions that so-called hallucinations (factually incorrect statements by the LLM) can be avoided if relevant context is provided with a user's question.

In a nutshell, RAG attempts to automate the step of providing relevant context. In its most common implementation, RAG involves a large collection of potentially relevant documents (the _knowledge base_), which are then indexed and accessed based on the user's message using some sort of retrieval mechanism.

In this example, we will deal with a simpler version: We expect the user to specify the knowledge base in the form of a single website, or a YouTube video transcript. We will then automate the step of retrieving this context and add it to any question the user might have in its entirety (a.k.a. _context stuffing_).

In [None]:
from dotenv import find_dotenv, load_dotenv

load_dotenv(find_dotenv())

In [2]:
from langchain_dartmouth.llms import ChatDartmouth

llm = ChatDartmouth(model_name="llama-3-1-8b-instruct")

Since RAG is a new concept, an LLM may give a helpful answer if asked without any context:

In [None]:
response = llm.invoke("What is RAG?")

response

If we know [a potentially helpful website](https://en.wikipedia.org/wiki/Retrieval-augmented_generation), we can leverage the LangChain framework to stuff its contents into our question:

In [None]:
from langchain_community.document_loaders import WebBaseLoader

URL = "https://en.wikipedia.org/wiki/Retrieval-augmented_generation"

pages = WebBaseLoader(URL).load()

context = pages[0].page_content
context

In [None]:
prompt = "Based on the following information, what is RAG? Context: \n\n" + context
response = llm.invoke(prompt)

response.pretty_print()

There are many [other loaders available](https://python.langchain.com/v0.2/docs/integrations/document_loaders/) we can leverage to obtain context information from a variety of sources. 

To do one more example, let's try to summarize a YouTube video based on its transcript:

In [24]:
from langchain_community.document_loaders import YoutubeLoader

VIDEO_URL = "https://www.youtube.com/watch?v=pqWUuYTcG-o"

transcripts = YoutubeLoader.from_youtube_url(VIDEO_URL, add_video_info=True).load()

context = transcripts[0].page_content

In [None]:
prompt = "Summarize the following video transcript. Transcript: \n\n" + context
response = llm.invoke(prompt)

response.pretty_print()

As the size of the knowledge base increases, eventually the LLM will not be able to process all of this information at once. This is where the conventional RAG paradigm comes in: Instead of stuffing the entire context into the prompt, the knowledge base is indexed and queried based on the user's question to only retrieve relevant chunks of context. This sort of architecture is beyond the scope of this workshop, but check out the [`langchain_dartmouth` Cookbook](https://dartmouth-libraries.github.io/langchain-dartmouth-cookbook/) to explore more!