# RAG - Retrieval Augmented Generation

The purpose of a RAG system is to generate a response that is relevant to the user's query. The system should be able to retrieve relevant information from a knowledge base and use it to generate a response. This notebook will demonstrate how to use the RAG model to generate responses to user queries.

## Requirements

Before we start, make sure we have an embedding model to use for the RAG system.
for this tutorial we will be using the `nomic-embed-text` model.

in your terminal run the following command:

```bash
ollama pull nomic-embed-text
```

verify that the model is downloaded by running the following command:

```bash
ollama models
```


First we will need a document to work with. We will be using the a code of conduct as our document.
Options for tools to load documents include:
- `PyPDFLoader`
- `WebBaseLoader`
- `ObsidianLoader` 
- `RedditPostsLoader`
- `RecursiveUrlLoader`
- `TextLoader`

In [None]:
from langchain_community.document_loaders import WebBaseLoader

urls = [
    "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/6JDbUb_L3egv_eOkouY71A.txt"
]
loader = WebBaseLoader(urls)
documents = loader.load()
print(documents[0].page_content[:100])


1.	Code of Conduct

Our Code of Conduct outlines the fundamental principles and ethical standards th


# Documents
Loaders will return a list of documents. Each document is a dictionary with the following keys:
- `page_content`: the text of the document
- `metadata`: a dictionary with additional information about the document
The document object will be the most important object in the RAG pipeline. It is used to create the embeddings of the documents.

# Chunking
after loading in our documents, we can now chunk them into smaller documents. This is done by using the `chunk` function from the `langchain.text_splitter` module. We can specify the chunk size and the chunk overlap.

notable splitters are:
- `RecursiveCharacterTextSplitter`
- `MarkdownTextSplitter`
- `TextSplitter`
- `CharacterTextSplitter`

In [20]:
from langchain_classic.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

chunks = splitter.split_documents(documents=documents)
print(chunks[2].page_content[:500])

Accountability: We take responsibility for our actions and decisions. We follow all relevant laws and regulations, and we strive to continuously improve our practices. We report any potential violations of this code and support the investigation of such matters.
Safety: We prioritize the safety of our employees, clients, and the communities we serve. We maintain a culture of safety, including reporting any unsafe conditions or practices.
Environmental Responsibility: We are committed to minimizi


# Vector Database
with our documents chunked, we can now create a vector database. We will use the `ChromaDB` library for this. 