# LLM RAG Tutorial
<a target="_blank" href="https://colab.research.google.com/github/SamHollings/llm_tutorial/blob/main/llm_tutorial_rag.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

This tutorial will give you a simple introduction to how to get started with an LLM to make a simple RAG app.

RAG (Retrieval Augmented Generation) allows us to give foundational models local context, without doing expensive fine-tuning and can be done even normal everyday machines like your laptop.
The basic idea is that we store documents as vectors in a database. When the user asks a question to the LLM, we can use langchain to first pass that question to the vector database, which retrieves relevant documents (these can be broken up into chunks, given metadata, summarised and various other steps to improve retrieval). The original question and these documents are then passed to the LLM (e.g. Claude) which then gives back the answer. So, in effect the model seems like it knows about what was in the database, e.g. local knowledge about your business, or hobby or whatever, whe in reality, that information was just injected into the prompt just prior to the model seeing it!

The main libraries we will use are:
- Langchain: which is basically a wrapper around the various LLMs and other tools to make it more consistent (so you can swap say.. OpenAI for Anthropic, easily)
- Anthropic: which is the library through which we will access the Claude model (more on why this is chosen below)
- ChromaDB: this is a simple vector database, which is a key part of the RAG model.
- sentence-transformer: this is an open-source model for embedding text

None of the above are "the best" tools - they're just examples, and you may whish to use difference embedding models, LLMs, vector databases, etc.

## Setup
- **Add documents to docs folder**: First there is a bit of setup. In this tutorial we won't go through how to take arbitrary sources and turn them into text files - that can be covered elsewhere. Instead, simply place some plain text documents ending in ".txt" in the "docs" folder.
    - There is a flat text version of the [Goldacre review](https://www.gov.uk/government/publications/better-broader-safer-using-health-data-for-research-and-analysis/better-broader-safer-using-health-data-for-research-and-analysis) already there to get you started
- **.env** file: to use the anthropic Claude model you'll need an access token. That can be made here: https://console.anthropic.com. After this you need to copy the env_example file, rename it ".env" and add in your access token.

In [1]:
# this forces google collab to install the dependencies
if "google.colab" in str(get_ipython()):
    print("Running on Colab")
    !git clone https://github.com/SamHollings/llm_tutorial.git -q
    %cd llm_tutorial
    !pip install -r requirements.txt -q -q

    import src.utils.colab as colab

    colab.upload_dot_env_file()

Running on Colab
/content/llm_tutorial
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.2/79.2 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.5/199.5 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m628.3/628.3 kB[0m [31m24.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m49.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.7/11.7 MB[0m [31m54.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m117.

KeyboardInterrupt: 

In [2]:
pip install -r requirements.txt




In [3]:
import src.utils.colab as colab
colab.upload_dot_env_file()

Saving .env.rtf to .env.rtf
File uploaded and renamed successfully.


In [3]:
import glob
import os

import toml
from dotenv import load_dotenv
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatAnthropic
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.schema.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from tqdm import tqdm

config = toml.load("config.toml")

FileNotFoundError: [Errno 2] No such file or directory: 'config.toml'

In [1]:
pip install --upgrade langchain


Collecting langchain
  Downloading langchain-0.3.11-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-core<0.4.0,>=0.3.24 (from langchain)
  Downloading langchain_core-0.3.24-py3-none-any.whl.metadata (6.3 kB)
Downloading langchain-0.3.11-py3-none-any.whl (1.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m12.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading langchain_core-0.3.24-py3-none-any.whl (410 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m410.6/410.6 kB[0m [31m23.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: langchain-core, langchain
  Attempting uninstall: langchain-core
    Found existing installation: langchain-core 0.3.21
    Uninstalling langchain-core-0.3.21:
      Successfully uninstalled langchain-core-0.3.21
  Attempting uninstall: langchain
    Found existing installation: langchain 0.3.9
    Uninstalling langchain-0.3.9:
      Successfully uninstalled langchain-0.3.9
Successfull

In [2]:
pip install langchain_community


Collecting langchain_community
  Downloading langchain_community-0.3.11-py3-none-any.whl.metadata (2.9 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting httpx-sse<0.5.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.6.1-py3-none-any.whl.metadata (3.5 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading marshmallow-3.23.1-py3-none-any.whl.metadata (7.5 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain_community)
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB

In [None]:
load_dotenv(".env")

# Use variables
# os.environ["OPENAI_API_KEY"] = os.getenv('openai_key')
os.environ["ANTHROPIC_API_KEY"] = os.getenv("anthropic_key")

## Initialise objects

We use a few different types of objects in a RAG pipeline.
- **chunk** because LLMs often can only take in relatively small amounts of text, we need to break larger bodies of text into small chunks. For this we use the `text_splitter`. Exactrly how we chunk up the text is an art in itself, and in this example we simple break it into ~1000 character long chunks (a very simple approach!).
- **embed**: the `embedding` model (by default we've chosen HuggingFace's "sentence-transformer") converts strings of text in the chunks into a vector representation (if you want to learn more about why it does this, have a look into natural language processing theory)
- **store**: the `vectorstore` is the database in which we will store and later retrieve the embedded text vectors for each chunk.
- **Question and Answer Chain**: the `RetrievalQA` chain is a langchain object which does a few things for us:
    - it takes our question and passes it to the `retriever` which in this case submits our question to the `vectorstore`, embeds it, and then returns simply the top 4 nearest chunks (in vector space)
    - this is then "stuffed" into a new prompt along with your question. The default prompt is something like this:
        - `"using the following documents: {stuffed documents} answer the following question: {question}. Answer:"`
    - this new prompt is then sent off the `llm` - in this case that is the Anthropic Claude model.

In [1]:
DEV_MODE = True
PERSIST_DIRECTORY = "db"
EMBEDDING_MODEL = "sentence-transformers/all-mpnet-base-v2"

if DEV_MODE:
    PERSIST_DIRECTORY += "/dev"

embedding = HuggingFaceEmbeddings(
    model_name=EMBEDDING_MODEL
)  # embedding_functions.DefaultEmbeddingFunction()
vectorstore = Chroma(persist_directory=PERSIST_DIRECTORY, embedding_function=embedding)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
llm = ChatAnthropic(anthropic_api_key=os.getenv("ANTHROPIC_API_KEY"))
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)

NameError: name 'HuggingFaceEmbeddings' is not defined

# Populate Vector Database

The below loads the text files into the vector database.
- first it uses to glob to get a list of all of the text files in "docs"
- next it converts this into the `Document` class preferred by langchain
- the document is run through the `text_splitter` to break it down into manageable chunks
- these chunks are added to the `vectorstore` (where they are first run through the `embedding` model prior to insertion into the database).
    - the database itself is just a SQLite database - you can even open it and look inside if you go to the db folder.

**NOTE**: this cell may take a bit of time to run, as it needs to chew through and embed quite a lot of text. Go away and make a cup of coffee.

In [None]:
if (
    not DEV_MODE
):  # won't populate the database if in dev mode - we can just use what was already loaded.
    for text_file_path in tqdm(
        glob.glob("docs/*.txt", recursive=True), desc="Processing Files", position=0
    ):
        with open(text_file_path, "r", encoding="utf-8") as text_file:
            doc = Document(
                page_content=text_file.read(), metadata={"file_path": text_file_path}
            )
            texts = text_splitter.split_documents([doc])
            vectorstore.add_documents(documents=texts)

## Question and Retrieve
Now we can do the fun part - **ask the model questions**.

In [None]:
question = "Describe what the Goldacre says about RAP (Reproducible Analytical Pipelines) and what we need to do to make them work."

answer = qa.run(question)
print(answer)

 Based on the context provided, here are some key points about RAP (Reproducible Analytical Pipelines) and what's needed to make them work according to Goldacre:

- RAPs represent a modern, efficient approach to delivering high quality, reproducible analytics compared to manual processes. They adopt standard practices from software development like good documentation, open source tools like R and Python, flexibility, and extensibility.

- RAPs meet criteria like reproducibility, reusability, auditability, efficiency, high quality, and being less prone to error.

- RAPs were first developed by the UK's Government Digital Service in 2017 around the core principle of being able to reproduce everything done today at any point in the future.

- To make RAPs work, open source languages like R and Python should be used rather than proprietary tools to ensure future access. 

- RAPs reflect an emphasis on documentation, flexibility, and extensibility - principles important for all data analysi