## Project 4 - Retrieval-augmented generation
The task is to create a Retrieval-Augmented Generation (RAG) system using an embedding model, a vector store, and a local LLM (Large Language Model).

In [None]:
!pip install --quiet --upgrade langchain-text-splitters langchain-community
!pip install --upgrade --quiet langchain-huggingface text-generation datasets
!pip install --upgrade --quiet  langchain langchain-core langchain-community langchain-text-splitters langchain-milvus


API Login

In [None]:
import getpass
import os

os.environ["LANGCHAIN_TRACING_V2"] = "true"
if not os.environ.get("LANGCHAIN_API_KEY"):
    os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("Enter LangChain API Key")
if not os.getenv("HUGGINGFACEHUB_API_TOKEN"):
    os.environ["HUGGINGFACEHUB_API_TOKEN"] = getpass.getpass("Enter your HuggingFace token: ")


Enter LangChain API Key··········
Enter your HuggingFace token: ··········


#### Create local LLM endpoint
For this purpose Llama 3.2 3B was chosen as it can be easily handled by Google Collab T4 Machine.

In [None]:
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint

endpoint = HuggingFaceEndpoint(
    repo_id="meta-llama/Llama-3.2-3B-Instruct",
    task="text-generation",
    max_new_tokens=512,
    do_sample=False,
    repetition_penalty=1.03,
)

llm = ChatHuggingFace(llm=endpoint)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


#### Embeddings
Embeddings model used in this project is stella_en_1.5B as it is lightweight and other languages than english are not needed.

In [None]:
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="dunzhang/stella_en_1.5B_v5", model_kwargs={"trust_remote_code": True})

#### Load and split document
As a test input Bee Movie scenario will be used. It is loaded from github page, and splitted into chunks with 500 length and 250 overlap.

In [None]:
import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = WebBaseLoader(
    web_paths=("https://gist.github.com/MattIPv4/045239bc27b16b2bcf7a3a9a4648c08a",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            id="file-bee-movie-script"
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=250)
all_splits = text_splitter.split_documents(docs)

#### Vector store
Milvus, an open-surce vector store, is used not only because its price (free) but also it's high-performance and scalability. Here we initialize database and load document.

In [None]:
from langchain_milvus import Milvus, Zilliz

vector_store = Milvus.from_documents(
    documents=all_splits,
    embedding=embeddings,
    connection_args={
        "uri": "./milvus_demo.db",
    },
    drop_old=True,
)

2025-01-08 13:59:30,152 [ERROR][handler]: RPC error: [create_index], <MilvusException: (code=65535, message=invalid index type: HNSW, local mode only support FLAT IVF_FLAT AUTOINDEX: )>, <Time:{'RPC start': '2025-01-08 13:59:30.150977', 'RPC error': '2025-01-08 13:59:30.152357'}> (decorators.py:140)


In [None]:
!pip install --quiet langgraph

#### Graph
Using langGraph a graph combining all the elements of RAG is created. Then some test questions are asked to check if it works.

In [None]:
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

prompt = hub.pull("rlm/rag-prompt")


class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

In [None]:
response = graph.invoke({"question": "What flower did donkey need to find?"})
print(response["answer"])

Donkey needed to find a hive, and he needed to stop a fly. However, before that, Donkey had to fill the seat of the Flower Cart with 4 Roses. Roses were flowers.


In [None]:
response = graph.invoke({"question": "What is the name of the bee smoker that workers at Honey Farms want?"})
print(response["answer"])

The workers at Honey Farms want the Thomas 3000 bee smoker, as seen in the context of the "smoking gun". The Thomas 3000 is a semi-automatic smoker that produces a consistent flow of smoke, knocking bees out more efficiently. It is described as having ninety puffs a minute and containing twice the nicotine and tar of other smokers.


In [None]:
response = graph.invoke({"question": "Who likes Mosquito girls?"})
print(response["answer"])

It's not explicitly stated in the context, but it seems that no one, including mosquitoes, likes mosquitoes, and they are often met with hostility ("Just smack," "They just smack").


In [None]:
response = graph.invoke({"question": "Whats the name of the boyfriend?"})
print(response["answer"])

The boyfriend's name is Mr. Benson Bee (or Barry Bee to those close to him). However, it is later confirmed that he is actually his bee parent's son, due to royal bee anatomy. His name is her human boyfriend, Mr. Benson, but a local bee or companion, named Barry, in the Hive
