# Setting up a Retrieval-Augmented Generation (RAG) pipeline

## Loading in the data

Below, we load in the scraped data and place them in a dictionary where keys are the title of the page (the conditions) and the values are the contents of the scraped page.

In [1]:
import os
from tqdm.notebook import tqdm
from bs4 import BeautifulSoup

# path to the condition folder (download them from sharepoint or scrape them again)
conditions_folder = "../nhs-use-case/conditions/"

# set to True if you want to extract only the main element
main_only = True

# read all conditions and put them in a list
conditions = {}
for condition in tqdm(os.listdir(conditions_folder)):
    try:
        content = open(
            os.path.join(conditions_folder, condition, "index.html"), "r"
        ).read()
        soup = BeautifulSoup(content, "html.parser")
        if main_only:
            # extract the main element
            main_element = soup.find("main", class_="nhsuk-main-wrapper")
            # extract the text from the main element
            text = main_element.get_text(separator="\n", strip=True)
        else:
            text = soup.get_text(separator="\n", strip=True)
        conditions[condition] = text
    except Exception as e:
        print(f"Error reading condition {condition}: {e}")
        continue

  0%|          | 0/913 [00:00<?, ?it/s]

Error reading condition index.html: [Errno 20] Not a directory: '../nhs-use-case/conditions/index.html/index.html'
Error reading condition .DS_Store: [Errno 20] Not a directory: '../nhs-use-case/conditions/.DS_Store/index.html'
Error reading condition README.txt: [Errno 20] Not a directory: '../nhs-use-case/conditions/README.txt/index.html'
Error reading condition mental-health: [Errno 2] No such file or directory: '../nhs-use-case/conditions/mental-health/index.html'


## Using the `langchain` library

We use the `langchain` library to create a vector store and will use sentence transformer embeddings from Hugging Face to create the embeddings. We will use the `Chroma` vector store to store the embeddings and perform similarity search.

In [None]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_text_splitters import SentenceTransformersTokenTextSplitter

sentence_transformer_model_name = "sentence-transformers/all-mpnet-base-v2"
embeddings = HuggingFaceEmbeddings(model_name=sentence_transformer_model_name)

splitter = SentenceTransformersTokenTextSplitter(
    model_name=sentence_transformer_model_name,
    chunk_overlap=256,
)

documents = splitter.create_documents(
    conditions.values(),
    metadatas=[{"document": cond} for cond in list(conditions.keys())],
)

We can create a vector database by passing in the documents to `Chroma`.

In [6]:
from langchain_chroma import Chroma

db = Chroma.from_documents(documents, embeddings)

We will use the `HuggingFacePipeline` to instantiate a chat model for constructing the responses.

In [None]:
from langchain_huggingface import (
    HuggingFacePipeline,
)

model_name = "Qwen/Qwen2.5-1.5B-Instruct"
llm = HuggingFacePipeline.from_model_id(
    model_id=model_name,
    task="text-generation",
    pipeline_kwargs={
        "max_new_tokens": 512,
        "temperature": 0.7,
        "do_sample": True,
        "return_full_text": False,
    },
)

Device set to use mps:0


We create a RAG class which puts all these components together.

In [None]:
from langchain_core.documents import Document
from typing_extensions import List, TypedDict
from langgraph.graph import START, StateGraph


class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


class RAG:
    def __init__(self, vector_store, prompt, llm):
        self.vector_store = vector_store
        self.prompt = prompt
        self.llm = llm
        self.graph = self.build_graph()

    def retrieve(self, state: State) -> dict[str, List[Document]]:
        retrieved_docs = self.vector_store.similarity_search(state["question"])
        return {"context": retrieved_docs}

    def generate(self, state: State) -> dict[str, str]:
        docs_content = "\n\n".join(doc.page_content for doc in state["context"])
        messages = self.prompt.invoke(
            {"question": state["question"], "context": docs_content}
        )
        response = self.llm.invoke(messages)
        return {"answer": response}

    def build_graph(self) -> StateGraph:
        graph_builder = StateGraph(State).add_sequence([self.retrieve, self.generate])
        graph_builder.add_edge(START, "retrieve")
        graph = graph_builder.compile()
        return graph

    def query(self, question: str) -> State:
        result = self.graph.invoke({"question": question})
        return result

Lastly, we need a prompt template to format the query and the context and we can pull a basic RAG template from `langchain` to do this.

In [19]:
from langchain import hub

prompt = hub.pull("rlm/rag-prompt")



In [20]:
prompt.invoke(
    {"context": "(context goes here)", "question": "(question goes here)"}
).to_messages()

[HumanMessage(content="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: (question goes here) \nContext: (context goes here) \nAnswer:", additional_kwargs={}, response_metadata={})]

We can now create a RAG object and query from it.

In [53]:
rag = RAG(
    vector_store=db,
    prompt=prompt,
    llm=llm,
)

In [54]:
rag.query(
    "What should I do if I have lost a lot of weight over the last 3 to 6 months?",
)

{'question': 'What should I do if I have lost a lot of weight over the last 3 to 6 months?',
 'context': [Document(id='4707d17a-1564-4f89-8a60-a29d8d7e90e3', metadata={'document': 'malnutrition'}, page_content='weight over the last 3 to 6 months you have other symptoms of malnutrition you \' re worried someone in your care, such as a child or older person, may be malnourished if you \' re concerned about a friend or family member, try to encourage them to see a gp. a gp can check if you \' re at risk of malnutrition by measuring your weight and height, and asking about any medical problems you have or any recent changes in your weight or appetite. if they think you could be malnourished, they may refer you to a healthcare professional such as a dietitian to discuss treatment. who \' s at risk of malnutrition malnutrition is a common problem that affects millions of people in the uk. anyone can become malnourished, but it \' s more common in people who : have a long - term health condit