# Retrieval-Augmented Generation (RAG)

Traditional sequence-to-sequence models (T5, QWEN) generate fluent text but can “hallucinate” or rely on stale knowledge. Retrieval-Augmented Generation (RAG) mitigates these problems by drawing on an external, updatable knowledge base. When you ask a question, the system:

* Retrieves the most relevant passages from its document store.
* Augments the generation model’s input with those passages.
* Generates an answer grounded in real text, reducing hallucinations and easing updates.



You’ll learn how to:

*   Load and preprocess the Wikitext-2 dataset via Hugging Face’s datasets library
*   Embed and index passages using Sentence-Transformers + FAISS
*   Wire a retriever and a lightweight Hugging Face generation model together via LangChain
*   Run queries and dynamically add new documents to your knowledge base




# What is LangChain?

**LangChain** is a Python framework that simplifies building applications with large language models by providing higher-level abstractions—called chains—to stitch together components like prompt templates, LLMs, document loaders, vector stores, and external tools. It lets you:



*   Load and preprocess text from files, webpages, or datasets.
*   Split and index documents into vector databases (e.g., FAISS, Chroma, Milvus).
*   Define chains (e.g., RetrievalQA, Summarization chains) that orchestrate retrieval, prompt construction, and generation.   
*   Manage memory for multi-turn conversations or agents that need context.

With LangChain, you can prototype complex RAG systems, chatbots, agents, and other LLM-powered workflows in just a few lines of code, without wiring each piece manually.

# 1. Install Dependencies

In [1]:
# Force-upgrade fsspec
!pip install --upgrade fsspec

# Core RAG stack
!pip install --upgrade \
  langchain \
  datasets \
  langchain-community \
  transformers \
  sentence-transformers \
  faiss-cpu


Collecting fsspec
  Using cached fsspec-2025.5.1-py3-none-any.whl.metadata (11 kB)
Using cached fsspec-2025.5.1-py3-none-any.whl (199 kB)
Installing collected packages: fsspec
  Attempting uninstall: fsspec
    Found existing installation: fsspec 2025.3.0
    Uninstalling fsspec-2025.3.0:
      Successfully uninstalled fsspec-2025.3.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datasets 3.6.0 requires fsspec[http]<=2025.3.0,>=2023.1.0, but you have fsspec 2025.5.1 which is incompatible.
gcsfs 2025.3.2 requires fsspec==2025.3.2, but you have fsspec 2025.5.1 which is incompatible.[0m[31m
[0mSuccessfully installed fsspec-2025.5.1
Collecting fsspec<=2025.3.0,>=2023.1.0 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Using cached fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Using cached fsspec-2025.3.0-py3-none-any.whl (193 kB)
Installing colle


# 2. Loading the Dataset
We’ll use the Wikitext-2 “raw” split, which provides plain-text Wikipedia articles:







In [2]:
from datasets import load_dataset

streamed = load_dataset("isaacus/open-australian-legal-corpus", split="corpus", streaming=True)

docs = []
total_chars = 0
for row in streamed:
    text = row["text"].strip()
    if not text:
        continue
    docs.append(text)
    total_chars += len(text)
    if total_chars >= 150_000_000:  # ≈ 150 MB
        break

print(f"Loaded {len(docs)} streamed passages (~1 GB)")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loaded 3362 streamed passages (~1 GB)


# 3. Preprocessing & Chunking
Long articles need splitting into manageable “chunks” (e.g. 200-token windows) for embedding:



In [3]:
import itertools

def chunk_text(text, chunk_size=200, overlap=50):
    tokens = text.split()
    for i in range(0, len(tokens), chunk_size - overlap):
        yield " ".join(tokens[i : i + chunk_size])

# Build a list of chunks
chunks = list(itertools.chain.from_iterable(chunk_text(doc) for doc in docs))
print(f"Created {len(chunks)} text chunks")


Created 157824 text chunks


# 4. Embedding the Documents
We’ll use a Sentence-Transformers model via LangChain’s HuggingFaceEmbeddings:


In [4]:
from langchain.embeddings import HuggingFaceEmbeddings

# Choose a compact, high-quality embedding model
embed_model = "sentence-transformers/all-MiniLM-L6-v2"
embeddings = HuggingFaceEmbeddings(model_name=embed_model)


  embeddings = HuggingFaceEmbeddings(model_name=embed_model)


# 5. Building the Vector Store
Index the embedded chunks with FAISS:


In [5]:
from langchain.vectorstores import FAISS

# Create (or load) a FAISS index
vectorstore = FAISS.from_texts(chunks, embeddings)


# 6. Creating the RAG Pipeline
We’ll pair a lightweight Hugging Face generation model (Flan-T5) with our retriever:


In [6]:
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
from langchain.llms import HuggingFacePipeline
from langchain.chains import RetrievalQA

# 1. Load a text2text-generation pipeline
gen_model_id = "google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(gen_model_id)
model     = AutoModelForSeq2SeqLM.from_pretrained(gen_model_id)
hf_pipeline = pipeline(
    "text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=256,
)

# 2. Wrap it for LangChain
llm = HuggingFacePipeline(pipeline=hf_pipeline)

# 3. Build the RetrievalQA chain
rag = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",          # “stuff” for simple prompt-plus-context
    retriever=vectorstore.as_retriever(),
)


Device set to use cuda:0
  llm = HuggingFacePipeline(pipeline=hf_pipeline)


# 7. Querying Your RAG System
Now ask a question:


In [7]:
question = "When was the Melbourne Courts were formed?"
answer = rag.run(question)
print("Answer:", answer)


  answer = rag.run(question)
Token indices sequence length is longer than the specified maximum sequence length for this model (1180 > 512). Running this sequence through the model will result in indexing errors


Answer: The trial of the action took place in Melbourne, delivered judgment in Sydney on the footing that the law of Victoria applied "to avoid keeping the parties waiting until the Court next sits in Melbourne" and "on the view, which both parties urge, that jurisdiction in the case is to be taken as having been exercised in Victoria" [48]


# 8. Dynamically Updating the Knowledge Base
You can update your knowledge base on the fly and immediately see the effect in your RAG answers. Below is an example that:

* Queries the system before adding a new document (and gets no or stale answer).
* Adds a new document containing a specific fact.
* Rebuilds the RAG chain (to pick up the new text).
* Queries after adding the document, yielding the correct, updated answer.

## Before adding documents


In [8]:
question = "when australia passed aborginal act?"
answer = rag.run(question)
print("Answer:", answer)


Answer: Act 1999 of the Commonwealth. which this Act commenced. action Where any provision of an Act, regulation, rule or by-law requires a person to give notice of his intention to bring an action, or of any claim that he intends to prosecute in an action, before the action is instituted in a court, the court may, if the justice of the case so requires, at any time before or after the close of pleadings, dispense with that requirement. Legislative history Notes • Please note—References in the legislation to other legislation or instruments or to titles of bodies or offices are not automatically updated as part of the program for the revision and publication of legislation and therefore may be obsolete. • Earlier versions of this Act (historical versions) are listed at the end of the legislative history. • For further information relating to the Act and subordinate legislation made under the Act see the Index of South Australian Statutes or www.legislation.sa.gov.au. Principal Act Year

## Add a new document

In [9]:
new_docs = [
    "The Aboriginal Land Rights (Northern Territory) Act was passed in December 1976. This legislation was a landmark piece of social reform, allowing First Nations people to claim land title if they could prove a traditional association. ",
]
vectorstore.add_texts(new_docs)

# Build the RetrievalQA chain
rag = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",          # “stuff” for simple prompt-plus-context
    retriever=vectorstore.as_retriever(),
)



## After adding new document



In [10]:
question = "when australia passed aborginal act?"
answer = rag.run(question)
print("Answer:", answer)


Answer: Act 1999 of the Commonwealth. which this Act commenced. action Where any provision of an Act, regulation, rule or by-law requires a person to give notice of his intention to bring an action, or of any claim that he intends to prosecute in an action, before the action is instituted in a court, the court may, if the justice of the case so requires, at any time before or after the close of pleadings, dispense with that requirement. Legislative history Notes • Please note—References in the legislation to other legislation or instruments or to titles of bodies or offices are not automatically updated as part of the program for the revision and publication of legislation and therefore may be obsolete. • Earlier versions of this Act (historical versions) are listed at the end of the legislative history. • For further information relating to the Act and subordinate legislation made under the Act see the Index of South Australian Statutes or www.legislation.sa.gov.au. Principal Act Year

In [14]:
rag.run("When was the Aboriginal Land Rights Act passed?")


'December 1976'

In [15]:
retriever = vectorstore.as_retriever()
docs = retriever.get_relevant_documents("when australia passed aboriginal act?")
for d in docs:
    print(d.page_content)


Volume 7 of The Public General Acts of South Australia 1837-1975 at page 540. • Certain textual alterations were made to this Act by the Commissioner of Statute Revision when preparing the reprint of the Act that incorporated all amendments in force as at 11 July 1988. A schedule of these alterations was laid before Parliament on 16 August 1988. New entries appear in bold. Entries that relate to provisions that have been deleted appear in italics. Provision|How varied|Commencement| Pt 1||| ss 2 and 3|deleted in pursuance of the Acts Republication Act 1967 as their function is now exhausted|11.7.1988| s 4|amended by 35/1978 s 3|8.6.1978| |amended by 100/1978 s 3|14.8.1980| |deleted by 94/1987 Sch|17.12.1987| s 5||| Aboriginal|inserted by 27/2004 s 13(1)|29.7.2004| Aboriginal-owned land|inserted by 27/2004 s 13(1)|29.7.2004| Aboriginal person|inserted by 27/2004 s 13(1)|29.7.2004| Adelaide Dolphin Sanctuary|inserted by 5/2005 Sch 2 (cl 42)|4.6.2005| aircraft|inserted by 94/1987 s 3(a)|17

  docs = retriever.get_relevant_documents("when australia passed aboriginal act?")


## Before adding documents

In [11]:
question = "when australia abortion law passed?"
answer = rag.run(question)
print("Answer:", answer)


Answer: No, it's not possible to tell.


## Add a new document

In [12]:
new_docs = [
    "In Victoria, abortion was decriminalized by the Abortion Law Reform Act 2008. This act also established guidelines for when abortions could take place. Abortion laws vary by state and territory in Australia, with some states requiring medical assessments and others allowing for broader access.",
]
vectorstore.add_texts(new_docs)

# Build the RetrievalQA chain
rag = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",          # “stuff” for simple prompt-plus-context
    retriever=vectorstore.as_retriever(),
)



## After adding new document

In [13]:
question = "when australia abortion law passed?"
answer = rag.run(question)
print("Answer:", answer)


Answer: In Victoria, abortion was decriminalized by the Abortion Law Reform Act 2008. This act also established guidelines for when abortions could take place. Abortion laws vary by state and territory in Australia, with some states requiring medical assessments and others allowing for broader access. abortion does not constitute an offence under a written law. (2) Subsection (1) applies whether the death occurs before, on or after the day on which the Abortion Legislation Reform Act 2023 section 20 comes into operation. [Section 3B inserted: No. 27 of 2020 s. 54.] Part 2 — Coroners and Coroner's court Division 1 — Coroner's court 5. Establishment of court (1) A court to be known as the Coroner's Court of Western Australia is established. (2) The court is to be constituted by a coroner and has exclusive jurisdiction to hold all inquests under this Act. (3) The court constituted by a coroner may sit and exercise the jurisdiction of a person who has selfadministered, or has been administ