# Tutorial: Building a RAG System with Groq and HuggingFace

This notebook demonstrates how to build a RAG pipeline using LangChain, **Groq**, and **HuggingFace**.

## 0. Setup and Environment Check

We define a helper function `check_env()` to ensure our API key is ready.

In [1]:
import os

from dotenv import load_dotenv
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_groq import ChatGroq
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.vectorstores import InMemoryVectorStore

load_dotenv()

def check_env():
    """Verify that required environment variables are set."""
    if not os.getenv("GROQ_API_KEY"):
        print("\n[!] ERROR: GROQ_API_KEY not found in .env or environment.")
        print("Please create a .env file with your key: GROQ_API_KEY=gsk_...\n")
        return False
    return True


  from pydantic.v1.fields import FieldInfo as FieldInfoV1
USER_AGENT environment variable not set, consider setting it to identify your requests.


## 1. Load Documents

We load content from the web.

In [2]:
def load_documents(urls: list[str]):
    """Load web pages and return a list of LangChain Document objects."""
    loader = WebBaseLoader(urls)
    docs = loader.load()
    print(f"[load] Loaded {len(docs)} document(s).")
    return docs

## 2. Split into Chunks

In [3]:
def split_documents(docs, chunk_size: int = 1000, chunk_overlap: int = 200):
    """Split documents into smaller chunks for embedding."""
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
    )
    splits = splitter.split_documents(docs)
    print(f"[split] Created {len(splits)} chunk(s).")
    return splits

## 3. Embed & Store (HuggingFace)

We use **HuggingFace** for free embeddings and an in-memory store.

In [4]:
def build_vectorstore(splits):
    """Embed chunks and store them in an in-memory vector store using HuggingFace."""
    # This uses a free embedding model from HuggingFace
    embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
    vectorstore = InMemoryVectorStore.from_documents(
        documents=splits,
        embedding=embeddings,
    )
    print("[store] Vector store built successfully (HuggingFace In-Memory).")
    return vectorstore


## 4. Build RAG Chain (Groq)

We use **ChatGroq** with the Llama-3.3 model.

In [5]:
def build_rag_chain(vectorstore):
    """
    Build a RAG chain that:
    - Retrieves the top-k most relevant chunks.
    - Formats them into a prompt.
    - Generates an answer with an OpenAI LLM.
    """
    retriever = vectorstore.as_retriever(search_kwargs={"k": 6})

    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                (
                    "You are an assistant for question-answering tasks. "
                    "Use the following pieces of retrieved context to answer "
                    "the question. If you don't know the answer, say that you "
                    "don't know. Use three sentences maximum and keep the "
                    "answer concise.\n\nContext:\n{context}"
                ),
            ),
            ("human", "{question}"),
        ]
    )

    llm = ChatGroq(model="llama-3.3-70b-versatile", temperature=0)

    def format_docs(docs):
        return "\n\n".join(doc.page_content for doc in docs)

    rag_chain = (
        RunnableParallel(
            context=retriever | format_docs,
            question=RunnablePassthrough(),
        )
        | prompt
        | llm
        | StrOutputParser()
    )

    return rag_chain


## 5. Usage

Run the cell below to ask your question.

In [6]:
def main():
    if not check_env():
        return

    # Source URLs â€“ feel free to replace with your own documents.
    urls = [
        "https://lilianweng.github.io/posts/2023-06-23-agent/",
    ]

    docs = load_documents(urls)
    splits = split_documents(docs)
    vectorstore = build_vectorstore(splits)
    rag_chain = build_rag_chain(vectorstore)

    print("\n=== RAG Question-Answering Session ===")
    print("Type 'exit' to quit.\n")

    while True:
        question = input("Your question: ").strip()
        if question.lower() in {"exit", "quit", "q"}:
            print("Goodbye!")
            break
        if not question:
            continue
        answer = rag_chain.invoke(question)
        print(f"\nAnswer: {answer}\n")


if __name__ == "__main__":
    main()


[load] Loaded 1 document(s).
[split] Created 66 chunk(s).
[store] Vector store built successfully (HuggingFace In-Memory).

=== RAG Question-Answering Session ===
Type 'exit' to quit.

Goodbye!
