1. Load your `.env` and Hugging Face token
2. Process your PDF
3. Generate embeddings and build a retriever
4. Query it using a remote Hugging Face model (via API)

### ‚úÖ Step 1: Load environment variables

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()
hf_token = os.getenv("HUGGINGFACEHUB_API_TOKEN")

if not hf_token:
    raise ValueError("‚ùå Missing Hugging Face API token in your .env file.")

### üìÑ Step 2: Load and split the PDF

In [2]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

pdf_path = "Tolkien-J.-The-lord-of-the-rings-HarperCollins-ebooks-2010.pdf"
loader = PyPDFLoader(pdf_path)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
splits = text_splitter.split_documents(docs)

### üß† Step 3: Create embeddings and FAISS retriever

In [3]:
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
db = FAISS.from_documents(splits, embedding_model)
retriever = db.as_retriever()

  embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


### ü§ñ Step 4: Set up Hugging Face LLM

In [4]:
from huggingface_hub import InferenceClient
from langchain_huggingface import HuggingFaceEndpoint

client = InferenceClient(model="mistralai/Mistral-7B-Instruct-v0.1", token=hf_token)

llm = HuggingFaceEndpoint(
    client=client,
    model="mistralai/Mistral-7B-Instruct-v0.1",
    temperature=0.7,
    max_new_tokens=256,
)

### ‚ú® Step 5: Add a custom prompt and setup RetrievalQA

In [5]:
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA

custom_prompt = PromptTemplate(
    input_variables=["context", "question"],
    template="""
Use the following context to answer the question. 
If you don't know the answer, just say you don't know ‚Äî do not make up an answer.

Context:
{context}

Question:
{question}

Answer:
"""
)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type="stuff",
    return_source_documents=True,
    chain_type_kwargs={"prompt": custom_prompt}
)

### ‚ùì Step 6: Ask a question

In [6]:
question = "What happened in the mines of Moria?"
result = qa_chain.invoke({"query": question})
print("üßô Gandalf says:\n", result['result'])



üßô Gandalf says:
 The mines of Moria were delved by the Dwarves for the metal mithril, which was becoming increasingly rare and difficult to find. The Hobbit, Gandalf, and their companions were on a quest to find this metal. They reached the upper reaches of the mines but were unable to see the morning because it was night. They continued their journey but were eventually plundered by goblins. Moria became a dark and empty place as its numbers dwindled and the life-span of the N√∫meno≈ôians waned. After many years, the Istari or Wizards appeared in Middle-earth, and some spoke of the mighty works of the past called Khazad-du'm. However, the mines of Moria remained a place of fear and emptiness, as the children of Durin had fled long ago.


In [7]:
import gradio as gr

def ask_gandalf(question):
    result = qa_chain.invoke({"query": question})
    return result["result"]

interface = gr.Interface(
    fn=ask_gandalf,
    inputs=gr.Textbox(lines=2, placeholder="Ask Gandalf a question..."),
    outputs="text",
    title="Gandalf Q&A",
    description="Ask questions about Lord of the Rings lore using your uploaded PDF."
)

interface.launch()


* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.




