# 🤖 Step 3: Load Mistral 7B on Apple Silicon and Define Reasoning Agents

This notebook does:
- Load Mistral 7B on Mac (CPU or Apple GPU)
- Define four agents:
  - Query Rewriter
  - Document Retriever
  - Fact Checker
  - Synthesizer
- Run a full agent chain manually

In [7]:
!pip uninstall bitsandbytes -y
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
!pip install transformers accelerate sentence-transformers

[0mLooking in indexes: https://download.pytorch.org/whl/cpu


In [8]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_name = "mistralai/Mistral-7B-Instruct-v0.1"

# Make sure you're logged in
tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"  # will use MPS or CPU on Mac
)

llm = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    return_full_text=False,
    temperature=0.7,
    repetition_penalty=1.1,
)

  from .autonotebook import tqdm as notebook_tqdm
Loading checkpoint shards: 100%|██████████| 2/2 [00:16<00:00,  8.39s/it]
Some parameters are on the meta device because they were offloaded to the disk.
Device set to use mps


In [9]:
def chat_with_mistral(prompt, system_prompt=None):
    if system_prompt:
        full_prompt = f"<s>[INST] {system_prompt} [/INST] {prompt.strip()}</s>"
    else:
        full_prompt = f"<s>[INST] {prompt.strip()} [/INST]"
        
    output = llm(full_prompt)[0]["generated_text"]
    return output.strip()

In [10]:
chat_with_mistral("What is the capital of France?")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


'The capital city of France is Paris.'

In [11]:
def query_rewriter_agent(user_question):
    prompt = (
        "Rewrite the user's question to make it clearer for a document retrieval system. "
        "Focus on making it specific and search-friendly.\n\n"
        f"User question: {user_question}"
    )
    return chat_with_mistral(prompt)

In [12]:
from sentence_transformers import SentenceTransformer
import faiss, pickle

In [13]:
from datasets import Dataset
import json
import os

# Load raw JSON
with open("../data/raw/hotpot_train_v1.1.json", "r") as f:
    raw_data = json.load(f)

# Flatten into list of {title, text}
flattened_docs = []
for entry in raw_data:
    for title, paragraphs in entry["context"]:
        for para in paragraphs:
            flattened_docs.append({"title": title, "text": para})

print(f"✅ Parsed {len(flattened_docs)} documents")

✅ Parsed 3703344 documents


In [14]:
# Convert to HF Dataset
dataset = Dataset.from_list(flattened_docs)

# Create save folder
os.makedirs("../data/hotpot_docs", exist_ok=True)

# Save to disk
dataset.save_to_disk("../data/hotpot_docs")
print("✅ Saved to ../data/hotpot_docs")

Saving the dataset (2/2 shards): 100%|██████████| 3703344/3703344 [00:01<00:00, 3087128.24 examples/s]

✅ Saved to ../data/hotpot_docs





In [15]:
from sentence_transformers import SentenceTransformer
from datasets import load_from_disk
import faiss
import pickle

# Load the FAISS index
index = faiss.read_index("../data/vector_index/faiss_hotpot_ip.index")

# Load the document ID mapping (list of indices)
with open("../data/vector_index/doc_lookup.pkl", "rb") as f:
    id_store = pickle.load(f)

# Load the full document text dataset
documents = load_from_disk("../data/hotpot_docs")

# Load the embedding model (same one used in Step 2)
model_emb = SentenceTransformer("BAAI/bge-large-en")

# Define the retrieval agent
def retrieval_agent(query, k=5):
    # Convert question into embedding vector
    q_vec = model_emb.encode(query, normalize_embeddings=True).astype("float32")
    
    # Search the top-k most similar docs
    D, I = index.search(q_vec.reshape(1, -1), k)
    
    # Return the top-k documents
    return [documents[int(idx)]["text"] for idx in I[0]]

In [16]:
def fact_checker_agent(original_question, retrieved_docs):
    joined = "\n\n".join(retrieved_docs)
    prompt = (
        f"Given the question:\n{original_question}\n\n"
        f"And the evidence:\n{joined}\n\n"
        "Decide whether the evidence supports the answer to the question. "
        "If yes, summarize the proof. If not, say it's insufficient."
    )
    return chat_with_mistral(prompt)

In [17]:
def synthesis_agent(question, retrieved_docs):
    joined = "\n\n".join(retrieved_docs)
    prompt = (
        f"Answer the following question using the information below.\n\n"
        f"Question: {question}\n\n"
        f"Evidence:\n{joined}\n\n"
        "Give a well-reasoned answer with traceable justification."
    )
    return chat_with_mistral(prompt)

In [18]:
user_question = "When did Isaac Newton publish Principia?"

# Step 1: Query Rewrite
print("🔁 Rewriting...")
rewritten = query_rewriter_agent(user_question)
print(rewritten)

# Step 2: Retrieve Docs
print("\n📚 Retrieving...")
docs = retrieval_agent(rewritten)
print(f"{len(docs)} documents retrieved")

# Step 3: Fact Check
print("\n✅ Fact Checking...")
fact_check = fact_checker_agent(user_question, docs)
print(fact_check)

# Step 4: Synthesis
print("\n🧠 Synthesizing Answer...")
final = synthesis_agent(user_question, docs)
print(final)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


🔁 Rewriting...
Could you please provide me with the precise date when Sir Isaac Newton published his work "Principia Mathematica"?

📚 Retrieving...


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


5 documents retrieved

✅ Fact Checking...


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


The given evidence does not support the question about when Isaac Newton published Principia as it only provides information about other individuals such as Hermann Minkowski and Edmund Halley. Therefore, this evidence is insufficient to provide an accurate answer to the question.

🧠 Synthesizing Answer...
The question asks when Isaac Newton published Principia. Unfortunately, there is no specific information provided in the evidence that can help us accurately determine the year of publication. The first edition of Principia Mathematica, which was edited by John Collins and published by George Bentley, was released on March 31, 1727. However, it is possible that this is not the edition or version that the original question refers to. Without more context or specific information about the edition or version in question, we cannot provide a definitive answer.


In [19]:
from langgraph.graph import StateGraph, END
from typing import TypedDict, List

class AgentState(TypedDict):
    user_question: str
    rewritten_question: str
    retrieved_docs: List[str]
    fact_check: str
    answer: str

In [20]:
def rewrite_node(state: AgentState) -> AgentState:
    rewritten = query_rewriter_agent(state["user_question"])
    return {**state, "rewritten_question": rewritten}

def retrieve_node(state: AgentState) -> AgentState:
    docs = retrieval_agent(state["rewritten_question"])
    return {**state, "retrieved_docs": docs}

def factcheck_node(state: AgentState) -> AgentState:
    check = fact_checker_agent(state["user_question"], state["retrieved_docs"])
    return {**state, "fact_check": check}

def synthesize_node(state: AgentState) -> AgentState:
    final = synthesis_agent(state["user_question"], state["retrieved_docs"])
    return {**state, "answer": final}

In [21]:
graph = StateGraph(AgentState)

graph.add_node("rewrite", rewrite_node)
graph.add_node("retrieve", retrieve_node)
graph.add_node("factcheck", factcheck_node)
graph.add_node("synthesize", synthesize_node)

graph.set_entry_point("rewrite")
graph.add_edge("rewrite", "retrieve")
graph.add_edge("retrieve", "factcheck")
graph.add_edge("factcheck", "synthesize")
graph.add_edge("synthesize", END)

rag_graph = graph.compile()

In [22]:
user_input = "When did Isaac Newton publish the Principia?"

initial_state = {
    "user_question": user_input
}

final_state = rag_graph.invoke(initial_state)

print("🔁 Rewritten Query:\n", final_state["rewritten_question"])
print("\n📚 Top Retrieved Doc:\n", final_state["retrieved_docs"][0][:500])
print("\n✅ Fact Check:\n", final_state["fact_check"])
print("\n🧠 Final Answer:\n", final_state["answer"])

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


🔁 Rewritten Query:
 Could you please provide me with the specific date when Sir Isaac Newton published his magnum opus, "Mathematical Principles of Natural Philosophy," commonly known as the "Principia"?

📚 Top Retrieved Doc:
 Aristotle ( ; Greek: Ἀριστοτέλης , , "Aristotélēs"; 384–322 BC) was an ancient Greek philosopher and scientist born in the city of Stagira, Chalkidice, on the northern periphery of Classical Greece.

✅ Fact Check:
 The evidence provided does not support the answer to the question. It includes information about various notable scientists and the founding of a scientific journal, but none of it directly relates to when Isaac Newton published the Principia.

🧠 Final Answer:
 It is not clear from the provided evidence when Isaac Newton published the Principia.
