# 🤖 Step 3: Load Mistral 7B on Apple Silicon and Define Reasoning Agents

This notebook does:
- Load Mistral 7B on Mac (CPU or Apple GPU)
- Define four agents:
  - Query Rewriter
  - Document Retriever
  - Fact Checker
  - Synthesizer
- Run a full agent chain manually

In [26]:
!pip uninstall bitsandbytes -y
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
!pip install transformers accelerate sentence-transformers

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[0m

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Looking in indexes: https://download.pytorch.org/whl/cpu


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




In [27]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_name = "mistralai/Mistral-7B-Instruct-v0.1"

# Make sure you're logged in
tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"  # will use MPS or CPU on Mac
)

llm = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    return_full_text=False,
    temperature=0.7,
    repetition_penalty=1.1,
)

Loading checkpoint shards: 100%|██████████| 2/2 [00:14<00:00,  7.33s/it]
Some parameters are on the meta device because they were offloaded to the disk.
Device set to use mps


In [28]:
def chat_with_mistral(prompt, system_prompt=None):
    if system_prompt:
        full_prompt = f"<s>[INST] {system_prompt} [/INST] {prompt.strip()}</s>"
    else:
        full_prompt = f"<s>[INST] {prompt.strip()} [/INST]"
        
    output = llm(full_prompt)[0]["generated_text"]
    return output.strip()

In [29]:
chat_with_mistral("What is the capital of France?")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


'The capital city of France is Paris.'

In [30]:
def query_rewriter_agent(user_question):
    prompt = (
        "Rewrite the user's question to make it clearer for a document retrieval system. "
        "Focus on making it specific and search-friendly.\n\n"
        f"User question: {user_question}"
    )
    return chat_with_mistral(prompt)

In [31]:
from sentence_transformers import SentenceTransformer
import faiss, pickle

In [32]:
from datasets import Dataset
import json
import os

# Load raw JSON
with open("../data/raw/hotpot_train_v1.1.json", "r") as f:
    raw_data = json.load(f)

# Flatten into list of {title, text}
flattened_docs = []
for entry in raw_data:
    for title, paragraphs in entry["context"]:
        for para in paragraphs:
            flattened_docs.append({"title": title, "text": para})

print(f"✅ Parsed {len(flattened_docs)} documents")

✅ Parsed 3703344 documents


In [33]:
# Convert to HF Dataset
dataset = Dataset.from_list(flattened_docs)

# Create save folder
os.makedirs("../data/hotpot_docs", exist_ok=True)

# Save to disk
dataset.save_to_disk("../data/hotpot_docs")
print("✅ Saved to ../data/hotpot_docs")

Saving the dataset (2/2 shards): 100%|██████████| 3703344/3703344 [00:04<00:00, 742515.18 examples/s] 

✅ Saved to ../data/hotpot_docs





In [34]:
from sentence_transformers import SentenceTransformer
from datasets import load_from_disk
import faiss
import pickle

# Load the FAISS index
index = faiss.read_index("../data/vector_index/faiss_hotpot_ip.index")

# Load the document ID mapping (list of indices)
with open("../data/vector_index/doc_lookup.pkl", "rb") as f:
    id_store = pickle.load(f)

# Load the full document text dataset
documents = load_from_disk("../data/hotpot_docs")

# Load the embedding model (same one used in Step 2)
model_emb = SentenceTransformer("BAAI/bge-large-en")

# Define the retrieval agent
def retrieval_agent(query, k=5):
    # Convert question into embedding vector
    q_vec = model_emb.encode(query, normalize_embeddings=True).astype("float32")
    
    # Search the top-k most similar docs
    D, I = index.search(q_vec.reshape(1, -1), k)
    
    # Return the top-k documents
    return [documents[int(idx)]["text"] for idx in I[0]]

In [35]:
def fact_checker_agent(original_question, retrieved_docs):
    joined = "\n\n".join(retrieved_docs)
    prompt = (
        f"Given the question:\n{original_question}\n\n"
        f"And the evidence:\n{joined}\n\n"
        "Decide whether the evidence supports the answer to the question. "
        "If yes, summarize the proof. If not, say it's insufficient."
    )
    return chat_with_mistral(prompt)

In [36]:
def synthesis_agent(question, retrieved_docs):
    joined = "\n\n".join(retrieved_docs)
    prompt = (
        f"Answer the following question using the information below.\n\n"
        f"Question: {question}\n\n"
        f"Evidence:\n{joined}\n\n"
        "Give a well-reasoned answer with traceable justification."
    )
    return chat_with_mistral(prompt)

In [37]:
user_question = "When did Isaac Newton publish Principia?"

# Step 1: Query Rewrite
print("🔁 Rewriting...")
rewritten = query_rewriter_agent(user_question)
print(rewritten)

# Step 2: Retrieve Docs
print("\n📚 Retrieving...")
docs = retrieval_agent(rewritten)
print(f"{len(docs)} documents retrieved")

# Step 3: Fact Check
print("\n✅ Fact Checking...")
fact_check = fact_checker_agent(user_question, docs)
print(fact_check)

# Step 4: Synthesis
print("\n🧠 Synthesizing Answer...")
final = synthesis_agent(user_question, docs)
print(final)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


🔁 Rewriting...
Can you provide me with the exact date when Sir Isaac Newton published his work "Principia Mathematica"?

📚 Retrieving...


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


5 documents retrieved

✅ Fact Checking...


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


The evidence provided does not support the answer to the question. There are two ISBNs provided, which correspond to different editions of "Principia" with different publication dates. One edition was published in 1727, and the other in 1740. However, the question asks specifically about the publication date of the original edition, which is not mentioned anywhere in the evidence. Therefore, it is insufficient to determine the publication date of the original edition based on the information provided.

🧠 Synthesizing Answer...
Isaac Newton published his work "Principia Mathematica" in three volumes in 1687. The evidence provided includes the ISBN numbers for these publications: ISBN 9781437788118 (for the first volume), ISBN 9781437788126 (for the second volume), and ISBN 9781437788133 (for the third volume).
