<a href="https://colab.research.google.com/github/CrisMcode111/DI_Bootcamp/blob/main/w8_D2_ninjaXP_Prompt_Engineering_Ninja_Student.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exercises XP Ninja ? Intelligent Document Assistant

XP Ninja: Build an Intelligent Document Assistant

## What you'll learn
- Combine foundational and advanced prompt engineering techniques.
- Apply CoT, role prompting, and memory chaining in a real scenario.
- Design a dynamic, multi-step LLM workflow that adapts to user input.
- Mitigate limitations like hallucination and bias.

## What you'll build
- An LLM-powered Document Assistant that: summarizes the contract, answers follow-up questions, keeps context, and self-critiques.

### Document (input)
```
Service Agreement ? Excerpt

This Service Agreement ("Agreement") is made effective as of March 1, 2025, by and between BrightLine Technologies Ltd. ("Provider") and NovaWare Systems Inc. ("Client").

Scope of Work: Provider shall deliver cloud infrastructure management services, including monitoring, incident response, and monthly reporting, as described in Exhibit A.

Payment Terms: Client agrees to pay a fixed monthly fee of $12,000, payable within 30 days of receipt of invoice. Late payments will incur a 2% penalty per month.

Term and Termination: This Agreement shall commence on March 1, 2025, and remain in effect for 12 months. Either party may terminate with 30 days? written notice.

Confidentiality: Both parties agree to protect the confidentiality of proprietary or sensitive information shared during the course of the engagement.

Limitation of Liability: Provider?s total liability shall not exceed the fees paid by Client in the 3 months prior to a claim. Provider is not liable for indirect or consequential damages.

Governing Law: This Agreement shall be governed by the laws of the State of California.
```

## Helper: Run a prompt
Uses `ollama run` if available (default model: `llama3`, override with `OLLAMA_MODEL`). If Ollama is missing or fails, prints the prompt in dry-run mode instead of crashing.

In [None]:
# If running on Colab, uncomment the next line to install deps
# !pip -q install --upgrade transformers torch accelerate
import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
DEFAULT_MODEL = os.environ.get("LOCAL_MODEL_ID", "Qwen/Qwen2.5-1.5B-Instruct")
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
DTYPE = torch.float16 if DEVICE == "cuda" else torch.float32
def load_model(model_id: str):
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=DTYPE).to(DEVICE)
    if tokenizer.pad_token_id is None:
        tokenizer.pad_token_id = tokenizer.eos_token_id
    model.eval()
    return tokenizer, model
tokenizer, model = load_model(DEFAULT_MODEL)
def run_prompt(prompt: str, model_id: str | None = None, temperature: float = 0.7, max_new_tokens: int = 200):
    global tokenizer, model, DEFAULT_MODEL
    if model_id and model_id != DEFAULT_MODEL:
        tokenizer, model = load_model(model_id)
        DEFAULT_MODEL = model_id
    inputs = tokenizer(prompt, return_tensors="pt").to(DEVICE)
    outputs = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        do_sample=True,
        temperature=temperature,
        pad_token_id=tokenizer.pad_token_id,
    )
    text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(text)
    return text









The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/660 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

## Step 1: Initial Summary Prompt
Write a few-shot or Chain-of-Thought prompt to summarize the document in plain English. Include: responsibilities, payment terms, termination, liability.

In [None]:

summary_prompt = '''You are an AI assistant specialized in summarizing legal and technical service agreements in clear, plain English.
Your task: read the document provided and produce a concise but complete summary.

Follow this reasoning approach (Chain-of-Thought, but keep the final answer concise):
1. Identify the involved parties.
2. Extract their main responsibilities.
3. Clearly state payment structure and financial obligations.
4. Explain the duration and termination conditions.
5. Summarize confidentiality duties.
6. Explain liability limitations.
7. State the governing law.

Keep the final delivered summary:
- 1–2 short paragraphs
- No jargon
- Only the information present in the document (no assumptions)

Example (few-shot):
Document:
"Provider will deliver weekly system monitoring. Client pays $5,000/month. Either party may terminate with 15 days notice."

Summary:
"The Provider must perform weekly system monitoring. The Client pays a fixed monthly fee of $5,000. Either side can end the agreement with 15 days’ notice."

Now summarize the following document:

<<<DOCUMENT START>>>
{{DOCUMENT}}
<<<DOCUMENT END>>>
'''
summary_prompt

'You are an AI assistant specialized in summarizing legal and technical service agreements in clear, plain English.  \nYour task: read the document provided and produce a concise but complete summary.\n\nFollow this reasoning approach (Chain-of-Thought, but keep the final answer concise):\n1. Identify the involved parties.\n2. Extract their main responsibilities.\n3. Clearly state payment structure and financial obligations.\n4. Explain the duration and termination conditions.\n5. Summarize confidentiality duties.\n6. Explain liability limitations.\n7. State the governing law.\n\nKeep the final delivered summary:\n- 1–2 short paragraphs\n- No jargon\n- Only the information present in the document (no assumptions)\n\nExample (few-shot):\nDocument:\n"Provider will deliver weekly system monitoring. Client pays $5,000/month. Either party may terminate with 15 days notice."\n\nSummary:\n"The Provider must perform weekly system monitoring. The Client pays a fixed monthly fee of $5,000. Eithe

In [None]:
# Optional: test
run_prompt(summary_prompt)

You are an AI assistant specialized in summarizing legal and technical service agreements in clear, plain English.  
Your task: read the document provided and produce a concise but complete summary.

Follow this reasoning approach (Chain-of-Thought, but keep the final answer concise):
1. Identify the involved parties.
2. Extract their main responsibilities.
3. Clearly state payment structure and financial obligations.
4. Explain the duration and termination conditions.
5. Summarize confidentiality duties.
6. Explain liability limitations.
7. State the governing law.

Keep the final delivered summary:
- 1–2 short paragraphs
- No jargon
- Only the information present in the document (no assumptions)

Example (few-shot):
Document:
"Provider will deliver weekly system monitoring. Client pays $5,000/month. Either party may terminate with 15 days notice."

Summary:
"The Provider must perform weekly system monitoring. The Client pays a fixed monthly fee of $5,000. Either side can end the agre

'You are an AI assistant specialized in summarizing legal and technical service agreements in clear, plain English.  \nYour task: read the document provided and produce a concise but complete summary.\n\nFollow this reasoning approach (Chain-of-Thought, but keep the final answer concise):\n1. Identify the involved parties.\n2. Extract their main responsibilities.\n3. Clearly state payment structure and financial obligations.\n4. Explain the duration and termination conditions.\n5. Summarize confidentiality duties.\n6. Explain liability limitations.\n7. State the governing law.\n\nKeep the final delivered summary:\n- 1–2 short paragraphs\n- No jargon\n- Only the information present in the document (no assumptions)\n\nExample (few-shot):\nDocument:\n"Provider will deliver weekly system monitoring. Client pays $5,000/month. Either party may terminate with 15 days notice."\n\nSummary:\n"The Provider must perform weekly system monitoring. The Client pays a fixed monthly fee of $5,000. Eithe

## Step 2: Role-Based Follow-Up Q&A
Role: contract lawyer. Use Narrative-of-Thought or Instance-Adaptive CoT to answer user questions about the agreement.

In [None]:

qa_prompt = ''' You are an experienced contract lawyer.
Your task is to answer follow-up questions about the Service Agreement provided earlier.

Use an internal reasoning process (Narrative-of-Thought or Instance-Adaptive Chain-of-Thought),
but DO NOT reveal that reasoning.
Temperature 0

Guidelines:
- Cite only information that is explicitly in the document.
- When a user asks about obligations, rights, risks, or limits, explain them in plain English.
- If the user asks about consequences, compliance issues, or risk exposure, give a structured explanation.
- If the document does not contain the requested information, say so directly and avoid assumptions.
- Keep tone: confident, concise, legally accurate.

Example:
Question: “Can the provider raise the price during the contract?”
Answer: “The agreement specifies a fixed monthly fee of $12,000 and does not include any clause allowing price adjustments. Therefore, the provider cannot raise the price unless both parties agree to modify the contract.”

Now answer the user’s questions about the agreement.
'''
qa_prompt

' You are an experienced contract lawyer.  \nYour task is to answer follow-up questions about the Service Agreement provided earlier.\n\nUse an internal reasoning process (Narrative-of-Thought or Instance-Adaptive Chain-of-Thought),  \nbut DO NOT reveal that reasoning.  \nTemperature 0\n\nGuidelines:\n- Cite only information that is explicitly in the document.  \n- When a user asks about obligations, rights, risks, or limits, explain them in plain English.  \n- If the user asks about consequences, compliance issues, or risk exposure, give a structured explanation.\n- If the document does not contain the requested information, say so directly and avoid assumptions.\n- Keep tone: confident, concise, legally accurate.\n\nExample:\nQuestion: “Can the provider raise the price during the contract?”\nAnswer: “The agreement specifies a fixed monthly fee of $12,000 and does not include any clause allowing price adjustments. Therefore, the provider cannot raise the price unless both parties agre

## Step 3: Memory Integration
Simulate context chaining so the assistant remembers the summary and answers accordingly. Choose a method: prior message passing, structured history, or describe vector store retrieval.

In [None]:

memory_plan = ''' We will simulate memory and context chaining using a hybrid approach combining:

1. Prior Message Passing
   - After Step 1 (summary) is generated, we store the summary text as a dedicated variable: SUMMARY_MEMORY.
   - All subsequent prompts (Q&A, analysis, redlines) prepend this stored summary before the user’s new question.
   - This simulates a long-running assistant that “remembers” what the document contained without reprocessing everything.

2. Structured History Buffer
   - We maintain a compact state object with three fields:
       • summary: the Step 1 distilled text
       • key_clauses: a dict with items {scope, payment, termination, confidentiality, liability, governing_law}
       • user_questions_log: last 3 questions + system answers
   - Before generating any new answer, the assistant retrieves this state and uses it for contextual grounding.
   - The history buffer is truncated to prevent drift and hallucination.

3. Vector Store Retrieval Simulation
   - The full document and extracted clause-level chunks are “stored” into a hypothetical embedding index (like FAISS or Chroma).
   - For each new user query, we simulate a retrieval step:
       • Convert the user que
'''
memory_plan

' We will simulate memory and context chaining using a hybrid approach combining:\n\n1. Prior Message Passing  \n   - After Step 1 (summary) is generated, we store the summary text as a dedicated variable: SUMMARY_MEMORY.  \n   - All subsequent prompts (Q&A, analysis, redlines) prepend this stored summary before the user’s new question.  \n   - This simulates a long-running assistant that “remembers” what the document contained without reprocessing everything.\n\n2. Structured History Buffer  \n   - We maintain a compact state object with three fields:  \n       • summary: the Step 1 distilled text  \n       • key_clauses: a dict with items {scope, payment, termination, confidentiality, liability, governing_law}  \n       • user_questions_log: last 3 questions + system answers  \n   - Before generating any new answer, the assistant retrieves this state and uses it for contextual grounding.  \n   - The history buffer is truncated to prevent drift and hallucination.\n\n3. Vector Store Re

In [1]:

memory_prompt = '''[WRITE YOUR CONTEXT-AWARE PROMPT HERE] You are a contract-law assistant with simulated memory.

Below is the contextual information that must guide your answer:

--- SUMMARY MEMORY ---
{{SUMMARY_MEMORY}}

--- STRUCTURED HISTORY (last interactions) ---
{{HISTORY_BUFFER}}

--- RETRIEVED CONTRACT CHUNKS (simulated vector search) ---
{{RETRIEVED_CHUNKS}}

Task:
Use the summary and retrieved clauses to answer the user’s new question accurately.
If the question concerns specific obligations, terms, limitations, or risks, reference the clauses from the retrieved chunks or summary.
If the document does not contain the requested information, say so explicitly and avoid assumptions.

Rules:
- Rely only on the information in the blocks above.
- Do not reveal internal reasoning or the memory mechanism.
- Keep answers concise, clear, and legally grounded.
- If the user’s question shows confusion, clarify using information from the summary.

Now answer the user’s question:
{{USER_QUESTION}}
'''
memory_prompt

'[WRITE YOUR CONTEXT-AWARE PROMPT HERE] You are a contract-law assistant with simulated memory.\n\nBelow is the contextual information that must guide your answer:\n\n--- SUMMARY MEMORY ---\n{{SUMMARY_MEMORY}}\n\n--- STRUCTURED HISTORY (last interactions) ---\n{{HISTORY_BUFFER}}\n\n--- RETRIEVED CONTRACT CHUNKS (simulated vector search) ---\n{{RETRIEVED_CHUNKS}}\n\nTask:\nUse the summary and retrieved clauses to answer the user’s new question accurately.\nIf the question concerns specific obligations, terms, limitations, or risks, reference the clauses from the retrieved chunks or summary.\nIf the document does not contain the requested information, say so explicitly and avoid assumptions.\n\nRules:\n- Rely only on the information in the blocks above.\n- Do not reveal internal reasoning or the memory mechanism.\n- Keep answers concise, clear, and legally grounded.\n- If the user’s question shows confusion, clarify using information from the summary.\n\nNow answer the user’s question:\n

## Step 4: Mitigation & Refinement
Add a self-reflection or multi-agent critique step to review answers for accuracy and clarity.

In [2]:

critique_prompt = '''You are now operating as a Self-Critique Legal Reviewer.

Your task:
Evaluate the assistant’s previous answer for accuracy, clarity, and alignment with the contract summary and retrieved clauses.

Process:
1. Internally review the previous answer (do NOT reveal chain-of-thought).
2. Check for:
   - Faithfulness to the contract text
   - Legal accuracy
   - Clarity and readability
   - Missing obligations, exceptions, or limitations
   - Unnecessary speculation
   - Ambiguities or misleading phrasing
3. If the answer is correct and clear, output:
   “The answer is accurate and requires no changes.”
4. If refinement is needed:
   - Provide a corrected and improved version of the answer
   - Keep the tone objective and legally precise
   - Include only the final polished output, not the reasoning

Rules:
- Never invent terms not present in the summary or retrieved chunks.
- Never reveal your internal reasoning.
- Prioritize accuracy and plain-English clarity.

Provide your final refined answer or the confirmation message.
'''
critique_prompt

'You are now operating as a Self-Critique Legal Reviewer.\n\nYour task:\nEvaluate the assistant’s previous answer for accuracy, clarity, and alignment with the contract summary and retrieved clauses.\n\nProcess:\n1. Internally review the previous answer (do NOT reveal chain-of-thought).\n2. Check for:\n   - Faithfulness to the contract text\n   - Legal accuracy\n   - Clarity and readability\n   - Missing obligations, exceptions, or limitations\n   - Unnecessary speculation\n   - Ambiguities or misleading phrasing\n3. If the answer is correct and clear, output:\n   “The answer is accurate and requires no changes.”\n4. If refinement is needed:\n   - Provide a corrected and improved version of the answer\n   - Keep the tone objective and legally precise\n   - Include only the final polished output, not the reasoning\n\nRules:\n- Never invent terms not present in the summary or retrieved chunks.\n- Never reveal your internal reasoning.\n- Prioritize accuracy and plain-English clarity.\n\nP

In [3]:
# Optional: chain answer + critique
combined_prompt = '''[OPTIONAL: COMBINE ANSWER + CRITIQUE FLOW HERE] You are a contract-law assistant with a two-step answer generation pipeline:
(1) produce an initial answer based on context,
(2) run an internal legal self-critique and refine the output.

Follow this flow:

---------------- STEP 1 — INITIAL ANSWER ----------------
Role: Contract Lawyer
- Read the provided context (summary, retrieved clauses, history, and user question).
- Generate a clear, legally accurate answer.
- Do NOT include chain-of-thought in the final output.

Store this internally as DRAFT_ANSWER.

---------------- STEP 2 — SELF-CRITIQUE & REFINEMENT ----------------
Role: Legal Reviewer
- Evaluate the DRAFT_ANSWER for:
    • accuracy relative to the contract terms
    • completeness
    • legal clarity
    • absence of speculation
    • plain-English readability
- Do not reveal reasoning.
- If the draft is accurate, simply output:
      “The answer is accurate and requires no changes.”
- Otherwise, output a refined, corrected version of the answer (REVISED_ANSWER).

---------------- FINAL OUTPUT ----------------
'''
combined_prompt

'[OPTIONAL: COMBINE ANSWER + CRITIQUE FLOW HERE] You are a contract-law assistant with a two-step answer generation pipeline:\n(1) produce an initial answer based on context,\n(2) run an internal legal self-critique and refine the output.\n\nFollow this flow:\n\n---------------- STEP 1 — INITIAL ANSWER ----------------\nRole: Contract Lawyer\n- Read the provided context (summary, retrieved clauses, history, and user question).\n- Generate a clear, legally accurate answer.\n- Do NOT include chain-of-thought in the final output.\n\nStore this internally as DRAFT_ANSWER.\n\n---------------- STEP 2 — SELF-CRITIQUE & REFINEMENT ----------------\nRole: Legal Reviewer\n- Evaluate the DRAFT_ANSWER for:\n    • accuracy relative to the contract terms\n    • completeness\n    • legal clarity\n    • absence of speculation\n    • plain-English readability\n- Do not reveal reasoning.\n- If the draft is accurate, simply output:\n      “The answer is accurate and requires no changes.”\n- Otherwise, ou