# üóÇÔ∏è Multi-Modal Document Studio
**Week 3 Exercise ‚Äî End-to-End Open-Source Document Analysis**

Automates first-pass review of any document (contract, email, article, policy) using only open-source models ‚Äî no API keys required.

**Produces:**
- Token structure & chat template preview
- Structured LLM brief auto-adapted to document type
- Named entity extraction (people, orgs, locations)
- Per-clause risk scoring
- Overall tone / sentiment analysis

---
**Hardware:** Set `DEVICE = "mps"` for Apple Silicon (default) or `DEVICE = "cuda"` for NVIDIA GPU.

## Cell 1 ‚Äî Install Dependencies

In [1]:
# Run once ‚Äî restart kernel after installation
!uv pip install -q transformers torch accelerate pdfplumber gradio sentencepiece optimum-quanto huggingface_hub

print("‚úÖ All packages installed. Restart the kernel, then run from Cell 2 onwards.")

‚úÖ All packages installed. Restart the kernel, then run from Cell 2 onwards.


## Cell 2 ‚Äî HuggingFace Authentication

Models are downloaded from the [HuggingFace Hub](https://huggingface.co). You need a free account and an access token.

1. Go to https://huggingface.co/settings/tokens
2. Create a token with **Read** permissions
3. Set it as an environment variable before launching Jupyter: `export HF_TOKEN=hf_...`  
   ‚Äî or paste it directly into the cell below (avoid committing it to git)

In [2]:
import os
from huggingface_hub import login

# Option A ‚Äî read from environment variable (recommended)
hf_token = os.environ.get("HF_TOKEN", "")

# Option B ‚Äî paste token directly (remove before sharing this notebook)
# hf_token = "hf_YOUR_TOKEN_HERE"

if not hf_token:
    raise EnvironmentError(
        "HuggingFace token not found.\n"
        "Set it with: export HF_TOKEN=hf_... (in your terminal before launching Jupyter)\n"
        "Or paste it into the hf_token variable above."
    )

login(token=hf_token, add_to_git_credential=False)
print("‚úÖ Logged in to HuggingFace Hub.")

Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


‚úÖ Logged in to HuggingFace Hub.


## Cell 3 ‚Äî Constants & Device Detection

In [3]:
import torch

# ‚îÄ‚îÄ Device ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
if torch.backends.mps.is_available():
    DEVICE = "mps"
    QUANT_BACKEND = "optimum-quanto"   # MPS path
elif torch.cuda.is_available():
    DEVICE = "cuda"
    QUANT_BACKEND = "bitsandbytes"     # CUDA path
else:
    DEVICE = "cpu"
    QUANT_BACKEND = "none"

print(f"Device      : {DEVICE}")
print(f"Quant backend: {QUANT_BACKEND}")

# ‚îÄ‚îÄ Model IDs ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
NER_MODEL          = "dslim/bert-base-NER"           # Fast NER
ZERO_SHOT_MODEL    = "facebook/bart-large-mnli"      # Zero-shot classification
SENTIMENT_MODEL    = "distilbert-base-uncased-finetuned-sst-2-english"
LLM_MODEL          = "meta-llama/Llama-3.2-1B-Instruct"  # Lightweight LLM, runs on CPU/MPS

# ‚îÄ‚îÄ Risk labels for zero-shot ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
RISK_LABELS = [
    "low risk", "medium risk", "high risk",
    "financial obligation", "liability", "termination clause",
    "data privacy", "intellectual property", "indemnification"
]

print("\n‚úÖ Constants set.")

Device      : mps
Quant backend: optimum-quanto

‚úÖ Constants set.


## Cell 3 ‚Äî Utility: Text Extraction

In [4]:
import pdfplumber, pathlib

def extract_text(source) -> str:
    """Extract text from a PDF path, TXT path, or raw string."""
    if source is None:
        return ""
    if hasattr(source, "name"):          # Gradio UploadedFile
        source = source.name
    # Raw text strings (multi-line or long) are not valid file paths
    if isinstance(source, str) and ('\n' in source or len(source) > 260):
        return source
    p = pathlib.Path(str(source))
    try:
        if p.exists():
            if p.suffix.lower() == ".pdf":
                with pdfplumber.open(p) as pdf:
                    return "\n".join(page.extract_text() or "" for page in pdf.pages)
            else:
                return p.read_text(errors="ignore")
    except OSError:
        pass
    return str(source)

# Quick smoke-test
sample = "This Agreement is entered into between Acme Corp and John Smith on 1 Jan 2025."
# print(extract_text(sample)[:200])
print("‚úÖ Text extractor ready.")

‚úÖ Text extractor ready.


## Cell 4 ‚Äî Week 3 / Day 2: NER Pipeline

In [5]:
from transformers import pipeline, AutoTokenizer

print(f"Loading NER model onto {DEVICE} ‚Ä¶")
ner_pipeline = pipeline(
    "ner",
    model=NER_MODEL,
    aggregation_strategy="simple",
    device=0 if DEVICE == "cuda" else -1  # pipeline uses int device index
)
ner_tokenizer = AutoTokenizer.from_pretrained(NER_MODEL)

def run_ner(text: str) -> str:
    """Return a formatted string of named entities."""
    if not text.strip():
        return "No text provided."
    chunk_size = 400
    words = text.split()
    chunks = [" ".join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]
    all_entities = []
    for chunk in chunks:
        ids = ner_tokenizer.encode(chunk, truncation=True, max_length=512, add_special_tokens=False)
        safe_chunk = ner_tokenizer.decode(ids)
        all_entities.extend(ner_pipeline(safe_chunk))
    
    if not all_entities:
        return "No named entities found."
    
    lines = [f"  [{e['entity_group']}] {e['word']}  (score: {e['score']:.2f})" for e in all_entities]
    return "\n".join(lines)

# Smoke-test
# print(run_ner(sample))
print("\n‚úÖ NER pipeline ready.")

Loading NER model onto mps ‚Ä¶


Some weights of the model checkpoint at dslim/bert-base-NER were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu



‚úÖ NER pipeline ready.


## Cell 5 ‚Äî Week 3 / Day 2: Sentiment Pipeline

In [6]:
print(f"Loading sentiment model ‚Ä¶")
sentiment_pipeline = pipeline(
    "sentiment-analysis",
    model=SENTIMENT_MODEL,
    device=0 if DEVICE == "cuda" else -1
)

def run_sentiment(text: str) -> str:
    """Return overall document sentiment."""
    if not text.strip():
        return "No text provided."
    # Sentiment models also cap at 512 tokens ‚Äî use first 400 words as proxy
    snippet = " ".join(text.split()[:400])
    result = sentiment_pipeline(snippet, truncation=True, max_length=512)[0]
    emoji = "üü¢" if result["label"] == "POSITIVE" else "üî¥"
    return f"{emoji} {result['label']}  (confidence: {result['score']:.2f})"

# print(run_sentiment(sample))
print("‚úÖ Sentiment pipeline ready.")

Loading sentiment model ‚Ä¶


Device set to use cpu


‚úÖ Sentiment pipeline ready.


## Cell 6 ‚Äî Week 3 / Day 2: Zero-Shot Risk Scorer

In [7]:
import textwrap

print("Loading zero-shot classification model ‚Ä¶")
zsc_pipeline = pipeline(
    "zero-shot-classification",
    model=ZERO_SHOT_MODEL,
    device=0 if DEVICE == "cuda" else -1
)

def split_into_clauses(text: str, max_words: int = 80) -> list[str]:
    """Split text into sentence-level chunks suitable for per-clause scoring."""
    import re
    sentences = re.split(r'(?<=[.!?])\s+', text.strip())
    clauses, current = [], []
    for sent in sentences:
        current.append(sent)
        if len(" ".join(current).split()) >= max_words:
            clauses.append(" ".join(current))
            current = []
    if current:
        clauses.append(" ".join(current))
    return clauses[:10]  # Cap at 10 clauses for speed

def run_risk_scoring(text: str) -> str:
    """Score each clause and return a formatted risk report."""
    if not text.strip():
        return "No text provided."
    clauses = split_into_clauses(text)
    lines = []
    for i, clause in enumerate(clauses, 1):
        result = zsc_pipeline(clause, candidate_labels=RISK_LABELS, truncation=True, max_length=512)
        top_label = result["labels"][0]
        top_score = result["scores"][0]
        risk_icon = "üî¥" if "high" in top_label or top_label in ("liability","indemnification") else \
                    "üü°" if "medium" in top_label or top_label in ("termination clause","financial obligation") else "üü¢"
        snippet = textwrap.shorten(clause, width=90, placeholder="‚Ä¶")
        lines.append(f"Clause {i:02d}: {risk_icon} {top_label} ({top_score:.2f})\n          \"{snippet}\"")
    return "\n\n".join(lines)

# Smoke-test on a two-sentence doc
test_doc = (
    "The Licensor shall not be liable for any indirect damages arising from the use of this software. "
    "Either party may terminate this agreement with 30 days written notice."
)
# print(run_risk_scoring(test_doc))
print("\n‚úÖ Risk scorer ready.")

Loading zero-shot classification model ‚Ä¶


Device set to use cpu



‚úÖ Risk scorer ready.


## Cell 7 ‚Äî Week 3 / Day 3: Tokenizer + Chat Template Preview

In [8]:
from transformers import AutoTokenizer

print("Loading tokenizer ‚Ä¶")
tokenizer = AutoTokenizer.from_pretrained(LLM_MODEL)

def run_token_preview(text: str) -> str:
    """Show token count and the chat-template-formatted prompt."""
    if not text.strip():
        return "No text provided."
    
    # Raw tokenization
    tokens = tokenizer.encode(text)
    token_count = len(tokens)
    first_10 = tokens[:10]
    
    # Chat template ‚Äî the Day 3 "aha moment"
    messages = [
        {"role": "system", "content": "You are a professional document analyst."},
        {"role": "user",   "content": f"Briefly summarise this document:\n\n{' '.join(text.split()[:300])}"}
    ]
    # chat_prompt = tokenizer.apply_chat_template(
    #     messages, tokenize=False, add_generation_prompt=True
    # )
    chat_tokens = tokenizer.apply_chat_template(
        messages, tokenize=True, add_generation_prompt=True
    )
    
    return (
        f"üìä Token Stats\n"
        f"  Raw token count    : {token_count}\n"
        f"  First 10 token IDs : {first_10}\n"
        f"  Chat prompt tokens : {len(chat_tokens)}\n\n"
        # f"üìù Chat-Template Prompt (first 600 chars):\n"
        # f"{'-'*60}\n"
        # f"{chat_prompt[:600]}"
    )

print(run_token_preview(sample))
print("\n‚úÖ Tokenizer ready.")

Loading tokenizer ‚Ä¶
üìä Token Stats
  Raw token count    : 21
  First 10 token IDs : [128000, 2028, 23314, 374, 10862, 1139, 1990, 6515, 2727, 22621]
  Chat prompt tokens : 69



‚úÖ Tokenizer ready.


## Cell 8 ‚Äî Week 3 / Days 4‚Äì5: LLM Brief Generator (Streaming)

> **Note:** First run will download ~600 MB for TinyLlama. Subsequent runs use the local cache.

In [9]:
import gc, threading
from transformers import AutoModelForCausalLM, TextIteratorStreamer

print(f"Loading LLM on {DEVICE} ‚Ä¶")

# Quantization ‚Äî MPS path uses optimum-quanto; CUDA path uses bitsandbytes
load_kwargs = dict(device_map="auto" if DEVICE == "cuda" else None)

if QUANT_BACKEND == "bitsandbytes":
    from transformers import BitsAndBytesConfig
    load_kwargs["quantization_config"] = BitsAndBytesConfig(load_in_4bit=True)
elif QUANT_BACKEND == "optimum-quanto":
    from optimum.quanto import quantize, qint8
    # We quantize after loading for MPS
    pass

llm_model = AutoModelForCausalLM.from_pretrained(LLM_MODEL, **load_kwargs)

if QUANT_BACKEND == "optimum-quanto":
    quantize(llm_model, weights=qint8)

if DEVICE == "mps":
    llm_model = llm_model.to("mps")

llm_model.eval()
print("‚úÖ LLM loaded.")

# ‚îÄ‚îÄ Document-type detection ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
def detect_doc_type(text: str) -> str:
    text_lower = text.lower()
    if any(w in text_lower for w in ["whereas", "licensor", "indemnif", "herein", "party"]):
        return "legal contract"
    elif any(w in text_lower for w in ["dear", "regards", "sincerely", "subject:"]):
        return "email"
    elif any(w in text_lower for w in ["privacy", "data controller", "gdpr", "personal data"]):
        return "privacy policy"
    elif any(w in text_lower for w in ["abstract", "methodology", "conclusion", "references"]):
        return "research article"
    else:
        return "general document"

def build_brief_prompt(text: str) -> list[dict]:
    doc_type = detect_doc_type(text)
    instruction = (
        f"You are a professional document analyst. The following is a {doc_type}.\n"
        "Provide a structured brief with:\n"
        "1. One-sentence summary\n"
        "2. Key parties or stakeholders\n"
        "3. Main obligations or key points (up to 5 bullet points)\n"
        "4. Notable risks or red flags\n"
        "5. Recommended next action\n"
        "Be concise. Use bullet points."
    )
    return [
        {"role": "system", "content": instruction},
        {"role": "user",   "content": text[:1500]}  # cap context for speed
    ]

def run_llm_brief(text: str, max_new_tokens: int = 400):
    """Generate a structured LLM brief. Yields accumulated text as each token arrives."""
    if not text.strip():
        yield "No text provided."
        return

    messages = build_brief_prompt(text)
    input_ids = tokenizer.apply_chat_template(
        messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
    ).to(DEVICE)
    attention_mask = torch.ones_like(input_ids).to(DEVICE)

    streamer = TextIteratorStreamer(
        tokenizer, skip_prompt=True, skip_special_tokens=True
    )

    gen_kwargs = dict(
        input_ids=input_ids,
        max_new_tokens=max_new_tokens,
        do_sample=False,
        streamer=streamer,
        attention_mask=attention_mask,
        pad_token_id=tokenizer.eos_token_id,
    )

    thread = threading.Thread(target=llm_model.generate, kwargs=gen_kwargs)
    thread.start()

    accumulated = ""
    for token_text in streamer:
        accumulated += token_text
        yield accumulated

    thread.join()

    del input_ids
    gc.collect()
    if DEVICE == "cuda":
        torch.cuda.empty_cache()
    elif DEVICE == "mps":
        torch.mps.empty_cache()

# Smoke-test
print("=== LLM Brief (streaming) ===")
# _ = run_llm_brief(test_doc)

Loading LLM on mps ‚Ä¶




‚úÖ LLM loaded.
=== LLM Brief (streaming) ===


## Cell 9 ‚Äî Full Analysis Pipeline (no UI)

In [10]:
def analyse_document(source, verbose: bool = True) -> dict:
    """
    Run the full 5-stage analysis on any document source.
    source: file path (PDF/TXT) or raw text string.
    Returns a dict with keys: text, token_preview, entities, risk_scores, sentiment, llm_brief
    """
    sep = "=" * 70
    
    text = extract_text(source)
    if not text.strip():
        print("‚ö†Ô∏è  No text found in document.")
        return {}
    
    results = {"text": text}
    
    # Stage 1 ‚Äî Token preview (Day 3)
    print(f"{sep}\n[1/5] TOKENIZER PREVIEW\n{sep}")
    results["token_preview"] = run_token_preview(text)
    if verbose: print(results["token_preview"])
    
    # Stage 2 ‚Äî Named entities (Day 2)
    print(f"\n{sep}\n[2/5] NAMED ENTITY EXTRACTION\n{sep}")
    results["entities"] = run_ner(text)
    if verbose: print(results["entities"])
    
    # Stage 3 ‚Äî Risk scoring (Day 2)
    print(f"\n{sep}\n[3/5] PER-CLAUSE RISK SCORING\n{sep}")
    results["risk_scores"] = run_risk_scoring(text)
    if verbose: print(results["risk_scores"])
    
    # Stage 4 ‚Äî Sentiment (Day 2)
    print(f"\n{sep}\n[4/5] OVERALL SENTIMENT\n{sep}")
    results["sentiment"] = run_sentiment(text)
    if verbose: print(results["sentiment"])
    
    # Stage 5 ‚Äî LLM brief (Days 4-5)
    print(f"\n{sep}\n[5/5] LLM DOCUMENT BRIEF (streaming)\n{sep}")
    results["llm_brief"] = "".join(run_llm_brief(text))
    
    print(f"\n{sep}\n‚úÖ Analysis complete.\n{sep}")
    return results


# ‚îÄ‚îÄ Run on a sample contract excerpt ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
CONTRACT_SAMPLE = """
SERVICE AGREEMENT

This Agreement is entered into as of January 1, 2025 between Acme Corporation
("Client") and TechSolutions Ltd ("Service Provider").

1. SERVICES. Service Provider agrees to develop and deliver a custom analytics
dashboard by March 31, 2025. Deliverables are specified in Exhibit A.

2. PAYMENT. Client shall pay $50,000 USD within 30 days of invoice. Late payments
shall incur a penalty of 1.5% per month.

3. TERMINATION. Either party may terminate this Agreement with 30 days written
notice. Client may terminate immediately for material breach.

4. LIMITATION OF LIABILITY. In no event shall either party be liable for indirect,
incidental, or consequential damages.

5. GOVERNING LAW. This Agreement shall be governed by the laws of the State of
California. Any disputes shall be resolved by arbitration in San Francisco.
"""

print("Document analysis ready")

# results = analyse_document(CONTRACT_SAMPLE)

Document analysis ready


## Cell 10 ‚Äî Gradio UI (Optional)

Launches a single-page web UI with file upload and streaming LLM output.

> Run this cell to start the app. A local URL (e.g. `http://127.0.0.1:7860`) will appear below.

In [None]:
import gradio as gr

def gradio_analyse(file_obj, raw_text: str):
    """Gradio handler ‚Äî streams results stage-by-stage as each completes."""
    source = file_obj if file_obj is not None else raw_text
    if not source:
        yield "", "Please upload a file or paste text.", "", "", "", ""
        return

    text = extract_text(source)
    yield text, "[1/5] Running token preview‚Ä¶", "", "", "", ""

    tok = run_token_preview(text)
    yield text, tok, "[2/5] Running NER‚Ä¶", "", "", ""

    ents = run_ner(text)
    yield text, tok, ents, "[3/5] Running risk scoring‚Ä¶", "", ""

    risk = run_risk_scoring(text)
    yield text, tok, ents, risk, "[4/5] Running sentiment‚Ä¶", ""

    sent = run_sentiment(text)
    yield text, tok, ents, risk, sent, "[5/5] Generating LLM brief‚Ä¶"

    for partial_brief in run_llm_brief(text):
        yield text, tok, ents, risk, sent, partial_brief

with gr.Blocks(title="Multi-Modal Document Studio") as demo:
    gr.Markdown("# üóÇÔ∏è Multi-Modal Document Studio\nUpload a PDF/TXT or paste text below.")
    
    with gr.Row():
        file_input = gr.File(label="Upload PDF or TXT", file_types=[".pdf", ".txt"])
        text_input = gr.Textbox(label="Or paste text here", lines=8, placeholder="Paste document text‚Ä¶")
    
    run_btn = gr.Button("üîç Analyse Document", variant="primary")
    
    with gr.Tabs():
        with gr.Tab("üìä Token Preview"):  tok_out  = gr.Textbox(lines=15, show_label=False)
        with gr.Tab("üè∑Ô∏è Named Entities"): ent_out  = gr.Markdown()
        with gr.Tab("‚ö†Ô∏è Risk Scores"):    risk_out = gr.Markdown()
        with gr.Tab("üòê Sentiment"):      sent_out = gr.Markdown()
        with gr.Tab("ü§ñ LLM Brief"):      llm_out  = gr.Markdown()
    
    run_btn.click(
        gradio_analyse,
        inputs=[file_input, text_input],
        outputs=[text_input, tok_out, ent_out, risk_out, sent_out, llm_out]
    )

demo.launch(inbrowser=True)

* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


---

## Concepts Demonstrated

| Cell | Week 3 Concept | What it shows |
|------|---------------|---------------|
| 4 | Day 2 ‚Äî `pipeline('ner')` | Extract parties, dates, money |
| 5 | Day 2 ‚Äî `pipeline('sentiment-analysis')` | Overall document tone |
| 6 | Day 2 ‚Äî `pipeline('zero-shot-classification')` | Per-clause risk without labelled data |
| 7 | Day 3 ‚Äî `AutoTokenizer` + `apply_chat_template` | Token IDs & prompt format |
| 8 | Day 4 ‚Äî `AutoModelForCausalLM` + `TextIteratorStreamer` + quantization | Local LLM + streaming |
| 8 | Day 4 ‚Äî `gc.collect()` + `empty_cache()` | MPS/CUDA memory management |
| 9 | Day 5 ‚Äî End-to-end chaining | All components wired together |
| 10 | ‚Äî | Gradio UI with file upload |