# Agent CFO — Performance Optimization & Design

---
This is the starter notebook for your project. Follow the required structure below.


You will design and optimize an Agent CFO assistant for a listed company. The assistant should answer finance/operations questions using RAG (Retrieval-Augmented Generation) + agentic reasoning, with response time (latency) as the primary metric.

Your system must:
*   Ingest the company’s public filings.
*   Retrieve relevant passages efficiently.
*   Compute ratios/trends via tool calls (calculator, table parsing).
*   Produce answers with valid citations to the correct page/table.


In [1]:
import os
os.environ["GEMINI_API_KEY"] = "AIzaSyBKaJ1EXo5qvIcLVjbWaSQeT_pL5VA6XhU"  # replace with your key

## 1. Config & Secrets

Fill in your API keys in secrets. **Do not hardcode keys** in cells.

In [2]:
import os

# Example:
# os.environ['GEMINI_API_KEY'] = 'your-key-here'
# os.environ['OPENAI_API_KEY'] = 'your-key-here'

COMPANY_NAME = "DBS Bank"


## 2. Data Download (Dropbox)

*   Annual Reports: last 3–5 years.
*   Quarterly Results Packs & MD&A (Management Discussion & Analysis).
*   Investor Presentations and Press Releases.
*   These files must be submitted later as a deliverable in the Dropbox data pack.
*   Upload them under `/content/data/`.

Scope limit: each team will ingest minimally 15 PDF files total.


## 3. System Requirements

**Retrieval & RAG**
*   Use a vector index (e.g., FAISS, LlamaIndex) + a keyword filter (BM25/ElasticSearch).
*   Citations must include: report name, year, page number, section/table.

**Agentic Reasoning**
*   Support at least 3 tool types: calculator, table extraction, multi-document compare.
*   Reasoning must follow a plan-then-act pattern (not a single unstructured call).

**Instrumentation**
*   Log timings for: T_ingest, T_retrieve, T_rerank, T_reason, T_generate, T_total.
*   Log: tokens used, cache hits, tools invoked.
*   Record p50/p95 latencies.

In [None]:
# === STAGE 1: INGESTION & INDEXING (Vector + BM25) ===
# Builds TF-IDF (vector) + BM25 (keyword) indices, saves artifacts, exposes hybrid_search().
# Logs T_ingest and T_retrieve; saves manifest and index files under /mnt/data/agent_cfo_index.

import os, re, time, json, math, pickle, sys, subprocess
from dataclasses import dataclass
from typing import Dict, Any, List
from contextlib import contextmanager
from collections import Counter
import numpy as np
import pandas as pd

# ---------- ensure a PDF extractor ----------
def _ensure_pypdf():
    try:
        import pypdf  # noqa
        return True
    except Exception:
        try:
            print("[SETUP] Installing pypdf ...")
            subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "pypdf"])
            import pypdf  # noqa
            print("[SETUP] pypdf installed.")
            return True
        except Exception as e:
            print("[WARN] Failed to install pypdf:", e)
            return False
HAVE_PYPDF = _ensure_pypdf()

# ---------- instrumentation ----------
class Instrumentor:
    def __init__(self): self.logs: List[Dict[str, Any]] = []
    def log(self, row: Dict[str, Any]): self.logs.append(row)
    def df(self) -> pd.DataFrame: return pd.DataFrame(self.logs)
    def latency_stats(self) -> Dict[str, float]:
        df = self.df()
        if df.empty or "T_total" not in df.columns: return {}
        return {"p50_total": round(float(df["T_total"].median()),3),
                "p95_total": round(float(df["T_total"].quantile(0.95)),3)}
instr = Instrumentor()
@contextmanager
def timeblock(payload: Dict[str, Any], key: str):
    t0 = time.time()
    try: yield
    finally: payload[key] = payload.get(key, 0.0) + (time.time() - t0)

# ---------- config & scan ----------
# DATA_DIRS = ["./", "/content", "/mnt/data"]   # adjust if needed
DATA_DIRS = ["./All"]   # adjust if needed
# ART_DIR   = "/mnt/data/agent_cfo_index"  # adjust if needed
ART_DIR   = "./agent_cfo_index"
os.makedirs(ART_DIR, exist_ok=True)

def scan_files() -> pd.DataFrame:
    found = []
    for d in DATA_DIRS:
        if os.path.isdir(d):
            for root, _, files in os.walk(d):
                for fn in files:
                    if fn.lower().endswith((".pdf", ".xls", ".xlsx")) and "agent_cfo_index" not in root:
                        found.append(os.path.join(root, fn))
    df = pd.DataFrame({
        "path": found,
        "file": [os.path.basename(p) for p in found],
        "ext": [os.path.splitext(p)[1].lower() for p in found],
    }).sort_values(["file","path"]).reset_index(drop=True)
    # de-duplicate by filename preferring the first occurrence
    df = df.drop_duplicates(subset=["file"], keep="first").reset_index(drop=True)
    return df

manifest = scan_files()
print(f"[MANIFEST] {len(manifest)} unique files")
try: display(manifest)
except NameError: print(manifest.head(20))
manifest.to_csv(os.path.join(ART_DIR, "manifest.csv"), index=False)

if len(manifest)==0:
    raise SystemExit("[STOP] No files found. Put PDFs under ./ or /content or /mnt/data then re-run.")

# ---------- PDF extract ----------
def extract_pages_with_pypdf(path: str) -> List[str]:
    if not HAVE_PYPDF: return []
    try:
        import pypdf
        reader = pypdf.PdfReader(path)
        pages = []
        for i in range(len(reader.pages)):
            try: pages.append(reader.pages[i].extract_text() or "")
            except Exception: pages.append("")
        return pages
    except Exception as e:
        print(f"[WARN] Cannot parse PDF: {os.path.basename(path)} ({e})")
        return []

# ---------- chunking ----------
@dataclass
class Chunk:
    doc_id: str
    file: str
    path: str
    year_qtr: str
    page: int
    section_hint: str
    text: str

def infer_year_qtr(filename: str) -> str:
    m = re.search(r'([1-4]Q)\s*(\d{2})', filename.upper())
    if m: return f"{m.group(1)}{m.group(2)}"
    m = re.search(r'(20\d{2})', filename)
    return m.group(1) if m else ""

SECTION_HOOKS = [
    r"key ratios|highlights|summary",
    r"net interest margin|nim\b",
    r"cost[- ]?to[- ]?income|cti|efficiency ratio",
    r"operating expenses|opex|expenses",
    r"income statement|statement of (comprehensive )?income",
    r"balance sheet|statement of financial position",
    r"management discussion|md&a"
]
def guess_section(text: str) -> str:
    txt = (text or "").lower()
    for pat in SECTION_HOOKS:
        if re.search(pat, txt): return pat
    return ""
def clean_text(t: str) -> str:
    t = t.replace("\x00"," ")
    t = re.sub(r'[ \t]+',' ', t)
    t = re.sub(r'\n{2,}','\n', t)
    return t.strip()

# Ingest
row = {"Query":"[ingest]","Tools":[],"CacheHits":0,"Tokens":0}
with timeblock(row, "T_total"):
    with timeblock(row, "T_ingest"):
        chunks: List[Chunk] = []
        for _, r in manifest.iterrows():
            if r["ext"] != ".pdf": 
                continue   # (XLS handled later by table tools)
            pages = extract_pages_with_pypdf(r["path"])
            yq = infer_year_qtr(r["file"])
            for i, raw in enumerate(pages):
                text = clean_text(raw or "")
                chunks.append(Chunk(
                    doc_id=f"{r['file']}#p{i+1}",
                    file=r["file"], path=r["path"], year_qtr=yq,
                    page=i+1, section_hint=guess_section(text), text=text[:20000]
                ))
instr.log(row)
print(f"[INGEST] Built {len(chunks)} page-chunks")
if len(chunks)==0:
    raise SystemExit("[STOP] PDFs found but no text extracted (likely scanned). Add OCR or upload text PDFs.")

# ---------- indices (BM25 + TF-IDF) ----------
TOKEN_PAT = re.compile(r"[A-Za-z0-9_.%\-]+")
def tokenize(s: str) -> List[str]: return [w.lower() for w in TOKEN_PAT.findall(s or "")]
tokenized = [tokenize(c.text) for c in chunks]

class BM25:
    def __init__(self, tokenized_docs: List[List[str]], k1=1.5, b=0.75):
        self.k1, self.b = k1, b
        self.N = len(tokenized_docs)
        self.doc_lens = np.array([len(d) for d in tokenized_docs], dtype=float)
        self.avgdl = float(self.doc_lens.mean()) if self.N>0 else 0.0
        df = Counter()
        for d in tokenized_docs:
            for t in set(d): df[t]+=1
        self.idf = {t: math.log((self.N - df_t + 0.5) / (df_t + 0.5) + 1.0) for t, df_t in df.items()}
        self.tf = [Counter(d) for d in tokenized_docs]
    def score(self, q_tokens: List[str]) -> np.ndarray:
        scores = np.zeros(self.N, dtype=float)
        for qi in set(q_tokens):
            if qi not in self.idf: continue
            idf = self.idf[qi]
            for idx, tf_d in enumerate(self.tf):
                f = tf_d.get(qi, 0.0)
                if f<=0: continue
                denom = f + self.k1*(1.0 - self.b + self.b*(self.doc_lens[idx]/(self.avgdl if self.avgdl>0 else 1.0)))
                scores[idx] += idf * (f*(self.k1+1.0)) / (denom if denom>0 else 1.0)
        return scores

bm25 = BM25(tokenized)

# TF-IDF
vocab: Dict[str, int] = {}
for toks in tokenized:
    for t in toks:
        if t not in vocab: vocab[t] = len(vocab)
V = len(vocab); N = len(tokenized)
df_counts = np.zeros(V, dtype=np.int32)
for toks in tokenized:
    for t in set(toks): df_counts[vocab[t]] += 1
idf = np.log((N+1)/(df_counts+1)) + 1.0

def tfidf_vec(tokens: List[str]) -> np.ndarray:
    vec = np.zeros(V, dtype=np.float32)
    if not tokens: return vec
    tf = Counter(tokens); max_tf = max(tf.values())
    for tok, f in tf.items():
        j = vocab.get(tok)
        if j is None: continue
        vec[j] = (0.5 + 0.5*(f/max_tf)) * idf[j]
    n = np.linalg.norm(vec)
    return vec / n if n>0 else vec

tfidf_matrix = np.vstack([tfidf_vec(t) for t in tokenized])

def cosine_sim(qvec: np.ndarray, mat: np.ndarray) -> np.ndarray:
    qn = np.linalg.norm(qvec)
    return np.zeros(mat.shape[0], dtype=np.float32) if qn==0 else mat @ (qvec/qn)
def norm01(v: np.ndarray) -> np.ndarray:
    lo, hi = v.min(), v.max()
    if hi-lo<1e-9: return np.zeros_like(v)
    return (v-lo)/(hi-lo)
def preview_line(s: str, maxlen: int = 160) -> str:
    s = (s or "").replace("\n"," ")
    return (s[:maxlen] + "...") if len(s)>maxlen else s

def hybrid_search(query: str, top_k=8, alpha=0.6) -> List[Dict[str, Any]]:
    row = {"Query": query, "Tools": ["retriever"], "CacheHits": 0}
    with timeblock(row, "T_total"):
        with timeblock(row, "T_retrieve"):
            q_tokens = [w.lower() for w in re.findall(r"[A-Za-z0-9_.%\-]+", query)]
            bm = bm25.score(q_tokens)
            qvec = tfidf_vec(q_tokens)
            cs  = cosine_sim(qvec, tfidf_matrix)
            bm_n, cs_n = norm01(bm), norm01(cs)
            hybrid = alpha*bm_n + (1-alpha)*cs_n
            idxs = np.argsort(-hybrid)[:top_k]
            out = []
            for rank, i in enumerate(idxs, start=1):
                c = chunks[int(i)]
                out.append({
                    "rank": rank,
                    "score": float(hybrid[i]),
                    "bm25": float(bm_n[i]),
                    "cos": float(cs_n[i]),
                    "file": c.file,
                    "page": c.page,
                    "year_qtr": c.year_qtr,
                    "section_hint": c.section_hint,
                    "doc_id": c.doc_id,
                    "preview": preview_line(c.text)
                })
    instr.log(row)
    return out

# ---------- save artifacts ----------
np.save(os.path.join(ART_DIR, "tfidf.npy"), tfidf_matrix)
with open(os.path.join(ART_DIR, "vocab.json"), "w") as f: json.dump(vocab, f)
with open(os.path.join(ART_DIR, "bm25.pkl"), "wb") as f: pickle.dump(
    {"doc_lens": bm25.doc_lens, "avgdl": float(bm25.avgdl), "idf": bm25.idf}, f)
with open(os.path.join(ART_DIR, "chunks.pkl"), "wb") as f: pickle.dump(chunks, f)
print(f"[SAVE] Artifacts -> {ART_DIR}")

print("\n[INSTRUMENTATION]")
try: display(instr.df())
except NameError: print(instr.df().head())
print(instr.latency_stats())

## 4. Baseline Pipeline

**Baseline (starting point)**
*   Naive chunking.
*   Single-pass vector search.
*   One LLM call, no caching.

In [10]:
# === STAGE 2: BASELINE RETRIEVAL & ONE LLM CALL (NO CACHING) ===
# - Single-pass retrieve (hybrid BM25+TF-IDF) -> light MMR rerank
# - Exactly ONE Gemini call per query (no caching)
# - Returns answer + guaranteed explicit citations

%pip install --upgrade --force-reinstall google-generativeai


import os
from typing import List, Dict, Any
import numpy as np

# -- tiny MMR-like reranker (NumPy 2.0-safe) --
def mmr_rerank(hits: List[Dict[str,Any]], lambda_mult=0.7, top_k=5) -> List[Dict[str,Any]]:
    if not hits:
        return []
    rel = np.array([h['score'] for h in hits], dtype=float)
    if np.ptp(rel) > 1e-9:
        rel = (rel - rel.min()) / (rel.max() - rel.min())
    sim = np.zeros((len(hits), len(hits)))
    for i,a in enumerate(hits):
        for j,b in enumerate(hits):
            sim[i,j] = 1.0 if (a['file']==b['file'] and a['page']==b['page']) else (0.3 if a['file']==b['file'] else 0.0)
    picked, out = set(), []
    while len(out) < min(top_k, len(hits)):
        scores = []
        for i in range(len(hits)):
            if i in picked:
                scores.append(-1e9); continue
            red = 0.0 if not picked else max(sim[i,j] for j in picked)
            scores.append(lambda_mult*rel[i] - (1-lambda_mult)*red)
        i_best = int(np.argmax(scores))
        picked.add(i_best); out.append(hits[i_best])
    for r,h in enumerate(out,1):
        h['rank'] = r
    return out

def retrieve_then_rerank(query: str, top_k=8, alpha=0.6):
    row = {"Query": query, "Tools": ["retriever"], "CacheHits": 0, "Tokens": 0}
    with timeblock(row, "T_total"):
        with timeblock(row, "T_retrieve"):
            raw_hits = hybrid_search(query, top_k=top_k, alpha=alpha)
        with timeblock(row, "T_rerank"):
            hits = mmr_rerank(raw_hits, lambda_mult=0.7, top_k=min(5, top_k))
    instr.log(row)
    return hits

# -- citation helpers --
def format_citation(hit: dict) -> str:
    parts = [hit["file"]]
    if hit.get("year_qtr"): parts.append(hit["year_qtr"])
    parts.append(f"p.{hit['page']}")
    if hit.get("section_hint"): parts.append(hit["section_hint"])
    return " — ".join(parts)

# Map doc_id -> full chunk text to give richer context to the LLM (not just preview)
_doc_text_map = None
def _build_doc_text_map():
    global _doc_text_map
    if _doc_text_map is None:
        _doc_text_map = {c.doc_id: c.text for c in chunks}
    return _doc_text_map

def _context_from_hits(hits: List[Dict[str,Any]], max_chars_per_chunk=1200, top_ctx=3) -> str:
    m = _build_doc_text_map()
    ctx_blocks = []
    for h in hits[:top_ctx]:
        text = m.get(h["doc_id"], h.get("preview","")) or ""
        text = text.strip().replace("\u0000"," ")
        if len(text) > max_chars_per_chunk:
            text = text[:max_chars_per_chunk] + " ..."
        ctx_blocks.append(
            f"[{format_citation(h)}]\n{text}"
        )
    return "\n\n".join(ctx_blocks)

# -- ONE Gemini call (no caching) --
def answer_with_gemini(query: str, top_k_retrieval=8, top_ctx=3, model_name="gemini-2.5-flash-preview-09-2025") -> Dict[str,Any]:
    # 1) Retrieve + rerank
    hits = retrieve_then_rerank(query, top_k=top_k_retrieval, alpha=0.6)

    # 2) Prepare prompt with explicit instruction to cite
    context = _context_from_hits(hits, max_chars_per_chunk=1200, top_ctx=top_ctx)
    system_task = (
        "You are Agent CFO. Answer the user's finance/operations question using ONLY the provided context. "
        "When you state any figures, also provide citations in the format: "
        "[Report, Year/Quarter, p.X, Section/Table]. Keep the answer concise and factual."
    )
    user_prompt = (
        f"Question:\n{query}\n\n"
        f"Context passages (use for citations):\n{context}\n\n"
        "Instructions:\n"
        "1) If a value cannot be supported by the context, say so.\n"
        "2) Include citations inline like: (DBS 3Q24 CFO Presentation — p.14 — Cost/Income table).\n"
        "3) End with a short one-line takeaway."
    )
    prompt = f"{system_task}\n\n{user_prompt}"

    # 3) Call Gemini ONCE
    row = {"Query": f"[generate] {query}", "Tools": ["generator"], "CacheHits": 0, "Tokens": 0}
    try:
        import google.generativeai as genai
    except Exception as e:
        raise SystemExit("google-generativeai package not installed. Run: pip install google-generativeai") from e

    api_key = os.environ.get("GEMINI_API_KEY")
    if not api_key:
        raise SystemExit("Missing GEMINI_API_KEY. Set os.environ['GEMINI_API_KEY'] = '...'.")
    genai.configure(api_key=api_key)

    with timeblock(row, "T_total"):
        with timeblock(row, "T_reason"):
            # (Place for any pre-LLM reasoning like light parsing if needed)
            pass
        with timeblock(row, "T_generate"):
            model = genai.GenerativeModel(model_name)
            resp = model.generate_content(prompt)
            text = getattr(resp, "text", "") or ""
            # Try to record token usage if available; else estimate
            try:
                usage = resp.usage_metadata
                row["Tokens"] = int((usage.prompt_token_count or 0) + (usage.candidates_token_count or 0))
            except Exception:
                # naive estimate: 4 chars ≈ 1 token
                row["Tokens"] = int(len(prompt)//4 + len(text)//4)

    instr.log(row)

    # 4) Guarantee citations are present (append explicit list of top contexts)
    explicit_citations = "\n".join(f"- {format_citation(h)}" for h in hits[:top_ctx])
    final_answer = text.strip()
    if not final_answer:
        final_answer = "No answer generated."
    final_answer += "\n\nCitations:\n" + explicit_citations

    return {
        "answer": final_answer,
        "hits": hits[:top_ctx],
        "raw_model_text": text
    }

# --- quick demo calls (ONE LLM CALL EACH; no caching) ---
demo_queries = [
    "Net Interest Margin (NIM) trend over the last five quarters; provide the values and 1–2 lines of explanation.",
    "Operating expenses YoY for the last three years; list the top three drivers from MD&A.",
    "Cost-to-Income ratio for the last three years; show your working and implications."
]
for q in demo_queries:
    out = answer_with_gemini(q, top_k_retrieval=10, top_ctx=3)
    print("\nQ:", q, "\n")
    print(out["answer"])

Collecting google-generativeai
  Using cached google_generativeai-0.8.5-py3-none-any.whl.metadata (3.9 kB)
Collecting google-ai-generativelanguage==0.6.15 (from google-generativeai)
  Using cached google_ai_generativelanguage-0.6.15-py3-none-any.whl.metadata (5.7 kB)
Collecting google-api-core (from google-generativeai)
  Using cached google_api_core-2.25.1-py3-none-any.whl.metadata (3.0 kB)
Collecting google-api-python-client (from google-generativeai)
  Using cached google_api_python_client-2.183.0-py3-none-any.whl.metadata (7.0 kB)
Collecting google-auth>=2.15.0 (from google-generativeai)
  Using cached google_auth-2.41.0-py2.py3-none-any.whl.metadata (6.6 kB)
Collecting protobuf (from google-generativeai)
  Using cached protobuf-6.32.1-cp39-abi3-macosx_10_9_universal2.whl.metadata (593 bytes)
Collecting pydantic (from google-generativeai)
  Using cached pydantic-2.11.9-py3-none-any.whl.metadata (68 kB)
Collecting tqdm (from google-generativeai)
  Using cached tqdm-4.67.1-py3-none-a

E0000 00:00:1759216962.706014 16638700 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.



Q: Net Interest Margin (NIM) trend over the last five quarters; provide the values and 1–2 lines of explanation. 

The specific numerical values for the Net Interest Margin (NIM) over the last five quarters cannot be determined from the provided context. However, the trend is described as follows:

In 1Q24, Group NIM was stable, supported by the Commercial book NIM increasing by 2 basis points quarter-on-quarter (qoq) due to fixed-rate asset repricing [1Q24, p.2, net interest margin]. By 3Q24, Group NIM was lower QoQ [3Q24, p.2, net interest margin]. This decline was primarily due to Markets Trading's deployment in products that were accretive to income but dilutive to NIM, even though the Commercial book NIM remained unchanged during that quarter [3Q24, p.2, net interest margin].

Group NIM showed stability in 1Q24 but declined by 3Q24 due to strategic asset deployment by Markets Trading.

Citations:
- 1Q24_CEO_presentation.pdf — 1Q24 — p.2 — net interest margin|nim\b
- 3Q24_CEO_pres

E0000 00:00:1759216967.969960 16638700 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.



Q: Operating expenses YoY for the last three years; list the top three drivers from MD&A. 

Based solely on the provided context, the operating expenses YoY for the last three years and the top three drivers from the MD&A cannot be determined.

The available context focuses on corporate governance [dbs-annual-report-2020.pdf, 2020, p.25, key ratios|highlights|summary], digitalization efforts [dbs-annual-report-2022.pdf, 2022, p.14, net interest margin|nim\b], and overall income performance, such as the total income decline of 2% to SGD 9.16 billion in 2024 [dbs-annual-report-2024.pdf, 2024, p.22, net interest margin|nim\b], but it does not contain detailed operating expense figures or a list of their specific drivers.

The requested data is not available in the provided passages.

Citations:
- dbs-annual-report-2020.pdf — 2020 — p.25 — key ratios|highlights|summary
- dbs-annual-report-2022.pdf — 2022 — p.14 — net interest margin|nim\b
- dbs-annual-report-2024.pdf — 2024 — p.22 — net i

E0000 00:00:1759216973.501915 16638700 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.



Q: Cost-to-Income ratio for the last three years; show your working and implications. 

The Cost-to-Income (CTI) ratio for the last three years (2019–2021) is calculated using the formula: Total Operating Expenses / Total Income. Since Total Operating Expenses are not provided directly, they are derived by subtracting Profit before allowances from Total Income (Operating Expenses = Total Income - Profit before allowances).

| Year | Total Income (I) ($ millions) | Profit before allowances (PBA) ($ millions) | Total Operating Expenses (E = I - PBA) ($ millions) | CTI Ratio (E / I) | Citation |
| :--- | :--- | :--- | :--- | :--- | :--- |
| **2021** | 14,297 | 7,828 | 6,469 | 45.25% | [dbs-annual-report-2021, 2021, p.98, FIVE-YEAR SUMMARY] |
| **2020** | 14,592 | 8,434 | 6,158 | 42.20% | [dbs-annual-report-2021, 2020, p.98, FIVE-YEAR SUMMARY] |
| **2019** | 14,544 | 8,286 | 6,258 | 43.03% | [dbs-annual-report-2021, 2019, p.98, FIVE-YEAR SUMMARY] |

**Implications:**
The Cost-to-Income ra

In [9]:
import google.generativeai as genai
import os

# Best practice: store your key as an environment variable
# Or replace "YOUR_API_KEY" with your actual key string for a quick test
genai.configure(api_key=os.environ.get("GEMINI_API_KEY", "YOUR_API_KEY"))

print("Available Models:\n")

# List all models and check which ones support the 'generateContent' method
for model in genai.list_models():
  if 'generateContent' in model.supported_generation_methods:
    print(f"- {model.name}")

Available Models:



E0000 00:00:1759216925.624576 16638700 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.


- models/gemini-2.5-pro-preview-03-25
- models/gemini-2.5-flash-preview-05-20
- models/gemini-2.5-flash
- models/gemini-2.5-flash-lite-preview-06-17
- models/gemini-2.5-pro-preview-05-06
- models/gemini-2.5-pro-preview-06-05
- models/gemini-2.5-pro
- models/gemini-2.0-flash-exp
- models/gemini-2.0-flash
- models/gemini-2.0-flash-001
- models/gemini-2.0-flash-exp-image-generation
- models/gemini-2.0-flash-lite-001
- models/gemini-2.0-flash-lite
- models/gemini-2.0-flash-preview-image-generation
- models/gemini-2.0-flash-lite-preview-02-05
- models/gemini-2.0-flash-lite-preview
- models/gemini-2.0-pro-exp
- models/gemini-2.0-pro-exp-02-05
- models/gemini-exp-1206
- models/gemini-2.0-flash-thinking-exp-01-21
- models/gemini-2.0-flash-thinking-exp
- models/gemini-2.0-flash-thinking-exp-1219
- models/gemini-2.5-flash-preview-tts
- models/gemini-2.5-pro-preview-tts
- models/learnlm-2.0-flash-experimental
- models/gemma-3-1b-it
- models/gemma-3-4b-it
- models/gemma-3-12b-it
- models/gemma-3-2

## 5. Benchmark Runner

Run these 3 standardized queries. Produce JSON then prose answers with citations. These are the standardized queries.

*   Net Interest Margin (NIM) trend over last 5 quarters, values and 1–2 lines of explanation.
    *   Expected: quarterly financial highlights.
*   Operating Expenses (Opex) YoY for last 3 years; top 3 drivers from MD&A.
    *   Expected: Opex table + MD&A commentary.
*   Cost-to-Income Ratio (CTI) for last 3 years; show working + implications.
    *   Expected: Operating Income & Opex lines.


In [None]:
# TODO: Implement benchmark runner


## 6. Instrumentation

Log timings: T_ingest, T_retrieve, T_rerank, T_reason, T_generate, T_total. Log tokens, cache hits, tools.

In [None]:
# Example instrumentation schema
import pandas as pd
logs = pd.DataFrame(columns=['Query','T_ingest','T_retrieve','T_rerank','T_reason','T_generate','T_total','Tokens','CacheHits','Tools'])
logs

## 7. Optimizations

**Required Optimizations**

Each team must implement at least:
*   2 retrieval optimizations (e.g., hybrid BM25+vector, smaller embeddings, dynamic k).
*   1 caching optimization (query cache or ratio cache).
*   1 agentic optimization (plan pruning, parallel sub-queries).
*   1 system optimization (async I/O, batch embedding, memory-mapped vectors).

In [None]:
# TODO: Implement optimizations


## 8. Results & Plots

Show baseline vs optimized. Include latency plots (p50/p95) and accuracy tables.

In [None]:
# TODO: Generate plots with matplotlib
