## Report Assistant - MCP Usecase

Interactive notebook that fetches recent scholarly works (OpenAlex), synthesizes an executive‑style report with Gemini, and optionally writes the report into a Google Doc.

#### Key components
- OpenAlex client (OpenAlexMCP) to search and format paper metadata.
- Gemini integration (google.generativeai) to synthesize a concise Markdown report from retrieved papers.
- Google Docs helpers (GDocsHelper) to create documents, append text, insert TOC/images, add comments and log actions.
- UI built with ipywidgets: connect to Google, create doc, discover tools, run searches, and send research requests.
- Logging to a local JSONL file (LOG_PATH) for audit of create/append actions.
- Optional Colab one‑click auth and fallback local OAuth flow.

#### Configurable options
- GEMINI_MODEL, CLIENT_SECRETS_JSON, DEFAULT_TOPIC/TITLE.
- OPENALEX_MAX_RESULTS, OPENALEX_SORT, and RUN_ID/LOG_PATH.
- Widgets to toggle OpenAlex usage and set number of results.

#### Brief workflow
- Connect to Google APIs (Docs & Drive) via the UI. **1) Connect Google** button
- Create a new Google Doc (header + timestamp). **2) Create Doc** button
- Discover OpenAlex capability and run a paper search for the user query.
- Synthesize a short executive report (Overview, Key Themes, Noteworthy Papers, Gaps & Risks, Next Steps, References) using Gemini.
- Append the synthesized Markdown to the Google Doc and record actions in LOG_PATH.
- View results and interact via the chat UI.

#### Outputs & artifacts
- Google Doc containing generated reports and appended queries.
- Local audit JSONL of actions (create, append, replace, insert_toc, insert_image).
- On‑screen chat history and discovery output.

#### Primary use cases
- Rapid research synthesis for executives or product teams.
- Prototype end‑to‑end LLM → retrieval → document automation with auditable actions.
- Teaching/demo of integrating OpenAlex, Gemini, and Google Docs APIs.

#### Extensibility
- Swap OpenAlex for other retrieval sources or add PDF ingestion.
- Replace Gemini with an alternate LLM or adjust prompt/response_mime_type.
- Extend GDocsHelper to support richer formatting or export workflows.

In [None]:
from google.colab import userdata
try:
    import google.generativeai as genai
    genai.configure(api_key=userdata.get('GOOGLE_API_KEY'))
except Exception:
    print("Install google-generativeai to enable live calls.")

GEMINI_MODEL = "gemini-2.5-pro"
CLIENT_SECRETS_JSON = None  # for local Jupyter OAuth; Colab does one-click auth

DEFAULT_TOPIC = "Executive Report: Recent trends in medical vision AI"
DEFAULT_TITLE = "Research Report — Medical Vision AI"

OPENALEX_MAX_RESULTS = 10
OPENALEX_SORT = "publication_date:desc"

from datetime import datetime, timezone
import uuid, os
RUN_ID = f"gdocs-openalex-{uuid.uuid4().hex[:8]}"
LOG_PATH = f"/mnt/data/{RUN_ID}_actions.jsonl"
print({"RUN_ID": RUN_ID, "LOG_PATH": LOG_PATH})
""")]

cells += [md("## Auth — connect to Google Docs & Drive")]
cells += [code("""
from typing import Optional
import json, os

SCOPES = [
"https://www.googleapis.com/auth/documents",
"https://www.googleapis.com/auth/drive.file",
"https://www.googleapis.com/auth/drive.metadata.readonly"
]

def google_auth(creds_path: str=None):
    try:
        from google.colab import auth as colab_auth  # type: ignore
        colab_auth.authenticate_user()
        from googleapiclient.discovery import build
        docs  = build("docs", "v1")
        drive = build("drive", "v3")
        return docs, drive
    except Exception:
        from google_auth_oauthlib.flow import InstalledAppFlow
        from google.auth.transport.requests import Request
        from google.oauth2.credentials import Credentials
        from googleapiclient.discovery import build
        token_file = "/mnt/data/token.json"
        creds = None
        if os.path.exists(token_file):
            creds = Credentials.from_authorized_user_file(token_file, SCOPES)
        if not creds or not creds.valid:
            if creds and creds.expired and creds.refresh_token:
                creds.refresh(Request())
            else:
                if not creds_path or not os.path.exists(creds_path):
                    raise RuntimeError("CLIENT_SECRETS_JSON not set or file missing.")
                flow = InstalledAppFlow.from_client_secrets_file(creds_path, SCOPES)
                creds = flow.run_local_server(port=0)
            with open(token_file, "w") as f:
                f.write(creds.to_json())
        docs  = build("docs", "v1", credentials=creds)
        drive = build("drive", "v3", credentials=creds)
        return docs, drive
    

import httpx
class OpenAlexMCP:
    name = "openalex.scholarly"
    def capabilities(self):
        return {
            "server": self.name,
            "tools": [{
            "name":"search_works",
            "description":"Query OpenAlex for scholarly works (papers). No API key required.",
            "args_schema":{"query":"str","per_page":"int","sort":"str"}
            }]
        }
    def search_works(self, query: str, per_page: int=10, sort: str="publication_date:desc"):
        params = {"search": query, "per_page": per_page, "sort": sort}
        r = httpx.get("https://api.openalex.org/works", params=params, timeout=30.0)
        r.raise_for_status()
        data = r.json()
        results = []
        for w in data.get("results", []):
            title = w.get("display_name")
            year  = w.get("publication_year")
            date  = w.get("publication_date")
            venue = (w.get("host_venue") or {}).get("display_name")
            doi   = (w.get("doi") or "")
            id_   = w.get("id")
            url   = (w.get("primary_location") or {}).get("landing_page_url") or (w.get("primary_location") or {}).get("pdf_url") or id_
            authors = ", ".join([a.get("author", {}).get("display_name", "") for a in w.get("authorships", [])][:6])
            cited  = w.get("cited_by_count")
            results.append({
            "title": title, "authors": authors, "venue": venue, "year": year, "date": date,
            "doi": doi, "url": url, "openalex_id": id_, "cited_by": cited
            })
        return results

openalex_server = OpenAlexMCP()

def format_refs_md(items):
    lines=[]
    for i,it in enumerate(items,1):
        doi = (it.get("doi","") or "").replace("https://doi.org/","")
        line = f"{i}. {it['title']} ({it.get('year') or ''}) — {it.get('authors','')}. {it.get('venue','')}. DOI: {doi}  URL: {it.get('url','')}"
        lines.append(line.strip())
    return "\n".join(lines)


# Gemini — synthesis from query + retrieved papers
import json, os
def ensure_gemini_text_model():
    return genai.GenerativeModel(
        GEMINI_MODEL,
        generation_config={"response_mime_type":"text/plain","temperature":0.2},
        )
                    
def synthesize_report(user_request: str, works: list):
    model = ensure_gemini_text_model()
    refs_json = json.dumps(works, ensure_ascii=False)
    prompt = f"""You are a research synthesis assistant for executives.
        Task: Produce a concise, well-structured report in markdown based on the user's request and the retrieved literature list (OpenAlex).
        Return sections with headings:
            •	Overview (2-4 sentences in plain language)
            •	Key Themes (bulleted; cite papers by # index)
            •	Noteworthy Papers (short bullets of 5-8 items; each bullet: [#index] Title — 1-2 insights)
            •	Gaps & Risks (bulleted)
            •	Next Steps (bulleted, actionable for leaders)
            •	References (numbered, reuse the indices)

        Rules:
            •	Ground claims in the provided items; do not invent facts.
            •	Use the numbering of the provided references (1..N).
            •	Keep the whole report under ~800 words.

        User request:
        {user_request}

        Retrieved items (JSON array):
        {refs_json}
        """
    resp = model.generate_content(prompt)
    text = (getattr(resp,"text",None) or "").strip()
    if not text:
        text = f"# Report\n\n## Overview\n(no content)\n\n## References\n{format_refs_md(works)}"
    return text

# Google Docs helpers
from googleapiclient.errors import HttpError
class GDocsHelper:
    def __init__(self, docs, drive, log_path):
        self.docs = docs; self.drive = drive; self.log_path = log_path
    def _log(self, e):
        e["ts"] = datetime.utcnow().isoformat()+"Z"
        os.makedirs(os.path.dirname(self.log_path), exist_ok=True)
        with open(self.log_path, "a") as f: f.write(json.dumps(e)+"\n")
    def create_doc(self, title):
        d = self.docs.documents().create(body={"title": title}).execute()
        doc_id = d["documentId"]; link = f"https://docs.google.com/document/d/{doc_id}/edit"
        self._log({"event":"create_doc","title":title,"docId":doc_id}); return doc_id, link
    def append_text(self, doc_id, text):
        reqs=[{"insertText":{"text":text,"endOfSegmentLocation":{}}}]
        self.docs.documents().batchUpdate(documentId=doc_id, body={"requests": reqs}).execute()
        self._log({"event":"append_text","docId":doc_id,"chars":len(text)})
    def replace_all(self, doc_id, query, replace, match_case=False):
        reqs=[{"replaceAllText":{"containsText":{"text":query,"matchCase":match_case},"replaceText":replace}}]
        self.docs.documents().batchUpdate(documentId=doc_id, body={"requests": reqs}).execute()
        self._log({"event":"replace_all","docId":doc_id,"query":query,"replace_len":len(replace)})
    def insert_toc(self, doc_id):
        reqs=[{"insertTableOfContents":{"location":{"index":1},"tableOfContents":{"suggestedInsertionIds":[]}}}]
        self.docs.documents().batchUpdate(documentId=doc_id, body={"requests": reqs}).execute()
        self._log({"event":"insert_toc","docId":doc_id})
    def insert_image(self, doc_id, url, width_pts=360):
        reqs=[{"insertInlineImage":{"location":{"endOfSegmentLocation":{}},"uri":url,"objectSize":{"height":{"magnitude":width_pts*0.5625,"unit":"PT"},"width":{"magnitude":width_pts,"unit":"PT"}}}}]
        self.docs.documents().batchUpdate(documentId=doc_id, body={"requests": reqs}).execute()
        self._log({"event":"insert_image","docId":doc_id,"url":url})
    def add_comment(self, file_id, content):
        self.drive.comments().create(fileId=file_id, body={"content": content}).execute()
        self._log({"event":"add_comment","fileId":file_id})

# Run the assistant — connect, create, and chat
import ipywidgets as widgets
from IPython.display import display, Markdown

_state = {
    "docs": None,
    "drive": None,
    "helper": None,
    "doc_id": None,
    "file_id": None,
    "link": None,
    "title": DEFAULT_TITLE,
    "topic": DEFAULT_TOPIC
    }

topic_in     = widgets.Text(value=DEFAULT_TOPIC, description="Topic")
title_in     = widgets.Text(value=DEFAULT_TITLE, description="Doc title")
connect_btn  = widgets.Button(description="1) Connect Google", button_style="info")
create_btn   = widgets.Button(description="2) Create Doc", button_style="success")

use_openalex = widgets.Checkbox(value=True, description="Use OpenAlex scholarly search")
per_page_sl  = widgets.IntSlider(value=OPENALEX_MAX_RESULTS, min=1, max=25, step=1, description="Results")

discover_btn = widgets.Button(description="Discover Tools")
chat_box     = widgets.Output(layout=widgets.Layout(border='1px solid #ddd', min_height='180px'))
input_box    = widgets.Text(placeholder="Ask: e.g., 'Generate report for top 10 most recent papers on medical vision AI'")
send_btn     = widgets.Button(description="Send")
doc_link     = widgets.HTML(value="")
tools_out    = widgets.Output()

history=[]

def render_chat():
    chat_box.clear_output()
    with chat_box:
        for u,a in history:
            display(Markdown(f"You: {u}"))
            display(Markdown(a))

def on_connect(_):
    try:
        docs,drive=google_auth(CLIENT_SECRETS_JSON)
        _state["docs"], _state["drive"] = docs, drive
        _state["helper"] = GDocsHelper(docs, drive, LOG_PATH)
        doc_link.value = "<span style='color:green;'>Connected to Google APIs ✓</span>"
    except Exception as e:
        doc_link.value = f"<span style='color:red;'>Auth error: {e}</span>"

def on_create(_):
    title = title_in.value.strip() or "Research Report"
    _state["title"] = title
    if not _state.get("helper"):
        doc_link.value = "<span style='color:red;'>Connect to Google first.</span>"
        return
    try:
        doc_id, link = _state["helper"].create_doc(title)
        _state.update({"doc_id": doc_id, "file_id": doc_id, "link": link})
        header = f"# {title}\nGenerated: {datetime.now(timezone.utc).isoformat().replace('+00:00','Z')}\n\n"
        _state["helper"].append_text(doc_id, header)
        doc_link.value = f"<a href='{link}' target='_blank'>Open Google Doc</a>"
        history.append(("Create doc", f"Created doc → {title}"))
        render_chat()
    except Exception as e:
        doc_link.value = f"<span style='color:red;'>Create error: {e}</span>"

def on_discover(_):
    tools = openalex_server.capabilities()
    tools_out.clear_output()
    with tools_out:
        display(Markdown(f"Discovered Server: {tools['server']}"))
    for t in tools["tools"]:
        display(Markdown(f"- {t['name']} — {t['description']}"))
    history.append(("Discover tools", "Listed available tools."))
    render_chat()

def handle_research_turn(user_msg: str):
    # Retrieve from OpenAlex (if enabled)
    works = []
    if use_openalex.value:
        try:
            works = openalex_server.search_works(user_msg, per_page=int(per_page_sl.value), sort=OPENALEX_SORT)
        except Exception as e:
            works = []
            history.append(("system", f"(OpenAlex error) {e}"))
    # Synthesize via Gemini
    try:
        report_md = synthesize_report(user_msg, works)
    except Exception as e:
        report_md = f"Report generation failed: {e}"
    # Append to Doc
    helper = _state.get("helper")
    if helper and _state.get("doc_id"):
        try:
            stamp = datetime.now(timezone.utc).isoformat().replace("+00:00","Z")
            header = f"\n\n—\nQuery @ {stamp}: {user_msg}\n\n"
            helper.append_text(_state["doc_id"], header + report_md)
            link = _state.get("link")
            return f"Fetched {len(works)} papers from OpenAlex and added a synthesized report.\n\nOpen Doc"
        except Exception as e:
            return f"(Doc append error) {e}"
    else:
        return "Connect to Google and create the doc first."

def on_send(_=None):
    user_msg = input_box.value.strip()
    if not user_msg:
        return
    input_box.value = ""
    assistant_reply = handle_research_turn(user_msg)
    history.append((user_msg, assistant_reply))
    render_chat()

connect_btn.on_click(on_connect)
create_btn.on_click(on_create)
discover_btn.on_click(on_discover)
send_btn.on_click(on_send)
input_box.on_submit(on_send)

display(widgets.VBox([
    widgets.HBox([topic_in, title_in]),
    widgets.HBox([connect_btn, create_btn]),
    widgets.HBox([use_openalex, per_page_sl, discover_btn]),
    doc_link,
    tools_out,
    chat_box,
    widgets.HBox([input_box, send_btn])
    ]))
render_chat()