<a href="https://colab.research.google.com/github/AlugubellySaisri/diabetes/blob/main/Week5%20part2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Google Gen AI SDK + utilities
!pip install --upgrade google-genai arxiv beautifulsoup4 requests readability-lxml

Collecting google-genai
  Downloading google_genai-1.49.0-py3-none-any.whl.metadata (46 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/46.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.7/46.7 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting arxiv
  Downloading arxiv-2.3.0-py3-none-any.whl.metadata (5.2 kB)
Collecting beautifulsoup4
  Downloading beautifulsoup4-4.14.2-py3-none-any.whl.metadata (3.8 kB)
Collecting requests
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting readability-lxml
  Downloading readability_lxml-0.8.4.1-py3-none-any.whl.metadata (4.0 kB)
Collecting feedparser~=6.0.10 (from arxiv)
  Downloading feedparser-6.0.12-py3-none-any.whl.metadata (2.7 kB)
Collecting cssselect (from readability-lxml)
  Downloading cssselect-1.3.0-py3-none-any.whl.metadata (2.6 kB)
Collecting sgmllib3k (from feedparser~=6.0.10->arxiv)
  Downloading sgmllib3k-1.0.0.ta

In [None]:
from getpass import getpass
import os

In [None]:
api_key = getpass("Paste your Google Gemini API key here: ")
os.environ["GEMINI_API_KEY"] = api_key   # optional - keep in memory only

Paste your Google Gemini API key here: ··········


In [None]:
from google import genai

In [None]:
|from google import genai

# create client using the key we provided
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])


In [None]:
resp = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Summarize in one sentence: Why is reproducibility important in research?",
)
print(resp.text)

Reproducibility is important because it validates research findings, ensuring their reliability and building the trust necessary for scientific progress.


In [None]:
import arxiv

def search_arxiv(query, max_results=5):
    search = arxiv.Search(
        query=query,
        max_results=max_results,
        sort_by=arxiv.SortCriterion.Relevance
    )
    results = []
    for result in search.results():
        results.append({
            "title": result.title,
            "summary": result.summary,
            "authors": [a.name for a in result.authors],
            "pdf_url": result.pdf_url,
            "id": result.get_short_id()
        })
    return results

In [None]:
# quick test
papers = search_arxiv("multimodal transformers", max_results=3)
for p in papers:
    print(p['title'])

  for result in search.results():


Multimodal Learning with Transformers: A Survey
MANGO: Multimodal Attention-based Normalizing Flow Approach to Fusion Learning
Multimodal Transformer With a Low-Computational-Cost Guarantee


In [None]:
import requests
from bs4 import BeautifulSoup

def fetch_page_text(url, max_chars=4000):
    try:
        r = requests.get(url, timeout=10, headers={"User-Agent": "research-agent/1.0"})
        r.raise_for_status()
    except Exception as e:
        return ""

    soup = BeautifulSoup(r.text, "html.parser")
    # simple extraction: join visible <p> text
    paragraphs = [p.get_text(separator=" ", strip=True) for p in soup.find_all("p")]
    content = "\n".join(paragraphs)
    return content[:max_chars] # limit length for API

In [None]:
def chunk_text(text, max_chars=3000):
    chunks = []
    start = 0
    while start < len(text):
        end = min(len(text), start + max_chars)
        chunks.append(text[start:end])
        start = end
    return chunks

In [None]:
SYSTEM_PROMPT = (
    "You are a concise research assistant. For each text provided, return: "
    "1) a one-paragraph summary (3-5 sentences), 2) three bullet key contributions/findings, "
    "and 3) a suggested short title. Be factual and include no hallucinated facts."
)

def summarize_chunk(chunk_text):
    prompt = SYSTEM_PROMPT + "\nNext to summarize:\n" + chunk_text
    resp = client.models.generate_content(
        model="gemini-2.5-flash",  # adjust if unavailable
        contents=prompt
    )
    return resp.text

In [None]:
def summarize_text_long(text):
    chunks = chunk_text(text, max_chars=3000)
    summaries = []
    for c in chunks:
        s = summarize_chunk(c)
        summaries.append(s)

    # Optionally aggregate the chunk summaries into a single final summary:
    if len(summaries) == 1:
        return summaries[0]
    else:
        combined = "\n\n".join(summaries)
        final_prompt = SYSTEM_PROMPT + "\n\nCombine the following chunk summaries into a single concise summary:\n\n" + combined
        final_resp = client.models.generate_content(model="gemini-2.5-flash", contents=final_prompt)
        return final_resp.text

In [None]:
query = "multimodal transformers"
papers = search_arxiv(query, max_results=5)

for p in papers:
    print("\n---")
    print("Title:", p['title'])
    text_to_summarize = p['summary'] # arXiv abstract (short)
    summary = summarize_text_long(text_to_summarize)
    print("Summary:\n", summary)
    print("PDF:", p['pdf_url'])

  for result in search.results():



---
Title: Multimodal Learning with Transformers: A Survey
Summary:
 1) This paper presents a comprehensive survey of Transformer techniques applied to multimodal data, an emerging field in AI research driven by multimodal applications and big data. The survey begins with a background on multimodal learning, the Transformer ecosystem, and the multimodal big data era. It then provides a theoretical review of Vanilla, Vision, and multimodal Transformers from a geometrically topological perspective, followed by an examination of their applications in multimodal pretraining and specific tasks. The paper concludes by summarizing common challenges and designs, and discussing open problems and future research directions for the community.

2)
*   Provides a comprehensive survey specifically focused on Transformer techniques for multimodal data.
*   Includes a theoretical review of various Transformer architectures (Vanilla, Vision, and Multimodal) from a geometrically topological perspective

In [None]:
from google.colab import drive
drive.mount('/content/drive')

out_path = '/content/drive/MyDrive/research_summaries.txt'
with open(out_path, 'w', encoding='utf-8') as f:
    for p in papers:
        f.write("TITLE: " + p['title'] + "\n")
        s = summarize_text_long(p['summary'])
        f.write(s + "\n\n---\n\n")

print("Saved to", out_path)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Saved to /content/drive/MyDrive/research_summaries.txt
