## Setup and Dependencies
This section installs the necessary libraries and configures the Gemini API key.

- google-generativeai: For accessing the Gemini LLM.

- pypdf: For extracting text from PDF files.

- pandas: Used here for handling data, though mainly for structure/potential future use (though not strictly required for the core logic).

- ipywidgets: For creating the interactive display of flashcards in a notebook environment.

In [None]:
%pip install google-generativeai pypdf pandas ipywidgets

In [None]:
import google.generativeai as genai
from pypdf import PdfReader
import textwrap
import json
import os

In [None]:
genai.configure(api_key="AIzaSyAcxISd3mLcBfJ46yeIh01lS6gU9b6PTJI")

MODEL = "gemini-2.5-flash"

## Core Utilities: I/O and Chunking
These helper functions manage reading the PDF file and splitting the raw text into model-safe chunks.
- read_pdf: Extracts all text from a PDF file page-by-page.
- chunk_text: Splits the long, raw text into smaller sections (default $\approx$ 6000 characters) to avoid exceeding the LLM's context window and to ensure focused summarization.
- call_llm: A simple wrapper function to call the Gemini model with a specified prompt and temperature.

In [None]:
def read_pdf(path):
    """Extracts text from a PDF file."""
    reader = PdfReader(path)
    text = ""
    for page in reader.pages:
        text += page.extract_text() + "\n"
    return text


def chunk_text(text, max_chars=6000):
    """Splits long text into model-safe chunks."""
    chunks = []
    current = []
    current_len = 0

    for line in text.split("\n"):
        if current_len + len(line) > max_chars:
            chunks.append("\n".join(current))
            current = []
            current_len = 0
        current.append(line)
        current_len += len(line)

    if current:
        chunks.append("\n".join(current))

    return chunks


def call_llm(prompt, temperature=0.2):
    """Single wrapper for Gemini calls."""
    response = genai.GenerativeModel(MODEL).generate_content(
        prompt,
        generation_config={"temperature": temperature}
    )
    return response.text

## Sequential Agent Pipeline
This project uses two distinct LLM agents operating in sequence: the Summarization Agent and the Flashcard Agent.

1. Summarization Agent


This agent takes a chunk of text and transforms it into structured Markdown notes, focusing only on key educational elements (concepts, definitions, examples).

In [None]:
def summarization_agent(chunk):
    prompt = f"""
You are a Summarization Agent.

Summarize the following content into clear, structured study notes.
Focus on:
- Key concepts
- Definitions
- Examples
- Important relationships
- Remove irrelevant text

Return only the notes.

Content:
{chunk}
"""
    return call_llm(prompt)

2. Flashcard Agent


This agent takes the output from the Summarization Agent (the structured notes) and generates concise question-answer pairs in a clean JSON format. This structure is crucial for easy parsing and interactive display.

In [None]:
def flashcard_agent(notes):
    prompt = f"""
You are a Flashcard Agent.

Create concise flashcards from the following study notes.
Output JSON list like:

[
  {{"question": "...", "answer": "..."}},
  ...
]

Notes:
{notes}
"""
    output = call_llm(prompt)
    try:
        return json.loads(output)
    except:
        # fallback: model sometimes outputs markdown
        return json.loads(output.strip().strip("```json").strip("```"))

## Main Processing Function

The process_file function orchestrates the entire workflow: reading, chunking, running agents sequentially, and combining results.

In [None]:
def process_file(file_path):
    print("Reading file...")
    text = read_pdf(file_path) if file_path.endswith(".pdf") else open(file_path).read()

    print("Chunking...")
    chunks = chunk_text(text)

    print(f"Processing {len(chunks)} chunks...\n")
    all_notes = []

    # Run the summarization agent sequentially
    for i, chunk in enumerate(chunks):
        print(f"Summarizing chunk {i+1} / {len(chunks)}...")
        summary = summarization_agent(chunk)
        all_notes.append(summary)

    full_notes = "\n\n".join(all_notes)

    print("Generating flashcards...")
    flashcards = flashcard_agent(full_notes)

    return full_notes, flashcards

In [None]:
full_notes, flashcards = process_file('sample.pdf')

## Interactive Display (IPywidgets)
This final section uses ipywidgets and IPython.display to render the results interactively in the Jupyter Notebook.

- The Structured Notes are displayed directly using Markdown formatting.

- The Flashcards are displayed using Accordion widgets, allowing users to see the question and click to reveal the answer for active recall.

In [None]:
from IPython.display import Markdown, display
from ipywidgets import Accordion, HTML, VBox
import pandas as pd

def show_notes(notes):
    display(Markdown(f"## ðŸ“˜ Study Notes\n\n{notes}"))

def show_flashcards(cards):
    items = []
    for c in cards:
        q = HTML(f"<b>Q:</b> {c['question']}")
        a = HTML(f"<b>A:</b> {c['answer']}")
        items.append(VBox([q, a]))

    accordion = Accordion(children=items)

    for i, c in enumerate(cards):
        accordion.set_title(i, f"Card {i+1}")

    return accordion

In [None]:
show_notes(full_notes)

In [None]:
show_flashcards(flashcards)