# AI STUDY PLANNER


## Setup: Install required packages

In [1]:
!pip install -q google-generativeai gradio pandas fpdf pypdf

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pandas-profiling 3.2.0 requires joblib~=1.1.0, but you have joblib 1.4.2 which is incompatible.
anaconda-cloud-auth 0.1.3 requires pydantic<2.0, but you have pydantic 2.12.4 which is incompatible.
datasets 2.12.0 requires huggingface-hub<1.0.0,>=0.11.0, but you have huggingface-hub 1.1.5 which is incompatible.
jupyter-server 1.23.4 requires anyio<4,>=3.1.0, but you have anyio 4.11.0 which is incompatible.
s3fs 2023.4.0 requires fsspec==2023.4.0, but you have fsspec 2025.10.0 which is incompatible.
transformers 4.32.1 requires huggingface-hub<1.0,>=0.15.1, but you have huggingface-hub 1.1.5 which is incompatible.


## Imports: Libraries and modules

**Key modules:**
- `google.generativeai` ‚Äî Gemini client (for LLM calls)
- `gradio` ‚Äî lightweight UI for demoing the agent
- `pandas` ‚Äî schedule and analytics data structures
- `pypdf` / `PdfReader` ‚Äî reading uploaded PDF syllabi
- `FPDF` ‚Äî generating a printable PDF report


In [2]:
import os
import json
import re
import datetime
import pandas as pd
import gradio as gr
import google.generativeai as genai
from pypdf import PdfReader
from fpdf import FPDF
from getpass import getpass

## API Key Setup (secure input)

**Inputs:** Interactive prompt from `getpass()` ‚Äî the key is not echoed to the screen.  
**Outputs:** `genai` client configured for subsequent LLM calls.

**Security notes:**
- **Never** store API keys in the notebook as plaintext or commit them to version control.
- For automated deployments, use environment variables or a secret manager instead of interactive prompts.

In [12]:
# --- API KEY SETUP ---
api_key = getpass("Enter your Google Gemini API Key: ")
genai.configure(api_key=api_key)

Enter your Google Gemini API Key: ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


## Configuration: Model & generation settings

**Purpose:** Centralized configuration for LLM usage and generation behavior.

**Key values:**
- `MODEL_NAME` ‚Äî Gemini model used for planning.
- `GENERATION_CONFIG` ‚Äî deterministic settings (temperature 0.0), response type, and token limit.

**Why:** Keeping these parameters in one place enables easy tuning (e.g., more creative plans vs deterministic structured JSON output). Use `temperature=0.0` when you need reliable, schema-conformant responses.

In [13]:
# --- CONFIGURATION ---
MODEL_NAME = "gemini-2.5-flash-preview-09-2025"
GENERATION_CONFIG = {
    "temperature": 0.0,  # Deterministic for JSON structure
    "response_mime_type": "application/json",
    "max_output_tokens": 25000,
}

## File processing helpers

**Purpose:** Utility function(s) to extract text from uploaded files (PDF, TXT, CSV, MD, HTML).

**Function:**
- `extract_text_from_file(filepath)` ‚Äî returns a text snippet for the planner to ingest.

**Inputs:** Path to a local file (uploaded syllabus or notes).  
**Outputs:** Plain text string extracted from the file.

**Notes:**
- PDF parsing uses `pypdf.PdfReader` ‚Äî results depend on PDF encoding.
- We truncate long file contents when sending them to the LLM to avoid token limits.
- Errors are caught and returned as descriptive strings to aid debugging.

In [14]:
# --- FILE PROCESSING HELPERS ---
def extract_text_from_file(filepath):
    """Extracts text from PDF, TXT, CSV, or MD files."""
    try:
        ext = os.path.splitext(filepath)[1].lower()
        if ext == '.pdf':
            reader = PdfReader(filepath)
            text = ""
            for page in reader.pages:
                text += page.extract_text() + "\n"
            return text
        elif ext in ['.txt', '.md', '.csv', '.html']:
            with open(filepath, 'r', encoding='utf-8') as f:
                return f.read()
        else:
            return f"[Unsupported file type: {ext}]"
    except Exception as e:
        return f"[Error reading file: {str(e)}]"

## Export helpers: HTML & PDF generation

**Purpose:** Convert the generated plan JSON into human-friendly artifacts:
- An HTML file for quick viewing
- A printable PDF for offline use / submission

**Function:**
- `export_plan_to_files(plan_json)` ‚Äî creates `<base_name>.html` and `<base_name>.pdf` and returns their filenames.

**Inputs:** `plan_json` (structure returned by the PlannerAgent)  
**Outputs:** `["<base>.pdf", "<base>.html"]`

**Notes:**
- HTML is generated using `pandas.DataFrame.to_html()` for fast rendering.
- PDF uses `FPDF` with a straightforward layout; replace with wkhtmltopdf/WeasyPrint for higher-fidelity styling.
- Long topic strings are truncated for PDF width constraints.

In [15]:
# --- EXPORT HELPERS ---
def export_plan_to_files(plan_json):
    """Generates HTML and PDF files from the plan JSON."""
    base_name = plan_json.get("suggested_filename_base", "study_plan")
    schedule = plan_json.get("schedule", [])

    # 1. Generate HTML
    df = pd.DataFrame(schedule)
    html_filename = f"{base_name}.html"
    if not df.empty:
        html_content = f"<h1>Study Plan: {base_name}</h1>"
        html_content += f"<div>{plan_json.get('plan_summary', '')}</div><hr>"
        html_content += df.to_html(index=False, classes='table table-striped')
        with open(html_filename, "w", encoding='utf-8') as f:
            f.write(html_content)

    # 2. Generate PDF
    pdf_filename = f"{base_name}.pdf"
    pdf = FPDF()
    pdf.add_page()
    pdf.set_font("Arial", size=12)
    pdf.cell(200, 10, txt=f"Study Plan: {base_name}", ln=True, align='C')
    pdf.ln(10)

    # Add Summary (Plain text approximation)
    summary_text = re.sub('<[^<]+?>', '', plan_json.get('plan_summary', '')) # Strip HTML
    pdf.set_font("Arial", size=10)
    pdf.multi_cell(0, 10, txt=summary_text)
    pdf.ln(10)

    # Add Schedule Rows
    pdf.set_font("Arial", 'B', 10)
    pdf.cell(30, 10, "Date", 1)
    pdf.cell(90, 10, "Topic", 1)
    pdf.cell(30, 10, "Status", 1)
    pdf.ln()

    pdf.set_font("Arial", size=9)
    for item in schedule:
        date_str = str(item.get('date', ''))
        topic_str = str(item.get('topic', ''))[:45] # Truncate for PDF width
        status_str = str(item.get('status', ''))

        pdf.cell(30, 10, date_str, 1)
        pdf.cell(90, 10, topic_str, 1)
        pdf.cell(30, 10, status_str, 1)
        pdf.ln()

    pdf.output(pdf_filename)

    return [pdf_filename, html_filename]

## System prompt & JSON schema for the PlannerAgent

**Purpose:** Provide the LLM with a strict system instruction and the exact JSON schema it must return.

**Why:** A detailed system prompt reduces hallucination and enforces a consistent output structure the notebook can parse programmatically.

**Contents:**
- Role & objective: PlannerAgent acts as an academic coach.
- Operational rules: input analysis, scheduling rules (session durations, no overlaps), recalibration behavior.
- JSON schema: defines `plan_summary`, `suggested_filename_base`, `stats`, and `schedule` item fields.

**Important:** Keep this prompt stable for reproducibility. If Gemini output deviates, first try lowering `temperature` or tightening schema instructions.

In [16]:
SYSTEM_PROMPT = """
## ROLE & OBJECTIVE
You are PlannerAgent, an expert academic coach. Analyze user goals and uploaded materials to generate a structured study plan.

## OPERATIONAL RULES
1. **Input Analysis:** Identify subjects, deadlines, and weak areas.
2. **Scheduling:**
   - Break subjects into sessions (45-90 mins).
   - Assign dates/times based on "Current Date".
   - Ensure logical progression. NO overlaps.
3. **Recalibration:** - If recalibrating: KEEP 'Completed' sessions fixed.
   - Reschedule 'Missed' or 'Pending' sessions to the future. If 'start_time' or 'end_time' is in the past, update it to be in the future, maintaining the duration.
4. **Output:** Return **ONLY** a single, valid JSON object.

## JSON SCHEMA
{
  "plan_summary": "Rich-text summary (HTML allowed). IMPORTANT: Ensure all double quotes within HTML attribute values are properly escaped for JSON, e.g., 'attribute=\\\"value\\\"').",
  "suggested_filename_base": "plan_name_YYYY",
  "stats": { "total_sessions": 0, "estimated_hours": 0, "subjects_count": 0 },
  "schedule": [
    {n      "id": 1,
      "subject": "Math",
      "topic": "Calculus I",
      "date": "YYYY-MM-DD",
      "start_time": "HH:MM",
      "end_time": "HH:MM",
      "duration_minutes": 60,
      "status": "Pending",
      "study_tip": "Tip text",
      "source_reference": "syllabus.pdf"
    }
  ]
}
"""


## PlannerAgent class ‚Äî generate and recalibrate study plans

**Purpose:** Encapsulate the LLM call workflow and translate user goals + file context into a validated plan JSON.

**Key method:**
- `generate_plan(user_goal, file_paths, current_plan_json=None)`
  - If `current_plan_json` provided: runs recalibration mode (preserve Completed sessions and reschedule Missed/Pending).
  - Otherwise: creates an initial plan from the natural language goal and any uploaded files.

**Process steps:**
1. Read uploaded files & add sanitized snippets to the prompt (limited to avoid token overflow).
2. Create a prompt that includes current date, user goal, and optional recalibration instructions.
3. Call the Gemini model with the `SYSTEM_PROMPT` and `GENERATION_CONFIG`.
4. Parse the returned JSON and return `plan_data`.

**Error handling:** Returns a JSON error payload if model call or parsing fails.

**Notes for reviewers:**
- This is the LLM interface layer ‚Äî keep it minimal and validate outputs before using them downstream.
- For local testing, consider mocking `self.model.generate_content` to avoid API calls.

In [17]:
class PlannerAgent:
    def __init__(self):
        self.model = genai.GenerativeModel(
            model_name=MODEL_NAME,
            system_instruction=SYSTEM_PROMPT
        )

    def generate_plan(self, user_goal, file_paths, current_plan_json=None):
        """Generates or Recalibrates a plan."""

        # 1. Process Files
        file_context = ""
        if file_paths:
            file_context = "\n--- UPLOADED FILES CONTENT ---\n"
            for fpath in file_paths:
                fname = os.path.basename(fpath)
                content = extract_text_from_file(fpath)
                # Limit content length to avoid token limits (approx 10k chars per file)
                file_context += f"Filename: {fname}\nContent: {content[:10000]}\n...\n"

        # 2. Build Prompt
        current_date = datetime.datetime.now().strftime("%Y-%m-%d %H:%M")
        prompt = f"Current Date: {current_date}\n\nUser Goal: {user_goal}\n{file_context}"

        if current_plan_json:
            prompt += "\n\n--- RECALIBRATION REQUEST ---\n"
            prompt += "The user wants to recalibrate this plan. Preserve 'Completed' sessions. "
            prompt += "Reschedule 'Missed' and 'Pending' sessions to the future starting from Current Date. "
            prompt += "If a session's 'start_time' or 'end_time' is in the past, update it to be in the future while maintaining the original duration.\n"
            prompt += json.dumps(current_plan_json)

        # 3. Call Gemini
        try:
            response = self.model.generate_content(prompt, generation_config=GENERATION_CONFIG)

            # Clean response (remove markdown code blocks if present)
            clean_text = response.text.replace("```json", "").replace("```", "").strip()

            plan_data = json.loads(clean_text)
            return plan_data

        except json.decoder.JSONDecodeError as e:
            print(f"JSON Decode Error: {e}")
            print(f"Problematic text: {clean_text[:500]}...") # Print a snippet of the problematic text
            return {"error": f"Failed to parse AI response as JSON: {e}", "plan_summary": "‚ö†Ô∏è Generation Failed due to invalid AI response. Please try again."}
        except Exception as e:
            error_message = f"An unexpected error occurred: {str(e)}"
            if "503" in str(e):
                error_message = "Gemini API service unavailable (503). Please try again later or check API status." \
                                "\nIt could also mean the model name is incorrect or your API key has issues."
            elif "authentication" in str(e).lower() or "api key" in str(e).lower():
                error_message = "Authentication failed. Please check your Google Gemini API key."
            elif "blocked" in str(e).lower():
                error_message = "Prompt blocked by safety settings or content policy. Please revise your query."

            print(f"Agent Error: {error_message}")
            return {"error": error_message, "plan_summary": "‚ö†Ô∏è Generation Failed. Please try again."}

agent = PlannerAgent()


## Gradio UI: Interactive Planner demo

**Purpose:** Provide a simple, interactive web interface for:
- Entering plain-English study goals
- Uploading syllabus files
- Generating a plan
- Marking sessions Completed or Missed
- Triggering AI-driven recalibration
- Downloading HTML/PDF exports
- Viewing basic stats and charts

**Major components:**
- **Plan Setup tab:** text input, file upload, Generate button, summary preview, download links
- **Tracker tab:** stats (total/completed/missed/pending), session update controls, charts, schedule table
- **Details tab:** inspect full session JSON

**Key functions wired to UI:**
- `handle_generation` ‚Äî calls `agent.generate_plan()` and updates the UI
- `update_dashboard` ‚Äî transforms plan JSON into DataFrame, stats, charts, download files
- `update_session_status` ‚Äî updates in-memory plan state (Completed/Missed) without API call
- `handle_recalibration` ‚Äî sends the current plan back to the agent to get an updated schedule

**Notes:**
- The UI uses `gr.State` to persist the plan in memory during the session.
- For production, persist state to a database and secure the Gemini key using environment secrets.
- This cell wires all UI events ‚Äî review callback signatures carefully when adding features.

In [18]:
def planner_ui():
    with gr.Blocks(theme=gr.themes.Soft(primary_hue="indigo"), title="PlannerAgent") as demo:

        # --- STATE MANAGEMENT ---
        # Stores the full JSON plan in memory
        plan_state = gr.State({})

        # --- HEADER ---
        gr.Markdown("# üéì AI Study Planner")

        with gr.Tabs():

            # ================= TAB 1: SETUP =================
            with gr.Tab("üìù Plan Setup"):
                with gr.Row():
                    with gr.Column(scale=2):
                        user_goal = gr.Textbox(
                            label="Study Goals & Constraints",
                            placeholder="E.g., I have a Data Structures exam on Dec 15th. I can study 2 hours every evening...",
                            lines=5
                        )
                        file_upload = gr.File(
                            label="Upload Syllabus or Notes (PDF/TXT)",
                            file_count="multiple",
                            file_types=[".pdf", ".txt", ".csv", ".md"]
                        )
                        generate_btn = gr.Button("üöÄ Generate New Plan", variant="primary")

                    with gr.Column(scale=1):
                        error_message_display = gr.Markdown("", elem_classes="error-message") # New component for errors
                        summary_box = gr.HTML(label="Strategy Summary", value="<p><i>Plan summary will appear here...</i></p>")
                        download_row = gr.Row(visible=False)
                        download_files = gr.File(label="Download Plan", interactive=False)

            # ================= TAB 2: TRACKER =================
            with gr.Tab("üìä Tracker"):
                # Statistics Row
                with gr.Row():
                    stat_total = gr.Number(label="Total Sessions", value=0)
                    stat_done = gr.Number(label="Completed", value=0)
                    stat_missed = gr.Number(label="Missed", value=0)
                    stat_pending = gr.Number(label="Pending", value=0)

                # Download option for Tracker tab
                download_row_tracker = gr.Row(visible=False)
                with download_row_tracker:
                    download_files_tracker = gr.File(label="Download Current Plan", interactive=False)

                with gr.Row():
                    # Left: Interactive Actions
                    with gr.Column(scale=1):
                        gr.Markdown("### Session Actions")
                        session_selector = gr.Dropdown(label="Select Session to Update", choices=[], type="value")

                        with gr.Row():
                            mark_done_btn = gr.Button("‚úÖ Complete", size="sm")
                            mark_missed_btn = gr.Button("‚ùå Missed", size="sm")

                        gr.Markdown("---")
                        recalibrate_btn = gr.Button("üîÑ Recalibrate Schedule (AI)", variant="secondary")
                        gr.Markdown("*Recalibrating sends your progress back to Gemini to reorganize future tasks.*")

                    # Right: The Schedule Table
                    with gr.Column(scale=3):
                        schedule_df = gr.Dataframe(
                            headers=["ID", "Date", "Subject", "Topic", "Status"],
                            datatype=["number", "str", "str", "str", "str"],
                            interactive=False,
                            label="Current Schedule"
                        )

                # Charts Row
                with gr.Row():
                    status_plot = gr.BarPlot(
                        x="Status",
                        y="Count",
                        title="Progress Overview",
                        tooltip=["Status", "Count"],
                        y_lim=[0, 20],
                        width=300
                    )
                    subject_plot = gr.BarPlot(
                        x="Subject",
                        y="Count",
                        title="Subject Distribution",
                        tooltip=["Subject", "Count"],
                        y_lim=[0, 20],
                        width=300
                    )

            # ================= TAB 3: DETAILS =================
            with gr.Tab("üìÖ Details"):
                gr.Markdown("### Selected Session Details")
                detail_json = gr.JSON(label="Full Session Data")

        # ================= LOGIC FUNCTIONS =================

        def update_dashboard(plan_data):
            """Refreshes all UI components based on the current JSON state."""
            if not plan_data or "error" in plan_data:
                return [
                    plan_data.get("plan_summary", ""), # summary_box
                    gr.update(visible=False), None, # Downloads Setup Tab
                    0, 0, 0, 0, # Stats
                    pd.DataFrame(), [], # Table, Dropdown
                    None, None, # Plots
                    {}, # JSON detail
                    gr.update(value=plan_data.get("error", "")), # error_message_display
                    gr.update(visible=False), None # Downloads Tracker Tab
                ]

            # 1. Prepare Data
            schedule = plan_data.get("schedule", [])
            df = pd.DataFrame(schedule)
            summary = plan_data.get("plan_summary", "No summary available.")

            # 2. Stats
            total = len(df)
            done = len(df[df['status'] == 'Completed']) if not df.empty else 0
            missed = len(df[df['status'] == 'Missed']) if not df.empty else 0
            pending = total - done - missed

            # 3. Plots Data Preparation
            if not df.empty:
                status_counts = df['status'].value_counts().reset_index()
                status_counts.columns = ['Status', 'Count']

                subject_counts = df['subject'].value_counts().reset_index()
                subject_counts.columns = ['Subject', 'Count']

                # Dropdown choices
                dropdown_choices = [f"{row['id']} - {row['topic'][:30]}..." for _, row in df.iterrows()]
            else:
                status_counts = pd.DataFrame({'Status': [], 'Count': []})
                subject_counts = pd.DataFrame({'Subject': [], 'Count': []})
                dropdown_choices = []

            # 4. Table view (subset of columns)
            display_df = df[['id', 'date', 'subject', 'topic', 'status']] if not df.empty else pd.DataFrame()

            # 5. Exports
            files = export_plan_to_files(plan_data)

            return [
                summary,
                gr.update(visible=True), files,
                total, done, missed, pending,
                display_df,
                gr.update(choices=dropdown_choices),
                status_counts,
                subject_counts,
                plan_data, # Update state
                gr.update(value=""), # Clear error message on success
                gr.update(visible=True), files # Downloads Tracker Tab
            ]

        def handle_generation(goal, files):
            if not goal:
                # Update only the error message, keep other outputs as current state or empty
                return gr.update(value="Please enter a goal."), *([gr.update()] * 11), gr.update(visible=False), None

            # Call Agent
            plan = agent.generate_plan(goal, files)

            return update_dashboard(plan)

        def handle_recalibration(current_plan):
            if not current_plan or not current_plan.get("schedule"):
                # Update only the error message, keep other outputs as current state or empty
                return gr.update(value="No plan to recalibrate."), *([gr.update()] * 11), gr.update(visible=False), None

            # Call Agent with existing plan context
            new_plan = agent.generate_plan("Recalibrate based on status updates.", [], current_plan_json=current_plan)

            return update_dashboard(new_plan)

        def update_session_status(current_plan, selected_str, new_status):
            """Updates local JSON state without calling API."""
            if not selected_str or not current_plan:
                return update_dashboard(current_plan) # No change

            # Extract ID from string "1 - Topic..."
            try:
                s_id = int(selected_str.split(' - ')[0])

                # Update in memory
                for session in current_plan['schedule']:
                    if session['id'] == s_id:
                        session['status'] = new_status
                        break

                return update_dashboard(current_plan)
            except:
                return update_dashboard(current_plan)

        def show_session_details(current_plan, selected_str):
            if not selected_str or not current_plan:
                return {}
            try:
                s_id = int(selected_str.split(' - ')[0])
                for session in current_plan['schedule']:
                    if session['id'] == s_id:
                        return session
                return {}
            except:
                return {}

        # ================= EVENT WIRING =================

        # 1. Generate Plan
        generate_btn.click(
            handle_generation,
            inputs=[user_goal, file_upload],
            outputs=[
                summary_box, download_row, download_files,
                stat_total, stat_done, stat_missed, stat_pending,
                schedule_df, session_selector, status_plot, subject_plot,
                plan_state, error_message_display,
                download_row_tracker, download_files_tracker
            ]
        )

        # 2. Update Status (Completed)
        mark_done_btn.click(
            lambda p, s: update_session_status(p, s, "Completed"),
            inputs=[plan_state, session_selector],
            outputs=[
                summary_box, download_row, download_files,
                stat_total, stat_done, stat_missed, stat_pending,
                schedule_df, session_selector, status_plot, subject_plot,
                plan_state, error_message_display,
                download_row_tracker, download_files_tracker
            ]
        )

        # 3. Update Status (Missed)
        mark_missed_btn.click(
            lambda p, s: update_session_status(p, s, "Missed"),
            inputs=[plan_state, session_selector],
            outputs=[
                summary_box, download_row, download_files,
                stat_total, stat_done, stat_missed, stat_pending,
                schedule_df, session_selector, status_plot, subject_plot,
                plan_state, error_message_display,
                download_row_tracker, download_files_tracker
            ]
        )

        # 4. Recalibrate
        recalibrate_btn.click(
            handle_recalibration,
            inputs=[plan_state],
            outputs=[
                summary_box, download_row, download_files,
                stat_total, stat_done, stat_missed, stat_pending,
                schedule_df, session_selector, status_plot, subject_plot,
                plan_state, error_message_display,
                download_row_tracker, download_files_tracker
            ]
        )

        # 5. Show Details on Dropdown Change
        session_selector.change(
            show_session_details,
            inputs=[plan_state, session_selector],
            outputs=[detail_json]
        )

    return demo

# Launch the App
app = planner_ui()
app.launch(debug=True)


  with gr.Blocks(theme=gr.themes.Soft(primary_hue="indigo"), title="PlannerAgent") as demo:


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://e0c2dea8d6f6af81ec.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


ERROR:tornado.access:503 POST /v1beta/models/gemini-2.5-flash-preview-09-2025:generateContent?%24alt=json%3Benum-encoding%3Dint (::1) 50854.34ms


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://e0c2dea8d6f6af81ec.gradio.live


