<a href="https://colab.research.google.com/github/Pakeetharan/ai-study-guide/blob/main/Study_Guide_Generator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üìö AI Study Guide Generator
**Turn your lecture slides and textbooks into professional exam notes and practice questions.**

### **How to use this tool:**
1. **Check Settings:** Go to top menu `Runtime` -> `Change runtime type` and ensure **T4 GPU** is selected.
2. **Initialize:** Click the **Play** button on **Step 1** below. Wait for it to say "System Ready" (~2 mins).
3. **Upload & Run:** Click the **Play** button on **Step 2**.
    * You will be asked to connect to **Google Drive** (this is to safely save your final PDF).
    * Click **"Choose Files"** to upload your PDFs. You can upload multiple files (e.g., *Week1.pdf, Week2.pdf*) at once.
4. **Get Results:** The AI will analyze each document separately and save a `Study_Guide_TIMESTAMP.pdf` into your Google Drive folder: `My Drive > AI_Study_Notes`.

---
**üí° Pro Tip:** Upload separate PDF files for each lecture topic instead of merging them. This helps the AI generate specific practice questions for every single topic.

In [4]:
# @title üöÄ Step 1: Initialize System
# @markdown Installs the AI engine, OCR tools, and PDF processors. It takes about **2 minutes**.
# @markdown You only need to run this once per session.

import os, sys, subprocess
import logging, warnings

logging.getLogger("pdfminer").setLevel(logging.ERROR)
warnings.filterwarnings("ignore")

print("‚è≥ Installing System Dependencies & Fonts...")
with open(os.devnull, 'w') as devnull:
    subprocess.run(["apt-get", "update"], stdout=devnull, stderr=devnull)
    # Added 'fonts-roboto' for better typography
    subprocess.run(["apt-get", "install", "-y", "tesseract-ocr", "poppler-utils",
                    "libcairo2", "libpango-1.0-0", "libgdk-pixbuf2.0-0", "libffi-dev",
                    "fonts-roboto"], stdout=devnull, stderr=devnull)

    pkgs = [
        "transformers", "accelerate", "bitsandbytes", "langchain-huggingface",
        "langchain-text-splitters", "langchain-community", "langchain-core",
        "pdfplumber", "pdf2image", "pytesseract", "markdown", "weasyprint",
        "tiktoken", "tqdm", "numpy"
    ]
    subprocess.run([sys.executable, "-m", "pip", "install"] + pkgs, stdout=devnull, stderr=devnull)

import torch
import pdfplumber
import pytesseract
import markdown
from datetime import datetime
from tqdm import tqdm
from pdf2image import convert_from_path
from google.colab import files, drive
from weasyprint import HTML, CSS
from weasyprint.text.fonts import FontConfiguration
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig
from langchain_huggingface import HuggingFacePipeline
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.prompts import PromptTemplate

print("‚è≥ Loading Llama-3-8B (Context-Aware Mode)...")
model_id = "NousResearch/Meta-Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True, bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True
)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=2048, model_kwargs={"temperature": 0.3}, return_full_text=False)
llm = HuggingFacePipeline(pipeline=pipe)

print("‚úÖ System Ready.")

‚è≥ Installing System Dependencies & Fonts...
‚è≥ Loading Llama-3-8B (Context-Aware Mode)...


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Device set to use cuda:0


‚úÖ System Ready.


In [12]:
pip install gc

[31mERROR: Could not find a version that satisfies the requirement gc (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for gc[0m[31m
[0m

In [None]:
# @title üìÇ Step 2: Upload Files & Generate Guide
# @markdown **Instructions:**
# @markdown 1. Run this cell to connect to Drive.
# @markdown 2. Upload your PDFs when the button appears.
# @markdown 3. The AI will process each file and save the result to `My Drive > AI_Study_Notes`.

import os
from google.colab import drive, files
import gc
import torch

# --- 1. Drive Connection ---
print("üîå Checking Google Drive connection...")
if not os.path.exists('/content/drive'):
    drive.mount('/content/drive')
else:
    print("‚úÖ Drive is already connected.")

output_folder = "/content/drive/My Drive/AI_Study_Notes"
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

# --- 2. Configuration ---
generate_exercises = True # @param {type:"boolean"}
OPTIMAL_CHUNK_SIZE = 7000
CHUNK_OVERLAP = 200

def extract_text_from_file(filename):
    text = ""
    try:
        with pdfplumber.open(filename) as pdf:
            for page in pdf.pages:
                extracted = page.extract_text()
                if extracted: text += extracted + "\n"
        if len(text) < 500: # OCR Fallback
            print(f"   ‚ö†Ô∏è Scanned content detected in {filename}. Running OCR...")
            images = convert_from_path(filename)
            for img in images: text += pytesseract.image_to_string(img) + "\n"
    except Exception as e: print(f"   ‚ùå Error reading {filename}: {e}")
    return text

def run_pipeline():
    print("\n" + "="*40)
    print("   ‚¨áÔ∏è  CLICK THE BUTTON BELOW TO UPLOAD  ‚¨áÔ∏è")
    print("="*40)
    uploaded = files.upload()

    if not uploaded:
        print("‚ùå No files uploaded.")
        return

    all_notes_markdown = ""
    all_exercises_markdown = ""

    for i, filename in enumerate(uploaded.keys()):
        print(f"\nüöÄ Processing File {i+1}/{len(uploaded)}: {filename}...")

        # Memory Cleanup
        torch.cuda.empty_cache()
        gc.collect()

        raw_text = extract_text_from_file(filename)
        if not raw_text.strip(): continue

        splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(
            tokenizer, chunk_size=OPTIMAL_CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP
        )
        docs = splitter.create_documents([raw_text])

        # --- Context-Aware Notes ---
        print(f"   üìù Generating Notes ({len(docs)} sections)...")
        # Removed emoji from header to prevent box character in PDF
        file_notes = f"# Module: {filename}\n"

        note_prompt = PromptTemplate.from_template(
            """
            You are an expert Professor. Analyze this text section:
            "{text}"

            TRANSFORM THIS INTO STUDY NOTES.

            Formatting Rules:
            1. **Comparisons:** If comparing items, create a Markdown Table.
            2. **Formulas:** Use Code Blocks (```) for math.
            3. **Concepts:** Use bold headers.
            4. **Summary:** End with a bullet-point summary.

            Output strictly in Markdown.
            """
        )

        for doc_idx, doc in enumerate(tqdm(docs, desc="   > Analyzing", leave=False)):
            try:
                messages = [{"role": "user", "content": note_prompt.format(text=doc.page_content)}]
                fmt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

                outputs = pipe(fmt, max_new_tokens=1500, pad_token_id=tokenizer.eos_token_id)
                res = outputs[0]["generated_text"]

                clean_res = res.split("assistant")[-1].strip() if "assistant" in res else res
                file_notes += f"\n{clean_res}\n"

            except Exception as e:
                print(f"\n   ‚ùå Error on Section {doc_idx}: {str(e)}")
                torch.cuda.empty_cache()

        all_notes_markdown += file_notes + "\n\n<div class='page-break'></div>\n\n"

        if generate_exercises:
            print(f"   üß† Designing Practice Questions...")
            mid = len(raw_text) // 4
            sample_context = raw_text[mid : mid + OPTIMAL_CHUNK_SIZE]

            # UPDATED PROMPT: Explicitly asks for vertical list formatting
            ex_prompt = f"""
            Create an Exam Section based on:
            "{sample_context}"

            Requirements:
            1. **3 Multiple Choice Questions.** - CRITICAL: Format options on new lines.
               - Example:
                 1. Question?
                    a) Option
                    b) Option

            2. **2 Short Answer Questions.**

            3. **Answer Key:**
               - Format as Blockquote (>).
               - Example: > 1. a) Explanation...
            """

            try:
                messages = [{"role": "user", "content": ex_prompt}]
                fmt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
                res = pipe(fmt, max_new_tokens=1500, pad_token_id=tokenizer.eos_token_id)[0]["generated_text"].split("assistant")[-1].strip()
                # Removed emoji from header
                all_exercises_markdown += f"## Practice: {filename}\n{res}\n\n<div class='page-break'></div>\n\n"
            except Exception as e:
                print(f"   ‚ùå Error generating exercises: {e}")

    # --- PDF Rendering ---
    print("\nüíæ Rendering Professional PDF...")
    final_md = f"""
    {all_notes_markdown}
    # Part 2: Practice Workbook
    {all_exercises_markdown}
    """

    html_content = markdown.markdown(final_md, extensions=['extra', 'codehilite', 'tables', 'fenced_code'])

    # CSS Updates:
    # 1. Removed @import (uses local fonts).
    # 2. Added specific styling for lists (li) to fix MCQ bunching.
    css = CSS(string="""
        @page { size: A4; margin: 2cm; }
        body {
            font-family: 'Roboto', 'Helvetica', 'Arial', sans-serif;
            font-size: 11pt;
            line-height: 1.6;
            color: #2d3436;
        }
        h1 { color: #2c3e50; border-bottom: 3px solid #3498db; padding-bottom: 10px; margin-top: 40px; font-weight: 700;}
        h2 { color: #e67e22; margin-top: 25px; font-weight: 400; border-left: 5px solid #e67e22; padding-left: 10px;}
        strong { color: #2980b9; }

        /* List Styling for MCQs */
        ul, ol { margin-bottom: 15px; padding-left: 20px; }
        li { margin-bottom: 5px; }

        table { width: 100%; border-collapse: collapse; margin: 20px 0; box-shadow: 0 2px 5px rgba(0,0,0,0.1); }
        th { background-color: #34495e; color: white; padding: 12px; text-align: left; }
        td { border: 1px solid #dfe6e9; padding: 10px; }
        tr:nth-child(even) { background-color: #f1f2f6; }

        pre { background-color: #f5f6fa; border: 1px solid #dcdde1; border-radius: 5px; padding: 15px; font-family: 'Courier New', monospace; }
        blockquote { background: #f0f8ff; border-left: 5px solid #3498db; margin: 10px 0; padding: 10px 20px; color: #555; }
        .page-break { page-break-after: always; }
    """)

    font_config = FontConfiguration()
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    output_filename = os.path.join(output_folder, f"Study_Guide_{timestamp}.pdf")

    HTML(string=html_content, base_url='.').write_pdf(output_filename, stylesheets=[css], font_config=font_config)
    print(f"üéâ Guide Saved: {output_filename}")

run_pipeline()

üîå Checking Google Drive connection...
‚úÖ Drive is already connected.

   ‚¨áÔ∏è  CLICK THE BUTTON BELOW TO UPLOAD  ‚¨áÔ∏è


Saving CogSys_MCS4201_Note01.pdf to CogSys_MCS4201_Note01 (4).pdf
Saving CogSys_MCS4201_Note02.pdf to CogSys_MCS4201_Note02 (4).pdf

üöÄ Processing File 1/2: CogSys_MCS4201_Note01 (4).pdf...
   üìù Generating Notes (1 sections)...


                                                     


   ‚ùå Error on Section 0: CUDA out of memory. Tried to allocate 2.90 GiB. GPU 0 has a total capacity of 14.74 GiB of which 1.56 GiB is free. Process 2478 has 13.18 GiB memory in use. Of the allocated memory 8.79 GiB is allocated by PyTorch, and 4.26 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
   üß† Designing Practice Questions...


