# üéì Lecture to Notes - GPU Accelerated

Transform long lecture videos (2-3 hours) into detailed, structured markdown notes using:
- **Whisper Large-v3** for transcription (via faster-whisper)
- **Qwen 2.5-14B** for intelligent note generation

‚ö° **Requirements**: Google Colab with T4 GPU (free tier works!)

---

## 1Ô∏è‚É£ Setup & Dependencies
Run this cell first - takes ~2-3 minutes

In [None]:
# Check GPU availability
!nvidia-smi

# Install dependencies
!pip install -q faster-whisper transformers accelerate bitsandbytes sentencepiece
!pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

print("\n‚úÖ Dependencies installed!")

## 2Ô∏è‚É£ Upload Your Lecture Video
Supports MP4 files up to 600MB

In [None]:
from google.colab import files
import os

print("üì§ Please upload your lecture video (MP4, max 600MB)...")
uploaded = files.upload()

VIDEO_PATH = list(uploaded.keys())[0]
VIDEO_NAME = os.path.splitext(VIDEO_PATH)[0]

print(f"\n‚úÖ Uploaded: {VIDEO_PATH}")
print(f"üìä Size: {os.path.getsize(VIDEO_PATH) / (1024*1024):.1f} MB")

## 3Ô∏è‚É£ Transcribe with Whisper Large-v3
This uses faster-whisper for GPU-accelerated transcription

In [None]:
from faster_whisper import WhisperModel
import time

print("üîÑ Loading Whisper Large-v3 model...")
model = WhisperModel("large-v3", device="cuda", compute_type="float16")
print("‚úÖ Model loaded!")

print(f"\nüé§ Transcribing: {VIDEO_PATH}")
print("‚è≥ This may take 5-15 minutes for a 2-3 hour video...\n")

start_time = time.time()

segments, info = model.transcribe(
    VIDEO_PATH,
    beam_size=5,
    language="en",
    vad_filter=True,
    vad_parameters=dict(min_silence_duration_ms=500)
)

# Collect all segments with timestamps
transcript_segments = []
full_transcript = ""

for segment in segments:
    transcript_segments.append({
        "start": segment.start,
        "end": segment.end,
        "text": segment.text.strip()
    })
    full_transcript += segment.text + " "
    
    # Progress indicator
    if len(transcript_segments) % 50 == 0:
        print(f"  Processed {len(transcript_segments)} segments...")

elapsed = time.time() - start_time
print(f"\n‚úÖ Transcription complete!")
print(f"‚è±Ô∏è Time taken: {elapsed/60:.1f} minutes")
print(f"üìù Total segments: {len(transcript_segments)}")
print(f"üìä Transcript length: {len(full_transcript.split())} words")

# Free up GPU memory
del model
import torch
torch.cuda.empty_cache()

## 4Ô∏è‚É£ Load Qwen 2.5-14B for Note Generation
Using 4-bit quantization to fit in T4 GPU memory

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

MODEL_ID = "Qwen/Qwen2.5-14B-Instruct"

print(f"üîÑ Loading {MODEL_ID} with 4-bit quantization...")
print("‚è≥ This takes 3-5 minutes on first run...\n")

# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
llm_model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

print("‚úÖ LLM loaded and ready!")

## 5Ô∏è‚É£ Generate Structured Notes
Processing transcript in chunks for comprehensive notes

In [None]:
def chunk_transcript(text, chunk_size=3000, overlap=200):
    """Split transcript into overlapping chunks for processing."""
    words = text.split()
    chunks = []
    start = 0
    
    while start < len(words):
        end = min(start + chunk_size, len(words))
        chunk = ' '.join(words[start:end])
        chunks.append(chunk)
        start = end - overlap if end < len(words) else end
    
    return chunks

def generate_notes(text, section_num, total_sections):
    """Generate detailed notes for a transcript section."""
    
    prompt = f"""You are an expert note-taker. Create detailed, comprehensive study notes from this lecture transcript section ({section_num}/{total_sections}).

TRANSCRIPT:
{text}

Create notes following this structure:
1. **Main Topics** - Key subjects covered
2. **Detailed Explanations** - In-depth coverage of concepts
3. **Key Definitions** - Important terms and their meanings
4. **Examples Given** - Any examples or case studies mentioned
5. **Important Points** - Crucial takeaways
6. **Connections** - How topics relate to each other

Be thorough and detailed. Use markdown formatting with headers, bullet points, and emphasis.
Do NOT summarize - expand and explain the concepts clearly for study purposes."""

    messages = [
        {"role": "system", "content": "You are a meticulous academic note-taker who creates comprehensive, detailed study notes."},
        {"role": "user", "content": prompt}
    ]
    
    text_input = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(text_input, return_tensors="pt").to("cuda")
    
    with torch.no_grad():
        outputs = llm_model.generate(
            **inputs,
            max_new_tokens=4096,
            temperature=0.7,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
    return response

# Process transcript in chunks
print("üìù Processing transcript and generating notes...\n")
chunks = chunk_transcript(full_transcript)
print(f"üìä Split into {len(chunks)} sections for processing\n")

all_notes = []
for i, chunk in enumerate(chunks, 1):
    print(f"üîÑ Processing section {i}/{len(chunks)}...")
    notes = generate_notes(chunk, i, len(chunks))
    all_notes.append(notes)
    print(f"   ‚úÖ Section {i} complete ({len(notes.split())} words)")

print("\n‚úÖ All sections processed!")

## 6Ô∏è‚É£ Generate Final Summary & Compile Notes

In [None]:
def generate_executive_summary(notes_text):
    """Generate an executive summary of the entire lecture."""
    
    # Take first 4000 words of notes for summary
    summary_input = ' '.join(notes_text.split()[:4000])
    
    prompt = f"""Based on these lecture notes, create a comprehensive executive summary:

{summary_input}

Create:
1. **Lecture Overview** (3-4 paragraphs)
2. **Key Learning Objectives** (bullet points)
3. **Main Topics Covered** (with brief descriptions)
4. **Critical Takeaways** (most important points to remember)
5. **Study Recommendations** (what to focus on for exams/understanding)

Be comprehensive but concise."""

    messages = [
        {"role": "system", "content": "You are an expert at synthesizing academic content into clear summaries."},
        {"role": "user", "content": prompt}
    ]
    
    text_input = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(text_input, return_tensors="pt").to("cuda")
    
    with torch.no_grad():
        outputs = llm_model.generate(
            **inputs,
            max_new_tokens=2048,
            temperature=0.7,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    return tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)

# Combine all notes
combined_notes = "\n\n".join(all_notes)

print("üìã Generating executive summary...")
executive_summary = generate_executive_summary(combined_notes)
print("‚úÖ Summary generated!")

# Compile final document
from datetime import datetime

final_notes = f"""# üìö {VIDEO_NAME}

**Generated**: {datetime.now().strftime('%Y-%m-%d %H:%M')}
**Source**: {VIDEO_PATH}
**Duration**: ~{len(transcript_segments) * 3 // 60} minutes (estimated)
**Model**: Qwen 2.5-14B + Whisper Large-v3

---

# üìã Executive Summary

{executive_summary}

---

# üìù Detailed Notes

"""

for i, notes in enumerate(all_notes, 1):
    final_notes += f"\n## Part {i}\n\n{notes}\n\n---\n"

# Add transcript reference at the end
final_notes += f"""
# üìú Full Transcript

<details>
<summary>Click to expand full transcript ({len(full_transcript.split())} words)</summary>

{full_transcript}

</details>
"""

print(f"\nüìä Final notes: {len(final_notes.split())} words")

## 7Ô∏è‚É£ Save & Download Notes

In [None]:
# Save to file
output_filename = f"{VIDEO_NAME}_notes.md"

with open(output_filename, 'w', encoding='utf-8') as f:
    f.write(final_notes)

print(f"‚úÖ Notes saved to: {output_filename}")
print(f"üìä File size: {os.path.getsize(output_filename) / 1024:.1f} KB")

# Download the file
print("\nüì• Downloading notes file...")
files.download(output_filename)

print("\nüéâ Done! Your detailed lecture notes are ready.")

## üîß Optional: Save Transcript Separately

In [None]:
# Save timestamped transcript
transcript_filename = f"{VIDEO_NAME}_transcript.txt"

with open(transcript_filename, 'w', encoding='utf-8') as f:
    for seg in transcript_segments:
        mins = int(seg['start'] // 60)
        secs = int(seg['start'] % 60)
        f.write(f"[{mins:02d}:{secs:02d}] {seg['text']}\n")

print(f"‚úÖ Transcript saved to: {transcript_filename}")
files.download(transcript_filename)

---

## ‚è±Ô∏è Expected Processing Times

| Video Length | Transcription | Note Generation | Total |
|-------------|---------------|-----------------|-------|
| 1 hour | ~3-5 min | ~5-8 min | ~10-15 min |
| 2 hours | ~6-10 min | ~10-15 min | ~20-25 min |
| 3 hours | ~10-15 min | ~15-20 min | ~30-40 min |

**Note**: First run takes longer due to model downloads (~10-15 min additional).