Coverage-aware course notes from lecture slides
Turn PPT/PDF into readable, traceable notes with images, OCR/vision, Lecture-Weave writing, and coverage checks.
Not just a slide summarizer, but a faithful study-document pipeline.
English | 中文 | Docs | Config | Roadmap
- Quick Start
- Optional GUI
- Pipeline And Presets
- Origin
- Setup
- Common Workflows
- Technical Docs
- Future Outlook
- License And Acknowledgements
git clone https://github.com/Cat-blizzard/SlideNote.git
cd SlideNote
.\install.ps1
.\run_gui.ps1The setup script creates .venv, installs SlideNote with GUI/LLM extras, and runs slidenote doctor. The GUI lets you paste API keys in the page for a single run, so you do not have to set terminal environment variables first.
Manual setup is still available:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -e ".[dev,llm]"
python -m slidenote doctorFor a local preview without API calls:
python -m slidenote build path\to\lecture.pdf --out outputs\local --preset local --export markdown-zipAfter the first install, run this Local preview command first. Confirm that notes.md and the shareable notes.zip are generated before switching to the lecture quality workflow.
For higher-quality notes with visual understanding:
$env:DASHSCOPE_API_KEY="..."
$env:DEEPSEEK_API_KEY="..."
python -m slidenote build path\to\lecture.pdf --out outputs\lecture --provider deepseek --export markdown-zipOpen outputs\lecture\notes.md after generation. Images are copied into outputs\lecture\notes.assets\ by default.
SlideNote Studio is a Streamlit interface around the same CLI pipeline. It supports uploading PPT/PDF files, entering API keys in the page, selecting presets, watching progress and ETA, reviewing token/cost reports, checking page-level sources, and downloading generated results.
.\run_gui.ps1See gui/README_GUI.md and gui/README_GUI.zh-CN.md for GUI details.
SlideNote is organized as a five-stage product pipeline. Low-level modules can stay fine-grained for caching, debugging, and partial refresh; the user-facing workflow should remain simple.
Ingest -> Understand -> Write -> Guard -> Export
| Stage | Purpose | Main artifacts |
|---|---|---|
| 1. Ingest | Parse PPT/PDF into stable, traceable structure. | content.json, element_ir.json, source_map.json, screenshots, assets, parser adapters |
| 2. Understand | Decide what the courseware is teaching. | deck_understanding.json, page_understanding.json, sections.json, deck_brief.json, figure/table understanding |
| 3. Write | Turn structured material into readable study notes. | notes.md, Lecture-Weave page notes, teaching enrichment |
| 4. Guard | Check faithfulness, coverage, and study quality. | coverage.json, coverage.md, content_guard.json, quality_report.json |
| 5. Export | Publish notes and reports. | notes.zip, notes.toc.md, notes.docx, notes.pdf, notes.tex; review/exam packs are generated separately by study-pack |
More detail: SlideNote Pipeline.
Use top-level --preset for product workflows. Everyday users now only need two modes: the default lecture mode and the no-API local mode.
| Preset | Best for | Behavior |
|---|---|---|
lecture |
Teacher-style detailed lecture notes. | Enables LLM, OCR auto, Vision auto, Lecture-Weave, deck brief, content guard, and teaching enrichment. |
local |
No API key, offline preview, parser checks. | Uses local rules only and does not call text, vision, or OCR APIs. |
python -m slidenote build lecture.pdf --out outputs\lecture --provider deepseek
python -m slidenote build lecture.pdf --out outputs\local --preset localMore detail: User Presets.
SlideNote started from a very personal learning problem.
I have never been the kind of student who learns best by simply listening to lectures. Sometimes I cannot fully follow a teacher's explanation in real time, and I usually learn more efficiently by reading. Reading lets me slow down, go back, skip ahead, and control the pace of understanding by myself.
But lecture slides are not the same as readable notes. After class, reading the PPT directly often feels incomplete: the bullets are fragmented, the logic is implicit, and many important details live in diagrams, screenshots, formulas, or the teacher's spoken explanation. Manually rewriting everything into notes is possible, but it is time-consuming, hard to keep complete, and not always pleasant to revisit later.
So I wanted to build a tool that could turn course slides into structured, readable, traceable notes: not just a summary, but a faithful learning document that preserves images, keeps page references, checks coverage, and helps convert lecture materials into something I can actually study from.
That idea became SlideNote.
SlideNote does not require a local GPU. The local parser can run with only Python dependencies; LLM rewriting, OCR, and visual understanding require API keys for the providers you choose.
Minimum setup:
- Python
3.10or newer. - A virtual environment is recommended.
- New users can run
.\install.ps1and then.\run_gui.ps1. python -m pip install -e ".[dev]"for local parsing.python -m pip install -e ".[dev,llm]"for LLM providers.
Optional software:
| Software | Purpose |
|---|---|
| LibreOffice | Converts .ppt / .pptx to PDF and enables full-slide screenshots when PowerPoint is unavailable. |
Microsoft PowerPoint + pywin32 |
Windows-only PPTX screenshot export route. |
| Pandoc | Word and LaTeX export. |
| LibreOffice + Pandoc | PDF export from notes.docx, usually more stable for CJK layout. |
Configuration details live in CONFIG.zh-CN.md. The build entrypoint is intentionally small; provider, OCR, Vision, and cache details are handled mostly through strong defaults and environment variables.
Local rule-based draft:
python -m slidenote build path\to\lecture.pptx --out outputs\local --preset local --export markdown-zipTeacher-style lecture notes:
python -m slidenote build path\to\lecture.pdf `
--out outputs\lecture-notes `
--provider deepseek `
--export markdown-zipReview and exam pack:
python -m slidenote build path\to\lecture.pdf `
--out outputs\lecture-review `
--provider deepseek
python -m slidenote study-pack outputs\lecture-review --question-count 20Text-only lecture notes:
python -m slidenote build path\to\lecture.pdf `
--out outputs\text-only `
--provider deepseek `
--vision offREADME is intentionally kept as a landing page. Detailed behavior lives in the docs:
| Topic | Link |
|---|---|
| Documentation index | docs/index.zh-CN.md |
| Pipeline stages | docs/pipeline.zh-CN.md |
| Presets | docs/presets.zh-CN.md |
| Coverage, content guard, quality report, review/exam packs | docs/quality-and-guard.zh-CN.md |
| Element IR, source map, assets | docs/ir-and-source-map.zh-CN.md |
| LLM providers, OCR, vision, cache, cost | docs/providers-and-cost.zh-CN.md |
| Roadmap design notes | docs/roadmap/extension-notes.zh-CN.md |
The main output is notes.md. To share Markdown notes with images, export notes.zip; it contains notes.md and the notes.assets/ image folder. Depending on options, SlideNote can also write content.json, deck_understanding.json, page_understanding.json, element_ir.json, source_map.json, coverage.md, quality_report.json, review.md, exam.md, exam.json, exam.html, notes.docx, notes.pdf, and other reports.
SlideNote is built with a hopeful assumption: future AI systems will become stronger, faster, cheaper, and easier to orchestrate through mature open-source agent frameworks. If that happens, this project should not merely run the same prompts for less money. Its ceiling should rise.
Models and providers such as DeepSeek are one example of the direction that makes this exciting: better price/performance, broader access, and a more open ecosystem can make high-quality multi-pass workflows practical for ordinary study materials. When API latency drops and agent frameworks become more reliable, SlideNote can afford to run richer stages by default: deeper deck understanding, page-level visual reasoning, teacher-style section writing, teaching enrichment, coverage repair, exam generation, wrong-answer review, and source verification.
The reason this matters is that SlideNote's bottleneck is not only "can the model summarize a slide?" The harder problem is coordinating parsing, vision, writing, grounding, quality checks, and revision without losing traceability. That is why the project invests in element_ir.json, source_map.json, coverage reports, artifact registries, presets, cache keys, and review/exam packs. Those structures let SlideNote absorb future model gains without being tied to one model, provider, or agent runtime.
The long-term vision is:
SlideNote should grow from a courseware converter into a course learning operating system.
In that version, slides, readings, personal notes, figures, formulas, quizzes, mistakes, and revisions all live in one traceable learning workflow.
SlideNote deliberately avoids this shortcut:
PPT -> LLM -> Summary
Instead, it follows:
PPT/PDF -> structured extraction -> source inventory -> note generation -> coverage check -> export
The local rule-based draft is only a baseline for debugging extraction and coverage. Production notes should use the default lecture preset, while coverage checks still rely on element IDs so the model cannot silently summarize away details.
SlideNote uses a dual-license structure:
- Source code is licensed under the GNU Affero General Public License v3.0 or later (
AGPL-3.0-or-later). See LICENSE. - Documentation and example educational materials are licensed under Creative Commons Attribution 4.0 International (
CC BY 4.0). See LICENSES/CC-BY-4.0.txt.
The SlideNote name, logo, and other brand assets are not licensed for standalone reuse. See NOTICE for the exact scope.
- SlideNote's optional review/exam study-pack workflow was conceptually inspired by WUBING2023/ExamPass-Assistant and the extended MIKUZ12/ExamPass-Assistant fork. SlideNote does not reuse their code, templates, prompts, or assets.
- GUI development contributions from hongzuoj-pixel.
- Testing contributions from MOm0-000.
- SlideNote's parser-adapter and document-IR roadmap is informed by prior art such as Microsoft MarkItDown, Docling, Marker, MinerU, and Unstructured.
- SlideNote's future retrieval, source tracing, and post-generation QA direction is informed by systems such as RAGFlow. These projects are references and inspirations, not bundled dependencies unless explicitly listed elsewhere.
- OpenAI Chat Completions API
- OpenAI Images and vision
- DeepSeek API
- Alibaba Cloud Model Studio OpenAI-compatible API
- Volcengine Ark OpenAI SDK compatibility
- Zhipu GLM OpenAI compatibility
- Baidu OCR API
- Mathpix OCR API
- Google Cloud Vision OCR
- Gemini generateContent API
- Gemini image understanding
- Claude Messages API
- Claude Vision
