Video Course Cards

Turn long course videos into timestamp-grounded knowledge cards, local card memory, and Obsidian-friendly Markdown.

Download | Desktop build | Local LLM setup | Roadmap | RAG plan

Overview

Video Course Cards is a local-first AI learning workspace for lecture videos. It turns a video into a transcript, cuts the transcript into semantic chunks, drafts grounded knowledge cards with a local LLM, stores everything in SQLite, embeds cards for retrieval, and exports portable Markdown snapshots.

The project is not trying to be another generic "chat with your transcript" demo. The core object is a claim-grounded knowledge card: a structured learning unit whose claims point back to transcript evidence and timestamps. And the future plan is to turn these cards in to a graph which can serve as an external world model that a controller can plan over, while a decoder can take the plan and graph as an input to generate answers.

video -> transcript -> semantic chunks -> grounded cards -> card embeddings -> retrieval -> Markdown export

SQLite is the source of truth. Markdown is an export format.

Why This Exists

Long technical lectures contain more than raw transcript text. A useful learning system should preserve where an idea came from, what evidence supports it, how it connects to other cards, and how the user later edits or rejects it.

This repository explores that pipeline as a local desktop application:

Grounded generation: cards keep claims, evidence, and source timestamps.
Local-first storage: videos, transcripts, cards, embeddings, and notes stay on the user's machine.
Structured memory: cards are JSON/SQLite records before they become Markdown.
Retrieval baseline: card embeddings support ordinary dense retrieval before more advanced graph-guided methods.
Portable output: exports are Obsidian-friendly Markdown snapshots.

Current Demo

The current demo runs on Windows as a Tauri desktop app with a packaged FastAPI sidecar.

It can:

upload local videos;
validate media with ffprobe;
extract audio with FFmpeg;
transcribe with faster-whisper;
show transcript segments next to the course workspace;
create semantic transcript chunks with Sentence Transformer embeddings;
generate cards manually from selected transcript text or automatically from chunks;
save, edit, delete, tag, and review cards;
attach user notes to cards;
embed cards and run dense card retrieval;
export one job or all cards as Markdown folders;
check local runtime dependencies such as FFmpeg, Ollama/Qwen, and embedding models.

Still rough:

the installer is not code-signed;
Windows is the only packaged target currently exercised;
Ollama, Qwen, FFmpeg, and model caches are user-installed dependencies;
RAG currently retrieves cards, but answer generation with citations is still planned;
exported Markdown does not sync edits back into SQLite.

Install

Download the latest Windows installer from:

https://github.com/eatoften/Video_Course_Cards/releases/latest

The installer includes:

Tauri desktop shell;
React UI;
packaged FastAPI backend;
SQLite schema and app code.

The installer does not bundle large model assets. Install local AI dependencies separately:

ollama pull qwen3:4b

Desktop data is stored under:

C:\Users\<user>\AppData\Local\Video Course Cards\

See docs/local-llm.md for local model configuration.

Developer Setup

Clone the repository, then install backend and frontend dependencies.

git clone https://github.com/eatoften/Video_Course_Cards.git
cd Video_Course_Cards

Run the backend:

cd backend
$env:PYTHONUTF8='1'
$env:PYTHONDONTWRITEBYTECODE='1'
uv run python -B -m uvicorn app.main:app --host 127.0.0.1 --port 8001 --reload

Run the frontend:

cd frontend
npm.cmd install
npm.cmd run dev

Open:

http://127.0.0.1:5174

Desktop Build

Tauri requires Rust/Cargo and the Visual Studio C++ build tools on Windows.

Build the backend sidecar:

powershell -NoProfile -ExecutionPolicy Bypass -File .\scripts\build-desktop-backend.ps1

Run the desktop app in development:

cd frontend
npm.cmd run tauri:dev

Build the Windows installer:

powershell -NoProfile -ExecutionPolicy Bypass -File .\scripts\build-windows-installer.ps1

Output:

frontend\src-tauri\target\release\bundle\nsis\Video Course Cards_0.1.0_x64-setup.exe

GitHub Actions can build and attach the installer to a tag release:

git tag v0.1.0
git push origin v0.1.0

See docs/tauri-desktop.md.

Architecture

flowchart LR
    subgraph Desktop["Tauri desktop"]
        UI["React / TypeScript"]
        RUST["Rust shell"]
    end

    subgraph Backend["FastAPI sidecar"]
        API["HTTP routes"]
        JOBS["Job service"]
        PIPELINE["Video pipeline"]
        CARDS["Card services"]
        RAG["Retrieval services"]
    end

    subgraph LocalAI["Local AI runtime"]
        FFMPEG["FFmpeg / ffprobe"]
        WHISPER["faster-whisper"]
        EMB["Sentence Transformers"]
        LLM["Ollama / Qwen"]
    end

    subgraph Storage["Local data"]
        DB["SQLite"]
        FILES["uploads / transcripts / exports"]
    end

    UI <--> API
    RUST --> API
    API --> JOBS
    JOBS --> PIPELINE
    PIPELINE --> FFMPEG
    PIPELINE --> WHISPER
    CARDS --> LLM
    RAG --> EMB
    API <--> DB
    API <--> FILES

The backend is deliberately split by responsibility:

Layer	Responsibility
`main.py`	HTTP routes and response mapping
`job_service.py`	video job orchestration
`job_store.py`	SQLite CRUD for jobs
`video_pipeline.py`	media probe, audio extraction, transcription
`transcript_chunker.py`	semantic transcript chunking
`knowledge_card_service.py`	card persistence and updates
`card_embedding_service.py`	card text -> embedding workflow
`rag_service.py`	card retrieval baseline
`desktop_server.py`	packaged backend sidecar entrypoint

Knowledge Cards

A card is stored as structured data, not just markdown text.

{
  "title": "Singular Value Decomposition",
  "summary": "SVD factors a matrix into orthogonal and diagonal structure.",
  "tags": ["linear algebra", "matrix factorization"],
  "source_start_seconds": 724.0,
  "source_end_seconds": 738.0,
  "claims": [
    {
      "text": "SVD decomposes a matrix using orthogonal and diagonal components.",
      "evidence": [
        {
          "text": "called the singular value decomposition",
          "start_seconds": 724.0,
          "end_seconds": 738.0
        }
      ]
    }
  ],
  "question": "What structure does SVD use to factor a matrix?",
  "answer": "It uses orthogonal matrices and a diagonal matrix."
}

This shape makes later work possible: duplicate detection, related-card search, graph edges, citation-aware RAG, and feedback-based evaluation.

API Surface

Selected endpoints:

Endpoint	Purpose
`POST /videos`	upload and register a local video
`POST /jobs/{job_id}/run`	run probe -> audio -> transcription
`GET /jobs/{job_id}/transcript`	return timestamped transcript segments
`POST /jobs/{job_id}/chunks`	generate semantic transcript chunks
`POST /jobs/{job_id}/cards/auto-generate`	generate cards from chunks
`GET /jobs/{job_id}/cards`	list cards for one video job
`PATCH /cards/{card_id}`	edit a saved card
`POST /cards/{card_id}/embedding`	embed one card
`POST /courses/{course_id}/card-embeddings`	embed all cards in a course
`POST /rag/retrieve`	retrieve relevant cards for a question
`POST /jobs/{job_id}/cards/export/markdown/folder`	export one job as Markdown
`POST /cards/export/markdown/folder`	export all cards as Markdown
`GET /runtime/status`	inspect local runtime dependencies

Tests

Backend:

cd backend
uv run pytest

Frontend:

cd frontend
npm.cmd run build

Tauri:

cd frontend\src-tauri
cargo check

Sidecar smoke test:

powershell -NoProfile -ExecutionPolicy Bypass -File .\scripts\test-desktop-backend.ps1

Roadmap

Near term:

improve automatic card generation reliability;
improve semantic chunk boundary quality;
detect duplicate or near-duplicate cards with embeddings;
show related cards in the UI;
turn card retrieval into a citation-grounded answer assistant;
add evaluation records for latency, unsupported claims, duplicates, and retrieval misses.

Longer term:

build a card similarity graph;
add relation types such as prerequisite, example_of, contrast_with, and part_of;
support human-in-the-loop graph editing;
compare ordinary dense RAG against graph-guided retrieval;
use user edits and save/delete decisions as a feedback dataset for future agentic learning loops.

Project Principles

Local data should stay local by default.
Claims should be traceable to evidence.
SQLite should remain the durable source of truth.
Markdown should be portable, inspectable, and tool-friendly.
Advanced AI features should be compared against simple baselines.
User corrections should become evaluation data before they become training data.

License

To be determined before the first public release.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
backend		backend
docs		docs
frontend		frontend
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Course Cards

Overview

Why This Exists

Current Demo

Install

Developer Setup

Desktop Build

Architecture

Knowledge Cards

API Surface

Tests

Roadmap

Project Principles

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Video Course Cards

Overview

Why This Exists

Current Demo

Install

Developer Setup

Desktop Build

Architecture

Knowledge Cards

API Surface

Tests

Roadmap

Project Principles

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages