Skip to content

QoriZii/ytlang

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ytlang

Generate language teaching materials from YouTube videos, so that you can tailor your learning process.

Currently supports learning English, French, Spanish, Chinese, Japanese, and Korean from YouTube.

Given a YouTube URL, produces:

  • recording.html — embedded video, live bilingual transcript strip, vocab flashcard panel
  • handout.html — printable worksheet with vocab by level (basic / intermediate / advanced)
  • transcript.html — bilingual reference transcript with context annotations
  • quiz.html — interactive quiz with vocab and comprehension questions
Recording view
Recording
Handout
Handout
Transcript
Transcript
Quiz - Flashcards
Quiz – Flashcards
Quiz - Fill in the blank
Quiz – Fill in the Blank
Quiz - Q&A
Quiz – Q&A

demo

Two pre-generated lessons are included in examples/ — no setup or API key needed. Clone the repo and open them directly in your browser:

Video Source Native Open
Gordon Ramsay Answers Cooking Questions From Twitter English Chinese recording · handout · transcript · quiz
Sénégal : arbres de vie — ARTE Reportage French Korean recording · handout · transcript · quiz

Setup

Requirements: Python 3.11+, uv

Package Purpose
youtube-transcript-api Fetch YouTube captions
yt-dlp Fetch enriched video metadata (subprocess)
xai-sdk Grok API client (preclean, translation, analysis)
typer CLI framework
python-dotenv Load .env file

1. Clone and install

git clone https://github.com/QoriZii/ytlang.git
cd ytlang
uv sync

2. Config

cp .env.example .env

Edit .env and fill in your values — at minimum, set your xAI API key (get one free at console.x.ai):

XAI_API_KEY=xai-...              # required
XAI_MODEL=grok-4-1-fast-non-reasoning   # LLM model for all calls
YTLANG_OUTDIR=examples                   # where lessons are saved

3. Generate your first lesson

Copy a YouTube video URL from your browser and paste it into the command. By default, the video is treated as English and the lesson is generated for Chinese-speaking learners. See Usage below for other language combinations.

uv run ytlang prep 'https://www.youtube.com/watch?v=...' --render

This fetches the transcript, translates it, generates vocab and quiz, and renders four HTML files into {YTLANG_OUTDIR}/<video_id>/.

4. View the lesson

uv run ytlang serve

This command opens recording.html in your browser with an embedded video player, synced bilingual transcript, and vocab cards.

How it works

YouTube URL + --lang / --native
    │
    ├─ fetch       youtube-transcript-api + yt-dlp metadata
    ├─ preclean    LLM → restore punctuation, merge ASR fragments into sentences
    ├─ translate   LLM → source lang to native lang
    └─ analyze     LLM → vocab, key points, quiz, transcript notes
         │
         └─ lesson.json 
              │
              ├─ recording.html
              ├─ handout.html
              ├─ transcript.html
              └─ quiz.html

lesson.json is the source of truth. Edit vocab (add, remove, adjust levels or definitions) before rendering based on your need. Re-run render any time to regenerate HTML from it.

Usage

Single video

# English video, Chinese learner (default)
uv run ytlang prep https://www.youtube.com/watch?v=VIDEO_ID

# French video, Chinese learner
uv run ytlang prep https://www.youtube.com/watch?v=VIDEO_ID --lang fr

# Japanese video, English learner
uv run ytlang prep https://www.youtube.com/watch?v=VIDEO_ID --lang ja --native en

# Spanish video, Korean learner
uv run ytlang prep https://www.youtube.com/watch?v=VIDEO_ID --lang es --native ko

# Render lesson.json → 4 HTML files (uses most recent video if no ID given)
uv run ytlang render
uv run ytlang render VIDEO_ID

# Open recording.html in browser after rendering
uv run ytlang render --open

# Prep + render in one step
uv run ytlang prep https://www.youtube.com/watch?v=VIDEO_ID --lang fr --render

# Serve recording.html over HTTP and open in browser
uv run ytlang serve
uv run ytlang serve VIDEO_ID

Reprocess (without re-fetching transcripts)

# Re-run translate + analyze on existing lesson (uses lang from lesson.json)
uv run ytlang reprocess VIDEO_ID

# Override language on reprocess
uv run ytlang reprocess VIDEO_ID --lang fr --native en

# Re-run from preclean stage (uses saved raw.json)
uv run ytlang reprocess VIDEO_ID --from-preclean

# Only re-translate, skip analysis
uv run ytlang reprocess VIDEO_ID --no-analyze

# Only re-analyze, skip translation
uv run ytlang reprocess VIDEO_ID --no-translate

# Reprocess + render
uv run ytlang reprocess VIDEO_ID --render

Output

All output goes to examples/<video_id>/:

examples/
└── abc123xyz/
    ├── raw.json           ← original ASR fragments (for reprocess --from-preclean)
    ├── lesson.json        ← edit this before re-rendering
    ├── recording.html
    ├── handout.html
    ├── transcript.html
    └── quiz.html

Supported languages

Code Language As source (learn) As native Vocab levels
en English Yes Yes CEFR A2–C1
fr French Yes Yes DELF A1–DALF C1
es Spanish Yes Yes DELE A1–C1
zh Chinese (Simplified) Yes Yes HSK 1–6+
ja Japanese Yes Yes JLPT N5–N1
ko Korean Yes Yes TOPIK I–II

Future work

  • More LLM providers — support OpenAI, Claude, Gemini, and local models alongside Grok
  • More languages — Arabic, German, Portuguese, Hindi, Thai, Vietnamese
  • More quiz types — listening comprehension, sentence reordering, dictation, shadowing exercises
  • Batch processing — prep multiple videos from a URL list in one command

About

An open-sourced language learning app

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors