MediaFlusher is a public baseline for a Frozen CLIP scoring stack and a Telegram media curation pipeline.
This repository intentionally keeps only the parts that are in active use:
- Frozen CLIP training and inference
- Telegram gated download, scoring, rebucketing, and flat-link materialization
- Minimal API and storage code needed by the current pipeline
Removed from the public baseline:
- Early
hereticabliteration experiments - Qwen / large-model inference routes
- LoRA training flows and related scripts
The main published flow is:
- Build/label data in webui (
/label), exportlabels.json. - Import to sqlite if needed via
scripts/import_to_db.py. - Train or resume the Frozen CLIP scorer with
scripts/train_frozen_clip.py. - Run Telegram gated download with
scripts/run_tg_gated_download.py. - Orchestrate full end-to-end via
scripts/run_telegram_global_pipeline.py(download, optional bulk re-score, bucket, prune). - Materialize high-score outputs under the configured
target_rootandflat_links_root.
Detailed Telegram pipeline behavior is documented in docs/telegram_global_pipeline.md.
The Frozen CLIP model details and training reference is in docs/frozen_clip_model.md.
- 标注入口(WebUI):
./start.sh frontend->webui->/label(标注) //pipeline(作业入口) - WebAPI:
./start.sh api->src/main.py - 训练:
scripts/train_frozen_clip.py - 训练数据导入:
scripts/import_to_db.py(配合--train/--val) - 标签导出:
webui/label页 - 单次下载门控:
scripts/run_tg_gated_download.py - 全链路编排:
scripts/run_telegram_global_pipeline.py - 全量重推理:
scripts/bulk_infer_telegram.py - 按分数重分桶:
scripts/rebucket_telegram_by_score.py - 低分清理:
scripts/prune_telegram_below_score.py - 数据切分:
scripts/split_dataset.py
- Frozen CLIP training:
scripts/train_frozen_clip.py - Telegram gated download:
scripts/run_tg_gated_download.py - Telegram global pipeline:
scripts/run_telegram_global_pipeline.py - Frozen CLIP runtime config:
src/config.py - Public project config:
configs/config.yaml
Use one command as the single Linux entry:
./start.shThis starts both API and WebUI in the foreground with shared shutdown. Other options:
./start.sh apistart backend API only../start.sh frontendstart WebUI only../start.sh --del-cacheclear cache dirs before startup../start.sh --build-webuibuild frontend and preview it instead of dev mode.
The public repository does not ship:
- Model checkpoints or pretrained weights
- Telegram sessions, caches, local databases, or downloaded media
- Local override config such as
configs/config.local.yaml
Those artifacts are intentionally gitignored.