A UV script is a single Python file that declares its own dependencies inline — a portable unit you run with
uv runwhere you have the hardware, or hand tohf jobs uv runon Hugging Face Jobs for a GPU. Chain several into a pipeline.
Each script carries its own dependencies, so people and agents can run one without cloning a repo, making a virtualenv, or installing a requirements.txt first.
A recipe here is one such script. Most read and write the Hugging Face Hub, so one script's output dataset becomes the next one's input.
First, install uv — it's the only thing you install; every script brings its own Python dependencies:
curl -LsSf https://astral.sh/uv/install.sh | shRun a recipe on a GPU — point Hugging Face Jobs at the script's URL and it runs on managed hardware, no GPU of your own needed. Here davanstrien/ufo-ColPali is a small public image dataset you can use as-is; the output lands in your namespace:
hf jobs uv run --flavor l4x1 --secrets HF_TOKEN \
https://huggingface.co/datasets/uv-scripts/ocr/raw/main/glm-ocr.py \
davanstrien/ufo-ColPali your-username/ufo-ocrNo pip install, no local setup. --secrets HF_TOKEN forwards your token so the job can write the output dataset back to the Hub. (Jobs needs the hf CLI — uv tool install huggingface_hub — and a Hugging Face account with pay-as-you-go credit — no subscription needed; it's billed by the second, and a small CPU job costs ~$0.01/hr. Run hf jobs hardware for current flavors and prices.)
Prefer your own machine? A recipe is just a UV script, so on a box with the hardware it needs — most recipes here want a CUDA GPU — you can run it (or inspect it with --help) directly, no Jobs required:
uv run https://huggingface.co/datasets/uv-scripts/ocr/raw/main/glm-ocr.py --helpA normal Python file with a metadata block at the top that lists its dependencies:
# /// script
# requires-python = ">=3.10"
# dependencies = ["datasets", "transformers", "torch"]
# ///Normally, running someone's Python script means cloning their repo, making a virtual environment, and pip install-ing a requirements.txt first — and if your versions don't match theirs, it can still break. Here the dependencies live inside the file, in that comment block, so uv (and hf jobs uv run) reads them, installs exactly those versions into a throwaway environment, and runs the file — straight from a URL, with nothing to set up. This is the standard PEP 723 inline-script-metadata format; see the uv scripts guide to learn more.
A self-contained, pinned script is easy to run and reuse, for a few reasons:
- Discrete & single-purpose — one script, one job. That job can be a two-second transform or a multi-hour fine-tune; either way it's one self-contained unit you pick by reading a header instead of a whole codebase.
- Self-describing — the PEP 723 dependency block, the docstring, and
--helptell you what it needs and how to call it. - Reproducible — dependencies are pinned in the file, so there's no env drift and no "works on my machine."
- Composable — recipes hand off through the Hub (usually a dataset in, a dataset or model out), so you can chain them into a pipeline.
- Portable — one self-contained file; run it with
uv runwhere you have the hardware (most recipes need a GPU), orhf jobs uv runit on a managed GPU.
Built for agents, too. Every recipe takes its arguments in the same input output order and runs from a URL, so an AI agent can pick a tool from its header and run it with no setup. On Jobs the agent runs in a sandbox: a throwaway disk, access limited to what the token's repo permissions allow, and a cost cap per job — not arbitrary code on your machine. (Hugging Face also ships an hf CLI skill for agents for driving Jobs from an editor.) This repo also ships a ready-to-use uv-recipes agent skill — point your agent at it to discover, run, and adapt recipes.
| Domain | What it does | On the Hub |
|---|---|---|
| ocr ⭐ | OCR / document → text & structured data — GLM, PaddleOCR-VL, Nanonets, olmOCR, dots, … (30+ models) | uv-scripts/ocr |
| vision | Zero-shot detection & segmentation over image datasets | sam3 · object-detection · vlm-object-detection |
| audio | Transcription & speech translation | transcription |
| embeddings & atlas | Embed a dataset; build an interactive map | build-atlas |
| data processing | Filter / dedup / stats over large datasets | dataset-stats · deduplication · classification |
| dataset creation | Turn PDFs / image URLs into Hub datasets | dataset-creation · iiif-tiles |
| synthetic data | Generate datasets with LLMs | synthetic-data |
| inference | Run any open LLM / VLM over a dataset | vllm · openai-oss · transformers-inference |
| entity extraction | NER / structured extraction over text | gliner |
| …and more | Training, evaluation, RAG indexing — migrating as they mature | training · transformers-training |
Most recipes now live in this repo; the rest link to the uv-scripts Hugging Face org where they run today, and migrate here over time. (each folder mirrors to its Hub dataset repo.)
What fits here: any self-contained UV script for data or ML work on the Hub. OCR and dataset work are the current focus, but inference, evaluation, RAG indexing, and training (fine-tuning with TRL / transformers, producing a model) are all in scope. If it's one pinned script that reads from or writes to the Hub, it belongs.
Because recipes hand off through the Hub, you can chain them — each step's output dataset is the next step's input. A document-collection pipeline, end to end:
PDFs / scans → OCR to markdown → dedup + stats → embed + visualise
dataset-creation ocr/glm-ocr.py deduplication build-atlas
Each arrow is a Hub dataset; each box is one hf jobs uv run (or uv run), and every box runs today from its Hub URL, even before it's migrated into this repo. A pipeline can also end in a trained model instead of another dataset. You can write the chain as a shell script, or an agent can generate it — the scripts are the same.
A recipe is the same file wherever you run it — on a machine with the hardware it needs, or on Hugging Face Jobs for a managed GPU. Same file, same arguments:
SCRIPT=https://huggingface.co/datasets/uv-scripts/ocr/raw/main/glm-ocr.py
# locally — needs the right hardware (a GPU for most recipes)
uv run $SCRIPT davanstrien/ufo-ColPali your-username/ufo-ocr
# on a managed GPU — pick hardware with --flavor; --secrets forwards your write token
hf jobs uv run --flavor l4x1 --secrets HF_TOKEN $SCRIPT davanstrien/ufo-ColPali your-username/ufo-ocrWhy reach for Jobs:
- Pay by the second — billed only while the job runs. Run
hf jobs hardware, or see the flavors and pricing. - No infra —
hf jobs uv run <url>and you're done. See thehf jobsCLI. - Hub-native — read and write datasets, models, and storage buckets directly. Running from the
https://huggingface.co/datasets/uv-scripts/…URL also attributes usage to the recipe.
These scripts are orchestration code: they download third-party models from the Hugging Face Hub at runtime and run inference. This repo does not redistribute any model weights. Each model you run carries its own license (MIT, Apache-2.0, OpenRAIL-M, and some with non-commercial or other use-based terms); those terms govern your use of the model, not this repo's code. You are responsible for checking each model's license — on its Hugging Face model card — before using it, especially in production.
The code and documentation in this repository are licensed under the Apache License 2.0. See NOTICE for attribution.
Recipes mirror to the uv-scripts Hugging Face org via GitHub Actions. See CONTRIBUTING.md to add one.