uv-scripts-for-ai

A UV script is a single Python file that declares its own dependencies inline — a portable unit you run with uv run where you have the hardware, or hand to hf jobs uv run on Hugging Face Jobs for a GPU. Chain several into a pipeline.

Each script carries its own dependencies, so people and agents can run one without cloning a repo, making a virtualenv, or installing a requirements.txt first.

A recipe here is one such script. Most read and write the Hugging Face Hub, so one script's output dataset becomes the next one's input.

Quickstart

First, install uv — it's the only thing you install; every script brings its own Python dependencies:

curl -LsSf https://astral.sh/uv/install.sh | sh

Run a recipe on a GPU — point Hugging Face Jobs at the script's URL and it runs on managed hardware, no GPU of your own needed. Here davanstrien/ufo-ColPali is a small public image dataset you can use as-is; the output lands in your namespace:

hf jobs uv run --flavor l4x1 --secrets HF_TOKEN \
  https://huggingface.co/datasets/uv-scripts/ocr/raw/main/glm-ocr.py \
  davanstrien/ufo-ColPali your-username/ufo-ocr

No pip install, no local setup. --secrets HF_TOKEN forwards your token so the job can write the output dataset back to the Hub. (Jobs needs the hf CLI — uv tool install huggingface_hub — and a Hugging Face account with pay-as-you-go credit — no subscription needed; it's billed by the second, and a small CPU job costs ~$0.01/hr. Run hf jobs hardware for current flavors and prices.)

Prefer your own machine? A recipe is just a UV script, so on a box with the hardware it needs — most recipes here want a CUDA GPU — you can run it (or inspect it with --help) directly, no Jobs required:

uv run https://huggingface.co/datasets/uv-scripts/ocr/raw/main/glm-ocr.py --help

What's a UV script?

A normal Python file with a metadata block at the top that lists its dependencies:

# /// script
# requires-python = ">=3.10"
# dependencies = ["datasets", "transformers", "torch"]
# ///

Normally, running someone's Python script means cloning their repo, making a virtual environment, and pip install-ing a requirements.txt first — and if your versions don't match theirs, it can still break. Here the dependencies live inside the file, in that comment block, so uv (and hf jobs uv run) reads them, installs exactly those versions into a throwaway environment, and runs the file — straight from a URL, with nothing to set up. This is the standard PEP 723 inline-script-metadata format; see the uv scripts guide to learn more.

Why UV scripts

A self-contained, pinned script is easy to run and reuse, for a few reasons:

Discrete & single-purpose — one script, one job. That job can be a two-second transform or a multi-hour fine-tune; either way it's one self-contained unit you pick by reading a header instead of a whole codebase.
Self-describing — the PEP 723 dependency block, the docstring, and --help tell you what it needs and how to call it.
Reproducible — dependencies are pinned in the file, so there's no env drift and no "works on my machine."
Composable — recipes hand off through the Hub (usually a dataset in, a dataset or model out), so you can chain them into a pipeline.
Portable — one self-contained file; run it with uv run where you have the hardware (most recipes need a GPU), or hf jobs uv run it on a managed GPU.

Built for agents, too. Every recipe takes its arguments in the same input output order and runs from a URL, so an AI agent can pick a tool from its header and run it with no setup. On Jobs the agent runs in a sandbox: a throwaway disk, access limited to what the token's repo permissions allow, and a cost cap per job — not arbitrary code on your machine. (Hugging Face also ships an hf CLI skill for agents for driving Jobs from an editor.) This repo also ships a ready-to-use uv-recipes agent skill — point your agent at it to discover, run, and adapt recipes.

Recipes

Domain	What it does	On the Hub
ocr ⭐	OCR / document → text & structured data — GLM, PaddleOCR-VL, Nanonets, olmOCR, dots, … (30+ models)	`uv-scripts/ocr`
vision	Zero-shot detection & segmentation over image datasets	`sam3` · `object-detection` · `vlm-object-detection`
audio	Transcription & speech translation	`transcription`
embeddings & atlas	Embed a dataset; build an interactive map	`build-atlas`
data processing	Filter / dedup / stats over large datasets	`dataset-stats` · `deduplication` · `classification`
dataset creation	Turn PDFs / image URLs into Hub datasets	`dataset-creation` · `iiif-tiles`
synthetic data	Generate datasets with LLMs	`synthetic-data`
inference	Run any open LLM / VLM over a dataset	`vllm` · `openai-oss` · `transformers-inference`
entity extraction	NER / structured extraction over text	`gliner`
…and more	Training, evaluation, RAG indexing — migrating as they mature	`training` · `transformers-training`

Most recipes now live in this repo; the rest link to the uv-scripts Hugging Face org where they run today, and migrate here over time. (each folder mirrors to its Hub dataset repo.)

What fits here: any self-contained UV script for data or ML work on the Hub. OCR and dataset work are the current focus, but inference, evaluation, RAG indexing, and training (fine-tuning with TRL / transformers, producing a model) are all in scope. If it's one pinned script that reads from or writes to the Hub, it belongs.

Compose a pipeline

Because recipes hand off through the Hub, you can chain them — each step's output dataset is the next step's input. A document-collection pipeline, end to end:

PDFs / scans          →   OCR to markdown      →   dedup + stats        →   embed + visualise
dataset-creation          ocr/glm-ocr.py           deduplication            build-atlas

Each arrow is a Hub dataset; each box is one hf jobs uv run (or uv run), and every box runs today from its Hub URL, even before it's migrated into this repo. A pipeline can also end in a trained model instead of another dataset. You can write the chain as a shell script, or an agent can generate it — the scripts are the same.

Portable: run it locally or on Jobs

A recipe is the same file wherever you run it — on a machine with the hardware it needs, or on Hugging Face Jobs for a managed GPU. Same file, same arguments:

SCRIPT=https://huggingface.co/datasets/uv-scripts/ocr/raw/main/glm-ocr.py

# locally — needs the right hardware (a GPU for most recipes)
uv run $SCRIPT davanstrien/ufo-ColPali your-username/ufo-ocr

# on a managed GPU — pick hardware with --flavor; --secrets forwards your write token
hf jobs uv run --flavor l4x1 --secrets HF_TOKEN $SCRIPT davanstrien/ufo-ColPali your-username/ufo-ocr

Why reach for Jobs:

Pay by the second — billed only while the job runs. Run hf jobs hardware, or see the flavors and pricing.
No infra — hf jobs uv run <url> and you're done. See the hf jobs CLI.
Hub-native — read and write datasets, models, and storage buckets directly. Running from the https://huggingface.co/datasets/uv-scripts/… URL also attributes usage to the recipe.

Model licenses

These scripts are orchestration code: they download third-party models from the Hugging Face Hub at runtime and run inference. This repo does not redistribute any model weights. Each model you run carries its own license (MIT, Apache-2.0, OpenRAIL-M, and some with non-commercial or other use-based terms); those terms govern your use of the model, not this repo's code. You are responsible for checking each model's license — on its Hugging Face model card — before using it, especially in production.

License

The code and documentation in this repository are licensed under the Apache License 2.0. See NOTICE for attribution.

Recipes mirror to the uv-scripts Hugging Face org via GitHub Actions. See CONTRIBUTING.md to add one.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.claude/skills		.claude/skills
.github		.github
build-atlas		build-atlas
dataset-stats		dataset-stats
deduplication		deduplication
gliner		gliner
iiif-tiles		iiif-tiles
jobs-utils		jobs-utils
object-detection		object-detection
ocr		ocr
org-card		org-card
sam3		sam3
skills/uv-recipes		skills/uv-recipes
tools		tools
transcription		transcription
vllm		vllm
vlm-object-detection		vlm-object-detection
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

uv-scripts-for-ai

Quickstart

What's a UV script?

Why UV scripts

Recipes

Compose a pipeline

Portable: run it locally or on Jobs

Model licenses

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

uv-scripts-for-ai

Quickstart

What's a UV script?

Why UV scripts

Recipes

Compose a pipeline

Portable: run it locally or on Jobs

Model licenses

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages