A single-host diagnostic daemon that records NVIDIA GPU utilization to SQLite and produces a retrospective report separating active use from allocated-but-idle ("idle-held") and truly idle (no process at all).
Conventional dashboards collapse the latter two. Surfacing
idle-held as its own number is the entire point. Someone left a
Jupyter notebook open with an 8 GB tensor on the GPU and went to
lunch — nvidia-smi will show 1% utilization, but the card is
unusable by anyone else. This tool measures that.
Status: bare-metal 1.0.
gua doctorchecks only the current machine.daemonrecords NVML telemetry from the current NVIDIA host,reportreads the resulting SQLite database, anddemoruns anywhere with fake telemetry. The Go v0.1.0 implementation remains downloadable at tagv0.1.0/ branchgo-archive.
The recommended install path is PyPI via uv.
Requires uv. In normal online environments, uv creates the isolated tool environment and manages the needed Python runtime. If Python downloads are disabled by local policy, install Python 3.12+ first.
uv tool install gpu-usage-audit
gua doctor
gua daemon --interval 30s
gua status
gua report --since 1h --interval 30s
gua stopgua doctor is intentionally read-only. It checks only the current
machine: OS/kernel/Python, /dev/nvidia*, nvidia-smi -L, NVML
load/init/device count/driver version, and the database path the daemon
would write to. The default is /tmp/gua.db; pass gua doctor --db PATH
when you plan to use a different daemon database.
Use gua doctor --json for the same report in a machine-readable form.
The JSON includes local paths, command stderr, and nvidia-smi -L output
with GPU UUIDs, so review it before sharing it outside your team.
gua doctor does not need sudo; run it as the same user that will run
the daemon.
Available gua subcommands: doctor, daemon, start, status,
stop, report, demo, version, help.
Update or remove the installed tool with uv:
uv tool upgrade gpu-usage-audit
uv tool uninstall gpu-usage-audituv tool uninstall gpu-usage-audit removes the installed Python tool and
its gua / gpu-usage-audit commands.
GitHub Release assets are also available for manual download:
BASE="https://github.com/AI-Ocean/gpu-usage-audit/releases/download/v1.0.2"
WHEEL="gpu_usage_audit-1.0.2-py3-none-any.whl"
curl -fsSLO "$BASE/$WHEEL"
curl -fsSLO "$BASE/SHA256SUMS"
sha256sum -c SHA256SUMS --ignore-missing
uvx --from "./$WHEEL" gua doctor$ gua report --since 1h --interval 30s
gua — lab-a100 (bare, driver 560.35.05) Window: 1:00:00
§1 Headline
basis: one sample = one GPU card at one daemon tick
rules: active >=10% util; idle-held <10% util with >100 MB process memory
█████████▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░
active █ 15.7%
idle-held ▒ 45.1% ← this is the number conventional tools miss
truly-idle ░ 39.2%
(51 samples)
§2 Idle capacity
converted from card-ticks to GPU-hours using the report --interval
idle-held: ~0.31 GPU-hours, ~1.53 GPUs equivalently unavailable
truly-idle: ~0.12 GPU-hours, ~1.00 GPUs equivalently free
§3 Per-GPU
per-card share of samples in the same three states
GPU-0 active 47.1% idle-held 35.3% truly-idle 17.6%
GPU-1 active 0.0% idle-held 100.0% truly-idle 0.0%
GPU-2 active 0.0% idle-held 0.0% truly-idle 100.0%
§4 Top identities
one identity counts once per GPU/tick after its processes are summed
identity gpu-hours idle-held samples
alice 0.42 42.9% 51
bob 0.28 100.0% 34
§5 Time-of-day heatmap (UTC)
darker means higher active share; blank means no samples
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
Mon .
The 3-bar collapses every card × every tick over the window into the
active / idle-held / truly-idle split. idle-held rows are the
embarrassing category: a process is holding GPU memory but the SM
utilization is below 10%. §2 converts those card-ticks into GPU-hours
with --interval; §4 groups process rows by identity, GPU, and tick
before ranking users, so multiple same-user processes on one GPU/tick
count once.
The demo subcommand records 30 ticks of fake telemetry and prints the
report — all in one process, no second shell needed.
gua demoThe bundled FakeTier produces a deterministic 5-tick workload —
active learning → idle-held memory → cleanup — so the output is the
same every run. Adjust the shape with --ticks N and --interval D.
On an NVIDIA host, start with doctor:
gua doctorDoctor should show the current machine, visible /dev/nvidia* device
files, nvidia-smi -L GPUs, NVML device count, and /tmp/gua.db status.
nvidia-ml-py is installed by default with gpu-usage-audit; if doctor
reports that pynvml is not importable, reinstall the isolated tool
environment:
uv tool install --force gpu-usage-auditIf pynvml imports but NVML init fails, fix the host NVIDIA driver
installation instead. libnvidia-ml.so.1 must be available and match the
loaded kernel driver; nvidia-smi -L should list GPUs before the daemon
can collect real telemetry.
Then run the collector:
gua daemon --interval 30s
gua statusRun the report:
gua report --since 1h --interval 30sStop the background collector when the collection window is done:
gua stopIf --db is omitted, both daemon and report use /tmp/gua.db.
daemon refuses to start when that database file already exists, so a
new collection run does not silently append to an old test database. If
gua doctor reports that the database already exists, either run
gua report against the existing data or choose a fresh --db PATH for
the next daemon run.
The daemon requires the NVIDIA driver and
libnvidia-ml.so.1. On a driverless host it exits with a friendly NVML initialization error. For a driverless box, usedemoinstead.
gua has commands sharing one SQLite file. The gpu-usage-audit entry
point remains installed for compatibility, but new examples use gua.
| Command | What it does |
|---|---|
daemon |
Starts the collector in the background. Samples real NVML telemetry on every tick and writes to a new database. NVIDIA host required. |
start |
Alias for gua daemon. |
status |
Shows whether the background collector PID is still running. Also clears a stale PID file when it points to a missing or unrelated process. |
stop |
Stops the background collector with SIGTERM. |
report |
One-shot read against the accumulated database. Safe to run while the daemon is still writing — SQLite WAL mode handles the concurrency. |
demo |
Self-contained showcase. Records N fake ticks and immediately prints the report. No GPU, no second shell, no operational meaning — just to see the output shape. |
gua daemon [--db PATH] [--interval D] [--pid-file PATH] [--log-file PATH]
gua start [--db PATH] [--interval D] [--pid-file PATH] [--log-file PATH]
gua daemon --foreground [--db PATH] [--interval D]
--db PATH(default/tmp/gua.db) — SQLite file to create and write to. The daemon exits with an error if the file already exists. WAL mode is enabled automatically.--interval D(default30s) — how often to sample. Accepts30s,1m,200ms, etc.--pid-file PATH(default/tmp/gua.pid) — background PID file.--log-file PATH(default/tmp/gua.log) — stdout/stderr from the background collector.--foreground— keep the collector attached to the current process. Use this for systemd or debugging.
By default, gua daemon returns after the collector starts. Each tick is
written to the log file; on shutdown the cumulative row count is written
there too. gua daemon --foreground prints the tick summaries directly
to the terminal and exits on Ctrl+C, SIGTERM, or systemctl stop.
gua status and gua stop verify that the PID file points to the
managed collector before acting on it; stale PID files are cleared.
gua report [--db PATH] [--since D] [--interval D] [--width N]
--db PATH(default/tmp/gua.db) — same SQLite file the daemon writes to. The report exits with an error if the file does not exist.--since D(default1h) — the report window. No upper bound —--since 365dis accepted. The effective window is min(--since, age of oldest sample), so passing a huge--sinceis the same as "all data". Units:ms,s,m,h,d(now; use7d).--interval D(default30s) — must match what the daemon used. This is how §2 (Idle capacity) and §4 (Top identities) convert tick counts to GPU-hours. Mismatched intervals → wrong GPU-hours.--width N(default60) — width of the §1 three-bar in characters.
gua demo [--db PATH] [--ticks N] [--interval D]
--db PATH(optional) — if omitted, a fresh temporary database is created and its path is printed to stderr.--ticks N(default30) — how many fake ticks to record before printing the report.--interval D(default1s) — tick spacing.
- Same
--intervalon both sides. If you ran the daemon with--interval 30s, rungua report --interval 30stoo. - Let it run for a while. §1/§3 are meaningful after one tick; §4 (Top identities) needs hours; §5 (Heatmap) needs days.
- WAL leaves sidecar files (
gua.db-wal,gua.db-shm). They are cleaned up automatically when the last connection closes. - DB size: ~50 MB per host per 30 days at 12 GPUs (extrapolated from Go v0.1.0; not yet re-measured for the Python rewrite).
For a long-running deployment, drop a unit file in
/etc/systemd/system/gpu-usage-audit.service:
[Unit]
Description=gua daemon
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/gua daemon --foreground --db /var/lib/gua/gua.db --interval 30s
Restart=on-failure
User=gua
[Install]
WantedBy=multi-user.targetThen systemctl enable --now gpu-usage-audit.
Each tick of the daemon records:
- per-card:
util_pct(SM utilization) - per-process:
mem_used_mbper(card, pid)
The report aggregates per card × per tick:
util >= 10 → active (compute is happening)
util < 10 AND mem > 100 → idle-held (memory is held, SM is cold)
util < 10 AND mem <= 100 → truly-idle (the card is genuinely free)
The 100 MB threshold absorbs the PyTorch/TF runtime baseline so importing torch doesn't count as "holding the GPU".
Requires uv (uv pins the Python version
automatically; requires-python = ">=3.12").
git clone https://github.com/AI-Ocean/gpu-usage-audit
cd gpu-usage-audit
uv sync # create .venv, install dev deps
uv run pytest # run the test suite
uv run ruff check # lint
uv run mypy # type-check (strict)
uv run gua demo # see the report shape locallyCI runs ruff + format check + mypy + pytest, then builds and smoke-tests
the wheel on every push and PR. Tag pushes (v*) rerun the same checks,
build sdist + wheel, smoke-test the wheel, and create a GitHub Release
with auto-generated notes. Release tags also publish the wheel and sdist
to PyPI through Trusted Publishing.
This is a single-host retrospective tool. Live dashboards, multi-host aggregation, quotas, Kubernetes cluster scans, Slurm scheduler joins, Docker/Podman fallback runtimes, and pod-name resolution are out of scope for bare-metal 1.0. Those belong above the host layer. If this tool surfaces enough idle-held to make scheduling worth solving, see ocean-all.
Apache License 2.0 — see LICENSE.