Skip to content

feat(agents): multi-device support — CPU, GPU, NPU per-agent selection#1252

Merged
kovtcharov-amd merged 2 commits into
mainfrom
kalin/npu-flm-profile
May 29, 2026
Merged

feat(agents): multi-device support — CPU, GPU, NPU per-agent selection#1252
kovtcharov-amd merged 2 commits into
mainfrom
kalin/npu-flm-profile

Conversation

@kovtcharov-amd
Copy link
Copy Markdown
Collaborator

GAIA agents couldn't leverage AMD Ryzen AI NPU hardware — inference defaulted to GPU via llamacpp with no way to select an alternative device. Users with XDNA2 NPUs had no path to power-efficient local inference, and there was no framework for per-agent device selection across CPU, GPU, and NPU.

Now each agent declares which devices it supports via DeviceConfig tuples. Users pick a device per-agent — dropdown in Agent UI, --device {cpu,gpu,npu} on CLI. GPU remains the default. NPU uses the FLM backend (gemma4-it-e2b-FLM); CPU falls back automatically with a latency warning. gaia init --profile npu handles NPU detection, FLM backend installation, and model download. Eval-verified on Ryzen AI MAX+ PRO 395: NPU matches or exceeds GPU quality (personality 3/3 @ 9.5/10, context_retention 4/4 @ 9.8/10) at ~24 tok/s.

Test plan

  • python -m pytest tests/unit/test_npu_device_support.py -xvs — 25 tests
  • gaia chat --device gpu — announces device, loads correct model
  • gaia chat --device npu — loads FLM model on NPU hardware
  • gaia chat --device cpu — warns about slow response times
  • gaia init --profile npu — detects NPU, installs FLM backend, downloads model
  • gaia init --profile npu on non-NPU hardware — fails loudly
  • gaia eval agent --device npu --category personality — all scenarios pass
  • Agent UI: device dropdown visible on agent cards, filtered by detected hardware

#1220)

Agents now declare which (device, model, recipe, backend) tuples they support
via DeviceConfig. Users pick a device per-agent in the Agent UI dropdown or
via `--device {cpu,gpu,npu}` on CLI. GPU is the default. NPU uses the FLM
backend on Ryzen AI XDNA2 hardware; CPU falls back with a latency warning.

Backend: DeviceConfig dataclass, GaiaConfig persistence (~/.gaia/config.json),
LemonadeClient.install_backend/uninstall_backend/get_recipe_status,
LemonadeManager device parameter, session-level device column + migration.

CLI: `gaia init --profile npu` (NPU detection, FLM backend install, recipe-
aware model download), `gaia chat --device npu`, `gaia eval agent --device npu`.

Frontend: DeviceConfig type, activeDevice/detectedDevices store state,
per-agent device dropdown with verified/unverified badges.

Eval-verified on Ryzen AI MAX+ PRO 395: NPU matches or exceeds GPU quality
(personality 3/3 9.5avg, context_retention 4/4 9.8avg) at ~24 tok/s.
@github-actions github-actions Bot added documentation Documentation changes llm LLM backend changes cli CLI changes tests Test changes electron Electron app changes performance Performance-critical changes agents labels May 29, 2026
@kovtcharov-amd kovtcharov-amd requested a review from itomek-amd May 29, 2026 07:33
@kovtcharov-amd kovtcharov-amd self-assigned this May 29, 2026
- Black: reformat cli.py
- Pylint: remove reimport of LemonadeClient (already at module level)
- Flake8: remove unused imports (tempfile, Path) in test file
@kovtcharov-amd
Copy link
Copy Markdown
Collaborator Author

@itomek-amd, this feature requires manual testing.

Copy link
Copy Markdown
Collaborator

@itomek itomek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid, well-layered feature: the device abstraction threads cleanly from DeviceConfig/DEFAULT_DEVICE_CONFIGS in registry.py through the CLI --device, the sessions DB migration, the agents API response, and the UI dropdown, with docs (npu.mdx + docs.json) and a 382-line test file. GPU stays the default and explicit-device requests fail loudly when hardware is missing — the right shape. One process note: this is an LLM-affecting change (it swaps the model per device), and CLAUDE.md asks for a gaia eval agent run compared to the committed baseline. The description cites strong NPU numbers but no baseline scorecard / --compare output is included — worth attaching so a reviewer can see GPU-vs-NPU parity rather than take the numbers on faith, especially since the NPU path runs at a much smaller ctx (4096 vs 32768). Two small inline notes on the device-probe except: pass and the _DEVICE_TO_MIN cpu omission. Approving on the understanding the eval evidence gets attached.


Generated by Claude Code

Comment thread src/gaia/cli.py
Comment thread src/gaia/llm/lemonade_manager.py
@kovtcharov-amd kovtcharov-amd enabled auto-merge May 29, 2026 16:26
@kovtcharov-amd kovtcharov-amd added this pull request to the merge queue May 29, 2026
Merged via the queue into main with commit d12c79f May 29, 2026
48 checks passed
@kovtcharov-amd kovtcharov-amd deleted the kalin/npu-flm-profile branch May 29, 2026 16:26
kovtcharov-amd pushed a commit that referenced this pull request Jun 2, 2026
…untime

Multi-device support (#1252) shipped non-functional: the Agent UI device
dropdown and CLI --device flag never changed the model, and an unavailable
device degraded silently. Fixes v0.20.0 release-review blockers B1/B2/H1
plus a GPU-tier validation bug.

B1: resolve the device's DeviceConfig.model (and ctx_size) at every UI
agent-build site instead of always using the session/GPU model; rewrite the
session model on a device switch so eviction/recreation picks it up. A guard
keeps an agent's own pinned model from being clobbered on the default GPU.

B2: add a device field to ChatAgentConfig, thread it through the base Agent
into LemonadeManager.ensure_ready(device=...), and fail loudly with an
actionable HardwareRequirementError when the requested device is absent.

H1: narrow the CLI device-probe except clause to only swallow connection/
timeout errors; a reachable-but-broken Lemonade now surfaces. Make the
GPU->CPU fallback state its reason.

Medium: map the gpu selector to amd_dgpu (lowest GPU tier) so a discrete-
Radeon-only host satisfies an explicit GPU requirement.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents cli CLI changes documentation Documentation changes electron Electron app changes llm LLM backend changes performance Performance-critical changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants