Skip to content

Agent Lifecycle Manager: install, configure, health check, and status #465

@kovtcharov

Description

@kovtcharov

Summary

Build a lifecycle manager that treats agents as installable units — checking requirements, installing dependencies, downloading models, running health checks, and reporting status. This is what the onboarding agent calls to set up the user's system.

Problem

Today there is no way to:

  • Check if an agent's dependencies are satisfied
  • Install just the deps needed for one agent
  • Know if an agent is in a working, degraded, or broken state
  • Recover from partial installation

`gaia init` downloads models but doesn't validate Python extras, external tools, or env vars.

Design

class AgentLifecycle:
    """Manages the full lifecycle of agents as installable units."""
    
    def __init__(self, registry: AgentRegistry):
        self.registry = registry
    
    def check_requirements(self, name: str) -> RequirementsReport:
        """Check all requirements against the current system.
        
        Returns:
            RequirementsReport with:
                - models_status: {name: downloaded|missing|wrong_version}
                - extras_status: {name: installed|missing}
                - tools_status: {name: found|missing, path: str}
                - env_vars_status: {name: set|missing}
                - system_status: {ram: ok|insufficient, disk: ok|insufficient}
        """
    
    def install(self, name: str, progress_cb=None) -> InstallResult:
        """Install everything needed for an agent.
        
        Steps:
        1. Validate system requirements (RAM, disk) from manifest
        2. Install Python extras: pip install amd-gaia[{extras}]
        3. Download required models via Lemonade
        4. Verify external tools present (warn if missing)
        5. Check env vars (warn if missing)
        6. Run agent self-test (instantiate + health check)
        7. Save state to ~/.gaia/agents/{name}/installed.json
        """
    
    def uninstall(self, name: str) -> bool:
        """Remove agent state (doesn't remove shared models)."""
    
    def configure(self, name: str, config: dict) -> bool:
        """Update agent-specific configuration."""
    
    def health_check(self, name: str) -> HealthStatus:
        """Check if an installed agent is working.
        
        Returns: healthy | degraded | error | not_installed
        - healthy: all deps met, model loaded, self-test passes
        - degraded: working but missing optional deps (e.g., VLM for PDF images)
        - error: required dep missing or self-test fails
        """
    
    def status(self) -> Dict[str, AgentStatus]:
        """Status of all discovered agents."""
    
    def repair(self, name: str) -> RepairResult:
        """Attempt to fix a degraded/errored agent."""

Agent Status States

not_installed → available → installing → installed → healthy
                                ↓                      ↓
                             failed                 degraded
                                                       ↓
                                                     error

Persistent State

~/.gaia/agents/
├── chat/
│   ├── installed.json    # {version, installed_at, models, extras, status}
│   └── config.json       # Agent-specific overrides
├── code/
│   └── installed.json
└── registry_cache.json   # Cached discovery results

Integration with Existing Code

  • `gaia init --profile chat` → calls `lifecycle.install("chat")`
  • `gaia init --profile all` → calls `lifecycle.install(name)` for each agent
  • New: `gaia agent install chat` → `lifecycle.install("chat")`
  • New: `gaia agent status` → `lifecycle.status()`
  • New: `gaia agent health chat` → `lifecycle.health_check("chat")`

Files to Create/Modify

  • `src/gaia/agents/base/lifecycle.py` (NEW, ~500 lines)
  • `src/gaia/installer/init_command.py` (MODIFY) — Delegate to lifecycle
  • `src/gaia/cli.py` (MODIFY) — Add `gaia agent` subcommand group

Acceptance Criteria

  • `check_requirements()` accurately reports status for all dep types
  • `install()` handles models, extras, tools, env vars with progress callbacks
  • `health_check()` distinguishes healthy/degraded/error/not_installed
  • Persistent state in `~/.gaia/agents/` survives restarts
  • `gaia init` works identically (backward compatible)
  • Unit tests with mocked deps (no real downloads)
  • Integration test: install → health check → verify status

Depends On

Enables

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentenhancementNew feature or requestinstallerInstaller changesorchestrationCross-agent orchestrationp0high prioritysdkSDK/framework changes

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions