Skip to content

Setup executor: progressive installation with resume, rollback, and parallel downloads #468

@kovtcharov

Description

@kovtcharov

Summary

Execute the install plan generated by the onboarding agent with progress tracking, crash recovery, parallel model downloads, and rollback on failure. Smallest model downloads first so the user can start chatting while larger models continue downloading.

Design

Execution Strategy

1. Pre-flight validation
   - Verify disk space for total plan
   - Verify network connectivity
   - Check for conflicting installs in progress

2. Install Lemonade Server (if needed)
   - Version check → upgrade if needed
   - Platform-specific MSI/DEB install

3. Download models (priority order — smallest first)
   - Qwen3-0.6B (400MB) → user can start minimal chat immediately
   - nomic-embed (500MB) → enables RAG
   - Qwen3-VL-4B (3GB) → enables vision
   - Qwen3-Coder-30B (18GB) → full capability
   
   Parallel downloads where safe (models are independent).
   After each model: update state, notify user of new capability.

4. Install Python extras
   - pip install amd-gaia[rag,audio,mcp] as needed
   - Verify imports after install

5. Configure services
   - Create ~/.gaia/ directory structure
   - Set up MCP config if needed
   - Write agent install state files

6. Validation
   - Run health check on each installed agent
   - Report final status

Persistent State for Resume

// ~/.gaia/setup_state.json
{
  "plan_id": "abc123",
  "started_at": "2026-03-08T10:00:00Z",
  "status": "in_progress",
  "steps": [
    {"type": "lemonade", "status": "completed", "completed_at": "..."},
    {"type": "model", "name": "Qwen3-0.6B", "status": "completed"},
    {"type": "model", "name": "Qwen3-Coder-30B", "status": "downloading", "progress": 0.45},
    {"type": "extras", "names": ["rag"], "status": "pending"},
    {"type": "validation", "status": "pending"}
  ],
  "user_profile": { ... }
}

On restart, executor reads state and resumes from last incomplete step.

Progress Callbacks

@dataclass  
class SetupProgress:
    current_step: int
    total_steps: int
    step_name: str
    step_progress: float     # 0.0 - 1.0 within current step
    overall_progress: float  # 0.0 - 1.0
    bytes_downloaded: int
    bytes_total: int
    estimated_remaining_s: int
    message: str             # Human-readable status
    new_capability: Optional[str]  # "You can now use: gaia chat"

Callbacks work for both CLI (Rich progress bar) and Chat UI (SSE events).

Files to Create/Modify

  • `src/gaia/installer/setup_executor.py` (NEW, ~500 lines)
  • `src/gaia/installer/init_command.py` (MODIFY) — delegate to executor
  • `tests/unit/installer/test_setup_executor.py` (NEW)

Acceptance Criteria

  • Smallest models download first → progressive capability unlock
  • Resume after crash: reads `setup_state.json`, skips completed steps
  • Progress callbacks work for CLI and headless modes
  • Pre-flight rejects if insufficient disk space
  • Failed step doesn't block independent subsequent steps
  • Final validation runs health check on all installed agents
  • Unit tests with mocked downloads and installs
  • `gaia init` backward compatible (uses executor internally)

Depends On

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestinstallerInstaller changesonboardingFirst-run experience, setup wizard, and user onboardingp1medium priority

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions