Skip to content

Gemma4Brain: core process_frame() — unified vision + nav JSON #2

@chatde

Description

@chatde

Parent PRD

#1

What to build

The core of gemma4_brain.py: a process_frame(pil_image) method that makes a single Gemma 4 call and returns a navigation + scene analysis dict compatible with the existing RoverBrain/BehaviorArbiter interface. No memory injection yet — just the bare pipe working.

Acceptance criteria

  • Gemma4Brain.process_frame(pil_image) calls Ollama /api/chat with image_b64 and returns a dict
  • Dict has required keys: scene, nav_decision, danger_level, reaction, objects
  • nav_decision is one of: FORWARD, TURN_LEFT, TURN_RIGHT, BACKUP, STOP
  • danger_level is an integer 0-3
  • JSON parse failures fall back to _fallback_result() (no crash)
  • Gemma4Brain._check_ollama() returns True when gemma4:e4b is loaded
  • BehaviorArbiter.translate_ai_decision(result) works with Gemma4Brain output (same shape as AIBrain)

Blocked by

None — can start immediately

User stories addressed

  • Gemma 4 makes navigation decisions using camera vision
  • Single model replaces both LLaVA and Llama 3.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions