Phase 1: Viewer Consolidation Foundation by abrichr · Pull Request #7 · OpenAdaptAI/openadapt-ml

abrichr · 2026-01-17T14:31:47Z

Summary

Implements Phase 1 (Foundation) of the viewer consolidation plan. Establishes the foundation for migrating from inline HTML generation to the reusable openadapt-viewer component library.

Changes

✅ Add openadapt-viewer dependency via uv add openadapt-viewer
✅ Create adapter module openadapt_ml/training/viewer_components.py with ML-specific wrappers:
- screenshot_with_predictions() - Screenshot display with human/AI action overlays
- training_metrics() - Training statistics metrics grid
- playback_controls() - Playback UI controls
- correctness_badge() - Pass/fail badge component
- generate_comparison_summary() - Model comparison summary
✅ Add screenshot tests tests/test_viewer_screenshots.py for component validation
✅ Create validation example openadapt_ml/training/viewer_migration_example.py demonstrating usage

Design Principles

Zero breaking changes - Existing viewer.py code remains unchanged
Adapter pattern - Wraps openadapt-viewer components with ML-specific context
Incremental adoption - Can be gradually adopted in future phases
Type-safe - Functions accept openadapt-ml data structures

Test Plan

Component imports work correctly
All wrapper functions generate valid HTML
Example viewer generates successfully
Tests pass: uv run pytest tests/test_viewer_screenshots.py -v
Example runs: uv run python -m openadapt_ml.training.viewer_migration_example

Next Steps (Phase 2)

Once Phase 1 is validated, Phase 2 will:

Gradually migrate viewer.py functions to use these adapters
Replace inline HTML/CSS generation with component calls
Maintain backward compatibility throughout

Add comprehensive unified baseline adapters supporting Claude, GPT, and Gemini models across multiple evaluation tracks: Provider Abstraction (models/providers/): - BaseAPIProvider ABC with common interface for all providers - AnthropicProvider: Base64 PNG encoding, Messages API - OpenAIProvider: Data URL format, Chat Completions API - GoogleProvider: Native PIL Image support, GenerateContent API - Factory functions: get_provider(), resolve_model_alias() - Error hierarchy: ProviderError, AuthenticationError, RateLimitError Baseline Module (baselines/): - TrackType enum: TRACK_A (coords), TRACK_B (ReAct), TRACK_C (SoM) - TrackConfig dataclass with factory methods for each track - BaselineConfig with model alias resolution and registry - PromptBuilder for track-specific system prompts and user content - UnifiedResponseParser supporting JSON, function-call, PyAutoGUI formats - ElementRegistry for element_id to coordinate conversion Benchmark Integration: - UnifiedBaselineAgent wrapping UnifiedBaselineAdapter for benchmarks - Converts BenchmarkObservation -> adapter format -> BenchmarkAction - Support for all three tracks via --track flag CLI Commands (baselines/cli.py): - run: Single model prediction with track selection - compare: Multi-model comparison on same task - list-models: Show available models and providers All 92 tests pass. Ready for model comparison experiments. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…atibility All dependencies (torch, transformers, pillow, peft, etc.) support Python 3.10+. The 3.12 requirement was unnecessarily restrictive and broke `pip install openadapt[all]` on Python 3.10 and 3.11. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add CI workflow that runs on pull requests and main branch pushes: - Tests on Python 3.10 and 3.11 - Runs on Ubuntu and macOS - Uses uv for dependency management - Runs ruff linter and formatter - Runs pytest suite Matches pattern used by openadapt-viewer and follows OpenAdapt ecosystem conventions. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- cluster_id: default=0 - cluster_centroid_distance: default=0.0 - internal_similarity: default=1.0 Fixes 1/14 test failures in test_segmentation.py

Phase 1 of viewer consolidation plan: Foundation Changes: - Add openadapt-viewer as local file dependency in pyproject.toml - Create openadapt_ml/training/viewer_components.py adapter module * screenshot_with_predictions() - Screenshot with human/AI overlays * training_metrics() - Training stats metrics grid * playback_controls() - Playback UI controls * correctness_badge() - Pass/fail badge component * generate_comparison_summary() - Model comparison summary - Add tests/test_viewer_screenshots.py with component validation tests - Add openadapt_ml/training/viewer_migration_example.py validation example Design: - Zero breaking changes to existing viewer.py code - Adapter pattern wraps openadapt-viewer with ML-specific context - Functions accept openadapt-ml data structures - Can be incrementally adopted in future phases Next steps (Phase 2): - Gradually migrate viewer.py to use these adapters - Replace inline HTML generation with component calls Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Fixed 158 linting errors in openadapt_ml/benchmarks/cli.py: - F541: Removed extraneous f-string prefixes (150 instances auto-fixed) - E402: Moved warnings import to top of file with other imports - F841: Removed unused variables (qemu_commands, run_name, all_ready, server_process) - E741: Renamed ambiguous variable 'l' to 'line' - F821: Added missing time import to cmd_vm function Also updated README.md with documentation about openadapt-evals integration. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Fixed across all modules: - E402: Moved imports to top (benchmarks/__init__.py) - E741: Renamed ambiguous variable 'l' to 'loss_entry' (trainer.py, lambda_labs.py) - E722: Replaced bare except with specific exceptions (lambda_labs.py) - F401: Added noqa comments for re-exported imports (ingest/) - F811: Renamed shadowing variable in config.py - F821: Added Episode import to TYPE_CHECKING block (grounding.py) - F541: Removed extraneous f-string prefixes (auto-fixed 95 instances) - F841: Removed unused variables (auto-fixed 20 instances) All modules now pass ruff check without errors. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Applied ruff format to ensure consistent code style across all modules. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Restored and enhanced the workflow segmentation system from commit dd9a393 with new integration for openadapt-capture format. ## What's Added ### Core Segmentation Pipeline (4 stages): 1. **Stage 1 - Frame Description (VLM)**: - Converts screenshots + actions into semantic descriptions - Supports Gemini, Claude, GPT-4o backends - Automatic caching for efficiency - File: openadapt_ml/segmentation/frame_describer.py 2. **Stage 2 - Episode Extraction (LLM)**: - Identifies coherent workflow boundaries - Few-shot prompting for better quality - Confidence-based filtering - File: openadapt_ml/segmentation/segment_extractor.py 3. **Stage 3 - Deduplication (Embeddings)**: - Finds similar workflows across recordings - Agglomerative clustering with cosine similarity - Supports OpenAI or local HuggingFace embeddings - File: openadapt_ml/segmentation/deduplicator.py 4. **Stage 4 - Annotation (VLM Quality Control)**: - Auto-annotates episodes for training data quality - Detects failures, boundary issues, incompleteness - Human-in-the-loop review workflow - File: openadapt_ml/segmentation/annotator.py ### Integration Features: - **CaptureAdapter**: Loads recordings from openadapt-capture SQLite format - File: openadapt_ml/segmentation/adapters/capture_adapter.py - Automatically used when capture.db is detected - Converts events to segmentation format - **Unified Pipeline**: Run all stages with single API - File: openadapt_ml/segmentation/pipeline.py - Automatic intermediate result caching - Resume support for interrupted runs - **CLI Interface**: Full command-line interface for all stages - File: openadapt_ml/segmentation/cli.py - Commands: describe, extract, deduplicate, annotate, review, export-gold - **Comprehensive Documentation**: - File: openadapt_ml/segmentation/README.md - 20+ code examples - Complete API reference - Integration guide - Cost estimates and performance benchmarks ## Use Cases 1. **Training Data Curation**: Extract and filter high-quality demonstration episodes 2. **Demo Retrieval**: Build searchable libraries for demo-conditioned prompting 3. **Workflow Documentation**: Auto-generate step-by-step guides from recordings ## Data Schemas All schemas use Pydantic for type safety (openadapt_ml/segmentation/schemas.py): - ActionTranscript: Frame-by-frame semantic descriptions - Episode: Coherent workflow segment with boundaries - CanonicalEpisode: Deduplicated workflow definition - EpisodeAnnotation: Quality assessment for training data ## Example Usage ```python from openadapt_ml.segmentation import SegmentationPipeline, PipelineConfig config = PipelineConfig( vlm_model="gemini-2.0-flash", llm_model="gpt-4o", similarity_threshold=0.85 ) pipeline = SegmentationPipeline(config) result = pipeline.run( recordings=["/path/to/recording1", "/path/to/recording2"], output_dir="workflow_library" ) print(f"Found {result.unique_episodes} unique workflows") ``` ## Next Steps See openadapt_ml/segmentation/README.md for: - P0: Integration tests with real openadapt-capture recordings - P0: Visualization generator for segment boundaries - P1: Improved prompt engineering and cost optimization - P2: Active learning and multi-modal features Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Features added: - Azure ML job tracking: Shows recent jobs from last 7 days with status - Cost tracking: Real-time uptime, hourly rate, and cost estimation - VM activity detection: Identifies what VM is currently doing - Evaluation history: Past benchmark runs and success rates (--details flag) - Enhanced UI: Structured dashboard with clear sections and icons New utility functions in vm_monitor.py: - fetch_azure_ml_jobs(): Fetch recent Azure ML jobs with filtering - calculate_vm_costs(): Calculate VM costs with hourly/daily/weekly rates - get_vm_uptime_hours(): Get VM uptime from Azure activity logs - detect_vm_activity(): Detect current VM activity (idle, running, setup) - get_evaluation_history(): Load past evaluation runs from results dir CLI enhancements: - Added --details flag for extended information - Improved output formatting with sections and separators - Better error handling and status icons - Preserved existing SSH tunnel and dashboard functionality Documentation: - Updated CLAUDE.md with new features and usage examples - Added detailed docstrings to all new functions This consolidates VM monitoring into a single enhanced command rather than creating duplicate dashboards, following the viewer consolidation strategy. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Update CaptureAdapter to work with actual openadapt-capture database format. Key changes: - Use screen.frame events instead of generic event types - Pair action events (mouse.down + mouse.up → single click) - Map frame events to screenshots via timestamp matching - Update event type filtering to match openadapt-capture schema - Improve frame-to-action association logic This enables the segmentation pipeline to process real capture recordings from openadapt-capture instead of requiring simulated data. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Enhance vm monitor command to provide complete VM usage tracking: - Real-time VM status (size, IP, power state) - Activity detection (idle, benchmark running, setup) - Cost tracking (uptime hours, hourly rate, total cost) - Azure ML jobs list (last 7 days with status) - Evaluation history (with --details flag) - Mock mode for testing without VM (--mock flag) Add new API endpoints to local.py dashboard server: - /api/benchmark/status - current job status with ETA - /api/benchmark/costs - cost breakdown (Azure VM, API, GPU) - /api/benchmark/metrics - performance metrics by domain - /api/benchmark/workers - worker status and utilization - /api/benchmark/runs - list all benchmark runs - /api/benchmark/tasks/{run}/{task} - task execution details Update README with VM monitor section including screenshots and usage examples. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add comprehensive test plan and results for workflow segmentation pipeline: - Test plan with 8 stages from environment setup to documentation - Test results documenting real capture processing outcomes - Test files for CaptureAdapter and segmentation pipeline Add VM monitor screenshot generation scripts and documentation: - Scripts for automated dashboard screenshot generation - Implementation plan for VM monitor screenshot feature - Analysis of screenshot capture approaches Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Archive OpenAdapter (incomplete pre-refactor cloud deployment POC) - Document key takeaways and lessons learned - Reference modern cloud infrastructure in openadapt-ml - Add guidelines for when to archive repositories OpenAdapter was an incomplete proof-of-concept from October 2024 with only 165 lines of code and no ecosystem usage. Cloud deployment is now production-ready in openadapt_ml/cloud/ and benchmarks/azure.py. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add search bar to viewer controls with Ctrl+F / Cmd+F keyboard shortcut - Implement advanced token-based search across step indices, action types, and text - Search filters step list in real-time with result count display - Clear button and Escape key support for resetting search - Consistent UI styling with existing viewer components - Integrates with existing step list filtering Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

abrichr · 2026-01-18T23:54:55Z

This PR has merge conflicts that need to be resolved before it can be merged. The PR contains significant work:\n\n- 15 commits with substantial changes (20,612 additions, 3,094 deletions)\n- Multiple important features: viewer consolidation, workflow segmentation, VM monitoring, CI improvements\n- All ruff linting issues resolved\n\nNext steps:\n1. Resolve merge conflicts with main branch\n2. Ensure all tests pass after conflict resolution\n3. Request review once conflicts are resolved\n\nThis is high-priority active work and should be merged soon after conflicts are addressed.

abrichr · 2026-01-19T00:09:24Z

Closing in favor of PR #9 which is a cleaned and rebased version of this PR.

PR #9 contains only the new features from this PR (viewer consolidation, workflow segmentation, VM monitoring) while removing commits that were already merged to main via PR #6 (unified baseline adapters, CI workflow, extensive linting fixes).

This resolves all merge conflicts and provides a cleaner commit history for review.

abrichr and others added 16 commits January 16, 2026 23:44

fix: add defaults to CanonicalEpisode fields for test compatibility

385d4ae

- cluster_id: default=0 - cluster_centroid_distance: default=0.0 - internal_similarity: default=1.0 Fixes 1/14 test failures in test_segmentation.py

chore: trigger CI

977f295

style: apply ruff formatting to all Python files

2500689

Applied ruff format to ensure consistent code style across all modules. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

This was referenced Jan 18, 2026

feat: implement unified baseline adapters for VLM comparison #2

Closed

Viewer consolidation, workflow segmentation, and VM monitoring #9

Merged

abrichr closed this Jan 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 1: Viewer Consolidation Foundation#7

Phase 1: Viewer Consolidation Foundation#7
abrichr wants to merge 16 commits intomainfrom
feat/viewer-phase1-foundation

abrichr commented Jan 17, 2026

Uh oh!

abrichr commented Jan 18, 2026

Uh oh!

abrichr commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abrichr commented Jan 17, 2026

Summary

Changes

Design Principles

Test Plan

Next Steps (Phase 2)

Related

Uh oh!

abrichr commented Jan 18, 2026

Uh oh!

abrichr commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant