Viewer consolidation, workflow segmentation, and VM monitoring#9
Merged
Viewer consolidation, workflow segmentation, and VM monitoring#9
Conversation
Phase 1 of viewer consolidation plan: Foundation Changes: - Add openadapt-viewer as local file dependency in pyproject.toml - Create openadapt_ml/training/viewer_components.py adapter module * screenshot_with_predictions() - Screenshot with human/AI overlays * training_metrics() - Training stats metrics grid * playback_controls() - Playback UI controls * correctness_badge() - Pass/fail badge component * generate_comparison_summary() - Model comparison summary - Add tests/test_viewer_screenshots.py with component validation tests - Add openadapt_ml/training/viewer_migration_example.py validation example Design: - Zero breaking changes to existing viewer.py code - Adapter pattern wraps openadapt-viewer with ML-specific context - Functions accept openadapt-ml data structures - Can be incrementally adopted in future phases Next steps (Phase 2): - Gradually migrate viewer.py to use these adapters - Replace inline HTML generation with component calls Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Restored and enhanced the workflow segmentation system from commit dd9a393
with new integration for openadapt-capture format.
## What's Added
### Core Segmentation Pipeline (4 stages):
1. **Stage 1 - Frame Description (VLM)**:
- Converts screenshots + actions into semantic descriptions
- Supports Gemini, Claude, GPT-4o backends
- Automatic caching for efficiency
- File: openadapt_ml/segmentation/frame_describer.py
2. **Stage 2 - Episode Extraction (LLM)**:
- Identifies coherent workflow boundaries
- Few-shot prompting for better quality
- Confidence-based filtering
- File: openadapt_ml/segmentation/segment_extractor.py
3. **Stage 3 - Deduplication (Embeddings)**:
- Finds similar workflows across recordings
- Agglomerative clustering with cosine similarity
- Supports OpenAI or local HuggingFace embeddings
- File: openadapt_ml/segmentation/deduplicator.py
4. **Stage 4 - Annotation (VLM Quality Control)**:
- Auto-annotates episodes for training data quality
- Detects failures, boundary issues, incompleteness
- Human-in-the-loop review workflow
- File: openadapt_ml/segmentation/annotator.py
### Integration Features:
- **CaptureAdapter**: Loads recordings from openadapt-capture SQLite format
- File: openadapt_ml/segmentation/adapters/capture_adapter.py
- Automatically used when capture.db is detected
- Converts events to segmentation format
- **Unified Pipeline**: Run all stages with single API
- File: openadapt_ml/segmentation/pipeline.py
- Automatic intermediate result caching
- Resume support for interrupted runs
- **CLI Interface**: Full command-line interface for all stages
- File: openadapt_ml/segmentation/cli.py
- Commands: describe, extract, deduplicate, annotate, review, export-gold
- **Comprehensive Documentation**:
- File: openadapt_ml/segmentation/README.md
- 20+ code examples
- Complete API reference
- Integration guide
- Cost estimates and performance benchmarks
## Use Cases
1. **Training Data Curation**: Extract and filter high-quality demonstration episodes
2. **Demo Retrieval**: Build searchable libraries for demo-conditioned prompting
3. **Workflow Documentation**: Auto-generate step-by-step guides from recordings
## Data Schemas
All schemas use Pydantic for type safety (openadapt_ml/segmentation/schemas.py):
- ActionTranscript: Frame-by-frame semantic descriptions
- Episode: Coherent workflow segment with boundaries
- CanonicalEpisode: Deduplicated workflow definition
- EpisodeAnnotation: Quality assessment for training data
## Example Usage
```python
from openadapt_ml.segmentation import SegmentationPipeline, PipelineConfig
config = PipelineConfig(
vlm_model="gemini-2.0-flash",
llm_model="gpt-4o",
similarity_threshold=0.85
)
pipeline = SegmentationPipeline(config)
result = pipeline.run(
recordings=["/path/to/recording1", "/path/to/recording2"],
output_dir="workflow_library"
)
print(f"Found {result.unique_episodes} unique workflows")
```
## Next Steps
See openadapt_ml/segmentation/README.md for:
- P0: Integration tests with real openadapt-capture recordings
- P0: Visualization generator for segment boundaries
- P1: Improved prompt engineering and cost optimization
- P2: Active learning and multi-modal features
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Features added: - Azure ML job tracking: Shows recent jobs from last 7 days with status - Cost tracking: Real-time uptime, hourly rate, and cost estimation - VM activity detection: Identifies what VM is currently doing - Evaluation history: Past benchmark runs and success rates (--details flag) - Enhanced UI: Structured dashboard with clear sections and icons New utility functions in vm_monitor.py: - fetch_azure_ml_jobs(): Fetch recent Azure ML jobs with filtering - calculate_vm_costs(): Calculate VM costs with hourly/daily/weekly rates - get_vm_uptime_hours(): Get VM uptime from Azure activity logs - detect_vm_activity(): Detect current VM activity (idle, running, setup) - get_evaluation_history(): Load past evaluation runs from results dir CLI enhancements: - Added --details flag for extended information - Improved output formatting with sections and separators - Better error handling and status icons - Preserved existing SSH tunnel and dashboard functionality Documentation: - Updated CLAUDE.md with new features and usage examples - Added detailed docstrings to all new functions This consolidates VM monitoring into a single enhanced command rather than creating duplicate dashboards, following the viewer consolidation strategy. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Update CaptureAdapter to work with actual openadapt-capture database format. Key changes: - Use screen.frame events instead of generic event types - Pair action events (mouse.down + mouse.up → single click) - Map frame events to screenshots via timestamp matching - Update event type filtering to match openadapt-capture schema - Improve frame-to-action association logic This enables the segmentation pipeline to process real capture recordings from openadapt-capture instead of requiring simulated data. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enhance vm monitor command to provide complete VM usage tracking:
- Real-time VM status (size, IP, power state)
- Activity detection (idle, benchmark running, setup)
- Cost tracking (uptime hours, hourly rate, total cost)
- Azure ML jobs list (last 7 days with status)
- Evaluation history (with --details flag)
- Mock mode for testing without VM (--mock flag)
Add new API endpoints to local.py dashboard server:
- /api/benchmark/status - current job status with ETA
- /api/benchmark/costs - cost breakdown (Azure VM, API, GPU)
- /api/benchmark/metrics - performance metrics by domain
- /api/benchmark/workers - worker status and utilization
- /api/benchmark/runs - list all benchmark runs
- /api/benchmark/tasks/{run}/{task} - task execution details
Update README with VM monitor section including screenshots and
usage examples.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add comprehensive test plan and results for workflow segmentation pipeline: - Test plan with 8 stages from environment setup to documentation - Test results documenting real capture processing outcomes - Test files for CaptureAdapter and segmentation pipeline Add VM monitor screenshot generation scripts and documentation: - Scripts for automated dashboard screenshot generation - Implementation plan for VM monitor screenshot feature - Analysis of screenshot capture approaches Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Archive OpenAdapter (incomplete pre-refactor cloud deployment POC) - Document key takeaways and lessons learned - Reference modern cloud infrastructure in openadapt-ml - Add guidelines for when to archive repositories OpenAdapter was an incomplete proof-of-concept from October 2024 with only 165 lines of code and no ecosystem usage. Cloud deployment is now production-ready in openadapt_ml/cloud/ and benchmarks/azure.py. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add search bar to viewer controls with Ctrl+F / Cmd+F keyboard shortcut - Implement advanced token-based search across step indices, action types, and text - Search filters step list in real-time with result count display - Clear button and Escape key support for resetting search - Consistent UI styling with existing viewer components - Integrates with existing step list filtering Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
5 tasks
- Remove non-existent openadapt_ml.shared_ui import from viewer.py - Skip anthropic test when anthropic package not installed (optional dependency) - Skip viewer_components test when openadapt-viewer not installed (optional dependency) All tests now pass (334 passed, 6 skipped). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Clean consolidation of three major feature sets from PR #7, with redundant commits removed. This PR adds:
openadapt-viewercomponent libraryThis is a rebased and cleaned version of PR #7 that removes commits already merged via PR #6 (unified baseline adapters, CI workflow, linting fixes).
1. Viewer Consolidation (Phase 1)
Changes
uv add openadapt-vieweropenadapt_ml/training/viewer_components.pywith ML-specific wrappers:screenshot_with_predictions()- Screenshot display with human/AI action overlaystraining_metrics()- Training statistics metrics gridplayback_controls()- Playback UI controlscorrectness_badge()- Pass/fail badge componentgenerate_comparison_summary()- Model comparison summarytests/test_viewer_screenshots.pyfor component validationopenadapt_ml/training/viewer_migration_example.pyDesign Principles
viewer.pycode remains unchangedNext Steps (Phase 2)
Once validated, Phase 2 will gradually migrate
viewer.pyfunctions to use these adapters.2. Workflow Segmentation System
A complete ML pipeline for extracting canonical workflows from screen recordings for training data curation and demo retrieval.
Core Pipeline (4 Stages)
Stage 1 - Frame Description (VLM): Converts screenshots + actions into semantic descriptions
Stage 2 - Episode Extraction (LLM): Identifies coherent workflow boundaries
Stage 3 - Deduplication (Embeddings): Finds similar workflows across recordings
Stage 4 - Annotation (VLM Quality Control): Auto-annotates episodes for training quality
Integration Features
Use Cases
Files Added
openadapt_ml/segmentation/(9 new modules, 5,313 lines)openadapt_ml/segmentation/README.md(920 lines of documentation)tests/test_segmentation_pipeline.py+tests/test_capture_adapter.pydocs/SEGMENTATION_TEST_PLAN.md+docs/SEGMENTATION_TEST_RESULTS.md3. VM Monitoring Enhancements
Enhanced the
vm monitorcommand to provide comprehensive Azure VM usage visibility:Features
--detailsflag)--mockflag)Usage
Files Changed
openadapt_ml/benchmarks/vm_monitor.py- New monitoring functionsopenadapt_ml/benchmarks/cli.py- Enhanced vm monitor commandopenadapt_ml/cloud/local.py- Dashboard API endpointsREADME.md- Updated with VM monitor documentationDocumentation
New Documentation Files
docs/REPOSITORY_HISTORY.md- Archive record of OpenAdapter repositorydocs/SEGMENTATION_TEST_PLAN.md- Test plan for workflow segmentationdocs/SEGMENTATION_TEST_RESULTS.md- Test results from real capturesdocs/VM_MONITOR_SCREENSHOT_IMPLEMENTATION.md- VM screenshot feature plandocs/vm_monitor_screenshot_analysis.md- Screenshot capture approachesdocs/screenshots/- VM monitor dashboard screenshotsTest Plan
Relation to PR #7
This PR is a cleaned and rebased version of PR #7 that:
The original PR #7 will be closed in favor of this cleaner version.
Breaking Changes
None - all changes are additive.
🤖 Generated with Claude Code