Feature/surv v1 phase5 metrics#145
Merged
ryanmccann1024 merged 22 commits intofeature/surv-v1-phase4-rl-integrationfrom Nov 7, 2025
Merged
Feature/surv v1 phase5 metrics#145ryanmccann1024 merged 22 commits intofeature/surv-v1-phase4-rl-integrationfrom
ryanmccann1024 merged 22 commits intofeature/surv-v1-phase4-rl-integrationfrom
Conversation
Implement comprehensive testing, documentation, and performance validation for survivability v1 features. Testing: - Add integration tests for end-to-end survivability pipeline - Add performance benchmarks for all time/memory budgets - Add regression tests for backward compatibility Documentation: - Update main README with survivability section - Update reporting README with survivability features - Add 4 example configurations with comprehensive guide Example Configurations: - Link failure with KSP-FF baseline - Geographic failure with 1+1 protection - RL policy evaluation with BC - Dataset generation for training All Phase 6 acceptance criteria met: - Integration tests verify E2E workflow - Performance tests validate all budgets (decision time ≤2ms, etc.) - Comprehensive documentation and examples - Backward compatibility preserved Related: phase6-quality/50-testing.md, 51-documentation.md, 52-performance.md
This commit fixes all type annotation and linting errors in the survivability test suite to ensure code quality and type safety. Changes: - Fix KPathCache import from fusion.modules.routing.k_path_cache - Update KSPFFPolicy instantiation (no constructor arguments) - Fix select_path method calls to use correct signature (state, action_mask) - Update get_path_features calls to match actual API signature - Add network_spectrum dict creation in tests for path feature extraction - Remove unused variable assignments flagged by ruff - Fix line length violations (E501) - Remove duplicate backup test files All mypy type checks and ruff linting checks now pass successfully. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Integrating phase 5 metrics and reporting functionality into the phase 6 quality assurance branch. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Added comprehensive survivability-related configuration sections across all config files and templates including: - Offline RL settings for policy configuration - Dataset logging settings for training data collection - Recovery timing parameters for failure simulation - Protection settings for network resilience Updated logging configuration to support dataset logging requirements. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Implemented full integration of DatasetLogger into the simulation engine
to enable offline RL dataset collection during simulations.
Changes:
- Added DatasetLogger initialization in SimulationEngine.__init__ with
proper directory structure (data/training_data/{network}/{date}/{time}/{thread}/)
- Implemented _log_dataset_transition() to capture state-action-reward
transitions after each routing decision
- Ensured logger is properly closed on simulation completion
- Added all survivability configuration sections to schema.py:
* dataset_logging (log_offline_dataset, dataset_output_path, epsilon_mix)
* offline_rl_settings (policy_type, fallback_policy, device)
* recovery_timing (protection_switchover_ms, restoration_latency_ms, etc.)
* protection_settings (protection_mode)
* routing_settings (route_method, k_paths, path_ordering, precompute_paths)
* failure_settings (failure_type, geo settings, timing parameters)
* reporting (export_csv, csv_output_path)
- Updated .gitignore to exclude data/training_data directory
Dataset format:
Each transition includes state (src, dst, bandwidth, k_paths), action
(selected path index), reward (+1.0/-1.0), action_mask (path feasibility),
and metadata (request_id, arrival_time, decision_time_ms).
Related: fusion/configs/examples/dataset_generation.ini now functional
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Changed sim_start format from '%m%d_%H_%M_%S_%f' to '%H_%M_%S_%f' and created separate self.date to avoid date duplication in paths. Before: data/output/NSFNet/1027/1027_17_54_36_579394/s1/ After: data/output/NSFNet/1027/17_54_36_579394/s1/ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixed multiple critical bugs in simulation and dataset generation:
1. Erlang loop bug: BatchRunner was ignoring erlang_start/stop/step
parameters and defaulting to erlang=300. Now properly reads config
values and makes erlang_stop inclusive.
2. CLI default override bug: --max_iters had default=3 in CLI parser,
which was overriding config file values. Changed to default=None
to respect config files.
3. Last iteration save: Made explicit check to ensure last iteration
always saves statistics regardless of save_step value.
4. Dataset file naming: Added erlang value to dataset filename
(dataset_erlang_{erlang}.jsonl) so each traffic volume gets its
own file instead of overwriting.
5. Dataset metadata: Added erlang and iteration fields to each
transition in the dataset for better tracking.
Files changed:
- fusion/cli/parameters/traffic.py: Remove default=3 from max_iters
- fusion/sim/batch_runner.py: Fix erlang parameter reading
- fusion/sim/network_simulator.py: Make erlang_stop inclusive
- fusion/core/simulation.py: Fix save logic, dataset naming, metadata
- fusion/reporting/dataset_logger.py: Revert append mode to write mode
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add complete CLI argument support for survivability experiments including failure injection, protection mechanisms, RL policies, and dataset logging. - Create fusion/cli/parameters/survivability.py with all argument groups - Register survivability arguments in CLI registry - Add survivability args to run_sim command - Enable CLI override of config file parameters 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Implements Section 6 (Integration) from survivability-v1 specs, completing the missing integration between FailureManager and the simulation execution. Changes: - SimulationEngine: Add FailureManager initialization and scheduling - SDNController: Add path feasibility checking for failed links - Automatic type conversion for node IDs (handles string/int mismatch) - Schedule failures using actual Poisson arrival times instead of indices - Add repair checking in main simulation loop - Update example config with valid link and debug logging Integration flow: 1. FailureManager created after topology initialization 2. Failure scheduled in first iteration using real request times 3. SDNController checks path feasibility before allocation 4. Repairs processed during request handling loop Fixes issue where failures were configured but never injected during simulation execution. All survivability phase 2-5 modules now fully integrated and functional. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…processing bugs - Fix 7 ruff E501 line-too-long errors in sdn_controller.py and simulation.py - Rename config sections to follow *_settings naming convention: - dataset_logging -> dataset_logging_settings - recovery_timing -> recovery_timing_settings - reporting -> reporting_settings - Fix test_run_generic_sim_multiple_erlangs_sequential expecting 3 runs - Fix test_get_logger_with_new_name_calls_setup assertion signature - Fix KeyError when processing missing optional config sections - Fix TypeError in failure scheduling by not setting missing optional values to None - Update config processing to skip missing optional options instead of setting to None All ruff checks now pass and unit tests fixed.
- Rename .github/issue_template to ISSUE_TEMPLATE (GitHub canonical format) - Fix broken links in issue template config.yml (Architecture Plan, Publications) - Add comprehensive ARCHITECTURE.md with system design, components, and data flow - Enhance README Publications section with structured citation format - Remove GitHub Discussions link from issue resources - Add placeholder for community-contributed publications All issue template resource links now point to existing documentation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Modernize all GitHub issue templates, PR templates, and commit message guide by removing emojis from section headers and titles. This creates a more professional appearance appropriate for a research simulator while maintaining all functionality and structure. Files updated: - Issue templates (bug report, feature request, config) - PR templates (feature, hotfix, general) - Commit message guide 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Update config validation error message to be path-agnostic since users can pass config files from any location via command line, not just ini/run_ini/. Remove emojis from user-facing error messages in run_gui and run_train for cleaner output. Update TODO entries to clarify that GUI and multi-processing features need full implementation. Standardize docstring formatting across all CLI modules for consistency. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Corrected CLI invocation syntax throughout documentation by adding the missing 'run_sim' subcommand. The correct format is: `python -m fusion.cli.run_sim run_sim --config_path ...` Added comprehensive "Templates vs Examples" section to configs/README.md explaining the distinction between generic reusable templates and specific ready-to-run example configurations. Changes include: - Fix CLI command examples in cli/README.md and configs/examples/README.md - Add "Templates vs Examples" section with comparison table and usage guidance - Add TODO for YAML/JSON configuration file input support - Add TODO for single entry point CLI architecture (fusion run_sim) - Add TODO for schema system consolidation (schema.py vs schemas/*.json) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Remove emojis from all top-level markdown files for professional presentation while maintaining readability and structure. Documentation improvements: - Remove emojis from README.md and DEVELOPMENT_QUICKSTART.md - Add comprehensive CLAUDE.md with project context for AI assistants - Fix placeholder email in CODE_OF_CONDUCT.md enforcement section - Streamline CONTRIBUTING.md with references to detailed standards - Remove research planning files (new-paper-*.md) Code quality improvements: - Remove redundant default values in network_analysis.py - Fix docstring formatting in cli_to_config.py - Add ML support TODO item in core/TODO.md - Remove verbose seeding comment block in simulation.py
…ture Resolve configuration duplication issues by implementing a hybrid system that supports both nested sections and flat backward-compatible access patterns. Changes: - Update config loader to preserve non-general sections as nested dicts - Add mirroring function to copy nested values to root for backward compat - Move route_method and allocation_method from required to optional settings - Reorganize routing and spectrum parameters into dedicated sections - Add missing ml_settings parameters across all config files - Add missing failure_settings parameters to survivability examples This allows new code to access engine_props["routing_settings"]["k_paths"] while legacy code continues to work with engine_props["k_paths"]. All configuration files now have clean separation between general_settings and specialized sections (routing_settings, spectrum_settings, ml_settings). Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
Fix test failures caused by recent routing architecture refactoring that introduced route_props for storing routing algorithm results. Also fix config tests to match hybrid nested/flat configuration architecture and remove emoji expectations per project guidelines. Changes include: - Add default values in network_analysis.get_link_usage_summary - Update factory tests to mock route_props.paths_matrix - Fix config_setup tests for nested optional options - Update CLI tests to remove emoji expectations (GUI and train) - Fix schema tests to match current required options structure - Complete route_props integration in routing algorithms Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
feat(survivability): implement phase 6 - quality assurance
fix(quality): resolve linting errors, unit test failures, and config processing bugs
Fix/survivability
Feature/surv v1 phase7 results
Feature/surv v1 phase6 quality
a69d2f4
into
feature/surv-v1-phase4-rl-integration
6 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Quick merge.