Feature/surv v1 phase5 metrics by ryanmccann1024 · Pull Request #145 · SDNNetSim/FUSION

ryanmccann1024 · 2025-11-07T16:19:52Z

Quick merge.

Implement comprehensive testing, documentation, and performance validation for survivability v1 features. Testing: - Add integration tests for end-to-end survivability pipeline - Add performance benchmarks for all time/memory budgets - Add regression tests for backward compatibility Documentation: - Update main README with survivability section - Update reporting README with survivability features - Add 4 example configurations with comprehensive guide Example Configurations: - Link failure with KSP-FF baseline - Geographic failure with 1+1 protection - RL policy evaluation with BC - Dataset generation for training All Phase 6 acceptance criteria met: - Integration tests verify E2E workflow - Performance tests validate all budgets (decision time ≤2ms, etc.) - Comprehensive documentation and examples - Backward compatibility preserved Related: phase6-quality/50-testing.md, 51-documentation.md, 52-performance.md

This commit fixes all type annotation and linting errors in the survivability test suite to ensure code quality and type safety. Changes: - Fix KPathCache import from fusion.modules.routing.k_path_cache - Update KSPFFPolicy instantiation (no constructor arguments) - Fix select_path method calls to use correct signature (state, action_mask) - Update get_path_features calls to match actual API signature - Add network_spectrum dict creation in tests for path feature extraction - Remove unused variable assignments flagged by ruff - Fix line length violations (E501) - Remove duplicate backup test files All mypy type checks and ruff linting checks now pass successfully. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Integrating phase 5 metrics and reporting functionality into the phase 6 quality assurance branch. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Added comprehensive survivability-related configuration sections across all config files and templates including: - Offline RL settings for policy configuration - Dataset logging settings for training data collection - Recovery timing parameters for failure simulation - Protection settings for network resilience Updated logging configuration to support dataset logging requirements. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Implemented full integration of DatasetLogger into the simulation engine to enable offline RL dataset collection during simulations. Changes: - Added DatasetLogger initialization in SimulationEngine.__init__ with proper directory structure (data/training_data/{network}/{date}/{time}/{thread}/) - Implemented _log_dataset_transition() to capture state-action-reward transitions after each routing decision - Ensured logger is properly closed on simulation completion - Added all survivability configuration sections to schema.py: * dataset_logging (log_offline_dataset, dataset_output_path, epsilon_mix) * offline_rl_settings (policy_type, fallback_policy, device) * recovery_timing (protection_switchover_ms, restoration_latency_ms, etc.) * protection_settings (protection_mode) * routing_settings (route_method, k_paths, path_ordering, precompute_paths) * failure_settings (failure_type, geo settings, timing parameters) * reporting (export_csv, csv_output_path) - Updated .gitignore to exclude data/training_data directory Dataset format: Each transition includes state (src, dst, bandwidth, k_paths), action (selected path index), reward (+1.0/-1.0), action_mask (path feasibility), and metadata (request_id, arrival_time, decision_time_ms). Related: fusion/configs/examples/dataset_generation.ini now functional 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Changed sim_start format from '%m%d_%H_%M_%S_%f' to '%H_%M_%S_%f' and created separate self.date to avoid date duplication in paths. Before: data/output/NSFNet/1027/1027_17_54_36_579394/s1/ After: data/output/NSFNet/1027/17_54_36_579394/s1/ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fixed multiple critical bugs in simulation and dataset generation: 1. Erlang loop bug: BatchRunner was ignoring erlang_start/stop/step parameters and defaulting to erlang=300. Now properly reads config values and makes erlang_stop inclusive. 2. CLI default override bug: --max_iters had default=3 in CLI parser, which was overriding config file values. Changed to default=None to respect config files. 3. Last iteration save: Made explicit check to ensure last iteration always saves statistics regardless of save_step value. 4. Dataset file naming: Added erlang value to dataset filename (dataset_erlang_{erlang}.jsonl) so each traffic volume gets its own file instead of overwriting. 5. Dataset metadata: Added erlang and iteration fields to each transition in the dataset for better tracking. Files changed: - fusion/cli/parameters/traffic.py: Remove default=3 from max_iters - fusion/sim/batch_runner.py: Fix erlang parameter reading - fusion/sim/network_simulator.py: Make erlang_stop inclusive - fusion/core/simulation.py: Fix save logic, dataset naming, metadata - fusion/reporting/dataset_logger.py: Revert append mode to write mode 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add complete CLI argument support for survivability experiments including failure injection, protection mechanisms, RL policies, and dataset logging. - Create fusion/cli/parameters/survivability.py with all argument groups - Register survivability arguments in CLI registry - Add survivability args to run_sim command - Enable CLI override of config file parameters 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Implements Section 6 (Integration) from survivability-v1 specs, completing the missing integration between FailureManager and the simulation execution. Changes: - SimulationEngine: Add FailureManager initialization and scheduling - SDNController: Add path feasibility checking for failed links - Automatic type conversion for node IDs (handles string/int mismatch) - Schedule failures using actual Poisson arrival times instead of indices - Add repair checking in main simulation loop - Update example config with valid link and debug logging Integration flow: 1. FailureManager created after topology initialization 2. Failure scheduled in first iteration using real request times 3. SDNController checks path feasibility before allocation 4. Repairs processed during request handling loop Fixes issue where failures were configured but never injected during simulation execution. All survivability phase 2-5 modules now fully integrated and functional. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…processing bugs - Fix 7 ruff E501 line-too-long errors in sdn_controller.py and simulation.py - Rename config sections to follow *_settings naming convention: - dataset_logging -> dataset_logging_settings - recovery_timing -> recovery_timing_settings - reporting -> reporting_settings - Fix test_run_generic_sim_multiple_erlangs_sequential expecting 3 runs - Fix test_get_logger_with_new_name_calls_setup assertion signature - Fix KeyError when processing missing optional config sections - Fix TypeError in failure scheduling by not setting missing optional values to None - Update config processing to skip missing optional options instead of setting to None All ruff checks now pass and unit tests fixed.

- Rename .github/issue_template to ISSUE_TEMPLATE (GitHub canonical format) - Fix broken links in issue template config.yml (Architecture Plan, Publications) - Add comprehensive ARCHITECTURE.md with system design, components, and data flow - Enhance README Publications section with structured citation format - Remove GitHub Discussions link from issue resources - Add placeholder for community-contributed publications All issue template resource links now point to existing documentation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Modernize all GitHub issue templates, PR templates, and commit message guide by removing emojis from section headers and titles. This creates a more professional appearance appropriate for a research simulator while maintaining all functionality and structure. Files updated: - Issue templates (bug report, feature request, config) - PR templates (feature, hotfix, general) - Commit message guide 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Update config validation error message to be path-agnostic since users can pass config files from any location via command line, not just ini/run_ini/. Remove emojis from user-facing error messages in run_gui and run_train for cleaner output. Update TODO entries to clarify that GUI and multi-processing features need full implementation. Standardize docstring formatting across all CLI modules for consistency. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Corrected CLI invocation syntax throughout documentation by adding the missing 'run_sim' subcommand. The correct format is: `python -m fusion.cli.run_sim run_sim --config_path ...` Added comprehensive "Templates vs Examples" section to configs/README.md explaining the distinction between generic reusable templates and specific ready-to-run example configurations. Changes include: - Fix CLI command examples in cli/README.md and configs/examples/README.md - Add "Templates vs Examples" section with comparison table and usage guidance - Add TODO for YAML/JSON configuration file input support - Add TODO for single entry point CLI architecture (fusion run_sim) - Add TODO for schema system consolidation (schema.py vs schemas/*.json) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Remove emojis from all top-level markdown files for professional presentation while maintaining readability and structure. Documentation improvements: - Remove emojis from README.md and DEVELOPMENT_QUICKSTART.md - Add comprehensive CLAUDE.md with project context for AI assistants - Fix placeholder email in CODE_OF_CONDUCT.md enforcement section - Streamline CONTRIBUTING.md with references to detailed standards - Remove research planning files (new-paper-*.md) Code quality improvements: - Remove redundant default values in network_analysis.py - Fix docstring formatting in cli_to_config.py - Add ML support TODO item in core/TODO.md - Remove verbose seeding comment block in simulation.py

…ture Resolve configuration duplication issues by implementing a hybrid system that supports both nested sections and flat backward-compatible access patterns. Changes: - Update config loader to preserve non-general sections as nested dicts - Add mirroring function to copy nested values to root for backward compat - Move route_method and allocation_method from required to optional settings - Reorganize routing and spectrum parameters into dedicated sections - Add missing ml_settings parameters across all config files - Add missing failure_settings parameters to survivability examples This allows new code to access engine_props["routing_settings"]["k_paths"] while legacy code continues to work with engine_props["k_paths"]. All configuration files now have clean separation between general_settings and specialized sections (routing_settings, spectrum_settings, ml_settings). Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

Fix test failures caused by recent routing architecture refactoring that introduced route_props for storing routing algorithm results. Also fix config tests to match hybrid nested/flat configuration architecture and remove emoji expectations per project guidelines. Changes include: - Add default values in network_analysis.get_link_usage_summary - Update factory tests to mock route_props.paths_matrix - Fix config_setup tests for nested optional options - Update CLI tests to remove emoji expectations (GUI and train) - Fix schema tests to match current required options structure - Complete route_props integration in routing algorithms Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

feat(survivability): implement phase 6 - quality assurance

fix(quality): resolve linting errors, unit test failures, and config processing bugs

Fix/survivability

Feature/surv v1 phase7 results

Feature/surv v1 phase6 quality

ryanmccann1024 and others added 22 commits October 16, 2025 15:57

chore: merge feature/surv-v1-phase5-metrics into phase6-quality

6ace58b

Integrating phase 5 metrics and reporting functionality into the phase 6 quality assurance branch. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Merge pull request #137 from SDNNetSim/feature/surv-v1-phase6-quality

116929d

feat(survivability): implement phase 6 - quality assurance

Merge pull request #139 from SDNNetSim/feature/surv-v1-phase7-results

5ae7b99

fix(quality): resolve linting errors, unit test failures, and config processing bugs

Merge pull request #142 from SDNNetSim/fix/survivability

e168663

Fix/survivability

Merge pull request #143 from SDNNetSim/feature/surv-v1-phase7-results

4878507

Feature/surv v1 phase7 results

Merge pull request #144 from SDNNetSim/feature/surv-v1-phase6-quality

9fe8bd7

Feature/surv v1 phase6 quality

ryanmccann1024 merged commit a69d2f4 into feature/surv-v1-phase4-rl-integration Nov 7, 2025
6 checks passed

ryanmccann1024 deleted the feature/surv-v1-phase5-metrics branch January 19, 2026 19:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/surv v1 phase5 metrics#145

Feature/surv v1 phase5 metrics#145
ryanmccann1024 merged 22 commits intofeature/surv-v1-phase4-rl-integrationfrom
feature/surv-v1-phase5-metrics

ryanmccann1024 commented Nov 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ryanmccann1024 commented Nov 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant