【Hackathon 9th Sprint No.9】feat: implement ES(t) macro/micro cross-validation and refactor analysis utilities #363

Dayuxiaoshui · 2025-11-14T10:30:38Z

This commit implements the Error-aware Speedup Score (ES_t) metric from Section 3.2.2 of the technical report (arXiv:2510.24035), along with the mathematical proofs from Appendix B and C that establish the sample-level validity of both S_t and ES_t metrics.

Key Features:

Appendix B Implementation - Sample-level proof for S_t:
- Micro-level calculation: geometric mean of rectified speedups for all samples
- Macro-level calculation: S_t = α^λ · β^(ληp) · b^(1-λ)
- Cross-validation: both methods produce identical results, proving S_t is equivalent to the geometric mean of sample-level rectified speedups
Appendix C Implementation - Sample-level proof for ES_t:
- Micro-level calculation: geometric mean of error-aware rectified speedups
- Macro-level calculation: ES_t = α^λ · β^(ληp) · γ_t^(1-λ)
- Dynamic penalty factor: γ_t = b^(sum(π_c * indicator(t < c)))
- Cross-validation: validates that ES_t is the geometric mean of error-aware rectified speedups, where failure samples use type-specific dynamic penalties instead of fixed penalty b
Error-aware design (Section 3.2.2):
- Error type classification: c=1 (accuracy), c=2 (runtime crash), c=3 (compile failure)
- Tiered tolerance rules: t≥1 tolerates accuracy errors, t≥2 tolerates runtime crashes, t≥3 tolerates all errors
- Dynamic penalty γ_t adapts based on error type distribution and tolerance level
Independent verification script:
- verify_macro_params.py: calculates and prints all macro parameters (alpha, beta, gamma, lambda, eta, pi) independently
- Enables validation of plot_ESt results by computing each parameter separately
Mandatory validation mechanism:
- plot_ESt.py: enforces macro/micro result matching before adoption
- Rejects results if validation fails, ensuring calculation correctness
Code refactoring for maintainability:
- macro_statistics.py: dedicated module for macro parameter calculations
- Each parameter has independent function (alpha, beta, gamma, lambda, eta, pi)
- Reduced nesting levels in analysis_util.py by extracting helper functions
- Simplified scan_all_folders and added .txt file support
- Improved code organization following software engineering best practices

Technical Details:

Micro calculation: processes each sample individually, applies rectified speedup rules, then computes geometric mean
Macro calculation: uses aggregated statistics (correct count, speedup distributions, error type proportions) to compute expected values
Validation: compares micro and macro results with tolerance threshold (1e-6)
All calculations verified against real benchmark data (118 samples)

Files Changed:

graph_net/analysis_util.py: refactored with helper functions, integrated macro_statistics module, reduced nesting, simplified scan_all_folders
graph_net/macro_statistics.py: new module for macro parameter calculations
graph_net/plot_ESt.py: added mandatory macro/micro validation
graph_net/verify_macro_params.py: new independent verification script

All code passes pre-commit checks, compiles successfully, and has been validated with real benchmark data.

PR Category

Description

…sis utilities This commit implements the Error-aware Speedup Score (ES_t) metric from Section 3.2.2 of the technical report (arXiv:2510.24035), along with the mathematical proofs from Appendix B and C that establish the sample-level validity of both S_t and ES_t metrics. Key Features: ============= 1. Appendix B Implementation - Sample-level proof for S_t: - Micro-level calculation: geometric mean of rectified speedups for all samples - Macro-level calculation: S_t = α^λ · β^(ληp) · b^(1-λ) - Cross-validation: both methods produce identical results, proving S_t is equivalent to the geometric mean of sample-level rectified speedups 2. Appendix C Implementation - Sample-level proof for ES_t: - Micro-level calculation: geometric mean of error-aware rectified speedups - Macro-level calculation: ES_t = α^λ · β^(ληp) · γ_t^(1-λ) - Dynamic penalty factor: γ_t = b^(sum(π_c * indicator(t < c))) - Cross-validation: validates that ES_t is the geometric mean of error-aware rectified speedups, where failure samples use type-specific dynamic penalties instead of fixed penalty b 3. Error-aware design (Section 3.2.2): - Error type classification: c=1 (accuracy), c=2 (runtime crash), c=3 (compile failure) - Tiered tolerance rules: t≥1 tolerates accuracy errors, t≥2 tolerates runtime crashes, t≥3 tolerates all errors - Dynamic penalty γ_t adapts based on error type distribution and tolerance level 4. Independent verification script: - verify_macro_params.py: calculates and prints all macro parameters (alpha, beta, gamma, lambda, eta, pi) independently - Enables validation of plot_ESt results by computing each parameter separately 5. Mandatory validation mechanism: - plot_ESt.py: enforces macro/micro result matching before adoption - Rejects results if validation fails, ensuring calculation correctness 6. Code refactoring for maintainability: - macro_statistics.py: dedicated module for macro parameter calculations - Each parameter has independent function (alpha, beta, gamma, lambda, eta, pi) - Reduced nesting levels in analysis_util.py by extracting helper functions - Simplified scan_all_folders and added .txt file support - Improved code organization following software engineering best practices Technical Details: ================== - Micro calculation: processes each sample individually, applies rectified speedup rules, then computes geometric mean - Macro calculation: uses aggregated statistics (correct count, speedup distributions, error type proportions) to compute expected values - Validation: compares micro and macro results with tolerance threshold (1e-6) - All calculations verified against real benchmark data (118 samples) Files Changed: ============== - graph_net/analysis_util.py: refactored with helper functions, integrated macro_statistics module, reduced nesting, simplified scan_all_folders - graph_net/macro_statistics.py: new module for macro parameter calculations - graph_net/plot_ESt.py: added mandatory macro/micro validation - graph_net/verify_macro_params.py: new independent verification script All code passes pre-commit checks, compiles successfully, and has been validated with real benchmark data.

paddle-bot · 2025-11-14T10:30:45Z

Thanks for your contribution!

graph_net/macro_statistics.py

graph_net/plot_ESt.py

graph_net/verify_macro_params.py

This commit refactors the evaluation metrics calculation code with the following improvements: 1. Terminology refactoring: macro -> aggregated - Rename macro_statistics.py to samples_statistics.py - Rename verify_macro_params.py to verify_aggregated_params.py - Update all variable and function names accordingly 2. Code structure improvements - Extract verification logic in plot_ESt.py into separate functions * compare_single_tolerance_level (12 lines) * print_verification_result (1 line) * verify_aggregated_micro_consistency (28 lines, meets ≤30 line requirement) - Refactor verify_aggregated_params.py to use functional programming style * Replace structured loops with list comprehensions * Use Counter for error type counting * Reduce multiple traversals to single pass where possible 3. Reduce function parameter coupling - calculate_beta: derive slowdown_speedups internally from correct_speedups - calculate_lambda: derive correct_count internally from correct_speedups - calculate_eta: derive statistics internally from correct_speedups 4. Decouple error type handling - calculate_pi: accept error_type_counts (dict) instead of hardcoded types - calculate_gamma: accept generic parameters (tolerance, get_pi, errno_tolerances) - Support user-defined error codes instead of hardcoded error types 5. Code quality improvements - Use explicit len() checks instead of implicit boolean conversion - Use modern Python type hints (list/tuple instead of typing.List/Tuple) - Improve code readability and maintainability All changes have been verified and pass pre-commit checks.

…regated_params.py

graph_net/samples_statistics.py

graph_net/analysis_util.py

- Replace error_type_counts (dict[str, int]) with errno2count (dict[int, int]) - Add get_errno_from_error_type() to map error type strings to errno (1, 2, 3) - Add get_error_type_from_errno() for reverse mapping when error type strings are needed - Update calculate_pi() to use errno2count and return dict[int, float] - Update calculate_all_aggregated_parameters() to use errno2count and errno_tolerance_thresholds - Update analysis_util.py and verify_aggregated_params.py to use errno2count - Improve code maintainability by using integer errno for sorting and comparison

graph_net/samples_statistics.py

graph_net/verify_aggregated_params.py

graph_net/samples_statistics.py

graph_net/verify_aggregated_params.py

graph_net/plot_ESt.py

- Rename verify_es_match_at_tolerance to compare_aggregated_es_and_microscopic_es - Replace tolerance_level with tolerance parameter - Replace tolerance_threshold with atol/rtol to avoid confusion - Rename verify_aggregated_microscopic_consistency to get_verified_aggregated_es_values - Change return type to dict only (remove all_matched) - Rename verified_scores to verified_es_values - Replace micro with microscopic throughout - Rename check_sample_correctness to get_sample_correctness - Rename t1 variables to first_errno_tolerance - Rename es_components to es_constructor_params - Rename calculate_parameters_for_tolerance to calculate_es_constructor_params_for_tolerance - Rename custom_map to errno_tolerance_overrides - Rename errno_as_tolerances to errno2tolerance - Add enable_aggregation_mode command line option

graph_net/plot_ESt.py

- Modified plot_ES_results to return fig, ax, all_x_coords for external plotting - Added manual plotting of aggregated ES(t) curves in main function - Both microscopic and aggregated curves are plotted on the same graph - Aggregated curves use dashed lines with square markers for distinction - All verification checks pass with floating-point precision differences (1.39e-17)

- Move ax.legend() outside the aggregation mode condition block - Ensure legend is always displayed regardless of aggregation mode - Fix issue where legend was missing when aggregation mode is disabled

Dayuxiaoshui · 2025-11-18T03:53:13Z

…lidation and refactor analysis utilities (PaddlePaddle#363) * feat: implement ES(t) macro/micro cross-validation and refactor analysis utilities This commit implements the Error-aware Speedup Score (ES_t) metric from Section 3.2.2 of the technical report (arXiv:2510.24035), along with the mathematical proofs from Appendix B and C that establish the sample-level validity of both S_t and ES_t metrics. Key Features: ============= 1. Appendix B Implementation - Sample-level proof for S_t: - Micro-level calculation: geometric mean of rectified speedups for all samples - Macro-level calculation: S_t = α^λ · β^(ληp) · b^(1-λ) - Cross-validation: both methods produce identical results, proving S_t is equivalent to the geometric mean of sample-level rectified speedups 2. Appendix C Implementation - Sample-level proof for ES_t: - Micro-level calculation: geometric mean of error-aware rectified speedups - Macro-level calculation: ES_t = α^λ · β^(ληp) · γ_t^(1-λ) - Dynamic penalty factor: γ_t = b^(sum(π_c * indicator(t < c))) - Cross-validation: validates that ES_t is the geometric mean of error-aware rectified speedups, where failure samples use type-specific dynamic penalties instead of fixed penalty b 3. Error-aware design (Section 3.2.2): - Error type classification: c=1 (accuracy), c=2 (runtime crash), c=3 (compile failure) - Tiered tolerance rules: t≥1 tolerates accuracy errors, t≥2 tolerates runtime crashes, t≥3 tolerates all errors - Dynamic penalty γ_t adapts based on error type distribution and tolerance level 4. Independent verification script: - verify_macro_params.py: calculates and prints all macro parameters (alpha, beta, gamma, lambda, eta, pi) independently - Enables validation of plot_ESt results by computing each parameter separately 5. Mandatory validation mechanism: - plot_ESt.py: enforces macro/micro result matching before adoption - Rejects results if validation fails, ensuring calculation correctness 6. Code refactoring for maintainability: - macro_statistics.py: dedicated module for macro parameter calculations - Each parameter has independent function (alpha, beta, gamma, lambda, eta, pi) - Reduced nesting levels in analysis_util.py by extracting helper functions - Simplified scan_all_folders and added .txt file support - Improved code organization following software engineering best practices Technical Details: ================== - Micro calculation: processes each sample individually, applies rectified speedup rules, then computes geometric mean - Macro calculation: uses aggregated statistics (correct count, speedup distributions, error type proportions) to compute expected values - Validation: compares micro and macro results with tolerance threshold (1e-6) - All calculations verified against real benchmark data (118 samples) Files Changed: ============== - graph_net/analysis_util.py: refactored with helper functions, integrated macro_statistics module, reduced nesting, simplified scan_all_folders - graph_net/macro_statistics.py: new module for macro parameter calculations - graph_net/plot_ESt.py: added mandatory macro/micro validation - graph_net/verify_macro_params.py: new independent verification script All code passes pre-commit checks, compiles successfully, and has been validated with real benchmark data. * refactor: rename macro to aggregated and improve code quality This commit refactors the evaluation metrics calculation code with the following improvements: 1. Terminology refactoring: macro -> aggregated - Rename macro_statistics.py to samples_statistics.py - Rename verify_macro_params.py to verify_aggregated_params.py - Update all variable and function names accordingly 2. Code structure improvements - Extract verification logic in plot_ESt.py into separate functions * compare_single_tolerance_level (12 lines) * print_verification_result (1 line) * verify_aggregated_micro_consistency (28 lines, meets ≤30 line requirement) - Refactor verify_aggregated_params.py to use functional programming style * Replace structured loops with list comprehensions * Use Counter for error type counting * Reduce multiple traversals to single pass where possible 3. Reduce function parameter coupling - calculate_beta: derive slowdown_speedups internally from correct_speedups - calculate_lambda: derive correct_count internally from correct_speedups - calculate_eta: derive statistics internally from correct_speedups 4. Decouple error type handling - calculate_pi: accept error_type_counts (dict) instead of hardcoded types - calculate_gamma: accept generic parameters (tolerance, get_pi, errno_tolerances) - Support user-defined error codes instead of hardcoded error types 5. Code quality improvements - Use explicit len() checks instead of implicit boolean conversion - Use modern Python type hints (list/tuple instead of typing.List/Tuple) - Improve code readability and maintainability All changes have been verified and pass pre-commit checks. * style: apply black formatting to samples_statistics.py and verify_aggregated_params.py * refactor: unify error type to errno mapping for better sorting - Replace error_type_counts (dict[str, int]) with errno2count (dict[int, int]) - Add get_errno_from_error_type() to map error type strings to errno (1, 2, 3) - Add get_error_type_from_errno() for reverse mapping when error type strings are needed - Update calculate_pi() to use errno2count and return dict[int, float] - Update calculate_all_aggregated_parameters() to use errno2count and errno_tolerance_thresholds - Update analysis_util.py and verify_aggregated_params.py to use errno2count - Improve code maintainability by using integer errno for sorting and comparison * refactor: split tolerance report generation * refactor: improve naming and semantics for ES calculation - Rename verify_es_match_at_tolerance to compare_aggregated_es_and_microscopic_es - Replace tolerance_level with tolerance parameter - Replace tolerance_threshold with atol/rtol to avoid confusion - Rename verify_aggregated_microscopic_consistency to get_verified_aggregated_es_values - Change return type to dict only (remove all_matched) - Rename verified_scores to verified_es_values - Replace micro with microscopic throughout - Rename check_sample_correctness to get_sample_correctness - Rename t1 variables to first_errno_tolerance - Rename es_components to es_constructor_params - Rename calculate_parameters_for_tolerance to calculate_es_constructor_params_for_tolerance - Rename custom_map to errno_tolerance_overrides - Rename errno_as_tolerances to errno2tolerance - Add enable_aggregation_mode command line option * feat: add aggregated ES(t) plotting and verification - Modified plot_ES_results to return fig, ax, all_x_coords for external plotting - Added manual plotting of aggregated ES(t) curves in main function - Both microscopic and aggregated curves are plotted on the same graph - Aggregated curves use dashed lines with square markers for distinction - All verification checks pass with floating-point precision differences (1.39e-17) * fix: move ax.legend outside aggregation condition block - Move ax.legend() outside the aggregation mode condition block - Ensure legend is always displayed regardless of aggregation mode - Fix issue where legend was missing when aggregation mode is disabled

paddle-bot bot added the contributor External developers label Nov 14, 2025

lixinqi reviewed Nov 14, 2025

View reviewed changes

luotao1 mentioned this pull request Nov 15, 2025

【Hackathon 9th】开源贡献个人挑战赛冲刺赛 PaddlePaddle/Paddle#76333

Open

Dayuxiaoshui added 2 commits November 16, 2025 18:53

style: apply black formatting to samples_statistics.py and verify_agg…

498f60d

…regated_params.py

lixinqi reviewed Nov 16, 2025

View reviewed changes

graph_net/samples_statistics.py Outdated Show resolved Hide resolved

graph_net/analysis_util.py Outdated Show resolved Hide resolved

lixinqi reviewed Nov 17, 2025

View reviewed changes

refactor: split tolerance report generation

a4aa31f

lixinqi reviewed Nov 17, 2025

View reviewed changes

lixinqi reviewed Nov 18, 2025

View reviewed changes

graph_net/plot_ESt.py Outdated Show resolved Hide resolved

Dayuxiaoshui added 2 commits November 18, 2025 11:17

fix: move ax.legend outside aggregation condition block

8654af5

- Move ax.legend() outside the aggregation mode condition block - Ensure legend is always displayed regardless of aggregation mode - Fix issue where legend was missing when aggregation mode is disabled

JewelRoam approved these changes Nov 18, 2025

View reviewed changes

lixinqi merged commit d698d66 into PaddlePaddle:develop Nov 18, 2025
3 checks passed

luotao1 added the PaddlePaddle Hackathon label Nov 24, 2025

【Hackathon 9th Sprint No.9】feat: implement ES(t) macro/micro cross-validation and refactor analysis utilities #363

【Hackathon 9th Sprint No.9】feat: implement ES(t) macro/micro cross-validation and refactor analysis utilities #363

Uh oh!

Conversation

Dayuxiaoshui commented Nov 14, 2025

Key Features:

Technical Details:

Files Changed:

PR Category

Description

Uh oh!

paddle-bot bot commented Nov 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Dayuxiaoshui commented Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants