Skip to content

Conversation

@Dayuxiaoshui
Copy link
Contributor

This commit implements the Error-aware Speedup Score (ES_t) metric from Section 3.2.2 of the technical report (arXiv:2510.24035), along with the mathematical proofs from Appendix B and C that establish the sample-level validity of both S_t and ES_t metrics.

Key Features:

  1. Appendix B Implementation - Sample-level proof for S_t:

    • Micro-level calculation: geometric mean of rectified speedups for all samples
    • Macro-level calculation: S_t = α^λ · β^(ληp) · b^(1-λ)
    • Cross-validation: both methods produce identical results, proving S_t is equivalent to the geometric mean of sample-level rectified speedups
  2. Appendix C Implementation - Sample-level proof for ES_t:

    • Micro-level calculation: geometric mean of error-aware rectified speedups
    • Macro-level calculation: ES_t = α^λ · β^(ληp) · γ_t^(1-λ)
    • Dynamic penalty factor: γ_t = b^(sum(π_c * indicator(t < c)))
    • Cross-validation: validates that ES_t is the geometric mean of error-aware rectified speedups, where failure samples use type-specific dynamic penalties instead of fixed penalty b
  3. Error-aware design (Section 3.2.2):

    • Error type classification: c=1 (accuracy), c=2 (runtime crash), c=3 (compile failure)
    • Tiered tolerance rules: t≥1 tolerates accuracy errors, t≥2 tolerates runtime crashes, t≥3 tolerates all errors
    • Dynamic penalty γ_t adapts based on error type distribution and tolerance level
  4. Independent verification script:

    • verify_macro_params.py: calculates and prints all macro parameters (alpha, beta, gamma, lambda, eta, pi) independently
    • Enables validation of plot_ESt results by computing each parameter separately
  5. Mandatory validation mechanism:

    • plot_ESt.py: enforces macro/micro result matching before adoption
    • Rejects results if validation fails, ensuring calculation correctness
  6. Code refactoring for maintainability:

    • macro_statistics.py: dedicated module for macro parameter calculations
    • Each parameter has independent function (alpha, beta, gamma, lambda, eta, pi)
    • Reduced nesting levels in analysis_util.py by extracting helper functions
    • Simplified scan_all_folders and added .txt file support
    • Improved code organization following software engineering best practices

Technical Details:

  • Micro calculation: processes each sample individually, applies rectified speedup rules, then computes geometric mean
  • Macro calculation: uses aggregated statistics (correct count, speedup distributions, error type proportions) to compute expected values
  • Validation: compares micro and macro results with tolerance threshold (1e-6)
  • All calculations verified against real benchmark data (118 samples)

Files Changed:

  • graph_net/analysis_util.py: refactored with helper functions, integrated macro_statistics module, reduced nesting, simplified scan_all_folders
  • graph_net/macro_statistics.py: new module for macro parameter calculations
  • graph_net/plot_ESt.py: added mandatory macro/micro validation
  • graph_net/verify_macro_params.py: new independent verification script

All code passes pre-commit checks, compiles successfully, and has been validated with real benchmark data.

PR Category

Description

…sis utilities

This commit implements the Error-aware Speedup Score (ES_t) metric from
Section 3.2.2 of the technical report (arXiv:2510.24035), along with the
mathematical proofs from Appendix B and C that establish the sample-level
validity of both S_t and ES_t metrics.

Key Features:
=============

1. Appendix B Implementation - Sample-level proof for S_t:
   - Micro-level calculation: geometric mean of rectified speedups for all samples
   - Macro-level calculation: S_t = α^λ · β^(ληp) · b^(1-λ)
   - Cross-validation: both methods produce identical results, proving S_t
     is equivalent to the geometric mean of sample-level rectified speedups

2. Appendix C Implementation - Sample-level proof for ES_t:
   - Micro-level calculation: geometric mean of error-aware rectified speedups
   - Macro-level calculation: ES_t = α^λ · β^(ληp) · γ_t^(1-λ)
   - Dynamic penalty factor: γ_t = b^(sum(π_c * indicator(t < c)))
   - Cross-validation: validates that ES_t is the geometric mean of
     error-aware rectified speedups, where failure samples use type-specific
     dynamic penalties instead of fixed penalty b

3. Error-aware design (Section 3.2.2):
   - Error type classification: c=1 (accuracy), c=2 (runtime crash), c=3 (compile failure)
   - Tiered tolerance rules: t≥1 tolerates accuracy errors, t≥2 tolerates
     runtime crashes, t≥3 tolerates all errors
   - Dynamic penalty γ_t adapts based on error type distribution and tolerance level

4. Independent verification script:
   - verify_macro_params.py: calculates and prints all macro parameters
     (alpha, beta, gamma, lambda, eta, pi) independently
   - Enables validation of plot_ESt results by computing each parameter separately

5. Mandatory validation mechanism:
   - plot_ESt.py: enforces macro/micro result matching before adoption
   - Rejects results if validation fails, ensuring calculation correctness

6. Code refactoring for maintainability:
   - macro_statistics.py: dedicated module for macro parameter calculations
   - Each parameter has independent function (alpha, beta, gamma, lambda, eta, pi)
   - Reduced nesting levels in analysis_util.py by extracting helper functions
   - Simplified scan_all_folders and added .txt file support
   - Improved code organization following software engineering best practices

Technical Details:
==================

- Micro calculation: processes each sample individually, applies rectified
  speedup rules, then computes geometric mean
- Macro calculation: uses aggregated statistics (correct count, speedup
  distributions, error type proportions) to compute expected values
- Validation: compares micro and macro results with tolerance threshold (1e-6)
- All calculations verified against real benchmark data (118 samples)

Files Changed:
==============
- graph_net/analysis_util.py: refactored with helper functions, integrated
  macro_statistics module, reduced nesting, simplified scan_all_folders
- graph_net/macro_statistics.py: new module for macro parameter calculations
- graph_net/plot_ESt.py: added mandatory macro/micro validation
- graph_net/verify_macro_params.py: new independent verification script

All code passes pre-commit checks, compiles successfully, and has been
validated with real benchmark data.
@paddle-bot
Copy link

paddle-bot bot commented Nov 14, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Nov 14, 2025
This commit refactors the evaluation metrics calculation code with the following improvements:

1. Terminology refactoring: macro -> aggregated
   - Rename macro_statistics.py to samples_statistics.py
   - Rename verify_macro_params.py to verify_aggregated_params.py
   - Update all variable and function names accordingly

2. Code structure improvements
   - Extract verification logic in plot_ESt.py into separate functions
     * compare_single_tolerance_level (12 lines)
     * print_verification_result (1 line)
     * verify_aggregated_micro_consistency (28 lines, meets ≤30 line requirement)
   - Refactor verify_aggregated_params.py to use functional programming style
     * Replace structured loops with list comprehensions
     * Use Counter for error type counting
     * Reduce multiple traversals to single pass where possible

3. Reduce function parameter coupling
   - calculate_beta: derive slowdown_speedups internally from correct_speedups
   - calculate_lambda: derive correct_count internally from correct_speedups
   - calculate_eta: derive statistics internally from correct_speedups

4. Decouple error type handling
   - calculate_pi: accept error_type_counts (dict) instead of hardcoded types
   - calculate_gamma: accept generic parameters (tolerance, get_pi, errno_tolerances)
   - Support user-defined error codes instead of hardcoded error types

5. Code quality improvements
   - Use explicit len() checks instead of implicit boolean conversion
   - Use modern Python type hints (list/tuple instead of typing.List/Tuple)
   - Improve code readability and maintainability

All changes have been verified and pass pre-commit checks.
- Replace error_type_counts (dict[str, int]) with errno2count (dict[int, int])
- Add get_errno_from_error_type() to map error type strings to errno (1, 2, 3)
- Add get_error_type_from_errno() for reverse mapping when error type strings are needed
- Update calculate_pi() to use errno2count and return dict[int, float]
- Update calculate_all_aggregated_parameters() to use errno2count and errno_tolerance_thresholds
- Update analysis_util.py and verify_aggregated_params.py to use errno2count
- Improve code maintainability by using integer errno for sorting and comparison
- Rename verify_es_match_at_tolerance to compare_aggregated_es_and_microscopic_es
- Replace tolerance_level with tolerance parameter
- Replace tolerance_threshold with atol/rtol to avoid confusion
- Rename verify_aggregated_microscopic_consistency to get_verified_aggregated_es_values
- Change return type to dict only (remove all_matched)
- Rename verified_scores to verified_es_values
- Replace micro with microscopic throughout
- Rename check_sample_correctness to get_sample_correctness
- Rename t1 variables to first_errno_tolerance
- Rename es_components to es_constructor_params
- Rename calculate_parameters_for_tolerance to calculate_es_constructor_params_for_tolerance
- Rename custom_map to errno_tolerance_overrides
- Rename errno_as_tolerances to errno2tolerance
- Add enable_aggregation_mode command line option
- Modified plot_ES_results to return fig, ax, all_x_coords for external plotting
- Added manual plotting of aggregated ES(t) curves in main function
- Both microscopic and aggregated curves are plotted on the same graph
- Aggregated curves use dashed lines with square markers for distinction
- All verification checks pass with floating-point precision differences (1.39e-17)
- Move ax.legend() outside the aggregation mode condition block
- Ensure legend is always displayed regardless of aggregation mode
- Fix issue where legend was missing when aggregation mode is disabled
@Dayuxiaoshui
Copy link
Contributor Author

image

@lixinqi lixinqi merged commit d698d66 into PaddlePaddle:develop Nov 18, 2025
3 checks passed
roll-away pushed a commit to roll-away/GraphNet that referenced this pull request Nov 19, 2025
…lidation and refactor analysis utilities (PaddlePaddle#363)

* feat: implement ES(t) macro/micro cross-validation and refactor analysis utilities

This commit implements the Error-aware Speedup Score (ES_t) metric from
Section 3.2.2 of the technical report (arXiv:2510.24035), along with the
mathematical proofs from Appendix B and C that establish the sample-level
validity of both S_t and ES_t metrics.

Key Features:
=============

1. Appendix B Implementation - Sample-level proof for S_t:
   - Micro-level calculation: geometric mean of rectified speedups for all samples
   - Macro-level calculation: S_t = α^λ · β^(ληp) · b^(1-λ)
   - Cross-validation: both methods produce identical results, proving S_t
     is equivalent to the geometric mean of sample-level rectified speedups

2. Appendix C Implementation - Sample-level proof for ES_t:
   - Micro-level calculation: geometric mean of error-aware rectified speedups
   - Macro-level calculation: ES_t = α^λ · β^(ληp) · γ_t^(1-λ)
   - Dynamic penalty factor: γ_t = b^(sum(π_c * indicator(t < c)))
   - Cross-validation: validates that ES_t is the geometric mean of
     error-aware rectified speedups, where failure samples use type-specific
     dynamic penalties instead of fixed penalty b

3. Error-aware design (Section 3.2.2):
   - Error type classification: c=1 (accuracy), c=2 (runtime crash), c=3 (compile failure)
   - Tiered tolerance rules: t≥1 tolerates accuracy errors, t≥2 tolerates
     runtime crashes, t≥3 tolerates all errors
   - Dynamic penalty γ_t adapts based on error type distribution and tolerance level

4. Independent verification script:
   - verify_macro_params.py: calculates and prints all macro parameters
     (alpha, beta, gamma, lambda, eta, pi) independently
   - Enables validation of plot_ESt results by computing each parameter separately

5. Mandatory validation mechanism:
   - plot_ESt.py: enforces macro/micro result matching before adoption
   - Rejects results if validation fails, ensuring calculation correctness

6. Code refactoring for maintainability:
   - macro_statistics.py: dedicated module for macro parameter calculations
   - Each parameter has independent function (alpha, beta, gamma, lambda, eta, pi)
   - Reduced nesting levels in analysis_util.py by extracting helper functions
   - Simplified scan_all_folders and added .txt file support
   - Improved code organization following software engineering best practices

Technical Details:
==================

- Micro calculation: processes each sample individually, applies rectified
  speedup rules, then computes geometric mean
- Macro calculation: uses aggregated statistics (correct count, speedup
  distributions, error type proportions) to compute expected values
- Validation: compares micro and macro results with tolerance threshold (1e-6)
- All calculations verified against real benchmark data (118 samples)

Files Changed:
==============
- graph_net/analysis_util.py: refactored with helper functions, integrated
  macro_statistics module, reduced nesting, simplified scan_all_folders
- graph_net/macro_statistics.py: new module for macro parameter calculations
- graph_net/plot_ESt.py: added mandatory macro/micro validation
- graph_net/verify_macro_params.py: new independent verification script

All code passes pre-commit checks, compiles successfully, and has been
validated with real benchmark data.

* refactor: rename macro to aggregated and improve code quality

This commit refactors the evaluation metrics calculation code with the following improvements:

1. Terminology refactoring: macro -> aggregated
   - Rename macro_statistics.py to samples_statistics.py
   - Rename verify_macro_params.py to verify_aggregated_params.py
   - Update all variable and function names accordingly

2. Code structure improvements
   - Extract verification logic in plot_ESt.py into separate functions
     * compare_single_tolerance_level (12 lines)
     * print_verification_result (1 line)
     * verify_aggregated_micro_consistency (28 lines, meets ≤30 line requirement)
   - Refactor verify_aggregated_params.py to use functional programming style
     * Replace structured loops with list comprehensions
     * Use Counter for error type counting
     * Reduce multiple traversals to single pass where possible

3. Reduce function parameter coupling
   - calculate_beta: derive slowdown_speedups internally from correct_speedups
   - calculate_lambda: derive correct_count internally from correct_speedups
   - calculate_eta: derive statistics internally from correct_speedups

4. Decouple error type handling
   - calculate_pi: accept error_type_counts (dict) instead of hardcoded types
   - calculate_gamma: accept generic parameters (tolerance, get_pi, errno_tolerances)
   - Support user-defined error codes instead of hardcoded error types

5. Code quality improvements
   - Use explicit len() checks instead of implicit boolean conversion
   - Use modern Python type hints (list/tuple instead of typing.List/Tuple)
   - Improve code readability and maintainability

All changes have been verified and pass pre-commit checks.

* style: apply black formatting to samples_statistics.py and verify_aggregated_params.py

* refactor: unify error type to errno mapping for better sorting

- Replace error_type_counts (dict[str, int]) with errno2count (dict[int, int])
- Add get_errno_from_error_type() to map error type strings to errno (1, 2, 3)
- Add get_error_type_from_errno() for reverse mapping when error type strings are needed
- Update calculate_pi() to use errno2count and return dict[int, float]
- Update calculate_all_aggregated_parameters() to use errno2count and errno_tolerance_thresholds
- Update analysis_util.py and verify_aggregated_params.py to use errno2count
- Improve code maintainability by using integer errno for sorting and comparison

* refactor: split tolerance report generation

* refactor: improve naming and semantics for ES calculation

- Rename verify_es_match_at_tolerance to compare_aggregated_es_and_microscopic_es
- Replace tolerance_level with tolerance parameter
- Replace tolerance_threshold with atol/rtol to avoid confusion
- Rename verify_aggregated_microscopic_consistency to get_verified_aggregated_es_values
- Change return type to dict only (remove all_matched)
- Rename verified_scores to verified_es_values
- Replace micro with microscopic throughout
- Rename check_sample_correctness to get_sample_correctness
- Rename t1 variables to first_errno_tolerance
- Rename es_components to es_constructor_params
- Rename calculate_parameters_for_tolerance to calculate_es_constructor_params_for_tolerance
- Rename custom_map to errno_tolerance_overrides
- Rename errno_as_tolerances to errno2tolerance
- Add enable_aggregation_mode command line option

* feat: add aggregated ES(t) plotting and verification

- Modified plot_ES_results to return fig, ax, all_x_coords for external plotting
- Added manual plotting of aggregated ES(t) curves in main function
- Both microscopic and aggregated curves are plotted on the same graph
- Aggregated curves use dashed lines with square markers for distinction
- All verification checks pass with floating-point precision differences (1.39e-17)

* fix: move ax.legend outside aggregation condition block

- Move ax.legend() outside the aggregation mode condition block
- Ensure legend is always displayed regardless of aggregation mode
- Fix issue where legend was missing when aggregation mode is disabled
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants