Skip to content

Conversation

@cliffburdick
Copy link
Collaborator

Enhances MatX's observability and error handling
capabilities by adding extensive logging throughout the codebase and providing an option to disable exceptions.

Logging enhancements:

  • Added TRACE-level logging to all operator and generator constructors
    • Log operator name via str() method and relevant constructor parameters
    • Consolidated log.h include in base_operator.h to reduce duplication
  • Added DEBUG-level logging for cache operations
    • Log cache hits and misses in LookupAndExec with cache ID, device, and thread
    • Log transform-specific cache attempts with descriptive names (FFT, MatMul, SVD, QR, LU, Eigenvalue, Inverse, CUB, Einsum, Solve, Sparse conversions, Filter, Covariance)
  • Added DEBUG-level logging for kernel launches
    • Log kernel parameters in CUDA executor
  • Added DEBUG-level logging for memory operations
    • Log all tensor allocations and deallocations with pointer and size info
    • Log all make_tensor() calls with signature, shape, pointer, and memory kind
  • Converted all printf/fprintf calls in error.h to use MatX logger
    • Error messages now use MATX_LOG_ERROR/MATX_LOG_FATAL consistently
  • Changed default log level from OFF to ERROR
    • Ensures error messages are visible by default
    • Users can override via MATX_LOG_LEVEL environment variable

Exception handling improvements:

  • Added MATX_DISABLE_EXCEPTIONS CMake option
    • When enabled, MATX_THROW logs fatal error and calls abort() instead of throwing
    • Provides exception-free operation for environments that don't support them
    • All error handling macros automatically adapt to exception-disabled mode
  • Fixed macro parameter naming to avoid preprocessor conflicts

These changes enable detailed runtime diagnostics for debugging performance issues, cache behavior, and memory usage while maintaining zero overhead when logging is disabled.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 30, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link

greptile-apps bot commented Oct 30, 2025

Skipped: This PR changes more files than the configured file change limit: (143 files found, 100 file limit)

@cliffburdick
Copy link
Collaborator Author

/build

1 similar comment
@cliffburdick
Copy link
Collaborator Author

/build

Enhances MatX's observability and error handling
capabilities by adding extensive logging throughout the codebase and providing
an option to disable exceptions.

Logging enhancements:
- Added TRACE-level logging to all operator and generator constructors
  - Log operator name via str() method and relevant constructor parameters
  - Consolidated log.h include in base_operator.h to reduce duplication
- Added DEBUG-level logging for cache operations
  - Log cache hits and misses in LookupAndExec with cache ID, device, and thread
  - Log transform-specific cache attempts with descriptive names (FFT, MatMul,
    SVD, QR, LU, Eigenvalue, Inverse, CUB, Einsum, Solve, Sparse conversions,
    Filter, Covariance)
- Added DEBUG-level logging for kernel launches
  - Log kernel parameters in CUDA executor
- Added DEBUG-level logging for memory operations
  - Log all tensor allocations and deallocations with pointer and size info
  - Log all make_tensor() calls with signature, shape, pointer, and memory kind
- Converted all printf/fprintf calls in error.h to use MatX logger
  - Error messages now use MATX_LOG_ERROR/MATX_LOG_FATAL consistently
- Changed default log level from OFF to ERROR
  - Ensures error messages are visible by default
  - Users can override via MATX_LOG_LEVEL environment variable

Exception handling improvements:
- Added MATX_DISABLE_EXCEPTIONS CMake option
  - When enabled, MATX_THROW logs fatal error and calls abort() instead of throwing
  - Provides exception-free operation for environments that don't support them
  - All error handling macros automatically adapt to exception-disabled mode
- Fixed macro parameter naming to avoid preprocessor conflicts

These changes enable detailed runtime diagnostics for debugging performance
issues, cache behavior, and memory usage while maintaining zero overhead when
logging is disabled.
@cliffburdick
Copy link
Collaborator Author

/build

@cliffburdick
Copy link
Collaborator Author

/build

@cliffburdick cliffburdick merged commit 0f53ca5 into main Oct 31, 2025
1 check passed
@cliffburdick cliffburdick deleted the more_logging branch October 31, 2025 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants