Skip to content

Add comprehensive logging with IRIS_LOG_LEVEL support#522

Merged
mawad-amd merged 4 commits into
mainfrom
muhaawad/iris-logging
Apr 28, 2026
Merged

Add comprehensive logging with IRIS_LOG_LEVEL support#522
mawad-amd merged 4 commits into
mainfrom
muhaawad/iris-logging

Conversation

@mawad-amd
Copy link
Copy Markdown
Collaborator

Summary

  • Add IRIS_LOG_LEVEL env var (DEBUG/INFO/WARNING/ERROR) to control iris log verbosity
  • Add _log_rank() helper for rank-aware logging with lazy formatting
  • Instrument 17 files across CCL ops, symmetric heap, allocators, distributed helpers, kernel launches, and fused ops with two-tier logging (DEBUG for tracing, INFO for lifecycle events)
  • Show [module] bracket only for internal iris logs, not user-facing ctx.info() calls

At default INFO level, only 4-5 lifecycle lines appear during init. At DEBUG, full tracing of every CCL call, kernel launch, allocation, and barrier.

Test plan

  • ruff check iris/ && ruff format --check iris/
  • IRIS_LOG_LEVEL=DEBUG torchrun --nproc_per_node=4 tests/run_tests_distributed.py tests/ccl/test_all_reduce.py -v — verify debug output with [Iris] prefix
  • Default INFO level — no visible logging noise, no test regressions

🤖 Generated with Claude Code

mawad-amd and others added 3 commits April 26, 2026 22:23
Instrument 17 files with ~30 log lines covering CCL ops, symmetric heap,
allocators, distributed helpers, HIP platform, kernel launches, and fused
ops. Two-tier logging: DEBUG for entry-point tracing, INFO for lifecycle
events (init, peer access, IPC). Enhanced log format with timestamp,
level, module name, and rank. Add IRIS_LOG_LEVEL env var to control level
at import time.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
_log_rank() now captures the caller's filename via sys._getframe so the
formatter can show [module]. User-facing ctx.info()/ctx.debug() calls go
through _log_with_rank(pathname="") so the module bracket is omitted.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The distributed_allgather, distributed_broadcast_tensor, and
distributed_barrier functions were calling _log_rank without rank=
and num_ranks= kwargs, causing log lines to show ?/? instead of
the actual rank info.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 27, 2026 05:24
@mawad-amd mawad-amd requested review from BKP and neoblizz as code owners April 27, 2026 05:24
@github-actions github-actions Bot added in-progress We are working on it iris Iris project issue labels Apr 27, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds rank-aware, env-configurable logging across Iris distributed ops and runtime to improve debuggability while keeping default output low-noise.

Changes:

  • Introduces IRIS_LOG_LEVEL env override and _log_rank() helper for rank-aware internal logging.
  • Updates IrisFormatter to include timestamps/levels/rank and module information.
  • Instruments CCL ops, matmul collectives, allocators/symmetric heap, kernel launch tracing, and distributed helpers with DEBUG/INFO logs.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
iris/host/logging/logging.py Adds IRIS_LOG_LEVEL, expands formatter output, and introduces _log_rank() helper.
iris/host/tracing/kernel_artifacts.py Adds DEBUG tracing for kernel launches via _log_rank().
iris/host/platform/hip.py Logs HIP/CUDA errors and DMA-BUF export at ERROR/DEBUG.
iris/host/memory/symmetric_heap.py Adds INFO/DEBUG lifecycle tracing for heap init, allocations, and peer refresh.
iris/host/memory/allocators/vmem_allocator.py Adds INFO init and DEBUG allocation tracing.
iris/host/memory/allocators/torch_allocator.py Adds INFO init, DEBUG allocation tracing, and ERROR OOM log.
iris/host/iris.py Adds INFO init log and DEBUG barrier trace.
iris/host/distributed/helpers.py Adds DEBUG tracing for allgather/broadcast/barrier helpers.
iris/host/distributed/fd_passing.py Adds DEBUG tracing for FD infrastructure setup.
iris/ccl/all_reduce.py Adds DEBUG tracing for all-reduce preamble and main call.
iris/ccl/all_gather.py Adds DEBUG tracing for all-gather entry.
iris/ccl/all_to_all.py Adds DEBUG tracing for all-to-all entry.
iris/ccl/reduce_scatter.py Adds DEBUG tracing for reduce-scatter entry.
iris/ops/matmul_all_reduce.py Adds DEBUG tracing for matmul all-reduce entry.
iris/ops/matmul_all_gather.py Adds DEBUG tracing for matmul all-gather entry.
iris/ops/matmul_reduce_scatter.py Adds DEBUG tracing for matmul reduce-scatter entry.
iris/ops/all_gather_matmul.py Adds DEBUG tracing for fused all-gather matmul entry.

Comment thread iris/host/logging/logging.py
Comment thread iris/host/logging/logging.py
Comment thread iris/host/memory/symmetric_heap.py Outdated
Comment thread iris/host/memory/symmetric_heap.py
Comment thread iris/host/iris.py Outdated
- Only show [module] for internal _log_rank() calls via iris_internal flag,
  not for user-facing ctx.info()/ctx.debug() logs
- Only set iris_num_ranks when num_ranks is not None (avoids "None" in output)
- Guard eager .item() calls behind logger.isEnabledFor(DEBUG)
- Lower refresh_peer_access() logs from INFO to DEBUG (too noisy)
- Guard f-string log in iris.py init behind isEnabledFor(INFO)
- Update test assertions to match new timestamp+level format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mawad-amd mawad-amd merged commit c196eef into main Apr 28, 2026
74 of 76 checks passed
@mawad-amd mawad-amd deleted the muhaawad/iris-logging branch April 28, 2026 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

in-progress We are working on it iris Iris project issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants