Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 30, 2025

The test was using incorrect pointer offsets when calling iris.copy(), causing failures with certain cache modifier combinations (.cv-.cg).

Changes

  • Fixed destination pointer calculation: Changed results + BLOCK_SIZE * target_rank to results + BLOCK_SIZE * cur_rank (line 31)

    • iris.copy() expects both pointers in current rank's address space; dst_ptr is translated internally
  • Fixed test expectations: Updated to verify each rank writes to its own slot on all targets (lines 100-106)

    • After barrier, results[rank_id] should contain data from rank rank_id: (rank_id + num_ranks) * (rank_id + 1)
    • Aligns with PUT semantics in test_copy_put

The original code calculated destination offsets as if they were already in the target rank's address space, but iris.copy() performs this translation internally.

Original prompt

This section details on the original issue you should resolve

<issue_title>Fix cache modifiers unittest error</issue_title>
<issue_description>2025-10-29T23:28:32.5127431Z FAILED tests/unittests/test_copy_cache_modifiers.py::test_copy_cache_modifiers[.cv-.cg] - AssertionError: Mismatch at rank 1, target 0 with load_cache_modifier=.cv, store_cache_modifier=.cg
2025-10-29T23:28:32.5127462Z assert False
2025-10-29T23:28:32.5127681Z + where False = <built-in method allclose of type object at 0x7f6b69a8a040>(tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n device='cuda:1'), tensor([10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10.,\n 10., 10.], device='cuda:1'))
2025-10-29T23:28:32.5127770Z + where <built-in method allclose of type object at 0x7f6b69a8a040> = torch.allclose
2025-10-29T23:28:32.5127963Z + and tensor([10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10.,\n 10., 10.], device='cuda:1') = <built-in method full of type object at 0x7f6b69a8a040>((16,), 10, dtype=torch.float32, device=device(type='cuda', index=1))
2025-10-29T23:28:32.5128039Z + where <built-in method full of type object at 0x7f6b69a8a040> = torch.full
2025-10-29T23:28:32.5128083Z + and torch.float32 = torch.float32
2025-10-29T23:28:32.5128270Z + and device(type='cuda', index=1) = tensor([[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n 0., 0.],\n [10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10.,\n 10., 10.],\n [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n 0., 0.],\n [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n 0., 0.]], device='cuda:1').device
2025-10-29T23:28:32.5128512Z FAILED tests/unittests/test_copy_cache_modifiers.py::test_copy_cache_modifiers[.cv-.cs] - AssertionError: Mismatch at rank 1, target 0 with load_cache_modifier=.cv, store_cache_modifier=.cs
2025-10-29T23:28:32.5128543Z assert False
2025-10-29T23:28:32.5128759Z + where False = <built-in method allclose of type object at 0x7f6b69a8a040>(tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n device='cuda:1'), tensor([10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10.,\n 10., 10.], device='cuda:1'))
2025-10-29T23:28:32.5128873Z + where <built-in method allclose of type object at 0x7f6b69a8a040> = torch.allclose
2025-10-29T23</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix cache modifiers unittest error Fix pointer arithmetic in test_copy_cache_modifiers Oct 30, 2025
Copilot finished work on behalf of mawad-amd October 30, 2025 01:02
Copilot AI requested a review from mawad-amd October 30, 2025 01:02
@mawad-amd mawad-amd marked this pull request as ready for review October 30, 2025 01:20
@mawad-amd mawad-amd merged commit 88970ee into muhaawad/cache-modifiers Oct 30, 2025
@mawad-amd mawad-amd deleted the copilot/fix-cache-modifiers-unittest-error branch October 30, 2025 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants