Ensure Consistency among Data Accesses in the same Thread Block #551

caiomcbr · 2025-06-17T23:23:29Z

The PR ensures that if two or more operations impact the same memory location, it will insert a mechanism to synchronize the thread block between theses operations, preventing any errors. For instance:

copy chunk 0 -> chunk 1
put chunk 1 -> chunk 2

Here we can see the copy operation and put operation have chunk 1 in common for writing and reading, respectively. In this case, I need to make sure the copy operation ends before starting the put operation. Therefore, we should have a synchronization mechanism between them to avoid any conflicts.

In this PR, we will insert synchronization mechanisms between the operations under the following conditions:

Operation using Write Data Access followed by Operation using Write Data Access to the same memory location
Operation using Read Data Access followed by Operation using Write Data Access to the same memory location
Operation using Write Data Access followed by Operation using Read Data Access to the same memory location

Copilot

Pull Request Overview

This PR implements a conflict-detection pass to insert synchronization operations between thread‐block operations that read/write the same buffer intervals. It adds data‐access tracking (DataAccess), a buffering‐access analyzer (BuffersAccess), and hooks into the existing pipeline to resolve dependencies after instruction fusion.

Introduced DataAccessType and DataAccess in dsl_types and built buffer_access.py to detect interval conflicts
Extended every operation with unique IDs and local_data_access methods to annotate reads/writes
Added resolve_data_dependency through ThreadBlock, Gpu, and MSCCLPPProgram.post_process_operations

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
python/mscclpp/language/internal/dsl_types.py	Added `DataAccessType`, `DataAccess`, reordered `SyncType`
python/mscclpp/language/internal/buffer_access.py	New `BuffersAccess` class to scan and sync operations
python/mscclpp/language/internal/operations.py	Gave operations UUIDs and implemented `local_data_access`
python/mscclpp/language/internal/threadblock.py	Imported buffer_access and added `resolve_data_dependency`
python/mscclpp/language/internal/gpu.py	Propagated `resolve_data_dependency` to all threadblocks
python/mscclpp/language/program.py	Called `resolve_data_dependency` after `adding_data_sync`
python/mscclpp/language/internal/optmizer.py	Included `relaxed_signal`/`relaxed_wait` in sync filter
python/mscclpp/language/tests/*.py	Updated tests for new sync/DSL types and revised logic
python/mscclpp/language/collectives.py & channel.py	Migrated imports to `dsl_types`

Comments suppressed due to low confidence (1)

python/mscclpp/language/tests/allreduce.py:13

[nitpick] The variable num_tb is ambiguous—does it represent the chunk factor, number of thread blocks, or something else? Consider renaming to a more descriptive identifier like chunk_factor or num_thread_blocks.

    num_tb = 8

Copilot · 2025-06-20T17:13:10Z

python/mscclpp/language/internal/operations.py

+        for chunk in self.src_buff:
+            data_access.append(
+                DataAccess(self.id, chunk.index, chunk.index + chunk.size - 1, chunk.type, DataAccessType.read)
+            )
+


In PutOperation.local_data_access, only read accesses (src_buff) are recorded. You should also append write accesses for each dst_buff chunk to correctly detect write conflicts.

Suggested change

for chunk in self.src_buff:

data_access.append(

DataAccess(self.id, chunk.index, chunk.index + chunk.size - 1, chunk.type, DataAccessType.read)

)

# Record read accesses for src_buff

for chunk in self.src_buff:

data_access.append(

DataAccess(self.id, chunk.index, chunk.index + chunk.size - 1, chunk.type, DataAccessType.read)

)

# Record write accesses for dst_buff

for chunk in self.dst_buff:

data_access.append(

DataAccess(self.id, chunk.index, chunk.index + chunk.size - 1, chunk.type, DataAccessType.write)

)

return data_access

Copilot · 2025-06-20T17:13:11Z

python/mscclpp/language/internal/dsl_types.py

+        return list(map_num.keys())[map_num[self] | map_num[other]]
+


The __or__ implementation may index past the end of map_num.keys() and is not a reliable reverse mapping. Consider using an explicit mapping from the bitwise OR result back to the correct DataAccessType enum member.

Suggested change

return list(map_num.keys())[map_num[self] | map_num[other]]

reverse_map = {1: DataAccessType.read, 2: DataAccessType.write, 3: DataAccessType.both}

result = map_num[self] | map_num[other]

return reverse_map.get(result, NotImplemented)

Copilot · 2025-06-20T17:13:11Z

python/mscclpp/language/tests/allreduce_pkt.py

+
+allreduce_example(args.name, args.num_gpus, args.num_threads_per_block, args.min_message_size, args.max_message_size)
+
+""" allreduce_example("allreduce", 4, 1024, 0, 2**64 - 1) """


[nitpick] This triple-quoted invocation at the end isn’t executed and can be confusing; consider removing it or converting it to a proper comment or unit test.

Suggested change

""" allreduce_example("allreduce", 4, 1024, 0, 2**64 - 1) """

# Example usage: allreduce_example("allreduce", 4, 1024, 0, 2**64 - 1)

Binyang2014 · 2025-06-20T17:08:50Z

python/mscclpp/language/internal/dsl_types.py

So we can not use types as the module name

Yes, I think so, actually I try different approaches, but I couldn't keep the same name.

Binyang2014 · 2025-06-20T17:13:07Z

python/mscclpp/language/internal/buffer_access.py

@@ -0,0 +1,82 @@
+from sortedcontainers import SortedDict


We need to put this in our reuqirements.txt?

Another concern is the latest sortedcontainers release is 2021, seems no updates since then

Interesting. If you want I can try to find another library or code the struct by myself as well.

caiomcbr added 2 commits June 17, 2025 23:06

wip

11bfd0f

wip

cb51bee

caiomcbr requested a review from Binyang2014 June 17, 2025 23:23

wip

d0d1210

Binyang2014 requested a review from Copilot June 20, 2025 17:11

Copilot AI reviewed Jun 20, 2025

View reviewed changes

Binyang2014 reviewed Jun 20, 2025

View reviewed changes

caiomcbr added 3 commits June 20, 2025 19:31

wip

4d6e173

wip

c2bf3e6

wip

877ac45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ensure Consistency among Data Accesses in the same Thread Block #551

Ensure Consistency among Data Accesses in the same Thread Block #551

Uh oh!

caiomcbr commented Jun 17, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jun 20, 2025

Uh oh!

Copilot AI Jun 20, 2025

Uh oh!

Copilot AI Jun 20, 2025

Uh oh!

Binyang2014 Jun 20, 2025

Uh oh!

caiomcbr Jun 20, 2025

Uh oh!

Binyang2014 Jun 20, 2025

Uh oh!

Binyang2014 Jun 20, 2025

Uh oh!

caiomcbr Jun 20, 2025

Uh oh!

Uh oh!

-        return list(map_num.keys())[map_num[self] | map_num[other]]
+        reverse_map = {1: DataAccessType.read, 2: DataAccessType.write, 3: DataAccessType.both}
+        result = map_num[self] | map_num[other]
+        return reverse_map.get(result, NotImplemented)


		allreduce_example(args.name, args.num_gpus, args.num_threads_per_block, args.min_message_size, args.max_message_size)

		""" allreduce_example("allreduce", 4, 1024, 0, 2**64 - 1) """

	""" allreduce_example("allreduce", 4, 1024, 0, 2**64 - 1) """
	# Example usage: allreduce_example("allreduce", 4, 1024, 0, 2**64 - 1)

Ensure Consistency among Data Accesses in the same Thread Block #551

Are you sure you want to change the base?

Ensure Consistency among Data Accesses in the same Thread Block #551

Uh oh!

Conversation

caiomcbr commented Jun 17, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Binyang2014 Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

caiomcbr Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Binyang2014 Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Binyang2014 Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

caiomcbr Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!