# ACCL Communicators
ACCL send/receive primitives and collectives execute between a group of connected ranks called a communicator. The communicator is defined by the relevant properties of the ranks contained within it, most importantly the association between rank index and the Ethernet address of the rank. Multiple overlapping communicators can be defined over a group of ranks such that any one rank may be referred to by different indices in different communicators. 

In general, communicators allow the ACCL user to restrict their call to a specific subset of all available ranks. By default, ACCL will create and use a global communicator comprised of all ranks, but this can be split in user-defined ways, as we will see. Let's start an emulator session with 4 ACCL instances and explore this feature.

In [None]:
from pyaccl import accl

accl0 = accl(4, 0, sim_mode=True)
accl1 = accl(4, 1, sim_mode=True)
accl2 = accl(4, 2, sim_mode=True)
accl3 = accl(4, 3, sim_mode=True)

## The global communicator
ACCL maintains a list of communicator objects, with the global communicator always at index 0. The communicator contains static and dynamic fields, where the dynamic fields are updated by the FPGA logic without host intervention. We can inspect the global communicator by reading it back from the ACCL configuration memory in the FPGA, and printing it:

In [None]:
accl0.communicators[0].readback()
print(accl0.communicators[0])

Each entry in the communicator defines the properties of a remote rank from the perspective of the local rank:
* IP address and listening port on the remote rank
* ID of local TCP session connected to the remote IP and port, if applicable, otherwise zeros
* Maximum size of messages which we can send to the remote rank, in bytes
* Input and output sequence numbers, identifying how many messages we've receved from or sent to the remote rank respectively

Of these, the session IDs and sequence numbers are updated by FPGA logic, while ports and message sizes are static. We can dump the communicator again after sending and receiving a few messages to observe the updated sequence IDs

In [None]:
accl0.send(accl0.allocate((32,)), 32, 1, 0)
accl0.send(accl0.allocate((32,)), 32, 2, 0)
accl0.send(accl0.allocate((32,)), 32, 3, 0)
accl3.send(accl3.allocate((32,)), 32, 0, 0)

accl0.communicators[0].readback()
print(accl0.communicators[0])
accl1.communicators[0].readback()
print(accl1.communicators[0])
accl2.communicators[0].readback()
print(accl2.communicators[0])
accl3.communicators[0].readback()
print(accl3.communicators[0])

The `send` primitives cause the output sequences to increment on the ranks which perform the send. While the data has been received at its destination, the sequence numbers aren't updated until a corresponding `recv` call is executed.

In [None]:
accl1.recv(accl1.allocate((32,)), 32, 0, 0)
accl2.recv(accl2.allocate((32,)), 32, 0, 0)
accl3.recv(accl3.allocate((32,)), 32, 0, 0)
accl0.recv(accl0.allocate((32,)), 32, 3, 0)

accl0.communicators[0].readback()
print(accl0.communicators[0])
accl1.communicators[0].readback()
print(accl1.communicators[0])
accl2.communicators[0].readback()
print(accl2.communicators[0])
accl3.communicators[0].readback()
print(accl3.communicators[0])

## Splitting a communicator
We can split ranks off the global communicator to create a new communicator with new indices for the ranks. Note that for collectives and primitives to work with this new communicator, it must be created in identical way on all ranks which will be part of the new communicator. Let's split off ranks 1, 2, and 3 into a new communicator.

In [None]:
accl1.split_communicator([1,2,3])
accl2.split_communicator([1,2,3])
accl3.split_communicator([1,2,3])

accl1.communicators[0].readback()
print(accl1.communicators[0])
accl1.communicators[1].readback()
print(accl1.communicators[1])

After the split, we have a new communicator with new rank indices attached to the same remote rank signatures (IP, port, session ID). Sequence numbers are also reset. We can utilize this new communicator in a primitive or collective, by explicitly specifying the communicator index in the `comm` optional argument of the function. We'll illustrate this with a broadcast on communicator 1, rooted in rank 0. Note that rank 0 on communicator 1 is the same as rank 1 on the global communicator. The sequence numbers get incremented on communicator 1 instead of the global communicator.

In [None]:
accl1.bcast(accl1.allocate((32,)), 32, 0, comm_id=1)
accl2.bcast(accl2.allocate((32,)), 32, 0, comm_id=1)
accl3.bcast(accl3.allocate((32,)), 32, 0, comm_id=1)

accl1.communicators[1].readback()
print(accl1.communicators[1])
accl2.communicators[1].readback()
print(accl2.communicators[1])
accl3.communicators[1].readback()
print(accl3.communicators[1])

We can create any number of communicators (within the limits of the size of the configuration memory) by splitting up either the global communicator or a derived communicator.

## De-Initialize ACCL instances
The `deinit()` function clears all internal data structures in the ACCL instance.

In [None]:
accl0.deinit()
accl1.deinit()
accl2.deinit()
accl3.deinit()