New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Barriers Required for Distributed execution. #923
Comments
To add to the design needs. In the spirit of trying to be able to move towards concepts that will allow us to move towards a batch being executed as a job, we also need something that can tell if all the batches or jobs that correspond to a kernel are done. So if we have the most of the work done by an executor, which is on a separate thread(s), the kernel needs a way to know when its done, so that it can set the output cache to |
For keeping track of batches that are done, i think we should add some member functions and variables to the kernel interface. functions:
variables:
This would be used in something like this:
|
For the patterns of distributing data, we could have a messenger class that is part of a kernel that has a state and enables this sort of functionality. This class would have a couple of features: In its constructor, it would receive information about the kernel id, input id and output ids. Or anything it needs for the metadata header that is part of the messages. It would also get a pointer to the outgoing message cacheMachine. It would also get a list of all the nodes, or maybe the context, which has all that. It would have functions for common sending paradigms: It would also have a function for sending metadata to all the nodes, such as the partition counts to expect: It would also have a function for getting info from all nodes for how much it should expect to receive. Similar to what was stated as needed above:
|
@williamBlazing
we should just use an atomic here. theres no reason to block the mutex for
The linux select function comes to mind. https://www.tutorialspoint.com/unix_system_calls/_newselect.htm as does how go works with channels for its selects statement. Like here the logic we are using is kind of ugly. ExecutingThread says to some cache. hey I am going to wait for you to be done. I am goign to stop doipng things after until you are. Two things can happen here that are hard to deal with.
In this wait_for_count is being used in a way differently from what it was before. Remember this tells you how much to wait depending not only on what comes from yourself but also things that come from OTHER nodes. So you have to collect this count from all the other nodes. |
For For your suggestion of using the linux select function, I think we already have patterns in our code for waiting functions, I dont think we should introduce a new way of doing things. The function |
If we dont get the count, its because something went wrong in the job. THe job should throw and error and we should handle that error. |
As I mentioned before, this is for tracking batches run by a kernel. We will always know what the count is, because you increment the counter when you create the job or batch. |
What about a situation where wait_for_count doesn't just depend on things that came from my node. For example for my join to be ready to proceed past the partitionKernel it needs to wait for all of its batches to be processed but it ALSO must wait for all other nodes to have sent the batches that correspond to me over and that those have been added to the output_cache |
The wait_for_count that i am talking about above is for tracking batches run by a kernel. That is something that would be used for kernels that distribute and those that dont. Given that CacheMachine has a function that is similar with the same name, lets rename it to: For kernels that distribute, if you need to wait for all the data and messages you are expecting to receive, that would be accomplished by So in the case of waiting for all the batches of the kernel to finish AND also waiting for it to receive all the expected messages, it would look like something like this:
Note that this example is not necessarily for the join, because the join has two outputs, but the principle is the same. In a join kernel, you would basically have two messageManagers for each table |
could that be
What about MessageFactory ? it really is a factory method right? |
Absolutely
No, its not only a factory. It has sending and receiving functions and it has a state |
Maybe I am misunderstanding. I thought message managers purpose was to generate the Metadata class in CacheMachine.h. It sends by adding to the output cache and the messages it receives end up in the appropriate cache. In the case of messages that need to be retrieved during a kernels execution it goes into the general input cache where the message can be pulled by name. |
The class would be able to send and receive messages as you mention, but This requires it to also have a state about how many messages it has sent and all that. |
Right now Kernels handle distribution and ensuring completeness so that they continue when they have to communicate. Here is an example of what that looks like in aggregation.
Below we are iterating through batches that this kernel gets from its input cache, partitioning them and sending each node its corresponding partition. We store a count of how many partitions we sent to each node and how many we kept for ourselves.
After this code executes we send each node a count of how many partitions we sent them.
Then we collect all of the partition counts from each worker node. After this we sum them up and wait for our output cache to have that many partitions before we can say this kernel is finished.
We want to abstract away a few of the things that are happening here. We are often following this pattern of spreading data out and then theres a barrier to be able to continue. We want to remove this code from the kernel run function itself and have a more generic way of saying things like
As we discuss and implement the movement towards scheduling tasks to be run we need to have primitives that can do things like :
The text was updated successfully, but these errors were encountered: