New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce amount of scaning needed to find notifications #500
Comments
Workers should send out request for tablet scan asynchronously. This approach makes it likely that when a workers scans a tablet, its doing it on behalf of many other workers. If a worker issued a scan request to another worker and waited, then its possible that other tablets its interested in scanning could be scanned while its waiting. |
I have come up with another solution to this problem. The solution is to group workers into fixed sized groups and have each group scan a disjoint set of tablets. For example if there are 23 workers, a minimum group size of 7, and 100 tablets, then would create the following groups :
Each worker in group 1 would have a unique id within the group ranging from 0 to 7. A worker with id 5 in the group would scan all 34 of the groups tablets looking for notifications where This solution allows the cost of scanning for notifications to stay fixed as the number of workers grows. The current notification finding implementation in Fluo has a single group. So notification processing is very evenly spread among workers without having worry about collisions. However, the cost of every worker scanning every tablet does not scale well as the number of workers grows. |
How would tablets be assigned to groups? |
Good question, I have spent a good bit of time thinking about this. This is easy to do, IF all of the workers can agree on what the current set of tablets for a table is. However, the workers can possibly read different splits at different times. I was trying to figure out a fancy distributed way of all workers agreeing on the same set of split points for a table (for a time period), but could not think of anything. So I think putting the splits in zookeeper is a good option. I am currently thinking of taking a subset of table split points where the total size is less than 128K or 256K and putting that in zookeeper. Thinking of reading the splits and removing all odd splits while the total size is greater than 256K before storing in ZK. The worker with the lowest ID could manage the splits stored in ZK. All other workers can observe the splits node. Once the workers all agree on a set of split point, its smooth sailing. Using the info about workers in zookeeper, can decide how many groups there are. Then can shuffle the splits in a deterministic way and round robin assign to them to groups. This should lead to all workers making the same decisions about which splits are in a group. The shuffle+round robin will result in very even and random assignment of tablets to groups. I was thinking it would be nice to avoid assigning contiguous tablets to a group. I have used the term tablets and splits. In reality all of the worker just need to agree on some set of row ranges for the table that don't overlap and cover the table. It does not need to be tablet split points, that's just convenient. |
How does recovery happen if any given process is killed? |
All of the information used to partition workers, tablets, and notification comes from zookeeper. All of the workers watch this information in zookeeper. What I am currently doing in my branch is when any of the information changes then workers stop processing notifications until the information in zookeeper is stable for 60 seconds. |
In #282, the way Fluo finds notifications was switched to the following
This change was a vast improvement, but having each worker scan all notifications will only scale to a certain point.
A possible solution is to only have one worker scan each tablet for notifications. Workers would hash notifications and send them to other workers that are responsible for those notifications. Worker would only scan a tablet when other workers request it. Also they would only send notifications to workers who recently requested a scan. This avoids problems with scanning unnecessarily and one worker sending notifications to another worker whose queue is full.
This model is conceptually similar to the way map reduce distributes data in cluster. I think this is a really scalable way to find and distribute notifications in a cluster. Also I like sticking with hashing (vs the lock service described in the paper) because it evens out the work so well. When a worker receives notifications to process found by another worker, it should check this hash before queueing and before executing like was done in #458.
The text was updated successfully, but these errors were encountered: