New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propose design for aggregated cluster view service #266
Conversation
318ae2f
to
9c0e4dd
Compare
|
||
|
||
## Problem Statement | ||
We identified a couple of use cases for accessing cross datacenter information. [Ambry](https://github.com/linkedin/ambry) is one of them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you expand more on why Ambry needs this feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure (will also update design doc about it).
Ambry uses Helix spectator in both their router (for retrying get requests remotely if failed locally) and storage node (for data replication purpose). Given the amount of clients that need global information, it would be more cost-effective for them if aggregated information are provided locally.
|
||
To provide aggregated cluster view, the solution I'm proposing is to add a special type of cluster, i.e. **View Cluster**. | ||
View cluster leverages current Helix semantics to store aggregated information of various **Source Clusters**. | ||
There will be another micro service (Helix View Aggregator) running, fetching information from clusters (likely from other data centers) to be aggregated, and store then to the view cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why cant we just set up zookeeper observers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though setting up observer local to clients can potentially reduce cross data center traffic, but has a few draw backs:
- all data changes will be propagated immediately, and if such information is not required frequently, there will be wasted traffic. Building a service makes it possible to customize aggregation granularity
- Using zookeeper observer leaves aggregation logic to client - providing aggregated data will make it easier for user to consume
- Building a service will leave space to customize aggregated data in the future, i.e. if we want to aggregate idea state, we might not need to aggregate preference list, etc
Will add these points into design doc
9c0e4dd
to
1e07ec3
Compare
5ebe967
to
9d89e93
Compare
This PR adds a design doc for aggregated cluster view service.