Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose design for aggregated cluster view service #266

Closed
wants to merge 1 commit into from

Conversation

zhan849
Copy link
Contributor

@zhan849 zhan849 commented Aug 21, 2018

This PR adds a design doc for aggregated cluster view service.

@zhan849 zhan849 force-pushed the harry/view-aggregator-design branch 2 times, most recently from 318ae2f to 9c0e4dd Compare August 21, 2018 18:10


## Problem Statement
We identified a couple of use cases for accessing cross datacenter information. [Ambry](https://github.com/linkedin/ambry) is one of them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you expand more on why Ambry needs this feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure (will also update design doc about it).

Ambry uses Helix spectator in both their router (for retrying get requests remotely if failed locally) and storage node (for data replication purpose). Given the amount of clients that need global information, it would be more cost-effective for them if aggregated information are provided locally.


To provide aggregated cluster view, the solution I'm proposing is to add a special type of cluster, i.e. **View Cluster**.
View cluster leverages current Helix semantics to store aggregated information of various **Source Clusters**.
There will be another micro service (Helix View Aggregator) running, fetching information from clusters (likely from other data centers) to be aggregated, and store then to the view cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why cant we just set up zookeeper observers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though setting up observer local to clients can potentially reduce cross data center traffic, but has a few draw backs:

  1. all data changes will be propagated immediately, and if such information is not required frequently, there will be wasted traffic. Building a service makes it possible to customize aggregation granularity
  2. Using zookeeper observer leaves aggregation logic to client - providing aggregated data will make it easier for user to consume
  3. Building a service will leave space to customize aggregated data in the future, i.e. if we want to aggregate idea state, we might not need to aggregate preference list, etc

Will add these points into design doc

@zhan849 zhan849 force-pushed the harry/view-aggregator-design branch from 9c0e4dd to 1e07ec3 Compare November 2, 2018 22:38
@asfgit asfgit force-pushed the master branch 2 times, most recently from 5ebe967 to 9d89e93 Compare November 16, 2018 23:57
@junkaixue junkaixue closed this Jul 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants