Skip to content

Missed callbacks in CurrentStates based RoutingTableProvider. #457

@huizhilu

Description

@huizhilu

Support separated callback handling for RoutingTableProvider can deliver information in short time.

CURRENT_STATE is the source of truth for both router listener and helix (to do EV update), if it's the writes clogging ZK, then both helix EV update and router listener should suffer the same latency, or EV update should be worse because of an extra hop.
If there are too many participants reading ZK, that means the observer (router) clogged in ZK callback queue, then we should see different router have different refresh time.
For CurrentStates RoutingTableProvider, this is what happen in time:
Source of truth changes states, zookeeper notifies Helix currentstate based RoutingTableProvider, RoutingTableProvider reads instances change and instances config changes and current state changes from zookeeper, RoutingTableProvider calculates a snapshot and invokes callback of the Espresso logic with the snapshot

Solution:

  1. update BasicClusterDataCache to do refresh with selective update. Only when a change happens, we do the cache refresh only for that change type (ex. instance config change). So we don’t have to do full refresh for each type change and this improves read performance.
  2. improve RoutingTableProvider.queueEvent() and RoutingTableProvider.handleEvent(). Before the change, instanceConfigs may be clogged by currentStates refresh and so callback is waiting long time for the snapshot. After the change, instanceConfigs snapshot will be returned to callback immediately, instead of waiting for currentStates completion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions