-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregate metrics from other discovery nodes #1266
Conversation
0302223
to
9c8427a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just did a first pass, will review again in more depth on monday. This is so cool.
5c7c9a3
to
7a602c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall structure looks solid. just had a few questions and comments
|
||
|
||
def upgrade(): | ||
op.create_table('daily_unique_users_metrics', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should try to distinguish the old per node tables from the new all discovery node tables. if "aggregate" is the wording we're using for all the discovery nodes together, i'd suggest putting that in the table names and stuff too. cause otherwise i'm sure we'll get confused between app_name_metrics and daily_app_name_metrics. conversely we could rename the existing app_name_metrics and route_metrics as app_name_metrics_single_node or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100%
start_time = int(start_time_obj.timestamp()) | ||
new_route_metrics, new_app_metrics = get_metrics(node, start_time) | ||
|
||
logger.info(f"received route metrics: {new_route_metrics}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will probably be a massive object to print, do we want to output this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point
num_discovery_providers = sp_factory_inst.functions.getTotalServiceTypeProviders(discovery_node_service_type).call() | ||
logger.info(f"number of discovery providers: {num_discovery_providers}") | ||
service_infos = [sp_factory_inst.functions.getServiceEndpointInfo(discovery_node_service_type, i).call() \ | ||
for i in range(1, num_discovery_providers + 1)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just an optimization, don't need to add this, but if performance is/was an issue we can use threadpoolexecutors to parallelize this like https://github.com/AudiusProject/audius-protocol/blob/master/discovery-provider/src/tasks/index_network_peers.py#L66-L85
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think doing this sync is probably better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it!
update_app_metrics_count(monthly_app_metrics, historical_metrics['apps']['monthly']) | ||
|
||
logger.info("synchronizing historical metrics") | ||
logger.info(f"daily historical route metrics to update: {daily_route_metrics}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing about printing large objects here, do we want this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these shouldn't be super large though, at least for the route metrics. daily should be about 30 records, each with a unique and total count; monthly would be (12*num_total_years) records. for apps it's harder to predict the number because the number of apps used for a given day/month could be different
but we can surely pull it for now until we see a real need for it
0223483
to
0ae8036
Compare
0ae8036
to
dc8a229
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @sddioulde! I love how all of this is net new too
all_other_nodes = [] | ||
|
||
# fetch all discovery nodes info in parallel | ||
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice 😃
mergeeeeeeeeeeeeeeeeeeeee!!!!!! |
Merged :) |
Description
We want each discovery node to give us aggregated metrics across all discovery nodes, so that obtaining the metrics does not depend on all nodes being up.
Tests
... also working on adding unit tests
Clients for the metrics data (e.g. the dashboard) will need to update the endpoints they make requests to in order to use the new aggregated metrics.