Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reporting followups #1020

Merged
merged 22 commits into from May 13, 2022
Merged

Reporting followups #1020

merged 22 commits into from May 13, 2022

Conversation

BrynCooke
Copy link
Contributor

@BrynCooke BrynCooke commented May 11, 2022

This is WIP. I still need to add executable_schema_id, this will be the most difficult part.

Spaceport now just deals with reports and forwards them on. There is no aggregation taking place in spaceport.
The telemetry plugin now composes the complete report for studio.
Metrics are aggregated on Key which is now an enum of excluded or regular. Excluded has no information but allows us to get operation_count.
Report header is introduced so that we can eventually populate it with all the right info.

@glasser Be good if you could take a look to see if this going in the right direction.

@netlify
Copy link

netlify bot commented May 11, 2022

Deploy Preview for apollo-router-docs canceled.

Name Link
🔨 Latest commit 7f3ffaf
🔍 Latest deploy log https://app.netlify.com/sites/apollo-router-docs/deploys/627d2f0e4aa6230009ac8ec5

@github-actions
Copy link
Contributor

@BrynCooke your pull request is missing a changelog!

Spaceport now just deals with reports and forwards them on. There is no aggregation taking place in spaceport.
The telemetry plugin now composes the complete report for studio.
Metrics are aggregated on Key which is now an enum of excluded or regular. Excluded has no information.
Report header is introduced so that we can eventually populate it with all the right info.
@BrynCooke BrynCooke changed the title Bryn/reporting followups Reporting followups May 11, 2022
@BrynCooke BrynCooke linked an issue May 11, 2022 that may be closed by this pull request
Copy link
Member

@glasser glasser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking nice...

apollo-router/src/plugins/telemetry/metrics/apollo.rs Outdated Show resolved Hide resolved
apollo-router/src/plugins/telemetry/metrics/apollo.rs Outdated Show resolved Hide resolved
apollo-router/src/plugins/telemetry/metrics/apollo.rs Outdated Show resolved Hide resolved
apollo-router/src/plugins/telemetry/metrics/apollo.rs Outdated Show resolved Hide resolved
apollo-router/src/plugins/telemetry/metrics/apollo.rs Outdated Show resolved Hide resolved
apollo-router/src/plugins/telemetry/metrics/apollo.rs Outdated Show resolved Hide resolved
}
}
} else {
// Usage reporting was missing, so it counts as one operation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure i understand where this ends up but 🤷

Copy link
Contributor Author

@BrynCooke BrynCooke May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A malicious plugin could remove the usage reporting from context in order to try avoid op count increasing.

Copy link
Contributor

@garypen garypen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks cleaner than the old code. Nice work!

Comment on lines 765 to 773
#[cfg(windows)]
assert_eq!(
schema.schema_id,
Some("C081A7FF570F630BDF22CB6CA73A2BB1D46657EF69FC66749F5A1F99332B5ECB".to_string())
);
#[cfg(not(windows))]
assert_eq!(
schema.schema_id,
Some("6881633761291F4A6E4F9A4A6BBE7AD69A80C63EBEBFC357464B7A6EF276EF0A".to_string())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: sha256 and windows vs not windows. I guess this is only likely to be an issue for local schemas. I mean if you get it from uplink (majority of prod use cases I guess) then the source text should match exactly…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zionts executable_schema_id sha is different on windows currently because of line endings. Would you prefer that this is normalized?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the line ending change get introduced? Is it as part of checking out the file with git, or reading it in with a Rust API? Seems like there must be a way to just not do that? Like, this isn't actually something that happens if you use Router with Uplink, just related to "there's a graphql file you're raeding from disk", right?

The normal Gateway/Uplink use case (and presumably the normal Router use case?) doesn't involve ever reading/writing the file from disk or doing anything that would do newline conversion...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Currently it is possible to use a local schema file in the router and also upload stats. Is this something we should disable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm sure it's fine, i'm just saying that i wouldn't recommend any normalization that takes the string returned from uplink and normalizes it before hashing it.

apollo-router/src/plugins/telemetry/apollo.rs Outdated Show resolved Hide resolved
apollo-router/src/plugins/telemetry/apollo.rs Outdated Show resolved Hide resolved
apollo-router/src/plugins/telemetry/metrics/apollo.rs Outdated Show resolved Hide resolved
apollo-router/src/plugins/telemetry/metrics/apollo.rs Outdated Show resolved Hide resolved
apollo-router/src/plugins/telemetry/mod.rs Outdated Show resolved Hide resolved
apollo-router/src/router_factory.rs Outdated Show resolved Hide resolved
apollo-spaceport/src/server.rs Show resolved Hide resolved
Copy link
Member

@glasser glasser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great (except for the SHA case thing).

The one thing I'm still somewhat confused about is why you have the two separate sets of Report and AggregatedReport types. Is there a reason you can't just use an AggregatedReport for everything? It seems like the only thing you can do with a Report that you can't do with an AggregatedReport is read back the operation's latency as a precise Duration (whereas it's bucketed when you put it in AggregatedReport). But... is that an operation you ever do? Seems like you could cut the types in half.

(And if you do want to keep both types, it would seem a bit easier to understand if the types you currently called AggregatedX were called X and the types you currently call X were called SingleOperationX or something — since that would align the types more directly to the protobuf names.)

That said there's nothing buggy with having the two separate sets of types — it just strikes me as unnecessary.


#[derive(Default, Debug, Serialize)]
pub(crate) struct AggregatedContextualizedStats {
context: StatsContext,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this struct live in a HashMap<StatsContext, AggregatedContextualizedStats> already? Seems like this field is redundant? Note that in protobufs, map keys can only be strings or integers, which is we we have a repeated ContextualizedStats with a context field on it instead of a map. But you're already using a map here (since rust apparently supports more general map keys) so this seems like it should be redundant?

I guess you'd need to change the definition of

impl From<AggregatedTracesAndStats> for apollo_spaceport::TracesAndStats {

to create stats_with_context from map entries rather than values.

Copy link
Contributor Author

@BrynCooke BrynCooke May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use only one set of types we would need to put the aggregated report in a mutex and swap it out on each send. This is an option that we could have used, but maybe it is safer not to do that in case of contention. We can revisit another time.

Let's do any renaming at the end once we are sure everything is working.

Copy link
Contributor Author

@BrynCooke BrynCooke May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this struct live in a HashMap<StatsContext, AggregatedContextualizedStats> already? Seems like this field is redundant? Note that in protobufs, map keys can only be strings or integers, which is we we have a repeated ContextualizedStats with a context field on it instead of a map. But you're already using a map here (since rust apparently supports more general map keys) so this seems like it should be redundant?

I guess you'd need to change the definition of

impl From<AggregatedTracesAndStats> for apollo_spaceport::TracesAndStats {

to create stats_with_context from map entries rather than values.

This won't work if I want to use:

impl From<AggregatedContextualizedStats> for apollo_spaceport::ContextualizedStats

as From does not have access to the map entry that it was stored in.

For consistency with the rest of the conversion logic lets keep it as is for now and look again if it is a perf problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renames in 9d71c25

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to trust you on this one! Maybe add a comment noting the redundancy if you think that's helpful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re

To use only one set of types we would need to put the aggregated report in a mutex and swap it out on each send. This is an option that we could have used, but maybe it is safer not to do that in case of contention. We can revisit another time.

I don't think I understand this one — it seems like the AggregatedReport type can be used anywhere the Report type can be used, except if you want to be able to put a duration into it and pull it out again without rounding. But my feedback is merely about clarity/simplicity/code size, and I very well may be missing something important. No need to convince me of this, happy to trust you're making the right call for today and quite possibly for tomorrow as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'd really like is that the generated rs files for the protobuf somehow can use the Histogram type. Didn't get time to investigate this though.

I think in the long run we'll make the changes to the datastructure to convert many of the u64s to bools as you suggested earlier, so maybe things will diverge in the long run.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in JS I ended up trying to add a feature to to protobuf library that let you use other representations in-memory for repeated fields that would get converted to arrays at encoding time (and then forking the library when it was clearly unmaintained)

@BrynCooke BrynCooke requested a review from garypen May 12, 2022 15:56
Copy link
Contributor

@garypen garypen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to assess the impact of removing report aggregation from spaceport. Hard to determine that from code review.

@BrynCooke BrynCooke merged commit 53e49e5 into main May 13, 2022
@BrynCooke BrynCooke deleted the bryn/reporting-followups branch May 13, 2022 06:45
@BrynCooke BrynCooke self-assigned this May 13, 2022
@Geal Geal added this to the v0.9.0 milestone May 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Studio reporting followups
4 participants