New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reporting followups #1020
Reporting followups #1020
Conversation
✅ Deploy Preview for apollo-router-docs canceled.
|
@BrynCooke your pull request is missing a changelog! |
cfa89a5
to
d5b8aec
Compare
Spaceport now just deals with reports and forwards them on. There is no aggregation taking place in spaceport. The telemetry plugin now composes the complete report for studio. Metrics are aggregated on Key which is now an enum of excluded or regular. Excluded has no information. Report header is introduced so that we can eventually populate it with all the right info.
d5b8aec
to
48c4d9c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking nice...
} | ||
} | ||
} else { | ||
// Usage reporting was missing, so it counts as one operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure i understand where this ends up but 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A malicious plugin could remove the usage reporting from context in order to try avoid op count increasing.
5361cfd
to
9aedc1d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks cleaner than the old code. Nice work!
#[cfg(windows)] | ||
assert_eq!( | ||
schema.schema_id, | ||
Some("C081A7FF570F630BDF22CB6CA73A2BB1D46657EF69FC66749F5A1F99332B5ECB".to_string()) | ||
); | ||
#[cfg(not(windows))] | ||
assert_eq!( | ||
schema.schema_id, | ||
Some("6881633761291F4A6E4F9A4A6BBE7AD69A80C63EBEBFC357464B7A6EF276EF0A".to_string()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: sha256 and windows vs not windows. I guess this is only likely to be an issue for local schemas. I mean if you get it from uplink (majority of prod use cases I guess) then the source text should match exactly…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zionts executable_schema_id
sha is different on windows currently because of line endings. Would you prefer that this is normalized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the line ending change get introduced? Is it as part of checking out the file with git, or reading it in with a Rust API? Seems like there must be a way to just not do that? Like, this isn't actually something that happens if you use Router with Uplink, just related to "there's a graphql file you're raeding from disk", right?
The normal Gateway/Uplink use case (and presumably the normal Router use case?) doesn't involve ever reading/writing the file from disk or doing anything that would do newline conversion...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Currently it is possible to use a local schema file in the router and also upload stats. Is this something we should disable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm sure it's fine, i'm just saying that i wouldn't recommend any normalization that takes the string returned from uplink and normalizes it before hashing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great (except for the SHA case thing).
The one thing I'm still somewhat confused about is why you have the two separate sets of Report and AggregatedReport types. Is there a reason you can't just use an AggregatedReport for everything? It seems like the only thing you can do with a Report that you can't do with an AggregatedReport is read back the operation's latency as a precise Duration (whereas it's bucketed when you put it in AggregatedReport). But... is that an operation you ever do? Seems like you could cut the types in half.
(And if you do want to keep both types, it would seem a bit easier to understand if the types you currently called AggregatedX were called X and the types you currently call X were called SingleOperationX or something — since that would align the types more directly to the protobuf names.)
That said there's nothing buggy with having the two separate sets of types — it just strikes me as unnecessary.
|
||
#[derive(Default, Debug, Serialize)] | ||
pub(crate) struct AggregatedContextualizedStats { | ||
context: StatsContext, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this struct live in a HashMap<StatsContext, AggregatedContextualizedStats>
already? Seems like this field is redundant? Note that in protobufs, map keys can only be strings or integers, which is we we have a repeated ContextualizedStats
with a context field on it instead of a map. But you're already using a map here (since rust apparently supports more general map keys) so this seems like it should be redundant?
I guess you'd need to change the definition of
impl From<AggregatedTracesAndStats> for apollo_spaceport::TracesAndStats {
to create stats_with_context from map entries rather than values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To use only one set of types we would need to put the aggregated report in a mutex and swap it out on each send. This is an option that we could have used, but maybe it is safer not to do that in case of contention. We can revisit another time.
Let's do any renaming at the end once we are sure everything is working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this struct live in a
HashMap<StatsContext, AggregatedContextualizedStats>
already? Seems like this field is redundant? Note that in protobufs, map keys can only be strings or integers, which is we we have arepeated ContextualizedStats
with a context field on it instead of a map. But you're already using a map here (since rust apparently supports more general map keys) so this seems like it should be redundant?I guess you'd need to change the definition of
impl From<AggregatedTracesAndStats> for apollo_spaceport::TracesAndStats {
to create stats_with_context from map entries rather than values.
This won't work if I want to use:
impl From<AggregatedContextualizedStats> for apollo_spaceport::ContextualizedStats
as From does not have access to the map entry that it was stored in.
For consistency with the rest of the conversion logic lets keep it as is for now and look again if it is a perf problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renames in 9d71c25
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to trust you on this one! Maybe add a comment noting the redundancy if you think that's helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re
To use only one set of types we would need to put the aggregated report in a mutex and swap it out on each send. This is an option that we could have used, but maybe it is safer not to do that in case of contention. We can revisit another time.
I don't think I understand this one — it seems like the AggregatedReport type can be used anywhere the Report type can be used, except if you want to be able to put a duration into it and pull it out again without rounding. But my feedback is merely about clarity/simplicity/code size, and I very well may be missing something important. No need to convince me of this, happy to trust you're making the right call for today and quite possibly for tomorrow as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'd really like is that the generated rs files for the protobuf somehow can use the Histogram type. Didn't get time to investigate this though.
I think in the long run we'll make the changes to the datastructure to convert many of the u64s to bools as you suggested earlier, so maybe things will diverge in the long run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in JS I ended up trying to add a feature to to protobuf library that let you use other representations in-memory for repeated fields that would get converted to arrays at encoding time (and then forking the library when it was clearly unmaintained)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to assess the impact of removing report aggregation from spaceport. Hard to determine that from code review.
This is WIP. I still need to add executable_schema_id, this will be the most difficult part.
Spaceport now just deals with reports and forwards them on. There is no aggregation taking place in spaceport.
The telemetry plugin now composes the complete report for studio.
Metrics are aggregated on Key which is now an enum of excluded or regular. Excluded has no information but allows us to get operation_count.
Report header is introduced so that we can eventually populate it with all the right info.
@glasser Be good if you could take a look to see if this going in the right direction.