-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bulk requests and grouping to the whisp count query #349
Comments
I agree we should rename it and then use it to decommission the old query later.
I would prefer to provide an array of criteria arguments to a single query and let whispr do the OR combination before executing the aggregate function
For a rate limiting option we should use the nestjs throttler https://docs.nestjs.com/security/rate-limiting to do a rate limiting based on the query. To identify how much request we want to allow it might be worth to implement a k6 load test to get this number. |
In a separate chat @DennisWue also mentioned that we do actually have some basic grouping capability already, which I was not aware of!
Would return:
It's pretty simple, but becomes complex when you want to group on more than one criteria (which we do in at least one of our apps). I can't see an elegant way of doing that - I think it would be necessary to separate the grouping by an underscore in the query name. Thoughts? |
Yes it's called 'Field alias' read more here https://spec.graphql.org/June2018/#sec-Field-Alias |
If it works then I like it! Can you give an example of what the query would need to look like using the grouping by thing1, thing2 example? |
Do you mean a combined grouping? query combinedCounts($filter: JSONObject) { with variable: which would return: |
Nope, I mean where the grouping is on 2 or more criteria (like the composite key in the original example) to return something equivalent to this:
Do you mean that the consuming application should manage its own mapping to manage that, either by building the composite key in the query name, or a separate hash to lookup the query name vs the composite key? |
Another thought (which was not the original aim of this issue) is that the new proposal would provide some enhanced functionality for grouping when you don't know the exact query filter - for example to return the count of all whisps in the database grouped by applicationID, without knowing all distinct applicationIDs. |
Okay this depends on what you want to achieve. Do you want to have the count the matching criteria1 AND criteria2 or do you want the criteria1 OR criteria2 counted. query combinedCounts() { For the OR case you would have to use the mongodb $or operator https://docs.mongodb.com/manual/reference/operator/query/or/
Yes this would be part of the requesting application |
I like the idea maybe we can even achieve something right now with a filter like |
I don't think I was clear enough in my initial question, yes it is an AND case, but the challenge was more around how we return it with a unique key (which I think you addressed with a lookup in the app with your comment below). Maybe it was a typo but that query gives this error because the query alias' are duplicated:
Ok - so that means that the app itself would generate a unique hash for the query name, which would solve the issue above. |
Sorry typo on my side, corrected the initial comment |
Yes I think a hash as the alias and a dynamic filter could be a solution for your usecase. |
@DennisWue I've just been testing your proposal with one of the queries from our apps and it does work well, even if the mapping within the consuming app adds a bit of complexity. I do still like the grouping idea for helping with cases where you want to group without knowing the filter, but that's a different use case which could be added a bit later. For this use case I think I'm now more in favour of @DennisWue's proposal and just doing a few documentation updates now to explain how to use the current functionality, and then consider the additional grouping option a bit later. @Babbafett @DennisWue @David-Wennemaring @schroedermaxi what are your thoughts? Is the need for the mapping in the consuming app an issue? |
I like the simplicity of the approach that we have. However, I do have a question. The current implementation still hits the database multiple times. What I mean is for each |
It's a good point - I think we need to do some testing of that part as I'm not certain how it works and at what level we can apply the rate limiting, and what level of visibility we have in the resolver of the number of alias' in the request. I'll take a look at that when I get some time. |
@nikola-kovacevic I haven't forgotten this point - I'm doing some other cleanup work this week and hope to test it at the same time. |
Ok so I did some work on this today. The good news is that we could still rate limit this type of request if we want to. I did a naive implementation of NestJS throttler, with a limit of 10 requests every 10 seconds. With this type of alias, we basically just hit the resolver x times for x aliased queries, so the throttling works in the same way as if we did x totally separate requests. Works fine with 10 aliased requests I then waited for the limit to reset (10 seconds and tried with more aliased requests ... Fails instantly with 11 aliased requests So that is good news, because when we implement a rate limiter (not in this task) it will work for this type of request. However, this steers me back towards the original proposal because we have a genuine case today where clients are hitting the count endpoint several hundred times over a few seconds. Combining the count of all clients this is actually thousands of times per second at peak times. I think that the original proposal is likely to work much better in this case because the aggregation is done by the db, and we're not launching lots of countDocuments operations which is an expensive op. I'm going to try a basic implementation and create some load tests to try to prove this. Thanks @nikola-kovacevic for catching this possible issue. |
I started working on the grouping feature, and created a draft PR for it: #420. Some more testing to do, but so far it looks promising. One thing that I'm unsure of right now is how we will manage subscriptions. I need to do a bit more investigation on that, and will possibly consider it as an additional PR depending on the scope. |
Simple load testing done, and the performance of the new query is a lot better, especially when the filter size is limited and we rely on grouping instead. I also realised that my point about subscriptions is not relevant for this issue as there's no current subscription for the which count query. It might be possible to implement a subscription for the count query once this is done: #425. |
Closed by #420, feature included in https://github.com/Sanofi-IADC/whispr/releases/tag/v2.4.0. |
Is your feature request related to a problem? Please describe
Currently the
whispsCount
query takes a single mongo format query, and returns the count of documents which match that query. In cases where an application needs to get the count of whisps in bulk, a separate request is required for each query, resulting in a high number of requests from the browser (e.g. one for every row of a table).Describe the solution you'd like
whispsCount
to allow bulk requests, and grouping by specific criteria in the mongo querycountDocuments()
function, and use db.collection.aggregate instead which will allow us to have a grouping functionality (explained more in detail below)Currently the query runs the mongo command
countDocuments()
(https://docs.mongodb.com/manual/reference/method/db.collection.countDocuments/).countDocuments
is actually just a wrapper for the following command:We can easily replace
countDocuments()
with that command to allow us to group, and therefore manage bulk requests.For example this query:
Would return something like this, which satisfies the requirements:
In that example the match criteria is very simple, but it could also be a more complex query with multiple things to match.
BREAKING CHANGE(S)
Describe alternatives you've considered
Consuming app provides an array of queries, with an ID for each query, whispr executes individual
documentCount
queries but returns a single responseUse a mongo aggregate function so that we can add grouping functionality
(this is the proposed solution described in detail above)
Additional context
Open points for discussion:
whispsCount
- should it be calledwhispCount
instead? - if it should then it allows us to introduce this feature without a breaking change (for now) as we could add the new query and decommission the old one laterThe text was updated successfully, but these errors were encountered: