-
Notifications
You must be signed in to change notification settings - Fork 899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP ] Very rough timeline / event streams optimizations for faster API requests with many tens of thousands of rows #22304
Conversation
For an api request with 10k event_streams, the requests before and after: initial(cold, no prior requests): 48 => 11 seconds http://localhost:3000/api/event_streams?limit=100&offset=0&expand=resources&attributes=group,group_level,group_name,id,event_type,message,ems_id,type,timestamp,created_on,host.name,source,ems_id,ext_management_system.name&filter[]=type=EmsEvent&filter[]=group=other&filter[]=group_level=detail
We don't need to classify and constantize a param's model name over and over again to see if it's valid? For this api request with 10101010101010101010k event_streams, the requests before and after: initial(cold, no prior requests): ~11 => ~9 seconds subsequent(previously requested): ~8 => ~5 seconds http://localhost:3000/api/event_streams?limit=100&offset=0&expand=resources&attributes=group,group_level,group_name,id,event_type,message,ems_id,type,timestamp,created_on,host.name,source,ems_id,ext_management_system.name&filter[]=type=EmsEvent&filter[]=group=other&filter[]=group_level=detail
For this api request with 10k event_streams, the requests before and after: initial(cold, no prior requests): ~9 => ~7.5 seconds subsequent(previously requested): ~5 => ~3.5 seconds http://localhost:3000/api/event_streams?limit=100&offset=0&expand=resources&attributes=group,group_level,group_name,id,event_type,message,ems_id,type,timestamp,created_on,host.name,source,ems_id,ext_management_system.name&filter[]=type=EmsEvent&filter[]=group=other&filter[]=group_level=detail 9 seconds -> 6 seconds
Before: Some filtering can only be done in ruby. For these pages, we download all records and run the filtering locally. We were filtering every record and then limiting the records for display This was generating a lot of extra work since we only need to filter the records that can end up on the page. After: In the case that we are returning counts, it is not possible to avoid filtering every record, but in the typical case (i.e.: `skip_counts = true`), we are able to only filter the records up until the end of the appropriate page. So we introduced filtering with a lazy enumerator. In the count case, we use to_a to undo the lazy enumeration - this avoids running the lazy enumeration multiple times. We've just added yet another type that the targets variable can be. There is the potential that not every type will be in the test cases and a future change doesn't take this into account. I tried to add come comments to show where I felt the issues could arise. We are down to one caller that does not use skip_counts. This is the main view page in the ui so it is popular by call volume Removing this would remove a number of touch points in this search code.
Cool. That groups_and_levels cache makes a ton of sense, but it needs a small change. What I think we should do is cache it as you did at the class level, but bust that cache if someone changes the settings. We already have hooks for this for things like blacklisted_events, so it shouldn't be a big change. |
Separately, I'm looking into making a scope for the groups and levels, so that it can be filtered in SQL. We know ahead of time which events are in which category, so it should be a relatively simple thing. That would probably eliminate the problem entirely by avoiding all of the rows to begin with. That being said, the group_and_level cache should stay also. |
@jrafanie I'm reviewing this locally and I think you get some more bang for the buck if you cache the event_groups as well (or perhaps instead). That is the thing that's unlikely to change and it's very expensive with the deep_merge. This brings a single call to group_and_level from 21ms to 0.5ms (and all subsequent calls will continue to be 0.5 ms regardless of event_type). With this change in this PR only, each unique event will still separately be 21ms, though calling with the same event will drop to 0.005ms. |
Thanks, I had both originally and started eliminating things no longer needed once one thing was "fixed" and just picked this because that's what the svg said. I had meant to go back to it because yes, that deep merge is quite expensive. |
This reverts commit 13eb095. Note, the commit: "When caching _to_ruby, also cache valid?" mitigates the need for this commit
I pushed a revert commit... we don't need the caching of the result of the classify+constantize if correctly cache the |
Checked commits jrafanie/manageiq@f521cc4~...123cbe4 with ruby 2.6.10, rubocop 1.28.2, haml-lint 0.35.0, and yamllint app/models/event_stream.rb
lib/miq_expression.rb
|
Have to say it again: the |
@jrafanie can you pull out the |
My change is still WIP due to the need to broadcast a cache bust. Feel free to add a commit to that PR if you have the cycles. |
This pull request is not mergeable. Please rebase and repush. |
I think everything from this PR was extracted or replacements found so I'll close it for now |
This was a temporary WIP PR to try a few things:
It included PRs that were either already opened and ones that were extracted:
Additionally, this PR sparked the idea to redo how the event groups were calculated and cached. So, an alternative PR was created: