[WIP ] Very rough timeline / event streams optimizations for faster API requests with many tens of thousands of rows #22304

jrafanie · 2023-01-12T17:02:32Z

This was a temporary WIP PR to try a few things:

It included PRs that were either already opened and ones that were extracted:

Additionally, this PR sparked the idea to redo how the event groups were calculated and cached. So, an alternative PR was created:

alternative PR: Add caching of event_groups and pre-partition the event types #22305

For an api request with 10k event_streams, the requests before and after: initial(cold, no prior requests): 48 => 11 seconds http://localhost:3000/api/event_streams?limit=100&offset=0&expand=resources&attributes=group,group_level,group_name,id,event_type,message,ems_id,type,timestamp,created_on,host.name,source,ems_id,ext_management_system.name&filter[]=type=EmsEvent&filter[]=group=other&filter[]=group_level=detail

We don't need to classify and constantize a param's model name over and over again to see if it's valid? For this api request with 10101010101010101010k event_streams, the requests before and after: initial(cold, no prior requests): ~11 => ~9 seconds subsequent(previously requested): ~8 => ~5 seconds http://localhost:3000/api/event_streams?limit=100&offset=0&expand=resources&attributes=group,group_level,group_name,id,event_type,message,ems_id,type,timestamp,created_on,host.name,source,ems_id,ext_management_system.name&filter[]=type=EmsEvent&filter[]=group=other&filter[]=group_level=detail

For this api request with 10k event_streams, the requests before and after: initial(cold, no prior requests): ~9 => ~7.5 seconds subsequent(previously requested): ~5 => ~3.5 seconds http://localhost:3000/api/event_streams?limit=100&offset=0&expand=resources&attributes=group,group_level,group_name,id,event_type,message,ems_id,type,timestamp,created_on,host.name,source,ems_id,ext_management_system.name&filter[]=type=EmsEvent&filter[]=group=other&filter[]=group_level=detail 9 seconds -> 6 seconds

Before: Some filtering can only be done in ruby. For these pages, we download all records and run the filtering locally. We were filtering every record and then limiting the records for display This was generating a lot of extra work since we only need to filter the records that can end up on the page. After: In the case that we are returning counts, it is not possible to avoid filtering every record, but in the typical case (i.e.: `skip_counts = true`), we are able to only filter the records up until the end of the appropriate page. So we introduced filtering with a lazy enumerator. In the count case, we use to_a to undo the lazy enumeration - this avoids running the lazy enumeration multiple times. We've just added yet another type that the targets variable can be. There is the potential that not every type will be in the test cases and a future change doesn't take this into account. I tried to add come comments to show where I felt the issues could arise. We are down to one caller that does not use skip_counts. This is the main view page in the ui so it is popular by call volume Removing this would remove a number of touch points in this search code.

Fryguy · 2023-01-12T17:14:55Z

Cool. That groups_and_levels cache makes a ton of sense, but it needs a small change. What I think we should do is cache it as you did at the class level, but bust that cache if someone changes the settings. We already have hooks for this for things like blacklisted_events, so it shouldn't be a big change.

Fryguy · 2023-01-12T17:18:56Z

Separately, I'm looking into making a scope for the groups and levels, so that it can be filtered in SQL. We know ahead of time which events are in which category, so it should be a relatively simple thing. That would probably eliminate the problem entirely by avoiding all of the rows to begin with. That being said, the group_and_level cache should stay also.

Fryguy · 2023-01-12T17:56:57Z

@jrafanie I'm reviewing this locally and I think you get some more bang for the buck if you cache the event_groups as well (or perhaps instead). That is the thing that's unlikely to change and it's very expensive with the deep_merge. This brings a single call to group_and_level from 21ms to 0.5ms (and all subsequent calls will continue to be 0.5 ms regardless of event_type). With this change in this PR only, each unique event will still separately be 21ms, though calling with the same event will drop to 0.005ms.

jrafanie · 2023-01-12T18:18:35Z

@jrafanie I'm reviewing this locally and I think you get some more bang for the buck if you cache the event_groups as well (or perhaps instead). That is the thing that's unlikely to change and it's very expensive with the deep_merge.

Thanks, I had both originally and started eliminating things no longer needed once one thing was "fixed" and just picked this because that's what the svg said. I had meant to go back to it because yes, that deep merge is quite expensive.

This reverts commit 13eb095. Note, the commit: "When caching _to_ruby, also cache valid?" mitigates the need for this commit

jrafanie · 2023-01-12T19:13:32Z

I pushed a revert commit... we don't need the caching of the result of the classify+constantize if correctly cache the valid? when we cache the result of _to_ruby? (in other words, commit 3 mitigates the need for commit 2)

miq-bot · 2023-01-12T19:20:51Z

Checked commits jrafanie/manageiq@f521cc4~...123cbe4 with ruby 2.6.10, rubocop 1.28.2, haml-lint 0.35.0, and yamllint
5 files checked, 4 offenses detected

app/models/event_stream.rb

❗ - Line 97, Col 5 - Style/RedundantReturn - Redundant return detected.

lib/miq_expression.rb

❗ - Line 165, Col 5 - Layout/CommentIndentation - Incorrect indentation detected (column 4 instead of 6).
❗ - Line 166, Col 1 - Layout/TrailingWhitespace - Trailing whitespace detected.
❗ - Line 171, Col 7 - Layout/CommentIndentation - Incorrect indentation detected (column 6 instead of 4).

Fryguy · 2023-01-12T21:23:22Z

@jrafanie I opened #22305 to do the caching a different way. Even with this PR, the sample code there took forever, but after that PR, the group* lookups are negligible.

kbrock · 2023-01-13T23:46:33Z

Have to say it again: the valid? optimization is my favorite here. Not the biggest win, but a great fix

kbrock · 2023-01-19T14:43:55Z

@jrafanie can you pull out the valid? and other low hanging fruit so we can get a few of these merged?

jrafanie · 2023-01-19T15:11:27Z

@jrafanie can you pull out the valid? and other low hanging fruit so we can get a few of these merged?

yeah, I have to revisit this and apply @Fryguy's change and only the ones we need from this branch. I do think the valid? optimization is good.

Fryguy · 2023-01-19T17:58:50Z

My change is still WIP due to the need to broadcast a cache bust. Feel free to add a commit to that PR if you have the cycles.

jrafanie · 2023-01-20T20:26:38Z

@jrafanie can you pull out the valid? and other low hanging fruit so we can get a few of these merged?

@kbrock opened #22318 for the valid? optimization... note my comment about what I changed from what we wrote.

miq-bot · 2023-01-27T14:47:13Z

This pull request is not mergeable. Please rebase and repush.

jrafanie · 2023-01-31T20:01:55Z

I think everything from this PR was extracted or replacements found so I'll close it for now

jrafanie and others added 4 commits January 12, 2023 12:00

jrafanie requested review from kbrock, agrare and Fryguy as code owners January 12, 2023 17:02

Revert "Cache the model_name previously constantized"

123cbe4

This reverts commit 13eb095. Note, the commit: "When caching _to_ruby, also cache valid?" mitigates the need for this commit

Fryguy mentioned this pull request Jan 12, 2023

Add caching of event_groups and pre-partition the event types #22305

Merged

MelsHyrule mentioned this pull request Jan 13, 2023

React Timeline Chart and Table Component ManageIQ/manageiq-ui-classic#8562

Merged

Fryguy added the performance label Jan 18, 2023

Fryguy self-assigned this Jan 18, 2023

jrafanie mentioned this pull request Jan 20, 2023

Since we only cache valid? _to_ruby, don't check it again #22318

Merged

miq-bot added the unmergeable label Jan 27, 2023

jrafanie closed this Jan 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP ] Very rough timeline / event streams optimizations for faster API requests with many tens of thousands of rows #22304

[WIP ] Very rough timeline / event streams optimizations for faster API requests with many tens of thousands of rows #22304

jrafanie commented Jan 12, 2023 •

edited

Fryguy commented Jan 12, 2023

Fryguy commented Jan 12, 2023

Fryguy commented Jan 12, 2023

jrafanie commented Jan 12, 2023

jrafanie commented Jan 12, 2023

miq-bot commented Jan 12, 2023

Fryguy commented Jan 12, 2023 •

edited

kbrock commented Jan 13, 2023

kbrock commented Jan 19, 2023

jrafanie commented Jan 19, 2023

Fryguy commented Jan 19, 2023

jrafanie commented Jan 20, 2023

miq-bot commented Jan 27, 2023

jrafanie commented Jan 31, 2023

[WIP ] Very rough timeline / event streams optimizations for faster API requests with many tens of thousands of rows #22304

[WIP ] Very rough timeline / event streams optimizations for faster API requests with many tens of thousands of rows #22304

Conversation

jrafanie commented Jan 12, 2023 • edited

Fryguy commented Jan 12, 2023

Fryguy commented Jan 12, 2023

Fryguy commented Jan 12, 2023

jrafanie commented Jan 12, 2023

jrafanie commented Jan 12, 2023

miq-bot commented Jan 12, 2023

Fryguy commented Jan 12, 2023 • edited

kbrock commented Jan 13, 2023

kbrock commented Jan 19, 2023

jrafanie commented Jan 19, 2023

Fryguy commented Jan 19, 2023

jrafanie commented Jan 20, 2023

miq-bot commented Jan 27, 2023

jrafanie commented Jan 31, 2023

jrafanie commented Jan 12, 2023 •

edited

Fryguy commented Jan 12, 2023 •

edited