PoC: Index Result rows rather than to convert them into hashes #51744

casperisfine · 2024-05-06T08:56:00Z

Using the samebenchmark as #51726

A significant part of the memory footprint comes from Result#each:

Total allocated: 4.61 MB (43077 objects)
Total retained: 3.76 MB (27621 objects)

allocated memory by file
-----------------------------------
 391.52 kB  activerecord/lib/active_record/result.rb

retained memory by file
-----------------------------------
 374.40 kB  activerecord/lib/active_record/result.rb

Rows are initially stored as arrays, but Result#each convert them to hashes. Depending on how many elements they contain, hashes use between 2 and 5 times as much memory than arrays.

Length	Hash	Array	Diff
1	160B	40B	-120B
2	160B	40B	-120B
3	160B	40B	-120B
4	160B	80B	-80B
5	160B	80B	-80B
6	160B	80B	-80B
7	160B	80B	-80B
8	160B	80B	-80B
9	464B	160B	-304B
10	464B	160B	-304B
11	464B	160B	-304B
12	464B	160B	-304B
13	464B	160B	-304B
14	464B	160B	-304B
15	464B	160B	-304B
16	464B	160B	-304B
17	912B	160B	-752B
18	912B	160B	-752B
19	912B	320B	-592B
20	912B	320B	-592B
21	912B	320B	-592B
22	912B	320B	-592B
23	912B	320B	-592B
24	912B	320B	-592B
25	912B	320B	-592B
26	912B	320B	-592B
27	912B	320B	-592B
28	912B	320B	-592B
29	912B	320B	-592B
30	912B	320B	-592B
31	912B	320B	-592B
32	912B	320B	-592B
33	1744B	320B	-1424B
34	1744B	320B	-1424B
35	1744B	320B	-1424B
36	1744B	320B	-1424B
37	1744B	320B	-1424B
38	1744B	320B	-1424B
39	1744B	640B	-1104B
40	1744B	640B	-1104B
41	1744B	640B	-1104B
42	1744B	640B	-1104B
43	1744B	640B	-1104B
44	1744B	640B	-1104B
45	1744B	640B	-1104B
46	1744B	640B	-1104B
47	1744B	640B	-1104B
48	1744B	640B	-1104B
49	1744B	640B	-1104B
50	1744B	640B	-1104B

Rather than to convert rows into hashes, we can loopkup the column index into a single hash common to all rows. To not complexify the code too much, rather than to pass the row array and the column index, we wrap both into an IndexedRow object, which uses an extra 40B object, but that's still less memory even in the worst case.

After:

Total allocated: 4.32 MB (43079 objects)
Total retained: 3.65 MB (29725 objects)

allocated memory by file
-----------------------------------
 101.66 kB  activerecord/lib/active_record/result.rb

retained memory by file
-----------------------------------
  84.70 kB  activerecord/lib/active_record/result.rb

As for access speed, it's of course a bit slower, but not by much, it's between 1.5 and 2 times slower, but remains in the 10's of M iterations per second, so I think this overhead is negligible compared to all the work needed to access a model attribute.

Also the LazyAttributeSet class only access this once, after the attribute is casted, the resulting value is still stored in a regular Hash.

cc @tenderlove @jhawthorn I'd like your opinion on this idea. I think it's worth it. We could potentially only do this when there is more than 1 row and more than 8 columns if we think the gains aren't quite big enough for small models.

Also NB: Result#each is public API, I changed it because it's a proof of concept, but this should be an alternate private API.

Last big refactor was in rails#37614. Somewhat extracted from rails#51744. The concern about columns with the same name didn't make much sense to me because in both code paths, the last one wins, so we can simplify the whole methods. Additionally by implementing `columns_index`, we have a decent template always available.

casperisfine · 2024-05-07T10:06:50Z

I changed it because it's a proof of concept, but this should be an alternate private API.

Or maybe it doesn't need to? It's documented as returning a Hash, but it would really be for the better if we always returned IndexedRows. Perhaps we can handle most backward compat issues with a delegate_to_missing :to_hash?

Using the samebenchmark as rails#51726 A significant part of the memory footprint comes from `Result#each`: ``` Total allocated: 4.61 MB (43077 objects) Total retained: 3.76 MB (27621 objects) allocated memory by file ----------------------------------- 391.52 kB activerecord/lib/active_record/result.rb retained memory by file ----------------------------------- 374.40 kB activerecord/lib/active_record/result.rb ``` Rows are initially stored as arrays, but `Result#each` convert them to hashes. Depending on how many elements they contain, hashes use between 2 and 5 times as much memory than arrays. | Length | Hash | Array | Diff | | --------- | --------- | --------- | --------- | | 1 | 160B | 40B | -120B | | 2 | 160B | 40B | -120B | | 3 | 160B | 40B | -120B | | 4 | 160B | 80B | -80B | | 5 | 160B | 80B | -80B | | 6 | 160B | 80B | -80B | | 7 | 160B | 80B | -80B | | 8 | 160B | 80B | -80B | | 9 | 464B | 160B | -304B | | 10 | 464B | 160B | -304B | | 11 | 464B | 160B | -304B | | 12 | 464B | 160B | -304B | | 13 | 464B | 160B | -304B | | 14 | 464B | 160B | -304B | | 15 | 464B | 160B | -304B | | 16 | 464B | 160B | -304B | | 17 | 912B | 160B | -752B | | 18 | 912B | 160B | -752B | | 19 | 912B | 320B | -592B | | 20 | 912B | 320B | -592B | | 21 | 912B | 320B | -592B | | 22 | 912B | 320B | -592B | | 23 | 912B | 320B | -592B | | 24 | 912B | 320B | -592B | | 25 | 912B | 320B | -592B | | 26 | 912B | 320B | -592B | | 27 | 912B | 320B | -592B | | 28 | 912B | 320B | -592B | | 29 | 912B | 320B | -592B | | 30 | 912B | 320B | -592B | | 31 | 912B | 320B | -592B | | 32 | 912B | 320B | -592B | | 33 | 1744B | 320B | -1424B | | 34 | 1744B | 320B | -1424B | | 35 | 1744B | 320B | -1424B | | 36 | 1744B | 320B | -1424B | | 37 | 1744B | 320B | -1424B | | 38 | 1744B | 320B | -1424B | | 39 | 1744B | 640B | -1104B | | 40 | 1744B | 640B | -1104B | | 41 | 1744B | 640B | -1104B | | 42 | 1744B | 640B | -1104B | | 43 | 1744B | 640B | -1104B | | 44 | 1744B | 640B | -1104B | | 45 | 1744B | 640B | -1104B | | 46 | 1744B | 640B | -1104B | | 47 | 1744B | 640B | -1104B | | 48 | 1744B | 640B | -1104B | | 49 | 1744B | 640B | -1104B | | 50 | 1744B | 640B | -1104B | Rather than to convert rows into hashes, we can loopkup the column index into a single hash common to all rows. To not complexify the code too much, rather than to pass the row array and the column index, we wrap both into an `IndexedRow` object, which uses an extra `40B` object, but that's still less memory even in the worst case. After: ``` Total allocated: 4.32 MB (43079 objects) Total retained: 3.65 MB (29725 objects) allocated memory by file ----------------------------------- 101.66 kB activerecord/lib/active_record/result.rb retained memory by file ----------------------------------- 84.70 kB activerecord/lib/active_record/result.rb ``` As for access speed, it's of course a bit slower, but not by much, it's between `1.5` and `2` times slower, but remains in the 10's of M iterations per second, so I think this overhead is negligible compared to all the work needed to access a model attribute. Also the `LazyAttributeSet` class only access this once, after the attribute is casted, the resulting value is still stored in a regular `Hash`.

Last big refactor was in rails#37614. Somewhat extracted from rails#51744. The concern about columns with the same name didn't make much sense to me because in both code paths, the last one wins, so we can simplify the whole methods. Additionally by implementing `columns_index`, we have a decent template always available.

rails-bot bot added the activerecord label May 6, 2024

casperisfine mentioned this pull request May 7, 2024

Simplify ActiveRecord::Result#hash_rows #51752

Merged

casperisfine force-pushed the result-rows branch from 6562d05 to 971f1b2 Compare May 7, 2024 10:05

casperisfine force-pushed the result-rows branch 2 times, most recently from 054583c to 443e1b4 Compare May 7, 2024 10:09

casperisfine force-pushed the result-rows branch from 443e1b4 to a269936 Compare May 7, 2024 10:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PoC: Index Result rows rather than to convert them into hashes #51744

PoC: Index Result rows rather than to convert them into hashes #51744

casperisfine commented May 6, 2024

casperisfine commented May 7, 2024

PoC: Index Result rows rather than to convert them into hashes #51744

Are you sure you want to change the base?

PoC: Index Result rows rather than to convert them into hashes #51744

Conversation

casperisfine commented May 6, 2024

casperisfine commented May 7, 2024