Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC: Index Result rows rather than to convert them into hashes #51744

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

casperisfine
Copy link
Contributor

Using the samebenchmark as #51726

A significant part of the memory footprint comes from Result#each:

Total allocated: 4.61 MB (43077 objects)
Total retained: 3.76 MB (27621 objects)

allocated memory by file
-----------------------------------
 391.52 kB  activerecord/lib/active_record/result.rb

retained memory by file
-----------------------------------
 374.40 kB  activerecord/lib/active_record/result.rb

Rows are initially stored as arrays, but Result#each convert them to hashes. Depending on how many elements they contain, hashes use between 2 and 5 times as much memory than arrays.

Length Hash Array Diff
1 160B 40B -120B
2 160B 40B -120B
3 160B 40B -120B
4 160B 80B -80B
5 160B 80B -80B
6 160B 80B -80B
7 160B 80B -80B
8 160B 80B -80B
9 464B 160B -304B
10 464B 160B -304B
11 464B 160B -304B
12 464B 160B -304B
13 464B 160B -304B
14 464B 160B -304B
15 464B 160B -304B
16 464B 160B -304B
17 912B 160B -752B
18 912B 160B -752B
19 912B 320B -592B
20 912B 320B -592B
21 912B 320B -592B
22 912B 320B -592B
23 912B 320B -592B
24 912B 320B -592B
25 912B 320B -592B
26 912B 320B -592B
27 912B 320B -592B
28 912B 320B -592B
29 912B 320B -592B
30 912B 320B -592B
31 912B 320B -592B
32 912B 320B -592B
33 1744B 320B -1424B
34 1744B 320B -1424B
35 1744B 320B -1424B
36 1744B 320B -1424B
37 1744B 320B -1424B
38 1744B 320B -1424B
39 1744B 640B -1104B
40 1744B 640B -1104B
41 1744B 640B -1104B
42 1744B 640B -1104B
43 1744B 640B -1104B
44 1744B 640B -1104B
45 1744B 640B -1104B
46 1744B 640B -1104B
47 1744B 640B -1104B
48 1744B 640B -1104B
49 1744B 640B -1104B
50 1744B 640B -1104B

Rather than to convert rows into hashes, we can loopkup the column index into a single hash common to all rows. To not complexify the code too much, rather than to pass the row array and the column index, we wrap both into an IndexedRow object, which uses an extra 40B object, but that's still less memory even in the worst case.

After:

Total allocated: 4.32 MB (43079 objects)
Total retained: 3.65 MB (29725 objects)

allocated memory by file
-----------------------------------
 101.66 kB  activerecord/lib/active_record/result.rb

retained memory by file
-----------------------------------
  84.70 kB  activerecord/lib/active_record/result.rb

As for access speed, it's of course a bit slower, but not by much, it's between 1.5 and 2 times slower, but remains in the 10's of M iterations per second, so I think this overhead is negligible compared to all the work needed to access a model attribute.

Also the LazyAttributeSet class only access this once, after the attribute is casted, the resulting value is still stored in a regular Hash.

cc @tenderlove @jhawthorn I'd like your opinion on this idea. I think it's worth it. We could potentially only do this when there is more than 1 row and more than 8 columns if we think the gains aren't quite big enough for small models.

Also NB: Result#each is public API, I changed it because it's a proof of concept, but this should be an alternate private API.

casperisfine pushed a commit to Shopify/rails that referenced this pull request May 7, 2024
Last big refactor was in rails#37614.
Somewhat extracted from rails#51744.

The concern about columns with the same name didn't make much sense
to me because in both code paths, the last one wins, so we can simplify
the whole methods.

Additionally by implementing `columns_index`, we have a decent template
always available.
casperisfine pushed a commit to Shopify/rails that referenced this pull request May 7, 2024
Last big refactor was in rails#37614.
Somewhat extracted from rails#51744.

The concern about columns with the same name didn't make much sense
to me because in both code paths, the last one wins, so we can simplify
the whole methods.

Additionally by implementing `columns_index`, we have a decent template
always available.
@casperisfine
Copy link
Contributor Author

I changed it because it's a proof of concept, but this should be an alternate private API.

Or maybe it doesn't need to? It's documented as returning a Hash, but it would really be for the better if we always returned IndexedRows. Perhaps we can handle most backward compat issues with a delegate_to_missing :to_hash?

@casperisfine casperisfine force-pushed the result-rows branch 2 times, most recently from 054583c to 443e1b4 Compare May 7, 2024 10:09
Using the samebenchmark as rails#51726

A significant part of the memory footprint comes from `Result#each`:

```
Total allocated: 4.61 MB (43077 objects)
Total retained: 3.76 MB (27621 objects)

allocated memory by file
-----------------------------------
 391.52 kB  activerecord/lib/active_record/result.rb

retained memory by file
-----------------------------------
 374.40 kB  activerecord/lib/active_record/result.rb
```

Rows are initially stored as arrays, but `Result#each` convert them
to hashes. Depending on how many elements they contain, hashes use
between 2 and 5 times as much memory than arrays.

|    Length |      Hash |     Array |      Diff |
| --------- | --------- | --------- | --------- |
|         1 |       160B |        40B |      -120B |
|         2 |       160B |        40B |      -120B |
|         3 |       160B |        40B |      -120B |
|         4 |       160B |        80B |       -80B |
|         5 |       160B |        80B |       -80B |
|         6 |       160B |        80B |       -80B |
|         7 |       160B |        80B |       -80B |
|         8 |       160B |        80B |       -80B |
|         9 |       464B |       160B |      -304B |
|        10 |       464B |       160B |      -304B |
|        11 |       464B |       160B |      -304B |
|        12 |       464B |       160B |      -304B |
|        13 |       464B |       160B |      -304B |
|        14 |       464B |       160B |      -304B |
|        15 |       464B |       160B |      -304B |
|        16 |       464B |       160B |      -304B |
|        17 |       912B |       160B |      -752B |
|        18 |       912B |       160B |      -752B |
|        19 |       912B |       320B |      -592B |
|        20 |       912B |       320B |      -592B |
|        21 |       912B |       320B |      -592B |
|        22 |       912B |       320B |      -592B |
|        23 |       912B |       320B |      -592B |
|        24 |       912B |       320B |      -592B |
|        25 |       912B |       320B |      -592B |
|        26 |       912B |       320B |      -592B |
|        27 |       912B |       320B |      -592B |
|        28 |       912B |       320B |      -592B |
|        29 |       912B |       320B |      -592B |
|        30 |       912B |       320B |      -592B |
|        31 |       912B |       320B |      -592B |
|        32 |       912B |       320B |      -592B |
|        33 |      1744B |       320B |     -1424B |
|        34 |      1744B |       320B |     -1424B |
|        35 |      1744B |       320B |     -1424B |
|        36 |      1744B |       320B |     -1424B |
|        37 |      1744B |       320B |     -1424B |
|        38 |      1744B |       320B |     -1424B |
|        39 |      1744B |       640B |     -1104B |
|        40 |      1744B |       640B |     -1104B |
|        41 |      1744B |       640B |     -1104B |
|        42 |      1744B |       640B |     -1104B |
|        43 |      1744B |       640B |     -1104B |
|        44 |      1744B |       640B |     -1104B |
|        45 |      1744B |       640B |     -1104B |
|        46 |      1744B |       640B |     -1104B |
|        47 |      1744B |       640B |     -1104B |
|        48 |      1744B |       640B |     -1104B |
|        49 |      1744B |       640B |     -1104B |
|        50 |      1744B |       640B |     -1104B |

Rather than to convert rows into hashes, we can loopkup the column index
into a single hash common to all rows. To not complexify the code too much,
rather than to pass the row array and the column index, we wrap both into
an `IndexedRow` object, which uses an extra `40B` object, but that's still
less memory even in the worst case.

After:

```
Total allocated: 4.32 MB (43079 objects)
Total retained: 3.65 MB (29725 objects)

allocated memory by file
-----------------------------------
 101.66 kB  activerecord/lib/active_record/result.rb

retained memory by file
-----------------------------------
  84.70 kB  activerecord/lib/active_record/result.rb
```

As for access speed, it's of course a bit slower, but not by much,
it's between `1.5` and `2` times slower, but remains in the 10's of M
iterations per second, so I think this overhead is negligible compared
to all the work needed to access a model attribute.

Also the `LazyAttributeSet` class only access this once, after the
attribute is casted, the resulting value is still stored in a regular
`Hash`.
fractaledmind pushed a commit to fractaledmind/rails that referenced this pull request May 13, 2024
Last big refactor was in rails#37614.
Somewhat extracted from rails#51744.

The concern about columns with the same name didn't make much sense
to me because in both code paths, the last one wins, so we can simplify
the whole methods.

Additionally by implementing `columns_index`, we have a decent template
always available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants