Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return early from allow list when possible #515

Merged
merged 5 commits into from
Jan 10, 2024
Merged

Conversation

camilo
Copy link
Contributor

@camilo camilo commented Dec 20, 2023

What is this?

Behaviour change:

Savepoints that don't follow the activerecord pattern active_record_savepoint_#{nesting_level} or that are nested more than two levels are not going to bypass the circuit anymore.

I think this is okay because in the end this is a best effort scenario on an overloaded database and this IS an active record adapter.

In the trilogy adapter.

We look at every single query to see wether or not it should be allowed to pass thru the circuit if it happens to be a transaction control statement, (COMMIT/ROLLBACK/RELEASE SAVEPOINT) because we want to try our hardest to make sure we don't pile open transactions onto MySQL even if the underlying database is overloaded.

Large queries, including queries with long comments (often the case when using marginalia to pass metadata around) will in aggregate use quite a bit of CPU just running regular expressions.

We can exploit the fact that active record trilogy uses a set pattern for savepoint names (`active_record_#{nesting_level}" ) and uses ROLLBACK and COMMIT without any extra characters in the end of the statement to simply return early in most queries by inspecting the last few characters.

FWIW

ruby 3.1.4p223 (2023-03-30 revision 2ec786f0d7) [x86_64-linux]
Warming up --------------------------------------
      end-with-case?   356.147k i/100ms
               regex    20.823k i/100ms
Calculating -------------------------------------
      end-with-case?      3.523M (± 1.9%) i/s -     17.807M in   5.056666s
               regex    206.105k (± 1.2%) i/s -      1.041M in   5.052247s

Comparison:
      end-with-case?:  3522827.5 i/s
               regex:   206105.0 i/s - 17.09x  slower

@camilo camilo force-pushed the micro-opt-allowlist branch 7 times, most recently from 542466d to 19bd6b9 Compare December 21, 2023 04:11
@camilo camilo marked this pull request as ready for review December 21, 2023 04:13
@camilo camilo force-pushed the micro-opt-allowlist branch 4 times, most recently from 167bd53 to 86b2878 Compare December 24, 2023 01:20
Copy link
Contributor

@adrianna-chang-shopify adrianna-chang-shopify left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, clever change 😄 I have a couple small nits, mostly on the comments. I'd also personally opt to drop the benchmark -- not sure it needs to get merged to the test suite, IMHO documenting it in this PR should and mentioning the optimization in an inline comment should suffice.

lib/semian/activerecord_trilogy_adapter.rb Outdated Show resolved Hide resolved
lib/semian/activerecord_trilogy_adapter.rb Outdated Show resolved Hide resolved
camilo and others added 3 commits January 5, 2024 10:28
Typos

Co-authored-by: Adrianna Chang <adrianna.chang@shopify.com>
Co-authored-by: Adrianna Chang <adrianna.chang@shopify.com>
@camilo
Copy link
Contributor Author

camilo commented Jan 8, 2024

I'd also personally opt to drop the benchmark -- not sure it needs to get merged to the test suite, IMHO documenting it in this PR should and mentioning the optimization in an inline comment should suffice.

Only reason to keep it is because I suspect an artificial case bench that is a very short query, say SELECT * FROM table_name will not show that much improvement and then we'll have to re-explain why this works the way it works.

As it stands now there is a realistic fixture of a real world query. I don't feel strongly tho. Something else would be to add a /bench directory so we don't have a random file that does not run at test time.


Another discussion point but that is larger and should probably live in its own issue is the actual real life effectiveness of the bypass, the need for it, and wether or not AR could just tell us (or is already telling us?) that a query is a COMMIT/ROLLBACK/SAVEPOINT RELEASE in that case we would not have to mess with the raw queries.

@camilo camilo merged commit 4d2f53c into main Jan 10, 2024
42 checks passed
@camilo camilo deleted the micro-opt-allowlist branch January 10, 2024 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants