Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache the computation of default columns during three-way merges. #7065

Merged
merged 4 commits into from
Dec 4, 2023

Conversation

nicktobey
Copy link
Contributor

This is a slow operation because it actually generates the CREATE TABLE string for the merged schema and uses yacc to parse it so we can correctly resolve default column expressions.

But we should only need to do this once, not once per row.

This also contains a rough performance regression test. Without this fix, the test should time out on GitHub's CI.

Copy link
Contributor

@fulghum fulghum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code change for column defaults looks great. Thank you for fixing that one! 🙏

I like the larger perf test, but I'm still very skeptical about the timing issues. The CI envs have a LOT of variance over time in how quickly they run, and they are of course much slower than our dev laptops. It seems like trying to test perf regressions by a set time limit is going to be pretty fragile.

This is still a step in a good direction though! I just think we should keep pushing for a better way to test perf for specific operations. The two ideas that come to mind right now are...

  • I still like the idea of having some internal merge stats that tell us how much work was done as part of the merge. It could break things down into a lot of interesting dimensions. It's definitely less direct than just measuring time, and you're right that it involves us measuring for things we know will be expensive, but it still seems like it might be an interesting idea to explore.
  • This is likely going to be messy, but we could try to have the perf testing code take a baseline on the hardware it's currently running on, and use that to create some perf modifier that could set an expected range for how fast code should run. I've seen other projects that have used a similar approach, but it's definitely a bit messy.

Comment on lines +14 to +21
echo "insert into t(pk) values" > import.sql
for i in {1..100000}
do
echo " ($i)," >> import.sql
done
echo " (104857);" >> import.sql

dolt sql < import.sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) I don't think you need to check in the generated import.sql script. I know I had suggested checking in a sql file to create the table, but since you've got the create_repo function in the BATS, I think that's an even better way to get reproducibility, so I think you can safely remove the checked-in import.sql file since we can easily regenerate the test database from the BATS function you made. You could also trim down some other files, for example branch_control.db shouldn't be needed since we aren't using branch permissions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to check in the .sql script. Removed.

load $BATS_TEST_DIRNAME/helper/common.bash

BATS_TEST_TIMEOUT=50

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to document some context about what the intent of the performance.bats file is. It's obvious to you and me right now, but it likely won't be to future code readers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments.

@nicktobey nicktobey merged commit e723577 into main Dec 4, 2023
17 checks passed
@nicktobey nicktobey deleted the nicktobey/perf branch December 4, 2023 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants