-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Normalization: Decrease event buffer size #11267
Conversation
4f92c1c
to
f6568f5
Compare
/test connector=bases/base-normalization |
1 similar comment
/test connector=bases/base-normalization |
/test connector=bases/base-normalization
|
/test connector=bases/base-normalization |
/test connector=bases/base-normalization
|
/test connector=bases/base-normalization
|
# We don't run dbt 1.0.x on all destinations (because their plugins don't support it yet) | ||
# So we need to only pass `--event-buffer-size` if we have DBT 1.0.0 | ||
# For some reason, `dbt --version` outputs to stderr, so we need to redirect it to stdout | ||
dbt --version 2>&1 | grep -q 'installed version: 1.0.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we upgrade dbt to 42.2.1
?
WHy not just try to run a dummy dbt cli with --event-buffer-size
and see if that passes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
42.2.1
I'm not sure what that's referring to, isn't 1.0.x the highest version?
run a dummy dbt cli
I feel like this would take more setup for basically the same outcome? E.g. if it just runs dbt --event-buffer-size=10000 run --project-dir "/nonexistent/path"
, the script would still need to grep the output (to check whether it failed due to not recognizing --event-buffer-size
, or because it just couldn't find the project file)
Unless you're suggesting to try actually running the real command with --event-buffer-size
, and if it fails then rerun without it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what that's referring to, isn't 1.0.x the highest version?
Yes today it's the highest, I was just saying when in the future we upgrade dbt again, and the string "1.0.0" being looked for won't be in the output anymore
I feel like this would take more setup for basically the same outcome?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aaahhh gotcha. yeah this looks a lot nicer, will do!
/publish connector=bases/base-normalization
|
looks like the snowflake normalization secret got updated for #11093 - won't be able to publish this branch until that's merged |
Oooooh!! Could that also be why my tests are failing here? |
yeah, the |
/publish connector=bases/base-normalization
|
|
||
# We don't run dbt 1.0.x on all destinations (because their plugins don't support it yet) | ||
# So we need to only pass `--event-buffer-size` if it's supported by DBT. | ||
dbt --help | grep -E -- '--event-buffer-size' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgao: If the bundled DBT version in is not supporting the --event-buffer-size flag, the script will exit here because of the previous set -e
statement.
This is the case with the MySQL normalization container (normalization-mysql:0.1.74) at the moment, likely the reason why nornalization with MySQL destination currently fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, thanks for pointing that out!
We should probably use this set +e
option before calling that line
set +e # allow script to continue running even if next commands fail to run properly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opened #11578
What
In some (hard to systematically explain) circumstances, normalization actually runs out of memory. We've observed pods occupying as much as 1GB of RAM, which is pretty wild. Upgrading DBT to 1.0.0 (#11051) fixed some of these errors, but sometimes we do still OOM.
Current hypothesis is that this occurs when DBT's event buffer grows too large. It defaults to 100K event capacity; decrease that to 10K.
🚨 User Impact 🚨
This is probably an extreme edge case: If a normalization run generates over 10K event messages, DBT will now drop the earliest messages. I'm pretty sure this only even matters in cases where DBT fails and tries to dump logs to stdout.
Pre-merge Checklist
Expand the relevant checklist and delete the others.
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described here