Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination Snowflake: Improve error handling in StagingClient #39135

Merged
merged 1 commit into from
Jun 5, 2024

Conversation

gisripa
Copy link
Contributor

@gisripa gisripa commented Jun 5, 2024

What

Use the returned results to determine success/failure for PUT and COPY INTO commands.

Review guide

User Impact

Can this PR be safely reverted and rolled back?

  • YES πŸ’š
  • NO ❌

Copy link

vercel bot commented Jun 5, 2024

The latest updates on your projects. Learn more about Vercel for Git β†—οΈŽ

Name Status Preview Comments Updated (UTC)
airbyte-docs βœ… Ready (Inspect) Visit Preview πŸ’¬ Add feedback Jun 5, 2024 8:39pm

Copy link
Contributor Author

gisripa commented Jun 5, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @gisripa and the rest of your teammates on Graphite Graphite

@gisripa gisripa changed the title snowflake-stagingclient-enh Destination Snowflake: Improve error handling in StagingClient Jun 5, 2024
@gisripa gisripa marked this pull request as ready for review June 5, 2024 17:49
@gisripa gisripa requested a review from a team as a code owner June 5, 2024 17:49
@gisripa gisripa force-pushed the gireesh/snowflake-stagingclient-enh branch from 393d5f1 to f94814f Compare June 5, 2024 19:03
@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Jun 5, 2024
@gisripa gisripa force-pushed the gireesh/snowflake-stagingclient-enh branch from f94814f to c754c40 Compare June 5, 2024 19:55
Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is SnowflakeStagingClientTest still useful, now that there's a proper integration test?

(two minor comments, neither blocking)

@@ -84,7 +111,8 @@ class SnowflakeStagingClient(private val database: JdbcDatabase) {
filePath,
stageName,
stagingPath,
Runtime.getRuntime().availableProcessors()
// max allowed param is 99, we don't need so many threads for a single file upload
minOf(Runtime.getRuntime().availableProcessors(), 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not minOf(..., 99)? wouldn't this restrict us to 4 threads in most cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in our cloud case it is always 1, our pod config is 1 cpu i believe based on my previous dev release run.

Copy link

@talnidam talnidam Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gisripa
The pod config could also run on 2 cpus by values.yaml definition ain't im wrong? (worker resources)
I couldn't understand why not use minOf(..., 99)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@talnidam surely can use minOf(..,99) however there is no added benefit in the way we upload 1 file only. We use a 200M file and only 1 file per PUT and snowflake chunks it if > 64M which I'm guessing won't be more than 3 or 4 chunks. The advantage of this shows up if we do PUT with a directory/* with many files in the directory. Also the default is 4 in snowflake settings if not provided. so just used that as max.

private val datasource =
SnowflakeDatabaseUtils.createDataSource(config, OssCloudEnvVarConsts.AIRBYTE_OSS)
private val database: JdbcDatabase = SnowflakeDatabaseUtils.getDatabase(datasource)
// Intentionally not using actual columns, since the staging client should be agnostic of these
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random thought: do we eventually want to explicitly specify column names in our COPY command? πŸ˜…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so.. it requires COPY INTO .. select filename#colindex or something i think right ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like that 🀷 not sure about the exact syntax, presumably there's some way to specify "please load into exactly these columns in the target table"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which I guess is actually slightly different from what you have (i.e. "read these columns out of the file" vs "write to these columns in the table")

@gisripa
Copy link
Contributor Author

gisripa commented Jun 5, 2024

is SnowflakeStagingClientTest still useful, now that there's a proper integration test?

Yeah not useful for happy path, only useful part was if the correct exception caught and thrown as ConfigError.

@gisripa gisripa merged commit 3a9dabb into master Jun 5, 2024
34 checks passed
@gisripa gisripa deleted the gireesh/snowflake-stagingclient-enh branch June 5, 2024 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/destination/snowflake
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants