Snowflake copy destinations should execute COPY commands in parallel #9087
Labels
area/connectors
Connector related issues
connectors/destination/snowflake
connectors/destinations-warehouse
type/enhancement
New feature or request
blocked on #8820
Tell us about the problem you're trying to solve
Currently, destination-snowflake's copy modes will generate multiple files on S3 or GCS, and then for each of those files, execute a COPY command in serial. We should run those commands in parallel to be more time-efficient.
Additionally, the COPY command can actually accept up to 1,000 files.
destination-snowflake
should take advantage of that capability.Describe the solution you’d like
S3StreamCopier
andGcsStreamCopier
to runcopyS3CsvFileIntoTable
/copyGcsCsvFileIntoTable
in parallelSnowflakeS3StreamCopier
andGcsStreamCopier
overridecopyStagingFileToTemporaryTable
to actually process 1,000 files at a time, and havecopyS3CsvFileIntoTable
/copyGcsCsvFileIntoTable
throw RuntimeException. (seeRedshiftStreamCopier
, which is similar).RedshiftStreamCopierTest
's usage ofverify(db).execute(...)
Describe the alternative you’ve considered or used
Additional context
Are you willing to submit a PR?
👍 😺
The text was updated successfully, but these errors were encountered: