-
Notifications
You must be signed in to change notification settings - Fork 36
Feature: implement snowflake export function #156 #163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Codecov ReportPatch coverage:
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## develop #163 +/- ##
===========================================
+ Coverage 92.91% 92.94% +0.02%
===========================================
Files 291 291
Lines 4473 4506 +33
Branches 604 606 +2
===========================================
+ Hits 4156 4188 +32
- Misses 203 204 +1
Partials 114 114
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides some suggestions, others LGTM 👍
this.logger.debug(`Acquiring connection from ${profileName}`); | ||
const client = await pool.acquire(); | ||
this.logger.debug(`Acquired connection from ${profileName}`); | ||
const { connection, pool } = await this.getConnection(profileName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to refactor it ! 👍
|
||
// use protected for testing purpose | ||
protected getCopyToStageSQL(sql: string, stageFilePath: string) { | ||
return `COPY INTO ${stageFilePath} FROM (${sql}) FILE_FORMAT = (TYPE = 'parquet') HEADER=true INCLUDE_QUERY_ID=true MAX_FILE_SIZE=5368709120;`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to add the comment to explain why to set the MAX_FILE_SIZE
is 5368709120
not other values, and paste the document link to help other members know :D
const { profileName, sql } = exportOptions; | ||
|
||
const { connection, pool } = await this.getConnection(profileName); | ||
directory = /^\./.exec(directory) ? directory : `./${directory}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, our directory may be a relative path based on the project or an absolute path temp directory from the root if the user does not set folderPath
in the cache
settings of the vulcan.yaml
:
Relative path:
> `tmp/schem1/pg/cache_department`
Absolute path:
> `/var/folders/51/049hk_157kgc8ht18q61c1lm0000gn/T/vulcan/cache/schem1/pg/cache_department`
If the snowflake GET path needs the absolute path, then currently the checking directory has .
before the path may cause an issue, and need to remove it.
We could use path.resolve
to normalize the above two cases directly, before is the screenshot (seems you have written it in the following code):
Btw, suggest adding the comment to explain the reason we need to get the absolute path from the Snowflake GET method needed and paste the document link to help other members know it.
const copyStatement = await this.getStatementFromSQL( | ||
connection, | ||
builtSQL, | ||
[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could refactor getStatementFromSQL
methods to make the binding arguments could be []
array default, then we don't need always to pass []
when copying to stage, getting files, and removing stage.
public override async export(exportOptions: ExportOptions): Promise<void> { | ||
let { directory } = exportOptions; | ||
const { profileName, sql } = exportOptions; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest removing the newline.
// Assert | ||
const files = fs.readdirSync(directory); | ||
expect(files.length).toBe(1); | ||
expect(/parquet$/.exec(path.basename(files[0]))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the expect(/parquet$/.exec(path.basename(files[0])))
actually may not really assert the result.
The value in the expect(value)
is the actual value, we should call the method to do the assert.
Please use the expect method to assert the result, e.g:
expect(files[0]).toMatch('/.parquet$/');
const files = fs.readdirSync(directory); | ||
expect(files.length).toBe(4); | ||
// check each file in the files exists | ||
files.forEach((file) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same issue and suggestion line above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to enhance test cases, according to the export
method logistics, we may still need two test cases, first is checking the directory not exist and throw error, second is the checking the happened cached error case
const stage = '@~/vulcan_cache_stage'; | ||
const stageFilePath = `${stage}/`; | ||
const builtSQL = this.getCopyToStageSQL(sql, stageFilePath); | ||
// copy data to a named stage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to rich the comment to explain why we need to copy to the stage to help members know moving to the stage is necessary.
await this.waitStatement(getStatement); | ||
this.logger.debug(`Exported parquet files to ${directory}`); | ||
|
||
// remove parquet file from stage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest rich the comment to describe removing the stage parquet files because we keep them in the stage temporarily for downloading, so it need to remove 😄
588711c
to
ae0b2a0
Compare
ae0b2a0
to
dc72b97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing and update the commits, LGTM 👍 👍
Description
This pull request implemented the snowflake export function to export the selected data to the local dir in parquet format.
export flow:
Issue ticket number
issue #156
Additional Context
Result
Use parquet-cli to view the downloaded parquet file
