-
Notifications
You must be signed in to change notification settings - Fork 4
feat!: Move to Arrow implementation #120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Mostly updated now. We will just need to bring |
erezrokah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. It seems most of the logic is implemented in the Arrow module.
As for the breaking changes, if those are easy to fix we can do them, otherwise we can make this a breaking change. Though I'm quite sure people will notice the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to add a . gitattributes file with the following content:
*.csv diff
I think that should allow these to be shown in the diff UI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried various ways but I think we can't force it--it's probably rendering it as binary because the files contain null bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to change module github.com/cloudquery/filetypes to module github.com/cloudquery/filetypes/v2
🤖 I have created a release *beep* *boop* --- ## [2.0.0](v1.6.2...v2.0.0) (2023-04-18) ### ⚠ BREAKING CHANGES * Move to Arrow implementation ([#120](#120)) ### Features * Move to Arrow implementation ([#120](#120)) ([b4fb660](b4fb660)) ### Bug Fixes * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.45.0 ([#123](#123)) ([a5deffa](a5deffa)) * **deps:** Update module golang.org/x/sys to v0.7.0 ([#113](#113)) ([bce7524](bce7524)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
In the parquet writer, we are converting extension columns to String. This is how it was also implemented before. One difference, however, is that UUID columns are now delimited with dashes, where before they were not. Timestamps now also use the default Arrow format, not the one we were using previously.
For these reasons, and because this is a fairly major internal change, we have decided to make this a breaking change.
Depends on cloudquery/plugin-sdk#783