Skip to content

Conversation

@hermanschaaf
Copy link
Member

@hermanschaaf hermanschaaf commented Apr 4, 2023

  • JSON
  • CSV
  • Parquet

In the parquet writer, we are converting extension columns to String. This is how it was also implemented before. One difference, however, is that UUID columns are now delimited with dashes, where before they were not. Timestamps now also use the default Arrow format, not the one we were using previously.

For these reasons, and because this is a fairly major internal change, we have decided to make this a breaking change.

Depends on cloudquery/plugin-sdk#783

@github-actions github-actions bot added the feat label Apr 4, 2023
@github-actions github-actions bot added feat and removed feat labels Apr 11, 2023
@github-actions github-actions bot added feat and removed feat labels Apr 13, 2023
@hermanschaaf hermanschaaf marked this pull request as ready for review April 13, 2023 16:07
@github-actions github-actions bot added feat and removed feat labels Apr 13, 2023
@hermanschaaf
Copy link
Member Author

hermanschaaf commented Apr 13, 2023

I still need to do a little bit more clean-up to fix conflicts in the go.mod and also create PRs for other repos (notably plugin-sdk) with changes that were necessary to make this work.

Mostly updated now. We will just need to bring cqmain on our Arrow branch up-to-date with the main on upstream Arrow so that the parquet file writer changes are included.

@github-actions github-actions bot added feat and removed feat labels Apr 13, 2023
Copy link
Member

@erezrokah erezrokah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. It seems most of the logic is implemented in the Arrow module.

As for the breaking changes, if those are easy to fix we can do them, otherwise we can make this a breaking change. Though I'm quite sure people will notice the change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to add a . gitattributes file with the following content:

*.csv diff

I think that should allow these to be shown in the diff UI

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried various ways but I think we can't force it--it's probably rendering it as binary because the files contain null bytes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to change module github.com/cloudquery/filetypes to module github.com/cloudquery/filetypes/v2

@hermanschaaf hermanschaaf merged commit b4fb660 into main Apr 18, 2023
@hermanschaaf hermanschaaf deleted the arrow branch April 18, 2023 11:09
kodiakhq bot pushed a commit that referenced this pull request Apr 18, 2023
🤖 I have created a release *beep* *boop*
---


## [2.0.0](v1.6.2...v2.0.0) (2023-04-18)


### ⚠ BREAKING CHANGES

* Move to Arrow implementation ([#120](#120))

### Features

* Move to Arrow implementation ([#120](#120)) ([b4fb660](b4fb660))


### Bug Fixes

* **deps:** Update module github.com/cloudquery/plugin-sdk to v1.45.0 ([#123](#123)) ([a5deffa](a5deffa))
* **deps:** Update module golang.org/x/sys to v0.7.0 ([#113](#113)) ([bce7524](bce7524))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants