fix(deps): Upgrade SDK to 3.6.3, encode unsupported types as strings#161
fix(deps): Upgrade SDK to 3.6.3, encode unsupported types as strings#161kodiakhq[bot] merged 26 commits intomainfrom
Conversation
|
Looks like arrow/csv only supports encoding of certain types: https://github.com/cloudquery/arrow/blob/52e5d8283320da444d653d3e858b00b148cedba4/go/arrow/csv/common.go#L229 |
|
@disq yes, parquet as well IIRC. I think we'll have to skip the tests on those for now, do a conversion ourselves, or implement it upstream |
I'm attempting a |
| switch dt := t.(type) { | ||
| case *arrow.DayTimeIntervalType, *arrow.DurationType, *arrow.MonthDayNanoIntervalType, *arrow.MonthIntervalType: // unsupported in pqarrow | ||
| return true | ||
| case *arrow.LargeBinaryType, *arrow.LargeListType, *arrow.LargeStringType: // not yet implemented in arrow |
There was a problem hiding this comment.
Ideally we should fall back to the non-large types (and zero copy contents?) in convertschema and other places, but this was easier and less messy for now.
|
|
||
| var pqTestOpts = schema.TestSourceOptions{ | ||
| // persisted as timestamp[ms]: | ||
| SkipTimestamps: true, |
There was a problem hiding this comment.
These types are all persisted as timestamp[ms] so we choose to skip the test/compare logic for them.
There was a problem hiding this comment.
hmm, can't we handle that? it's pretty important to check that we handle all timestamps correctly 🤔 also, are they persisted as timestamp[ms] or timestamp[us]?
There was a problem hiding this comment.
All timestamps are "persisted" as timestamp[ms, tz=UTC] in pqarrow for some reason. Maybe it's parquet's native timestamp format I need to check.
There was a problem hiding this comment.
@hermanschaaf There's a timestamp coercion feature in pqarrow which defaults to false but they still seem to get persisted only as ms.
There's also a special case which always coerces second-precision to ms:
// the user implicitly wants timestamp data to retain it's original time units,
// however the arrow seconds time unit cannot be represented in parquet, so must
// be coerced to milliseconds
if typ.Unit == arrow.Second {
logicalType = arrowTimestampToLogical(typ, arrow.Millisecond)
}f20fffa to
3fbf5c4
Compare
66f2ef9 to
35265cf
Compare
aacdf6b to
94b8eb9
Compare
🤖 I have created a release *beep* *boop* --- ## [3.0.1](v3.0.0...v3.0.1) (2023-05-25) ### Bug Fixes * **deps:** Update module github.com/cloudquery/plugin-pb-go to v1.0.6 ([#157](#157)) ([1dccb3a](1dccb3a)) * **deps:** Update module github.com/cloudquery/plugin-pb-go to v1.0.8 ([#160](#160)) ([ba7c364](ba7c364)) * **deps:** Upgrade SDK to 3.6.3, encode unsupported types as strings ([#161](#161)) ([6b6e305](6b6e305)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
No description provided.