test(table): geospatial test fixtures#1094
Conversation
laskoviymishka
left a comment
There was a problem hiding this comment.
Fixtures look good to me — pinned to a parquet-testing SHA, ASLv2 on both sides, deterministic contents. I’m fine landing them before the parser.
One ask before merge: can we wire regeneration through //go:generate? puffin/gen_dv_fixture.go + puffin/dv_golden_test.go is the closest precedent. No need to rewrite the generator in Go, just //go:generate bash gen_fixtures.sh is fine. This just makes fixture refresh discoverable via go generate ./....
I left a couple small portability nits inline, plus one README typo: Geopatial → Geospatial.
Thanks again for picking this up!
| } >> "$README" No newline at end of file | ||
| } >> "$README" | ||
|
|
||
| echo "Finished fetching parquet fixtures with geo data from parquet-testing repo" No newline at end of file |
There was a problem hiding this comment.
Added a little message here so it is clear what happened when a user runs go generate
|
|
||
| package table | ||
|
|
||
| // Run the following command to generate the parquet fixtures with | ||
| // geo data for testing purposes: go generate ./table/... | ||
| //go:generate bash testdata/geo/gen_fixtures.sh |
There was a problem hiding this comment.
I put this in the table package since there is not a geo package. Realistically once the geo package is merged and stabilized it may make more sense to put it there. Once that is merged feel free to ping me and I am happy to switch. Just didn't want to create an empty package for no reason that could cause merge conflicts with other open PRs
There was a problem hiding this comment.
Yes, totally make sense as follow up later
|
Should be all set for review again now, thanks for feedback! |
Add geospatial test fixtures that can be used for validating geospatial logic performs as expected. All fixtures come from the parquet-testing repository. I was previously unaware this repo existed but I think this is the right data to use since it prevents us from needing to write custom pyarrow code to generate them.
Note these fixtures use all the native parquet geometry type not the geoarrow encoding. I am pretty sure this is what we want, but if I am misinterpreting something let me know. My understanding is that geoarrow is not used in the native parquet type (but from reading other PRs like it seems like geoarrow-go is being used for some of the parse logic, not the storage encoding?) There is more data here https://github.com/geoarrow/geoarrow-data if we need more, some of which is
_native.parquetbut since it does include a lot of geoparquet files with the geoarrow encoding, I did not use any of these.I have not added any parse tests for this since my understanding is that the initial geospatial PRs here have not been merged yet. Given these are standard parquet test files, it can be assumed they will need to be able to be parsed and conform to the spec.
Closes #1090