Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDAL read is empty? #350

Closed
JimShady opened this issue Apr 19, 2023 · 20 comments · Fixed by #353
Closed

GDAL read is empty? #350

JimShady opened this issue Apr 19, 2023 · 20 comments · Fixed by #353

Comments

@JimShady
Copy link

image

@JimShady
Copy link
Author

But it' s fine just using gdal.

image

@JimShady
Copy link
Author

Similar using the raster_to_grid reader:

image

@milos-colic
Copy link
Contributor

@JimShady is the tif you are using a publicly accessible tif.
I would like to reproduce the issue on my side and generate a fix.

@JimShady
Copy link
Author

It is not publicly available unfortunately. Do you have a suggestion of a sample tif I could use instead to test if the issue persists?

@JimShady
Copy link
Author

How about this file:

GHS_BUILT_V_E2020_GLOBE_R2022A_54009_1000_V1_0.tif

It gives me the same issue.

Available here:

https://ghsl.jrc.ec.europa.eu/download.php?ds=builtV

image

@JimShady
Copy link
Author

Showing same error:

image

@JimShady
Copy link
Author

Actually this is probably a bad example file because I've just realised its not using CRS 4326. Let me find another.

@JimShady
Copy link
Author

sample_flood_file.zip

Here you go.

@JimShady
Copy link
Author

image

@JimShady
Copy link
Author

I presume at this point that I have something incorrectly configured, but I'm unsure what.

@milos-colic
Copy link
Contributor

@JimShady thanks for sharing these, I will work on a fix today.
One thing you just raised is important and I will factor it in. There is an assumption on CRS being 4326 (which is not documented and too big of an assumption). I will add auto reprojection in case of other CRSs and if we have the ID inside the metadata we will use it to translate to 4326 for indexing purposes.

@milos-colic
Copy link
Contributor

@JimShady I am not able to reproduce.
I have used the file you shared and I am able
Screenshot 2023-04-20 at 10 08 42
to read it both with vsizip and if I unzip it.

Could you confirm which Databricks Runtime you are using?

@JimShady
Copy link
Author

image

@JimShady
Copy link
Author

image

image

@JimShady
Copy link
Author

Let me know if you need anything else @milos-colic

@JimShady
Copy link
Author

Changed my DBR to 11.3, but still getting the same result.

@milos-colic
Copy link
Contributor

I have now tested with file being in dbfs:/tmp as well.
And I tried both 11.3 and 12.2 LTS, both were successful.
The only parameter I could not set on my cluster is the spark.databricks.passthrough.enabled
I wonder if that is causing some files not to be listed when the reader is listing files.
SInce the output matches a read on empty directory.

Could you try to run spark.read.format("binaryFile").option("pathGlobFilter", "*sample_flood_file.zip").load("dbfs:/tmp")?
This will run a bit longer than usual read since it needs to list all the files in tmp.
I just want to confirm it isnt your passthrough that masks files from you.

@JimShady
Copy link
Author

image

@JimShady
Copy link
Author

Seems that it can see the file.

@milos-colic
Copy link
Contributor

The issue is occurring due to documentation errors and it instructions aren't correct.

driverName option has to respect GDAL short driver names. For .tif files the correct driver name is GTiff.

Another issue that is occurring is the issue with mos.read().format("raster_to_grid") is the fileExtension option, this one requires a wildcard, tif wont work, but *.tif will work for tif files.

I will update the documentation with the correct call outs for the gotchas.

@milos-colic milos-colic linked a pull request Apr 21, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants