Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add possibility to save as .tif with compression #220

Closed
glaroc opened this issue Feb 18, 2020 · 19 comments · Fixed by #229
Closed

Add possibility to save as .tif with compression #220

glaroc opened this issue Feb 18, 2020 · 19 comments · Fixed by #229

Comments

@glaroc
Copy link
Contributor

glaroc commented Feb 18, 2020

We are producing a very large number of Circuitscape outputs and the .asc format is really not a good format for storing large rasters. It seems like it would be quite straightforward to add the possibility to save the outputs as .tif with LZW lossless compression using GDAL.jl. This would create files that are much smaller and more optimized for GIS viewing.

@ViralBShah
Copy link
Member

Is GDAL now available through BB?

@ViralBShah
Copy link
Member

ViralBShah commented Feb 18, 2020

A PR would be much appreciated, since we otherwise have a long list of things to address in Circuitscape and may not get around to this before JuliaCon.

@ranjanan
Copy link
Member

GDAL is available as an artifact: https://github.com/JuliaBinaryWrappers/GDAL_jll.jl

@glaroc
Copy link
Contributor Author

glaroc commented Feb 18, 2020

I'll try to do a PR, but I can't guarantee I'll have time soon. I'll try!

@ranjanan
Copy link
Member

@glaroc I'm happy to help along the way. Feel free to ping me here or on Slack if you have any questions.

@vlandau
Copy link
Member

vlandau commented Feb 24, 2020

If this is figured out, might make it low-hanging fruit to also allow tif files as input, which would help make workflows more efficient. +1 from me!

@glaroc
Copy link
Contributor Author

glaroc commented Feb 25, 2020

I think the easiest method would be to use the GeoArrays package to load the tif files, then copy the CRS from the input to create GeoArrays for the outputs and save them as tifs again with GeoArrays.write(). This workflow should be quite straightforward.

@glaroc
Copy link
Contributor Author

glaroc commented Feb 25, 2020

Note that this should also speed up i/o quite a bit, especially for large files. I just did a test by importing the same file as asc and geotiff:

@time ga2=readdlm("test.asc",Float32,skipstart=6)
55.028136 seconds (491.58 M allocations: 13.272 GiB, 11.95% gc time)
9524×13197 Array{Float32,2}:

@time ga = GeoArrays.read("test.tif")
3.126899 seconds (96 allocations: 988.900 MiB)
13197×9524×1 GeoArray{Float32}:

@vlandau
Copy link
Member

vlandau commented Feb 25, 2020

Is there an easy way of converting ga to a 2D Float64 array? If so it would be as easy as doing the file read as you have it, then doing a conversion, and leaving everything else in Circuitscape as-is. All of the functions in Circuitscape (and likely those from many of its dependencies) don't have methods for GeoArrays.

@glaroc
Copy link
Contributor Author

glaroc commented Feb 25, 2020

I'm not a Julia expert, but it should be as simple as:

@time ga2=Array{Float32}(ga)[:,:,1]'
25.763421 seconds (628.45 M allocations: 12.174 GiB, 15.61% gc time)
9524×13197 LinearAlgebra.Adjoint{Float32,Array{Float32,2}}:

There might be a better, more efficient way of doing it...?

@ViralBShah
Copy link
Member

ViralBShah commented Feb 25, 2020

That seems to kill a lot of the speed. Any idea why the conversion is so slow? Also may need a copy or something, since this is still an adjoint. I wonder if permutedims is faster.

@glaroc
Copy link
Contributor Author

glaroc commented Feb 25, 2020

I'm not sure why this is so slow. I guess it's just going from a 2x2xN array to a 2x2 array?

A simple solution is just to extract the first band. permutedims seems to work.

@time ga2=permutedims(ga[:,:,1])
22.075344 seconds (491.36 M allocations: 8.266 GiB, 13.30% gc time)
9524×13197 Array{Float32,2}

@vlandau
Copy link
Member

vlandau commented Feb 25, 2020

I'm 100% for adding support for tifs, so this is just playing devil's advocate...

One potential downside to accommodating tifs is that it requires users to have GDAL, which is a rather large dependency. Does Package.add("GeoArrays") also add GDAL to the host, or does it need to be installed manually? A manual install might prove a challenge for some users (but it could be easy enough to just link to instructions from the Circuitscape docs).

Many Circuitscape users are already "spatial" people, so many would probably already have GDAL installed for other applications (such as GIS packages in R). The added dependency might not really be a big deal at all -- and users probably won't mind the slight hassle of installing GDAL and accommodating the space it takes up if it means they no longer need to worry about file format conversion.

@vlandau
Copy link
Member

vlandau commented Feb 25, 2020

A GDAL dep could also open up a whole world of additional possibilities for new Circuitscape features.

@glaroc
Copy link
Contributor Author

glaroc commented Feb 25, 2020

@vlandau Since GDAL is available as a Binary Build (see BB discussion above), GDAL doesn't need to be compiled and installed separately. Just need to run

Pkg.add("GDAL_jll")

@vlandau
Copy link
Member

vlandau commented Feb 25, 2020

Awesome. I wasn't sure what a BB was -- I'm still a bit of a programming newbie! :)

@ViralBShah
Copy link
Member

ViralBShah commented Feb 25, 2020

Yes, it will all come automatically. We definitely don't want to use a GDAL on people's computers. With BB (BinaryBuilder), we provide precompiled binaries for all architectures. See: https://julialang.org/blog/2019/11/artifacts/

In fact GDAL_jll will just be a Circuitscape depenency. So people will get it automatically.

@ViralBShah
Copy link
Member

Awesome. I wasn't sure what a BB was -- I'm still a bit of a programming newbie! :)

BB is a very Julia specific acronym and a very recent one that only package authors mostly know about! We're going to get a website and more information on it out eventually...

@glaroc
Copy link
Contributor Author

glaroc commented Mar 10, 2020

Just published a PR that implements this using GeoArrays. Seems to work well with my files, but I didn't do in-depth testing.

Edit: I just cleaned up the PR and resubmitted it.

#227

@vlandau vlandau linked a pull request May 5, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants