Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

very slow geoid calculation #75

Open
alex-s-gardner opened this issue Jan 7, 2023 · 16 comments
Open

very slow geoid calculation #75

alex-s-gardner opened this issue Jan 7, 2023 · 16 comments

Comments

@alex-s-gardner
Copy link
Contributor

alex-s-gardner commented Jan 7, 2023

I recently issued a pull request to add geoid function to Geodesy.jl because the geoid conversion in Proj.jl is very slow for large numbers of points [~>500] relative to reading the geoid from a local file. I'm not sure if Geodesy.jl is the right home for such a function or if we just need to improve Proj.jl.

Here is a figure and the corresponding code that demonstrates just how slow using Proj is for geoid to ellipsoid conversions:

geoid_benchmark

# retrieve EGM2008 geoid heights ["EPSG:3855"] above WGS84 ellipsoid ["EPSG:4979"] 
# for `n random points

using GeoArrays, Proj

# file can be downloaded here: 
# https://s3-eu-west-1.amazonaws.com/download.agisoft.com/gtg/us_nga_egm2008_1.tif
geoid_file = "/Users/gardnera/data/geoids/us_nga_egm2008_1.tif"
n = [2, 10, 100, 5000, 10000];
time_local = zeros(size(n));
time_proj = zeros(size(n));

height1 = [];
height2 = [];
for i = eachindex(n)
    lat = (rand(n[i]) .- 0.5) .* 180;
    lon = (rand(n[i]) .- 0.5) .* 360;

    ## Approach 1: read directly from 1/60 degree geotiff
    begin
        t = time();
        # read [lazzily] in EGM2008 geoid as a GeoArray with 1 minute resolution
        
        egm2008 = GeoArrays.read(geoid_file);

        # find indices of points into cropped dem 
        xy = hcat(lon,lat);
        xy = [copy(r) for r in eachrow(xy)];
        ind = indices.(Ref(egm2008), xy);
        ind = [CartesianIndex((ij[1], ij[2])) for ij in ind];

        # extract values
        height1 = egm2008[ind];
        time_local[i] = time()-t;
    end;

    ## Approach 2: use Proj.jl
    begin
        t = time();
        # enamble network for geoid conversions
        Proj.enable_network!();

        # build transformation 
        trans = Proj.Transformation("EPSG:4326", "EPSG:3855", always_xy=true);

        # project points
        data = trans.(lon,lat,0);

        height2 = -getindex.(data, 3);
        time_proj[i] = time()-t;
    end;
end


using CairoMakie
# Check that both return same resolution
f = Figure()
ax = Axis(f[1, 1],
    title = "validation",
    xlabel = "local",
    ylabel = "Proj.jl"
)
scatter!(ax, height1, height2)

# plot time between appraoches
ax = Axis(f[1, 2],
    title = "timing",
    xlabel = "number of points",
    ylabel = "time [s]"
)
lines!(ax, n, time_local, label = "local")
lines!(ax, n, time_proj, label = "Proj.jl")
axislegend(ax)
f
@alex-s-gardner
Copy link
Contributor Author

Do we have any maintainers for Proj?

@visr
Copy link
Member

visr commented Jan 11, 2023

Creating the Proj.Transformation is expensive compared to projecting a point. There is no need to reconstruct it every time.

@alex-s-gardner
Copy link
Contributor Author

Creating the Proj.Transformation is expensive compared to projecting a point. There is no need to reconstruct it every time.

I'm not sure I follow the comment, Proj.Transformation is only generated once in the above example

@visr
Copy link
Member

visr commented Jan 11, 2023

But it's in for i = eachindex(n) right?

@visr
Copy link
Member

visr commented Jan 11, 2023

Oh nevermond I misunderstood

@yeesian
Copy link
Member

yeesian commented Jan 12, 2023

I guess it isn't a fair comparison to read from a file of precomputed values versus transforming it from points in a different coordinate system haha. One possibility is to reach out to the developers of the underlying PROJ library to see if they might consider adding to https://proj.org/resource_files.html? That way it'll be useful to PROJ users beyond JuliaGeo/Julialang. Update: @visr has a much better response in #75 (comment) and #75 (comment)

@visr
Copy link
Member

visr commented Jan 12, 2023

This is the pipeline you are using, with the proj definition:

Transformation pipeline
    description: axis order change (2D) + WGS 84 to EGM2008 height (1)
    definition: proj=pipeline step proj=unitconvert xy_in=deg xy_out=rad step inv proj=vgridshift grids=us_nga_egm08_25.tif multiplier=1 step proj=unitconvert xy_in=rad xy_out=deg step proj=axisswap order=2,1
    direction: forward

Note the grids=us_nga_egm08_25.tif part. If I'm not mistaken this grid is accessed remotely over the PROJ CDN. This is probably what is slow. The docs have some functions related to caching that could be worth looking into. But indeed a better comparison would be to let Proj also use the same local grid that you downloaded, and adjust the pipeline to have it use that. There are also configuration settings and grid cache functions that may help here. The defaults in proj.ini are:

cache_enabled = on
cache_size_MB = 300
cache_ttl_sec = 86400

Also broadcasting trans.(x,y) is probably not ideal, since I imagine it will open and close the grid once per point. It would probably help a lot to use the proj_trans_generic or proj_trans_array for efficiency.

None of those functions are part of the high level interface, but all are available in libproj.jl and can be used. I put some references below. I've never really done this so I'm not sure what the best way to do this is. Though figuring it out and documenting would be very helpful.

network
https://proj.org/resource_files.html
https://proj.org/community/rfc/rfc-4.html
https://proj.org/development/reference/functions.html#network-related-functionality

array
https://proj.org/development/reference/functions.html#c.proj_trans_generic
https://proj.org/development/reference/functions.html#c.proj_trans_array

Proj.jl/test/libproj.jl

Lines 215 to 268 in 86b8b09

@testset "dense 4D coord vector transformation" begin
source_crs = Proj.proj_create("EPSG:4326")
target_crs = Proj.proj_create("EPSG:28992")
trans = Proj.Transformation(source_crs, target_crs, always_xy = true)
# This array is mutated in place. Note that this array needs to have 4D elements,
# with 2D elements it will only do every other one
A = [Proj.Coord(5.39, 52.16) for _ = 1:5]
err = Proj.proj_trans_array(trans.pj, Proj.PJ_FWD, length(A), A)
@test err == 0
B = [Proj.Coord(155191.3538124342, 463537.1362732911) for _ = 1:5]
@test all(is_approx(a, b) for (a, b) in zip(A, B))
# since A is not in target_crs, we will get an error on a second call
err = Proj.PROJError("push: Invalid latitude")
@test_throws err Proj.proj_trans_array(trans.pj, Proj.PJ_FWD, length(A), A)
# error number is reset during PROJError construction
@test Proj.proj_errno(trans.pj) == 0
# test triggering finalizer manually
finalize(trans)
@test trans.pj === C_NULL
end
@testset "generic array transformation" begin
source_crs = Proj.proj_create("EPSG:4326")
target_crs = Proj.proj_create("EPSG:28992")
trans = Proj.Transformation(source_crs, target_crs, always_xy = true)
# inplace transformation of vector of 2D coordinates
# using https://proj.org/development/reference/functions.html#c.proj_trans_generic
A = [SA[5.39, 52.16] for _ = 1:5]
st = sizeof(first(A))
ptr = pointer(A)
n = length(A)
# 8 is sizeof(Float64), so ptr + 8 points to first latitude element
n_done = Proj.proj_trans_generic(
trans.pj,
Proj.PJ_FWD,
ptr,
st,
n,
ptr + 8,
st,
n,
C_NULL,
C_NULL,
C_NULL,
C_NULL,
C_NULL,
C_NULL,
)
@test n_done == n
B = [SA[155191.3538124342, 463537.1362732911] for _ = 1:5]
@test A B
end

@visr
Copy link
Member

visr commented Jan 12, 2023

Switching from using remote to local grid makes it 30x faster for me. Exactly how much will also depend on your network.

using Proj, PROJ_jll

# use projsync to get a local copy of the grid we need
run(`$(projsync()) --file us_nga_egm08_25.tif --user-writable-directory`);
# add the directory to the PROJ search path
user_writable_directory = Proj.proj_context_get_user_writable_directory(false)
Proj.proj_context_set_search_paths(2, [Proj.PROJ_DATA[], user_writable_directory])

Proj.enable_network!()
trans = Proj.Transformation("EPSG:4326", "EPSG:3855", always_xy=true)

n = 2000
n = 10000
lat = (rand(n) .- 0.5) .* 180;
lon = (rand(n) .- 0.5) .* 360;

# ensure the z changes, i.e. it found the grid
@assert trans(lon[1], lat[1], 0)[3] != 0

@time trans.(lon, lat, 0);
# 3.535632 seconds; n = 2k
# 17.051302 seconds; n = 10k

I haven't looked at the PROJ code, but I'm wondering where the remaining difference with you GeoArrays code is coming from. I thought that maybe it would help to make fewer ccalls by using proj_trans_array, but this did not result in a speedup.

A = [Proj.Coord(x, y) for (x, y) in zip(lon, lat)]
# proj_trans_array mutates A
Proj.proj_trans_array(trans.pj, Proj.PJ_FWD, length(A), A);

@visr
Copy link
Member

visr commented Jan 12, 2023

# read [lazzily] in EGM2008 geoid as a GeoArray with 1 minute resolution
egm2008 = GeoArrays.read(geoid_file)

It's worth noting that this is not a lazy read, it read the entire file into memory. Probably PROJ doesn't do that, which is a memory use vs speed tradeoff.

@alex-s-gardner
Copy link
Contributor Author

alex-s-gardner commented Jan 17, 2023

Closing as it looks like

Switching from using remote to local grid makes it 30x faster

I wonder if @visr example above should be added to the ReadMe?

@visr
Copy link
Member

visr commented Jan 17, 2023

Yes I think it would be good to add it to the readme, it also took me some time to find the right steps. Would you want to add it?

In the future maybe we can add some convenience functions around it, and store the resource files in a scratch space, for which I opened #76.

@alex-s-gardner
Copy link
Contributor Author

No problem, I'll add this to the ReadMe

@alex-s-gardner
Copy link
Contributor Author

alex-s-gardner commented Jan 17, 2023

@visr it seems that on both my Mac and my linux server I have issues:

Downloading from https://cdn.proj.org into /home/gardnera/.local/share/proj
Cannot open https://cdn.proj.org/files.geojson: Cert verify failed: BADCERT_NOT_TRUSTED
Cannot download files.geojson

I tested access from my browser and I can access the file so this does not appear to be a firewall issue. I see that you dealt with a similar issue in GDAL. Any idea what needs to be done here?

@visr
Copy link
Member

visr commented Jan 17, 2023

Hmm that's odd. Something similar to GDAL is done on loading Proj.jl:
https://github.com/JuliaGeo/Proj.jl/blob/v1.4.0/src/Proj.jl#L129-L133

Did you load Proj first? Because if you only loaded PROJ_jll, this initialization code is not run.

@alex-s-gardner
Copy link
Contributor Author

alex-s-gardner commented Jan 17, 2023

Not too sure what the issue is. Both packages were loaded before run(). I'm using:

Proj v1.4.0
PROJ_jll v900.100.0+0

the run() call looks like this:

julia> `$(projsync()) --file us_nga_egm08_25.tif --user-writable-directory`
setenv(`/home/gardnera/.julia/artifacts/8a643038d2adde781829b4467a32afa307e23b51/bin/projsync --file us_nga_egm08_25.tif --user-writable-directory`,["GDAL_DATA=/home/gardnera/anaconda2/envs/GDAL/share/gdal", "PATH=/home/gardnera/.julia/artifacts/1da0aa3d7598c95b9d7b3a96367b6a8501b6eae4/bin:/home/gardnera/.julia/artifacts/8793267ae1f4b96f626caa27147aa0218389c30d/bin:/home/gardnera/.julia/artifacts/d22cde7583df1d5f71160a8e4676955a66a91f33/bin:/home/gardnera/.julia/artifacts/8a643038d2adde781829b4467a32afa307e23b51/bin:/home/gardnera/anaconda2/envs/GDAL/bin:/home/gardnera/anaconda2/envs/GDAL/bin/aws:/home/gardnera/anaconda2/bin:/u/bylot0/gardnera/bin:/net/devon/mnt/devon2-r1/devon0/gardnera/.vscode-server/bin/97dec172d3256f8ca4bfb2143f3f76b503ca0534/bin/remote-cli:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin", "CONDA_PROMPT_MODIFIER=(GDAL) ", "PWD=/home/gardnera/Documents/GitHub/ItsLivePlayground", "XDG_SESSION_CLASS=user", "DISPLAY=localhost:10.0", "TERM_PROGRAM=vscode", "VSCODE_GIT_ASKPASS_NODE=/net/devon/mnt/devon2-r1/devon0/gardnera/.vscode-server/bin/97dec172d3256f8ca4bfb2143f3f76b503ca0534/node", "XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop", "CPL_ZIP_ENCODING=UTF-8"  …  "_=/usr/local/bin/julia", "CONDA_DEFAULT_ENV=GDAL", "USER=gardnera", "CONDA_SHLVL=1", "PROJ_LIB=/home/gardnera/anaconda2/envs/GDAL/share/proj", "CONDA_EXE=/home/gardnera/anaconda2/bin/conda", "HOME=/home/gardnera", "TERM=xterm-256color", "TERM_PROGRAM_VERSION=1.74.3", "COLORTERM=truecolor"])

@visr visr reopened this Jan 17, 2023
@visr
Copy link
Member

visr commented Jan 18, 2023

Oh wait setting the path to the certificates in libproj only affects libproj, not applications like projsync. For those you have to set the PROJ_CURL_CA_BUNDLE environment variable, which you can do in julia like this:

withenv("PROJ_NETWORK" => "ON", "PROJ_CURL_CA_BUNDLE" => ca_roots()) do

The PROJ_NETWORK doesn't need to be set, since projsync always uses the network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants