Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do we use cloneGeom for every output? #196

Closed
rafaqz opened this issue Jan 1, 2024 · 14 comments · Fixed by #198
Closed

Why do we use cloneGeom for every output? #196

rafaqz opened this issue Jan 1, 2024 · 14 comments · Fixed by #198

Comments

@rafaqz
Copy link
Member

rafaqz commented Jan 1, 2024

We often seem to do this multiple times. Clearly we can remove some of these double clones. But do we need to clone at all? I cant see why...

This one seems like an obvious unnecessary copy:

LinearRing(cloneGeom(result, context), context)

Because then we copy again here:

new(cloneGeom(obj, context), context)

Removing these copies makes everything much faster. This is our GeometryOps.jl area method` (with a few branches checked out) on LG polygon. We don't need LibGEOS, its just for testing. But its a good example:

JuliaGeo/GeometryOps.jl#32

With cloneGeom

julia> @benchmark GO.area($pLG)
BenchmarkTools.Trial: 10000 samples with 151 evaluations.
 Range (min  max):  623.709 ns   5.664 ms  ┊ GC (min  max): 0.00%  5.21%
 Time  (median):       1.048 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):     2.340 μs ± 68.149 μs  ┊ GC (mean ± σ):  2.31% ± 0.08%

     █ ▄▅    ▃                                                  
  ▃▂▄█▇███▅▆▇█▇▄▃▃▃▄▃▄▃▄▄▄▄▄▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▂▂▂▂▂▂▂▂ ▃
  624 ns          Histogram: frequency by time         3.59 μs <

Removing the first cloneGeom:

julia> @benchmark GO.area($pLG)
BenchmarkTools.Trial: 10000 samples with 175 evaluations.
 Range (min  max):  438.794 ns   4.472 ms  ┊ GC (min  max): 0.00%  6.64%
 Time  (median):     757.680 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):     2.201 μs ± 70.767 μs  ┊ GC (mean ± σ):  3.27% ± 0.10%

  █▅ ▇▅   ▁    ▁                                                
  ██▄██▃▂▇█▇▆▂▃█▇▆▄▃▂▃▃▃▂▃▅▅▄▄▄▄▄▄▅▅▅▅▄▄▄▄▃▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁ ▃
  439 ns          Histogram: frequency by time         1.73 μs <

 Memory estimate: 80 bytes, allocs estimate: 3.

And the first and second cloneGeom

julia> @benchmark GO.area($pLG)
BenchmarkTools.Trial: 10000 samples with 189 evaluations.
 Range (min  max):  391.788 ns   3.552 ms  ┊ GC (min  max): 0.00%  5.34%
 Time  (median):     661.643 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.391 μs ± 46.849 μs  ┊ GC (mean ± σ):  2.62% ± 0.08%

  █▄         ▂          ▁                                       
  ██▂▂▁▂▄▃▂▂▂██▅▄▅▃▅▄▇█████▇▆▅▆▅▄▄▄▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  392 ns          Histogram: frequency by time         1.16 μs <

 Memory estimate: 48 bytes, allocs estimate: 2.

And finally if I remove the one one the line above when a LineString rewraps
another LineString:

julia> @benchmark GO.area($pLG)
BenchmarkTools.Trial: 10000 samples with 337 evaluations.
 Range (min  max):  256.193 ns   10.465 μs  ┊ GC (min  max): 0.00%  96.32%
 Time  (median):     260.564 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   277.483 ns ± 167.507 ns  ┊ GC (mean ± σ):  1.02% ±  1.67%

  ▅█▄▅▄▄▃▂▁  ▁        ▂▁ ▂  ▁▃  ▂▁                              ▂
  ████████████▇▇██▇█▇█████▇▆██████▇▇█▇▅▅▇▇▃▅▄▇▃▃▄▃▆▃▁▁▃▇▃▁▁▄▄▅▅ █
  256 ns        Histogram: log(frequency) by time        418 ns <

 Memory estimate: 48 bytes, allocs estimate: 2.

We shouldn't change this one, but it shouldn't be happening in getgeom anyway.

@evetion
Copy link
Member

evetion commented Jan 2, 2024

Yeah, this is something I encountered as well and partially fixed for points, just like you in the other PR.

There are double inits, as getX already is a X. Most constructors do clone if given the chance.

Clonegeom is only needed on init with another existing geometry.

@evetion
Copy link
Member

evetion commented Jan 2, 2024

However, we must be careful. Your exteriorring instance must be cloned per https://libgeos.org/doxygen/geos__c_8h.html#afc5d54a2905996833747c2d566da2af3.

@evetion
Copy link
Member

evetion commented Jan 2, 2024

I will make a PR

@rafaqz
Copy link
Member Author

rafaqz commented Jan 2, 2024

I wondered... instead of cloning could we skip the finaliser on polygon rings?

The important thing is not to delete them?

(I guess you could get a ring then delete its parent polygon and trigger a segfault)

@evetion
Copy link
Member

evetion commented Jan 2, 2024

Are you proposing we leak memory? ;) Everything we create in GEOS needs to be destroyed again.

@rafaqz
Copy link
Member Author

rafaqz commented Jan 2, 2024

We arent creating anything with polygon rings, and theyre attached to the polygons finaliser. Right? Why else should we never delete them?

@evetion
Copy link
Member

evetion commented Jan 2, 2024

Ah, I misunderstood. You meant like a view or similar? The problem is that Julia doesn't know the ring is linked to the Polygon, so it could garbage collect the Polygon, invalidating the ring.

We could make structs like prepared geometry, linking back to the Polygon object?

Anyway, not sure how much performance you gain by it.

@rafaqz
Copy link
Member Author

rafaqz commented Jan 2, 2024

I think its more than half the cost of loading a polygon. Probably we would need an GI.unsafe_getgeom method where users promise the objects dont escape and we GC.preserve the polygon around a closure.

One day when everything else is done 😅

@evetion
Copy link
Member

evetion commented Jan 2, 2024

Let's not burden GI with unsafe, but let's see what I can make with a GC.preserve, never used it extensively.

@evetion
Copy link
Member

evetion commented Jan 2, 2024

Well, with the following code (including a third argument to the constructor to run a finalizer or not), the tests run, but I can trigger segfaults. Any ideas?

GC.@preserve obj begin
    return LinearRing(result, context, false)
end

But this segfaults:

julia> import LibGEOS as LG

julia> ring = LG.exteriorRing(pLG)^C

julia> pLG = LG.Polygon([
           [
               [0.0, 5.0], [2.0, 2.0], [5.0, 2.0], [2.0, -2.0], [5.0, -5.0],
               [0.0, -2.0], [-5.0, -5.0], [-2.0, -2.0], [-5.0, 2.0], [-2.0, 2.0],
               [0.0, 5.0],
           ],
       ])
POLYGON ((0 5, 2 2, 5 2, 2 -2, 5 -5, 0 -2, -5 -5, -2 -2, -5 2, -2 2, 0 5))

julia> ring = LG.exteriorRing(pLG)
LINEARRING (0 5, 2 2, 5 2, 2 -2, 5 -5, 0 -2, -5 -5, -2 -2, -5 2, -2 2, 0 5)

julia> pLG = nothing

julia> GC.gc()

julia> ring
# Segfaults

@rafaqz
Copy link
Member Author

rafaqz commented Jan 2, 2024

Thats why we need a method with a promise the contents dont escape the closure! I dont think it can work otherwise

@evetion
Copy link
Member

evetion commented Jan 2, 2024

Or something like this, with AbstractLinearRing replacing LinearRing in most signatures.

abstract type AbstractLinearRing <: AbstractGeometry end

struct LinearRingView <: AbstractLinearRing
    ptr::GEOSGeom
    context::GEOSContext
    parent::AbstractGeometry
    function LinearRingView(
        obj::GEOSGeom,
        parent::AbstractGeometry,
        context::GEOSContext = get_global_context(),
    )
        new(obj, context, parent)
    end
end

mutable struct LinearRing <: AbstractLinearRing

@rafaqz
Copy link
Member Author

rafaqz commented Jan 2, 2024

Wont it still segfault if you delete the polygon?

@evetion
Copy link
Member

evetion commented Jan 2, 2024

Nope, because the polygon is linked in the parent field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants