Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling Tables.istable is too slow if it's not actually a table #322

Open
rafaqz opened this issue Jan 22, 2023 · 1 comment
Open

Calling Tables.istable is too slow if it's not actually a table #322

rafaqz opened this issue Jan 22, 2023 · 1 comment

Comments

@rafaqz
Copy link
Contributor

rafaqz commented Jan 22, 2023

Calling Tables.istable ends up at hasmethod as a fallback. (via TableTraits.jl and IteratorInterfaceExtensions.jl)

This is super slow. It can be seen as the big chunk in blue/green in this flame graph. The rest of the graph is rasterizing a whole vector of polygons, which in this case is not a table.

2023-01-22-224421_1920x1080

Is there a way to check if something is a table without this fallback?

The use case here is the object can be a table with the target column (geometries) and some extra columns we may use, or just some iterator of geometries. I dont know a way to separate these other than istable - but the iterators get this overhead because they are not tables.

@rafaqz rafaqz changed the title Calling Tables.istable is too slow Calling Tables.istable is too slow if it's not actually a table Jan 22, 2023
@rafaqz rafaqz closed this as completed Jan 22, 2023
@rafaqz rafaqz reopened this Jan 22, 2023
@rafaqz
Copy link
Contributor Author

rafaqz commented Jan 22, 2023

The problem here seems to be that I'm iterating over an object and each of its contents is passed to a method that again checks istable.

It benchmarks at 160ns per call, which is not much at all if you just do it once. Its just too much to be used as a guard when iterating over lots of things. I had assumed it was a type level check so it would essentially be free.

Actually istable was taking 40 μs per call inside the function! I'm not sure why it benchmarks faster in the REPL or what the interaction is.

My solution is to use GeoInterface.jl traits to filter objects first, because they are compile time traits. Some iterators will still be slow. It seems that if multiple packages had traits that check if methods exist this would start to be a problem.

Some benchmarks (of rasterize in Rasters.jl). This run has 4 calls to istable, always false:

julia> @benchmark rasterize(sum, $polygons; res=5, fill=1, boundary=:center)
BenchmarkTools.Trial: 8682 samples with 1 evaluation.
 Range (min  max):  489.019 μs  109.038 ms  ┊ GC (min  max): 0.00%  69.94%
 Time  (median):     513.876 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   578.621 μs ±   2.509 ms  ┊ GC (mean ± σ):  7.34% ±  1.68%

                        ▁▆█▅▂                     ▂▄             
  ▁▁▂▂▂▂▂▂▂▂▂▂▁▂▂▂▁▁▁▁▁▃█████▅▄▃▄▅▅▄▃▃▂▃▃▃▃▃▃▂▂▂▂▅██▇▃▂▂▂▂▃▂▂▁▁ ▃
  489 μs           Histogram: frequency by time          541 μs <

 Memory estimate: 39.75 KiB, allocs estimate: 955.

Putting istable checks last so they aren't called in this case:

julia> @benchmark rasterize(sum, $polygons; res=5, fill=1, boundary=:center)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  149.697 μs  120.447 ms  ┊ GC (min  max):  0.00%  70.26%
 Time  (median):     156.504 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   193.400 μs ±   2.044 ms  ┊ GC (mean ± σ):  12.80% ±  1.21%

                  ▄▇██▆▃▁                                        
  ▁▁▁▁▂▂▁▁▁▁▁▁▂▃▄████████▇▅▄▃▃▂▂▂▂▃▃▃▃▄▄▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  150 μs           Histogram: frequency by time          170 μs <

 Memory estimate: 33.25 KiB, allocs estimate: 827.

Seems more like 80 μs per call.

This is on Julia 1.9.0-beta-2, with Tables.jl v1.10.0

Edit: there was one more istable left above, now actually with none:

julia> @benchmark rasterize(sum, $polygons; res=5, fill=1, boundary=:center)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):   86.809 μs  76.460 ms  ┊ GC (min  max):  0.00%  65.67%
 Time  (median):      91.758 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   119.805 μs ±  1.306 ms  ┊ GC (mean ± σ):  12.39% ±  1.14%

  ▂▇█▇▅▂▁▂▃▃▃▃▃▃▄▃▃▂▂▁▁▁▁▁▁▁▂▁   ▂▁  ▁▂▂ ▂▄▄▄▄▃▂▂▂▂▁▁          ▂
  █████████████████████████████▇████████████████████████▇▇▆▅▅▅ █
  86.8 μs       Histogram: log(frequency) by time       124 μs <

 Memory estimate: 33.25 KiB, allocs estimate: 827.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant