Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Toeplitz-dot-Hankel for very large dimensional transforms #209

Open
dlfivefifty opened this issue Mar 27, 2023 · 4 comments
Open

Use Toeplitz-dot-Hankel for very large dimensional transforms #209

dlfivefifty opened this issue Mar 27, 2023 · 4 comments

Comments

@dlfivefifty
Copy link
Member

Toeplitz-dot-Hankel has better complexity plans so should be the default when arrays are large: Eg for cheb2leg even n = 1000 sees a signifcant speedup:

julia> n = 100; x = randn(n); @time cheb2leg(x); @time th_cheb2leg(x);
  0.000267 seconds (2 allocations: 944 bytes)
  0.000654 seconds (98 allocations: 288.531 KiB)

julia> n = 1000; x = randn(n); @time cheb2leg(x); @time th_cheb2leg(x);
  0.028686 seconds (2 allocations: 7.984 KiB)
  0.006464 seconds (99 allocations: 10.559 MiB)

julia> n = 1000; x = randn(n); @time cheb2leg(x); @time th_cheb2leg(x);
  0.028856 seconds (2 allocations: 7.984 KiB)
  0.011597 seconds (99 allocations: 10.559 MiB)

julia> n = 10_000; x = randn(n); @time cheb2leg(x); @time th_cheb2leg(x);
  0.778423 seconds (3 allocations: 78.219 KiB)
  0.103821 seconds (108 allocations: 799.524 MiB)

This was prematurely changed but reverted as there were some regressions. But also the number of allocations in th_* is exorbitant, probably because it dates back to a port of Matlab code.

For matrices I'm not seeing much improvement in the th_*, even for a 40k x 10 transform, which is suspicious....but doing a profile all the time is in the FFTs

@MikaelSlevinsky
Copy link
Member

It's true there has been a significant regression in the lib plans (a couple reasons for this, not all justified).

@MikaelSlevinsky
Copy link
Member

There's a library method size_t X(summary_size_tb_eigen_FMM)(X(tb_eigen_FMM) * F); to get the number of bytes of the plans. The ccall isn't even setup yet, but may be interesting for comparison's sake re: allocations.

@jishnub
Copy link
Member

jishnub commented Mar 28, 2023

Most of the allocation seems to come from

C = Matrix{T}(undef, n, n)

which allocates an n x n matrix, but discards most of the columns in

This may be re-thought.
E.g. in the n = 10_000 example, only 39 columns are actually eventually retained.

@dlfivefifty
Copy link
Member Author

ah thanks, that should be an easy fix, we can just change it to 100 columns for now (we know the number of columns grows logarithmically so this probably will never be reached)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants