Skip to content

Bubbler-4/cg263748

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A fast, accurate integer convolution

This is an entry for Code Golf challenge Compute convolution quickly and accurately.

The algorithm used is Number Theoretic Transform, which is a variation of FFT that does not suffer from floating-point inaccuracies. The modulo used is P = 71 * 2^57 + 1 which is slightly larger than 10^19, and I devised a specialized a * b % P function here, which gave ~4x speedup for the entire convolution compared to (a as u128 * b as u128 % P as u128) as u64.

The main entrypoint for the algorithm is solve_ntt.

Additionally, three variants of the same algorithm is implemented as solve_ntt_par, solve_ntt_par2, and solve_ntt_par3, which are some attempts at applying data-independent parallelism using rayon. (Trying to parallelize individual butterflying turns out to be harmful, apparently because it causes much more cache misses. Will it perform better if cache-line-sized chunks are used instead?)

Use cargo bench to run benchmarks locally.

I don't claim this to be the fastest code that supports up to 10^19 in the result of convolution. Even the core algorithm itself (Cooley-Tukey) can be substituted for different ones such as Stockham for significant improvement.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages