Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: Lazy/faster randperm from FPE #33067

Open
Keno opened this issue Aug 25, 2019 · 13 comments
Open

Idea: Lazy/faster randperm from FPE #33067

Keno opened this issue Aug 25, 2019 · 13 comments
Labels
domain:randomness Random number generation and the Random stdlib performance Must go faster status:help wanted Indicates that a maintainer wants help on an issue or pull request

Comments

@Keno
Copy link
Member

Keno commented Aug 25, 2019

I was using randperm, but I didn't want to store the whole permutation all the time since I only accessed it rarely and the permutation was fairly large. I did some research into whether it would be possible to obtain a random permutation, but generate the indices lazily (or alternatively have a compact encoding for the random permutation so that it can be randomly queried). Turns out there's good solutions to this problem from the crypto community under the headline of "Format-preserving encryption" (FPE). Basically what an FPE scheme provides is a pseudorandom-permutation on message S^n where S is the alphabet and n is the message length. Now, this doesn't map exactly to what we need, but it's fairly easy to do so using cycle walking. Take S={0,1}, n=ceil(log2(N)) (where N is the input to randperm). Then repeatedly apply the fpe until you get something <N. By construction the probability of that happening is >50%, so one should get there fairly quickly (fancier implementations might chose a better radix to improve the probability). NIST is in the process of standardizing FPE schemes [1] and both current schemes are AES based, so they should be quite speedy on modern hardware with AES intrinsics in the hardware. Since the existing randperm needs to do a lot of memory chases to generate its permutations, I wouldn't be surprised if an FPE-based randperm was faster than the current implementation for large n (as well as allowing lazy and sparse accesses).

I'm not sure whether this belongs in Random or in a separate package and I don't have much time to play with it at the moment, but I don't think it'd be super hard to do and thought it would be a neat performance experiment at the very least.

[1] https://csrc.nist.gov/publications/detail/sp/800-38g/rev-1/draft

@Keno Keno added performance Must go faster status:help wanted Indicates that a maintainer wants help on an issue or pull request domain:randomness Random number generation and the Random stdlib labels Aug 25, 2019
@ViralBShah
Copy link
Member

I suggest putting it in a package for now.

@Keno
Copy link
Member Author

Keno commented Aug 25, 2019

Yes, as a start, so we can benchmark it, definitely.

@narendrakpatel
Copy link
Contributor

Hi. I am interested in contributing to this. How can I help?

@Keno
Copy link
Member Author

Keno commented Aug 30, 2019

I think the first step is probably to get a good pure-julia implementation of AES. You may want to collaborate with @chethega on that one (cf #32954 - which could use a fast AES-based RNG).

@narendrakpatel
Copy link
Contributor

@kanav99 has already done a fast pure-julia implementation of AES. Maybe we can use that.

@kanav99
Copy link

kanav99 commented Aug 30, 2019

Hi all! I have a locally stored package and optimized version of AES with proper multiple levels of caching. If you work on making a new package, I can make patches along with @narendrakpatel

@ViralBShah
Copy link
Member

Publish it on github?

@kanav99
Copy link

kanav99 commented Aug 31, 2019

Yes sure, I will make the package in publishable condition and post the link here ASAP

@ParadaCarleton
Copy link
Contributor

Hi, was this ever implemented? We have a very good use case for this in QuasiMonteCarlo.jl!

@ParadaCarleton
Copy link
Contributor

It looks like cryptography isn't necessary for this--see here and here.

@Keno
Copy link
Member Author

Keno commented Jan 7, 2023

It looks like cryptography isn't necessary for this--see here and here.

It's the same technique but with a weak pseudorandom permuation. Kesler even cites FPE in the original paper. It'll be good to have options. The kesler one is probably faster but the AES one is probably less biased.

@ParadaCarleton
Copy link
Contributor

The kesler one is probably faster but the AES one is probably less biased.

I wouldn't be so sure--AES is very fast thanks to the AES-NI instructions. Actually, it happens to be about as fast as Xoshiro on my laptop (5 vs 6 ns). The only problem is FPE might be slow, since it would require using AES as a round function in a numeric (non-binary) Feistel cipher, and manipulating a vector of digits might be slow. (Assuming that's the most efficient way to represent a number in a base other than binary?)

@sunoru
Copy link
Contributor

sunoru commented Mar 5, 2023

I'm interested to work on this. Currently, I have created a package AESNI.jl to provide AES block ciphers by using AES-NI (which was internally used in Random123.jl)

And then another package FormatPreservingEncryption.jl is built on top of it.

I didn't use AES.jl because it doesn't use AES-NI intrinsics and seems a little difficult to extend to include this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:randomness Random number generation and the Random stdlib performance Must go faster status:help wanted Indicates that a maintainer wants help on an issue or pull request
Projects
None yet
Development

No branches or pull requests

6 participants