-
-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libafl_bolts: some improvements to the rands
module
#2086
Conversation
Seeding with splitmix64 is a good way to avoid starting with low-entropy PRNG states, and is explicitly recommended by the authors of both xoshiro256++ and Romu. While at it, give the xoshiro256++ PRNG its proper name.
SFC64 is a well-established and well-understood PRNG designed by Chris Doty-Humphrey, the author of PractRand. It has been tested quite a lot over the years, and to date has no known weaknesses. Compared to xoshiro256++, it is slightly faster and is likely to be a more future-proof design (xoshiro/xoroshiro family of generators come with quite long history of [flaws][1] found over the years). Compared to Romu, it is slightly slower, but guarantees absense of bias, minimum period of at least 2^64 for any seed, and non-overlapping streams for different seeds. [1]: https://tom-kaitchuck.medium.com/designing-a-new-prng-1c4ffd27124d
#[must_use] | ||
pub fn fast_bound(rand: u64, n: u64) -> u64 { | ||
debug_assert_ne!(n, 0); | ||
let mul = u128::from(rand).wrapping_mul(u128::from(n)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know where I left the original comment, but: would u128 be an issue on 32 bit systems?
Also, this method should maybe be #[inline]
(?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler will inline it if it's good to do so, no worries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check if #[inline]
can help with cross-crate inlining without requiring LTO.
Regarding speed, how serious are we about 32-bit performance? Most of the PRNGs in the module already assume 64-bit system (and Lehmer64 even uses an 128x64 multiply).
What we can do is change the signature to fast_bound(rand: u64, n: usize)
, and on 32-bit system fall back to full-width 32x32 multiply (as suggested by Lemire), instead of 64x64 like we do on 64-bit. This will give us the maximum possible speed on 32-bit systems, at the expense of more bias. I'd argue that this amount of bias is not a problem; usually n
is much smaller than 2^32
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like a 32 bit fallback since people sometimes run things on arm or even x86 32 bit code
Looks good overall! Just a stupid question: aren't the canges to floats for probabilities potentially slower than integers everywhere? |
#[must_use] | ||
pub fn with_seed(seed: u64) -> Self { | ||
Self { val: seed } | ||
let mut rand = XkcdRand { val: 0 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually use Self
in this case. Also, I don't get the reason for this specific change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for this change is to ensure that regardless of the seeding, the output of next_float()
or below()
is reasonable (and changes with changes to the seed). With direct seeding without splitmix
e.g. below(2^32)
will return 0 for all 32-bit bit seeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't the output of next_float
and below
always be the same for XkcdRand anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is how the output varies with different seeds. With direct seeding:
seed 0 => below(100) = 0
seed 1 => below(100) = 0
seed ... => below(100) = 0
seed 2^32 => below(100) = 0
This makes it impossible to find a simple seed that gives the below(100)
value you are aiming for. With splitmix, conceptually:
seed 0 => below(100) = 37
seed 1 => below(100) = 5
seed ... => below(100) = 81
seed 2^32 => below(100) = 17
Without splitmix-ed XkcdRand
, 2e53c99 would have to change its test usage to another PRNG with more robust seeding. If that is preferrable, I can leave XkcdRand
as-is and change the test instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XkcdRand
is a joke type... https://xkcd.com/221/
let mut romu = RomuDuoJrRand::with_seed(1); | ||
let mut lehmer = Lehmer64Rand::with_seed(1); | ||
let mut romu_trio = RomuTrioRand::with_seed(1); | ||
|
||
c.bench_function("sfc64", |b| b.iter(|| black_box(sfc64.next()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random question, how does this guy do in the benchmark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sub-nanosecond u64
generation on modern machines (I am fighting with turbo boost on my laptop, so can't get stable numbers to report right now). Slower than Romu
(which in general is considered risky due to having no guarantees regarding bias or cycle length), but faster than xoshiro256++
(which is among the fastest widely-used general-purpose PRNGs today).
I'd say this is as fast as you can go without sacrificing something (like Romu
and Lehmer64
do).
If we are realistically targeting machines with no FPU, then yes. If that is the case, I can revert this back to integer percentages. My main reason to go with floats was that they are more natural to use and more convenient, especially when computing the probabilities. |
Think this looks good for now! if you have more things you want to change (like, 32 bit fallbacks or similar), feel free to open a new PR. Thx! :) |
First, I'd like to thank you for an excellent library!
I started playing around with libafl and reading the source code, and quickly noticed that there are some things that could be improved in the
rands
module oflibafl_bolts
. I've tried to make the commits in this PR relatively self-contained:Would gladly discuss and rework anything that raises concerns.