Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libafl_bolts: some improvements to the rands module #2086

Merged
merged 7 commits into from
Apr 23, 2024

Conversation

flyingmutant
Copy link
Contributor

First, I'd like to thank you for an excellent library!

I started playing around with libafl and reading the source code, and quickly noticed that there are some things that could be improved in the rands module of libafl_bolts. I've tried to make the commits in this PR relatively self-contained:

Would gladly discuss and rework anything that raises concerns.

Seeding with splitmix64 is a good way to avoid starting with
low-entropy PRNG states, and is explicitly recommended
by the authors of both xoshiro256++ and Romu.

While at it, give the xoshiro256++ PRNG its proper name.
SFC64 is a well-established and well-understood PRNG designed by
Chris Doty-Humphrey, the author of PractRand. It has been tested
quite a lot over the years, and to date has no known weaknesses.

Compared to xoshiro256++, it is slightly faster and is likely to
be a more future-proof design (xoshiro/xoroshiro family of generators
come with quite long history of [flaws][1] found over the years).

Compared to Romu, it is slightly slower, but guarantees absense
of bias, minimum period of at least 2^64 for any seed, and
non-overlapping streams for different seeds.

[1]: https://tom-kaitchuck.medium.com/designing-a-new-prng-1c4ffd27124d
#[must_use]
pub fn fast_bound(rand: u64, n: u64) -> u64 {
debug_assert_ne!(n, 0);
let mul = u128::from(rand).wrapping_mul(u128::from(n));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know where I left the original comment, but: would u128 be an issue on 32 bit systems?
Also, this method should maybe be #[inline](?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler will inline it if it's good to do so, no worries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check if #[inline] can help with cross-crate inlining without requiring LTO.

Regarding speed, how serious are we about 32-bit performance? Most of the PRNGs in the module already assume 64-bit system (and Lehmer64 even uses an 128x64 multiply).

What we can do is change the signature to fast_bound(rand: u64, n: usize), and on 32-bit system fall back to full-width 32x32 multiply (as suggested by Lemire), instead of 64x64 like we do on 64-bit. This will give us the maximum possible speed on 32-bit systems, at the expense of more bias. I'd argue that this amount of bias is not a problem; usually n is much smaller than 2^32.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like #[inline] should probably be added to everything non-generic performance-critical in the rands module, will do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like a 32 bit fallback since people sometimes run things on arm or even x86 32 bit code

@domenukk
Copy link
Member

Looks good overall! Just a stupid question: aren't the canges to floats for probabilities potentially slower than integers everywhere?

#[must_use]
pub fn with_seed(seed: u64) -> Self {
Self { val: seed }
let mut rand = XkcdRand { val: 0 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually use Self in this case. Also, I don't get the reason for this specific change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this change is to ensure that regardless of the seeding, the output of next_float() or below() is reasonable (and changes with changes to the seed). With direct seeding without splitmix e.g. below(2^32) will return 0 for all 32-bit bit seeds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't the output of next_float and below always be the same for XkcdRand anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is how the output varies with different seeds. With direct seeding:

seed 0    => below(100) = 0
seed 1    => below(100) = 0
seed ...  => below(100) = 0
seed 2^32 => below(100) = 0

This makes it impossible to find a simple seed that gives the below(100) value you are aiming for. With splitmix, conceptually:

seed 0    => below(100) = 37
seed 1    => below(100) = 5
seed ...  => below(100) = 81
seed 2^32 => below(100) = 17

Without splitmix-ed XkcdRand, 2e53c99 would have to change its test usage to another PRNG with more robust seeding. If that is preferrable, I can leave XkcdRand as-is and change the test instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XkcdRand is a joke type... https://xkcd.com/221/

let mut romu = RomuDuoJrRand::with_seed(1);
let mut lehmer = Lehmer64Rand::with_seed(1);
let mut romu_trio = RomuTrioRand::with_seed(1);

c.bench_function("sfc64", |b| b.iter(|| black_box(sfc64.next())));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random question, how does this guy do in the benchmark?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sub-nanosecond u64 generation on modern machines (I am fighting with turbo boost on my laptop, so can't get stable numbers to report right now). Slower than Romu (which in general is considered risky due to having no guarantees regarding bias or cycle length), but faster than xoshiro256++ (which is among the fastest widely-used general-purpose PRNGs today).

I'd say this is as fast as you can go without sacrificing something (like Romu and Lehmer64 do).

@flyingmutant
Copy link
Contributor Author

Looks good overall! Just a stupid question: aren't the canges to floats for probabilities potentially slower than integers everywhere?

If we are realistically targeting machines with no FPU, then yes. If that is the case, I can revert this back to integer percentages. My main reason to go with floats was that they are more natural to use and more convenient, especially when computing the probabilities.

@domenukk domenukk merged commit e1b8c9b into AFLplusplus:main Apr 23, 2024
97 of 98 checks passed
@domenukk
Copy link
Member

Think this looks good for now! if you have more things you want to change (like, 32 bit fallbacks or similar), feel free to open a new PR. Thx! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants