Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results in AFLGo and AFLFast papers #20

Closed
karl-fuzz-101 opened this issue Apr 25, 2018 · 5 comments
Closed

Different results in AFLGo and AFLFast papers #20

karl-fuzz-101 opened this issue Apr 25, 2018 · 5 comments

Comments

@karl-fuzz-101
Copy link

Recently I read your AFLFast and AFLGo papers.
I found some experiment results on binutils are different in these two papers.

These are the result clips from the papers.
aflfast
aflgo

For CVE-2016-4487, AFL found the bug in 2.63h, and AFLFast found it in 0.46h; however in AFLGo paper, AFL found the bug in 4m and AFLGo found in 2m.
(CVE-2016-4488, CVE-2016-4489, CVE-2016-4490, CVE-2016-4492 are similar)

If this is because that the initial seeds in AFLGo are closer to the bugs, then why in AFLFast can find the bugs faster than AFLGo in CVE-2016-4491 and CVE-2016-6131?

If the in each of the paper AFL and the extension tools are provided with a same set of initial seeds, then why the "factor" of AFLFast looks better than the "factor" of AFLGo (second last column)?

I wonder why the directed fuzzer find the bugs slower than the general purpose fuzzer?

Can you open source your initial seeds for binutils?

@karl-fuzz-101
Copy link
Author

@mboehme I just noticed that in some other issues, some people are confused with the distance calculation on binutils.

@mboehme
Copy link
Collaborator

mboehme commented Apr 25, 2018

Thanks for your interest in our greybox fuzzing research! Correct, there seems to be a disagreement between the results for AFL in the AFLFast and AFLGo papers. However, the sections on experimental setup in both papers should well explain this difference.

First, we are using two different versions of AFL (the most recent in each case). The version of AFL used in the AFLGo paper (FidgetyAFL) already incorporates the explore-schedule which makes recent versions of AFL substantially faster than earlier versions (i.e., before AFLFast).

Second, in the AFLFast paper, we executed both AFL and AFLFast with the deterministic stage (w/o "-d"). However, as Michal Zalewski pointed out in a discussion on AFLFast, he suggested future experiments be conducted with "-d". Hence, we opted to executed both AFL and AFLGo without the deterministic stage (w/ "-d").

Third, we made sure that there is no disadvantage in the comparison of AFLGo with AFL or AFLFast with AFL. In both papers, all fuzzers are started on the same seed corpus, using the same command line parameters, and given the same time budget. In the case of AFL, AFLFast, AFLGo on binutils, the "seed corpus" was the empty file:

mkdir in
echo "" > in/in

Fourth, in order to allow other researchers (including you) to reproduce our results, we made our tools publicly available. The experimental infrastructure is discussed in the paper. I hope this has answered your questions.

Since there is no issue with the tool per-se, I am closing this issue.

@mboehme mboehme closed this as completed Apr 25, 2018
@fuseproj
Copy link

@mboehme I tried AFL 1.94b (I think it was the version AFLFast paper) with c++filt version before your fix. CVE-2016-4487/CVE-2016-4488 took a little long time to be found, but it reported all 46 unique crashes within 1 hour. I checked gdb+valgrind backtrace and some are relevant to CVE-2016-4492/CVE-2016-4493, CVE-2016-4490 and some other crashes. But I don't know how to use AFLGo to get the distance for binutils; can you open source the scripts? (I had some questions in #19 and hope you answer)

Also, I'm also interesting in that said by @karl-fuzznoob why AFLFast can be better than AFLGo sometimes?

@karl-fuzz-101
Copy link
Author

@fuseproj Interesting observations. @mboehme Can you share the details?

@karl-fuzz-101
Copy link
Author

Also, 4487/4488 and 4492/4493 are similar crashes, but in terms of time fuzzing results are different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants