-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add -dict option (like AFL's -x) to replace low-signal string literal list #315
base: master
Are you sure you want to change the base?
Conversation
Added a -dict option to go-fuzz to use a specific user-created list of tokens instead of the tokens collected by go-fuzz-build in the metadata file. tldr; -x from afl-fuzz
Looking at this again, I think this also addresses #174. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. It's a fine idea, and a simple patch. (I did some work a while ago on improving the quality of the auto-generated literals, but I don't have the bandwidth to dig them up and apply them any time soon.)
I'm torn on the format; I like how it's simple, but it's not hard to imagine newline characters being useful in literals. One sloppy option is to stay line-oriented, but apply strconv.Unquote if possible and if not, accept as-is. Then you can use a quoted string to get any literal you want in (including a literal that looks like a quoted string), while still having a simple form for everything else. What do you think?
if err != nil { | ||
log.Fatalf("cannot read dictionary file '%v'", *flagDict) | ||
} | ||
if dictStat.Size() == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to allow an empty dictionary? E.g. I can imagine someone using this to disable dictionaries. (Maybe it'd break something in the mutator code? If so, we could add an early exit when strLits is empty.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel if the user is manually specifying a dictionary, it would be strange if it was empty -- therefore probably a mistake. Happy to remove the Fatalf
call, but I still think it should be logged. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we could log.
@@ -57,6 +58,16 @@ func main() { | |||
log.Fatalf("both -http and -worker are specified") | |||
} | |||
|
|||
if *flagDict != "" { | |||
dictStat, err := os.Stat(*flagDict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment explaining why we stat here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we pass *flagDict to workerMain? Or read here and pass contents to workerMain?
It looks like this split stat/size check + reading separately doubles amount of code.
log.Fatalf("could not read tokens from \"%v\": %v\n", fileName, err) | ||
} | ||
textData := string(dictData) | ||
splits := strings.Split(textData,"\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that it really matters, but let's not convert to string and then convert back to []byte. Use bytes.Split instead. Or use a bufio.Scanner, which'll be simpler yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -57,6 +58,16 @@ func main() { | |||
log.Fatalf("both -http and -worker are specified") | |||
} | |||
|
|||
if *flagDict != "" { | |||
dictStat, err := os.Stat(*flagDict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we pass *flagDict to workerMain? Or read here and pass contents to workerMain?
It looks like this split stat/size check + reading separately doubles amount of code.
go-fuzz/main.go
Outdated
if *flagDict != "" { | ||
dictStat, err := os.Stat(*flagDict) | ||
if err != nil { | ||
log.Fatalf("cannot read dictionary file '%v'", *flagDict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please log the error, otherwise it will be hard to understand why it failed.
} | ||
textData := string(dictData) | ||
splits := strings.Split(textData,"\n") | ||
for S := range splits { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be shorter with for _, token := range splits
.
Or even shorter with for _, token := range strings.Split(textData,"\n")
.
splits := strings.Split(textData,"\n") | ||
for S := range splits { | ||
token := splits[S] | ||
if token == "\n" || token == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it contain "\n" is we split by "\n"?
log.Fatalf("could not read tokens from \"%v\": %v\n", fileName, err) | ||
} | ||
textData := string(dictData) | ||
splits := strings.Split(textData,"\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -116,6 +118,24 @@ func newHub(metadata MetaData) *Hub { | |||
ro.intLits = append(ro.intLits, []byte(lit.Val)) | |||
} | |||
} | |||
if *flagDict != "" { | |||
ro.strLits = nil // Discard existing tokens | |||
fileName := *flagDict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This renaming looks excessive, especially if we remove the log.Printf. Just use *flagDict.
Overall I am a fan of scripting expert smartness and making it available to all users out-of-the box, rather then shifting the hard work onto every user. We could do better static analysis as you noted, intercept byte/string comparisons at runtime to build dynamic dictionary, etc. But as Josh noted, simplicity of this change bribes, so I guess I don't mind. |
Good point.
where I literally want foo with quotes, but they will be silently stripped with no feedback... I can think of using strconv.Unquote always (somewhat cumbersome for users), or supporting either current format, or json-encoded |
Thank you both for the feedback! I haven't forgotten about this PR - I'll find the time to work on this soon (hopefully within the next couple weeks). |
Take all the time you need. :) |
In recent testing I've found that the
ROData.strLits
list of literals can fill with useless noise; strings collected from places such as error messages, e.g.:This list of literals is used directly by
go-fuzz
in the mutation logic, i.e.:go-fuzz/go-fuzz/mutator.go
Lines 346 to 367 in 6a8e9d1
Having lots of noise in
strLits
can therefore result in some fairly useless test cases, particularly for syntax-aware programs.I propose this small change to add a
-dict
option, so that the user can manually supply a list of useful tokens togo-fuzz
. This replaces theROData.strLits
tokens (built from the list in themetadata
file) with a high-signal list that the user supplies.Other thoughts
The signal of the built-in token list could perhaps be improved by modifying the code to avoid messages passed to functions such as log.Fatal or fmt.Print, etc.
go-fuzz/go-fuzz-build/cover.go
Line 394 in 6a8e9d1