Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add -dict option (like AFL's -x) to replace low-signal string literal list #315

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

roddux
Copy link

@roddux roddux commented Mar 26, 2021

In recent testing I've found that the ROData.strLits list of literals can fill with useless noise; strings collected from places such as error messages, e.g.:

$ unzip metadata project-fuzz.zip
$ cat metadata | jq .Literals | rg 'invalid|error' | head -n5
    "Val": "crypto/aes: invalid key size ",
    "Val": "Error reading socket: %v",
    "Val": "http2: Transport conn %p received error from processing frame %v: %v",
    "Val": "invalid metric name",
    "Val": "RCodeNameError",
[...]

This list of literals is used directly by go-fuzz in the mutation logic, i.e.:

go-fuzz/go-fuzz/mutator.go

Lines 346 to 367 in 6a8e9d1

case 18:
// Insert a literal.
// TODO: encode int literals in big-endian, base-128, etc.
if len(ro.intLits) == 0 && len(ro.strLits) == 0 {
iter--
continue
}
var lit []byte
if len(ro.strLits) != 0 && m.r.Bool() {
lit = []byte(ro.strLits[m.rand(len(ro.strLits))])
} else {
lit = ro.intLits[m.rand(len(ro.intLits))]
if m.rand(3) == 0 {
lit = reverse(lit)
}
}
pos := m.rand(len(res) + 1)
for i := 0; i < len(lit); i++ {
res = append(res, 0)
}
copy(res[pos+len(lit):], res[pos:])
copy(res[pos:], lit)

Having lots of noise in strLits can therefore result in some fairly useless test cases, particularly for syntax-aware programs.

I propose this small change to add a -dict option, so that the user can manually supply a list of useful tokens to go-fuzz. This replaces the ROData.strLits tokens (built from the list in the metadata file) with a high-signal list that the user supplies.


Other thoughts

The signal of the built-in token list could perhaps be improved by modifying the code to avoid messages passed to functions such as log.Fatal or fmt.Print, etc.

func (lc *LiteralCollector) Visit(n ast.Node) (w ast.Visitor) {

Added a -dict option to go-fuzz to use a specific user-created
list of tokens instead of the tokens collected by go-fuzz-build
in the metadata file. tldr; -x from afl-fuzz
@roddux
Copy link
Author

roddux commented Apr 8, 2021

Looking at this again, I think this also addresses #174.

Copy link
Collaborator

@josharian josharian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. It's a fine idea, and a simple patch. (I did some work a while ago on improving the quality of the auto-generated literals, but I don't have the bandwidth to dig them up and apply them any time soon.)

I'm torn on the format; I like how it's simple, but it's not hard to imagine newline characters being useful in literals. One sloppy option is to stay line-oriented, but apply strconv.Unquote if possible and if not, accept as-is. Then you can use a quoted string to get any literal you want in (including a literal that looks like a quoted string), while still having a simple form for everything else. What do you think?

if err != nil {
log.Fatalf("cannot read dictionary file '%v'", *flagDict)
}
if dictStat.Size() == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to allow an empty dictionary? E.g. I can imagine someone using this to disable dictionaries. (Maybe it'd break something in the mutator code? If so, we could add an early exit when strLits is empty.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel if the user is manually specifying a dictionary, it would be strange if it was empty -- therefore probably a mistake. Happy to remove the Fatalf call, but I still think it should be logged. Thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we could log.

go-fuzz/hub.go Outdated Show resolved Hide resolved
@@ -57,6 +58,16 @@ func main() {
log.Fatalf("both -http and -worker are specified")
}

if *flagDict != "" {
dictStat, err := os.Stat(*flagDict)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment explaining why we stat here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we pass *flagDict to workerMain? Or read here and pass contents to workerMain?
It looks like this split stat/size check + reading separately doubles amount of code.

go-fuzz/hub.go Outdated Show resolved Hide resolved
log.Fatalf("could not read tokens from \"%v\": %v\n", fileName, err)
}
textData := string(dictData)
splits := strings.Split(textData,"\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it really matters, but let's not convert to string and then convert back to []byte. Use bytes.Split instead. Or use a bufio.Scanner, which'll be simpler yet.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

go-fuzz/main.go Outdated Show resolved Hide resolved
@@ -57,6 +58,16 @@ func main() {
log.Fatalf("both -http and -worker are specified")
}

if *flagDict != "" {
dictStat, err := os.Stat(*flagDict)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we pass *flagDict to workerMain? Or read here and pass contents to workerMain?
It looks like this split stat/size check + reading separately doubles amount of code.

go-fuzz/main.go Outdated
if *flagDict != "" {
dictStat, err := os.Stat(*flagDict)
if err != nil {
log.Fatalf("cannot read dictionary file '%v'", *flagDict)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please log the error, otherwise it will be hard to understand why it failed.

go-fuzz/hub.go Outdated Show resolved Hide resolved
}
textData := string(dictData)
splits := strings.Split(textData,"\n")
for S := range splits {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be shorter with for _, token := range splits.
Or even shorter with for _, token := range strings.Split(textData,"\n").

splits := strings.Split(textData,"\n")
for S := range splits {
token := splits[S]
if token == "\n" || token == "" {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it contain "\n" is we split by "\n"?

log.Fatalf("could not read tokens from \"%v\": %v\n", fileName, err)
}
textData := string(dictData)
splits := strings.Split(textData,"\n")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -116,6 +118,24 @@ func newHub(metadata MetaData) *Hub {
ro.intLits = append(ro.intLits, []byte(lit.Val))
}
}
if *flagDict != "" {
ro.strLits = nil // Discard existing tokens
fileName := *flagDict
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This renaming looks excessive, especially if we remove the log.Printf. Just use *flagDict.

@dvyukov
Copy link
Owner

dvyukov commented Apr 14, 2021

Overall I am a fan of scripting expert smartness and making it available to all users out-of-the box, rather then shifting the hard work onto every user. We could do better static analysis as you noted, intercept byte/string comparisons at runtime to build dynamic dictionary, etc. But as Josh noted, simplicity of this change bribes, so I guess I don't mind.

@dvyukov
Copy link
Owner

dvyukov commented Apr 14, 2021

I'm torn on the format; I like how it's simple, but it's not hard to imagine newline characters being useful in literals. One sloppy option is to stay line-oriented, but apply strconv.Unquote if possible and if not, accept as-is. Then you can use a quoted string to get any literal you want in (including a literal that looks like a quoted string), while still having a simple form for everything else. What do you think?

Good point.
Strictly speaking, the input format may be binary and one may want to include some magic binary sequences.
Opportunistically trying strconv.Unquote may lead to some surprises for e.g.:

aaa
bbb
"foo"

where I literally want foo with quotes, but they will be silently stripped with no feedback...

I can think of using strconv.Unquote always (somewhat cumbersome for users), or supporting either current format, or json-encoded []string for better control. Is there any prior art in other fuzzers (AFL, LibFuzzer, hongfuzz)?

@roddux
Copy link
Author

roddux commented Apr 23, 2021

Thank you both for the feedback! I haven't forgotten about this PR - I'll find the time to work on this soon (hopefully within the next couple weeks).

@josharian
Copy link
Collaborator

Take all the time you need. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants