Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for token dictionaries #336

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,13 @@ Vendoring with modules is not yet supported. A `vendor` directory will be ignore
Note that while modules are used to prepare the build, the final instrumented build is still done in GOPATH mode.
For most modules, this should not matter.

## Fuzzing dictionaries

Go-fuzz supports user-defined dictionaries containing tokens or interesting byte sequences. Dictionaries replace the
low-signal token list that is automatically generated by go-fuzz with a high-signal token list.
Use `-dict DICTIONARY_FILE` to provide a dictionary. The dictionary syntax is the same as AFL/Libfuzzer.
See [AFL Dictionaries](https://github.com/google/AFL/blob/master/dictionaries/README.dictionaries) for more information.

## libFuzzer support

go-fuzz-build can also generate an archive file
Expand Down
129 changes: 129 additions & 0 deletions go-fuzz/hub.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,14 @@
package main

import (
"bytes"
"encoding/hex"
"fmt"
"io/ioutil"
"log"
"net/rpc"
"path/filepath"
"strconv"
"sync"
"sync/atomic"
"time"
Expand Down Expand Up @@ -73,6 +77,102 @@ type Stats struct {
restarts uint64
}

func parseDictTokenLine(tokenLine *[]byte, tokenLineNo int) *[]byte {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this function to somewhere at the bottom (after use). It's not the most important function of the file to be first.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why return pointer to []byte?

var err error
metaDataMode := true
token := make([]byte, 0, len(*tokenLine))
tokenLevel := 0
for index := 0; index < len(*tokenLine); index++ {
switch (*tokenLine)[index] {
case byte('"'):
if !metaDataMode {
// If we are parsing the token (metaDataMode=false) the first " we encounter marks the end of the token
metaDataMode = !metaDataMode
} else if index == 0 || (*tokenLine)[index-1] == byte('=') {
// change the metaDataMode either directly or if a keyword is defined after an equal sign
metaDataMode = !metaDataMode
}
break
case byte('\\'):
// Handle escape sequence
if !metaDataMode {
index++
if index >= len(*tokenLine) {
log.Printf("dictionary token in line %d has incorrect format", tokenLineNo)
return nil
}
switch (*tokenLine)[index] {
case byte('"'), byte('\\'):
// Handle escaped quote (\") and escaped backslash (\\)
token = append(token, (*tokenLine)[index])
break

case byte('x'):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be casted to byte?

// Handle hexadecimal values (e.g. \xFF)
if index+2 >= len(*tokenLine) {
log.Printf("dictionary token in line %d has incorrect format", tokenLineNo)
return nil
}

hexBytes := make([]byte, 1)
_, errDecode := hex.Decode(hexBytes, (*tokenLine)[index+1:index+3])
if errDecode != nil {
log.Printf("dictionary token in line %d has incorrect format", tokenLineNo)
return nil
}

token = append(token, hexBytes[0])

index = index + 2
break

case byte('n'):
// Handle newline (\n)
token = append(token, byte('\n'))
break

case byte('t'):
// Handle tab (\t)
token = append(token, byte('\t'))
break
}
}
break
case byte('@'):
//Handle token level if metaDataMode
if metaDataMode && index+1 < len(*tokenLine) {
num := ""
for counter := 1; index+counter < len(*tokenLine); counter++ {
value := int((*tokenLine)[index+counter])
if 0x30 <= value && value <= 0x39 {
num = num + string(rune(value))
} else {
break
}
}
tokenLevel, err = strconv.Atoi(num)
if err != nil {
log.Printf("token level in dictionary line %d could not be parsed", tokenLineNo)
return nil
}
}
// Fallthrough if not metaDataMode to add the @ to the token
fallthrough
default:
if !metaDataMode {
token = append(token, (*tokenLine)[index])
}
}

}

// If the global dictLevel is equal or higher than the tokenLevel is added, otherwise it is ignored
if tokenLevel <= dictLevel {
return &token
}
return nil
}

func newHub(metadata MetaData) *Hub {
procs := *flagProcs
hub := &Hub{
Expand Down Expand Up @@ -116,6 +216,35 @@ func newHub(metadata MetaData) *Hub {
ro.intLits = append(ro.intLits, []byte(lit.Val))
}
}

if dictPath != "" {
/*
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use C-style single-line // comments.

Replaces the low-signal token list with a user defined high-signal token list.
Existing tokens that were obtained through token capture and which are stored in ro.strLits are discarded.
The intLits tokens are not discarded and will be used. However the user can also specify integers as a
bytearray in the dictionary to use them as well.
*/
ro.strLits = nil // Discard existing tokens
dictionary, err := ioutil.ReadFile(dictPath)
if err != nil {
log.Fatalf("could not read tokens from %q: %v", dictPath, err)
}

for tokenLineNo, tokenLine := range bytes.Split(dictionary, []byte("\n")) {
// Ignore Comments
if bytes.HasPrefix(bytes.TrimSpace(tokenLine), []byte("#")) || len(tokenLine) == 0 {
continue
}
token := parseDictTokenLine(&tokenLine, tokenLineNo)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pass pointer to tokenLine?

if token != nil {
// add token to ro.strLits
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look useful. Remove.

ro.strLits = append(ro.strLits, *token)
}

}

}

hub.ro.Store(ro)

go hub.loop()
Expand Down
36 changes: 34 additions & 2 deletions go-fuzz/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ import (
"path/filepath"
"runtime"
"runtime/debug"
"strconv"
"strings"
"sync/atomic"
"syscall"
"time"
Expand Down Expand Up @@ -42,10 +44,14 @@ var (
flagSonar = flag.Bool("sonar", true, "use sonar hints")
flagV = flag.Int("v", 0, "verbosity level")
flagHTTP = flag.String("http", "", "HTTP server listen address (coordinator mode only)")
flagDict = flag.String("dict", "", "optional fuzzer dictionary (using AFL/Libfuzzer format)")

shutdown uint32
shutdownC = make(chan struct{})
shutdownCleanup []func()

dictPath = ""
dictLevel = 0
)

func main() {
Expand All @@ -57,6 +63,32 @@ func main() {
log.Fatalf("both -http and -worker are specified")
}

if *flagDict != "" {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this logic to a helper function. It's too low level for main.

// Check if the provided path exists
_, err := os.Stat(*flagDict)
if err != nil {
// If not it might be because a dictLevel was provided by appending @<num> to the dict path
atIndex := strings.LastIndex(*flagDict, "@")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this logic can be simpler if we try to split by @ first. Otherwise we have too many branches and duplicated error handling. If there is no specific reason to try to open the file with "@" first, please split by @ before first stat.

if atIndex != -1 {
dictPath = (*flagDict)[:atIndex]
_, errStat := os.Stat(dictPath)
if errStat != nil {
log.Fatalf("cannot read dictionary file %q: %v", dictPath, err)
}
dictLevel, err = strconv.Atoi((*flagDict)[atIndex+1:])
if err != nil {
log.Printf("could not convert dict level using dict level 0 instead")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not Fatalf? That's incorrect user input.

dictLevel = 0
}
} else {
// If no dictLevel is provided and the dictionary does not exist log error and exit
log.Fatalf("cannot read dictionary file %q: %v", *flagDict, err)
}
} else {
dictPath = *flagDict
}
}

go func() {
c := make(chan os.Signal, 1)
signal.Notify(c, syscall.SIGINT)
Expand Down Expand Up @@ -100,8 +132,8 @@ func main() {
// Try the default. Best effort only.
var bin string
cfg := new(packages.Config)
// Note that we do not set GO111MODULE here in order to respect any GO111MODULE
// setting by the user as we are finding dependencies. See modules support
// Note that we do not set GO111MODULE here in order to respect any GO111MODULE
// setting by the user as we are finding dependencies. See modules support
// comments in go-fuzz-build/main.go for more details.
cfg.Env = os.Environ()
pkgs, err := packages.Load(cfg, ".")
Expand Down