go randgen

go version of mysql rangen for db fuzz test

Install with `go get` Command

go get -u  github.com/pingcap/go-randgen/cmd/go-randgen

try it:

go-randgen -h

How to Compile It

install go-bindata command

go get -u github.com/jteeuwen/go-bindata/...

compile go-randgen

make all

Features

It has a built-in zz file, you can generate sqls just with a yy file
Be able to generate sql without connection to db, so it is very fast if you just want sqls
yy syntax is mysql randgen compatible, as long as it doesn't contain embedded perl code
go-randgen supports embedded lua code blocks, just as mysql randgen supports perl code blocks
Implemented with pure Golang, it's design is very flexible, easy to hack
All exported functions are stateless other than cmd package, if needed, you can call them as a library

Quick Start

gentest

generate sqls to test window functions:

# -Y the path of yy file to use
# -Q sql num to generate by yy
# -B break ddls and generated sqls into two files
# if user doesn't specfied a zz file path, it will use built-in zz
./go-randgen gentest -Y examples/windows.yy -Q 10 -B

After execution, you will see a output.data.sql, which includes ddl (define table structrue) and dml (init data in tables), and a output.rand.sql, which includes sqls generated by specified yy file.

If you don't want to use built-in zz, you can write a zz by hand, and use -Z to specify path, the detail of zz syntax to see following Grammar Guide.

If you just want to generate some sqls by yy, without generating ddl, you can use --skip-zz to skip ddl generation, but yy file can not include key word related with table name or field name in this condition.

the detail of yy syntax to see following Grammar Guide.

gendata

generate table structrue and data in user specified dsns according to user specified or built-in zz file.

# generate table structure and data specified by zz in specified dsn
# dsns are split by comma
./go-randgen gendata --dsns "root:@tcp(127.0.0.1:3306)/randgen,root:@tcp(127.0.0.1:4000)/randgen"

gensql

parse yy to generate sqls by user specified dsn:

./go-randgen gensql -Y examples/functions.yy \  
             --dsn "root:@tcp(127.0.0.1:3306)/randgen" \ 
             -Q 100

Note that gensql assumes structures of tables in the dsn are the same (Because data genrated by randgen has this trait).

exec

User sepecifies two dsns, then exec command will execute generated sqls in two dsn and dump sqls with inconsistent execution result.

Example:

./go-randgen exec -Y examples/functions.yy \
             --dsn1 "root:@tcp(127.0.0.1:4000)/randgen" \
             --dsn2 "root:@tcp(127.0.0.1:3306)/randgen" \  
             -Q 100

In above command, go randgen will generate data in two dsns first, then execute 100 sqls generated according to functions.yy, if execution result of a sql is inconsistent, it will dump related info into dump directory. (you can change this directory by --dump)

Note that go randgen compare sql result without order in default. For example, go-randgen think that the following result is consistent:

Result1:

+------+------+
| p    | s    |
+------+------+
|    1 | aaa  |
|    2 | bbb  |
+------+------+

Result2:

+------+------+
| p    | s    |
+------+------+
|    2 | bbb  |
|    1 | aaa  |
+------+------+

If you want to compare in order byte by byte, you should add --order.

exec can also skip data generation by set --skip-zz, it will generate sqls just like gensql command.

Call It as a Library

Except packages under cmd directory, all exported functions is stateless, you can call them again and again safely.

The implementation under cmd directory is the best examples to use it as a library.

A simple example: get an iterator by yy and generate ten sqls

package main

import (
	"fmt"
	"github.com/pingcap/go-randgen/grammar"
	"github.com/pingcap/go-randgen/grammar/sql_generator"
	"log"
)

func main() {
	yy := `
{
i = 1
}

query: 
    create
    
create:
    CREATE TABLE 
    {print(string.format("table%d", i)); i = i+1}
    (a int)
`
	iterator, err := grammar.NewIter(yy, "query", 5, nil, false)
	if err != nil {
		log.Fatalf("get iter err %v\n", err)
	}

	iterator.Visit(sql_generator.FixedTimesVisitor(func(_ int, sql string) {
		fmt.Println(sql)
	}, 5))
}

Note that there is no problem to pass nil in the third param because of the absence of keyword in the example yy. But if you use yy keyword, you should use gendata.NewKeyfun() to create this param.

Print result：

CREATE TABLE table1 (a int)
CREATE TABLE table2 (a int)
CREATE TABLE table3 (a int)
CREATE TABLE table4 (a int)
CREATE TABLE table5 (a int)

Grammar Guide

zz Grammar

Quick Start

zz file is a lua script file, it will define the following three things:

tables to generate
fields in tables
data in fields

Use built-in zz as an example:

-- table related defination
tables = {
    -- record number
    rows = {10, 20, 30, 90},
    -- table character set
    charsets = {'utf8', 'latin1', 'binary'},
    -- table partition number, 'undef' represents no partition
    partitions = {4, 6, 'undef'},
}

-- field related defination
fields = {
    -- field data types
    types = {'bigint', 'float', 'double', 'decimal(40, 20)',
        'char(20)', 'varchar(20)'},
    -- generate signed and unsigned field with above type 
    sign = {'signed', 'unsigned'}
}

-- data init related defination
data = {
    -- optional value of number type
    numbers = {'null', 'tinyint', 'smallint',
        '12.991', '1.009', '-9.183',
        'decimal',
    },
    -- optional value of char type
    strings = {'null', 'letter', 'english'},
}

As you can see, there must be three lua Table variables in zz file, which is tables, fields and data.

tables defines attributes on table, like rows, charsets and partitions in above example, these attributes' cartesian product will be evaluated,
every tuple in which will generate a table. tables definition in above example will generate 4(rows)*3(charsets)*3(partitions)=36 tables in total.

fields defines what fields in tables, these infos' cartesian product also will be evaluated, every tuple in which will generate a field, but the fields generated by above example is less than 6(types)*2(sign)=12, because sign attribute can only apply on numberic type, go-randgen will automatically ignore it for non-numberic types, so example definition will only genrate 4(number)*2(sign)+2(char)=10 fields. Note that tables generated by randgen all have the same fields.

data defines data in tables, key represents field types (see following for detail), value is an array, which are optional values of this type. For each genrated record, it will random choose a value for fields of corresponding type. Optional values include "literal" and "generator".Take the definition of numbers above as an example, null,12.991, etc. are literals, they will be take directly as a value, but when tinyint (it is a generator) is taken, it will generate a random value in -128~127 (to know more generators, see following for detail).

tables

keys	mean	options	default
rows	record number in table	any positive number	[0, 1, 2, 10, 100]
charsets	table's character set	'utf8','utf8mb4','ascii','latin1','binary', 'undef' means not set charset explicitly	['undef']
partitions	partition number of table	any positive number or 'undef', 'undef' means no partition	['undef']

Related source code is tablesVars variable in gendata/tables.go

fields

keys	mean	options	default
types	field type	any valid MySQL type	['int', 'varchar', 'date', 'time', 'datetime']
keys	index or not	'key' means add index to the field, 'undef' means not	['undef', 'key']
sign	unsigned or not	'signed', 'unsigned'	['signed']

Related source code is fieldVars variable in gendata/fields.go.

data

data definition in go-randgen is enhanced compared with mysql randgen. mysql randgen only support five summary types, which is numbers, blobs, temporals, enum and strings. go-randgen not only support the five summary types, but also support fine-grained definition.

For example:

data = {
    numbers = {'null', 'tinyint', 'smallint',
        '12.991', '1.009', '-9.183',
        'decimal',
    },
    bigint = {100, 10, 3},
}

For above data definition, it will take randomly from 100, 10 and 3 when go-randgen generate bigint field, instead from numbers definition, which is more rough.

The map from concrete data types to summary types is summaryType variable in gendata/data.go.

Among above numbers definition, 'tinyint', 'smallint', 'decimal' are go-randgen built-in generators.

To know all generators in go-randgen, you can see init function in gendata/generators/register.go

yy Grammar

Quick Start

A simple example:

# one line comment
/*
multiline comment
*/

query:
    select
    | select1

select:
    SELECT fields FROM _table
    
fields:
    _field
    | _field_int

The generated result of this yy maybe:

select1
SELECT "a random field" FROM "a random table in this db"
SELECT "a random int field" FROM "a random table in this db"
select1

Comment

one line comment: #
multiline comment: /**/

Classifications of Identifier

Non-Terminal: be composed of lowercase, number of underline, but can not begin with number
Terminal: be composed of uppercase, special character or number, but can not begin with underline
Keyword: begin with underline

For Non-Terminal in the right of production, if the corresponding production can not be found, the Non-Terminal degrade to Terminal

Keyword

Keyword begin with underline.

The keywords to get table names and field names:

_table: randomly get a table name from generated tables
_field: randomly get a field name from generated fields
_field_int: randomly get a field from generated int fields
_field_char: randomly get a field from generated char and varchar fields
_field_list: get all field split by comma in the table
_field_int_list: get all int field split by comma in the table
_field_char_list: get all char field split by comma in the table

Some sugars to randomly generate data(it will generate double quotes automatically in character related data):

_digit: random number in 0-9
_letter: random char from 'a' to 'z'
_english: random English word
_int: radom int value
_date: generate randomly yyyy-MM-dd formatted date
_year: random year
_time: random hh:mm:ss formatted time
_datetime: random yyyy-MM-dd hh:mm:ss formatted time

I only write a part of keywords. If you want to know all keywords, You can see the NewKeyfun function in gendata/gendata.go

Embedded Lua Code Block

You can write lua code in "{}", call print function to splice content to sql

query:{a = 1}
    CREATE TABLE 
    {print(string.format("t%d", a))} (a INT)

Above example will always generate sql CREATE TABLE t1 (a INT).

You can call any funcation in lua standard library, for example:

# random number from 10-20
query:
    {print(math.random(10,20))}

Normal code block will be executed every time the corresponding branch is selected.

go-randgen support insert code in the head of yy file, this code will only be executed once, called head code block, the main application of head code block is the declaration of variables or functions:

# head code block
{
i = 1
a = 100
function add(num1, num2)
    return num1 + num2    
end
}

query:
   select

select:
   SELECT * FROM _table WHERE where_clause
   
where_clause:
   _field_int > {print(i)}
   | _field_char > {print(a)}
   | _field_int + _field_int > {print(add(i, a))}

You may be afraid that the brace may be conflict with lua Table definition.In fact, you are worried for no thing, you are free to use lua Table in code block:

{
f={a=1, b=3}
arr={0,2,3,4}
}

query:
  {print(arr[f.a])} | {print(arr[f.b])}

Above example will only genrate "0" or "3" (Note that index of lua array begin from 1).

Another important feature is that you can call yy key word in lua code block by _xxx(), for example:

query:{table = _table()}
    BEGIN ; update ; select ; END

update:
    UPDATE {print(table)} SET _field_int = 10

select:
    SELECT * FROM {print(table)}

Above yy will ensure that random table in update and select is the same.

Frequent Pattern

recursive subquery

query:
    select

select:
    SELECT * FROM
    (select)
    WHERE _field_int > 10
    | SELECT * FROM _table WHERE _field_char = _english

empty rule

 order:
     ASC
     |DESC
     |    # order can product empty
     
 #...

generate a sequence of sqls

For exampe, we want to test Prepared statement

query:
	SET @stmt = {print('"')} select {print('"')};
	PREPARE stmt FROM @stmt_create ; 
	EXECUTE stmt ;
	
select:
    SELECT * FROM _table

If you generate 3 sqls with above yy(namely set -Q with 3), it will generate sqls as follow:

SET @stmt = " SELECT * FROM _table ";
PREPARE stmt FROM @stmt_create; 
EXECUTE stmt;

If you generate 6 sqls, it will generate above sqls twice.

If you generate 2 sqls, result is as follow:

SET @stmt = " SELECT * FROM _table ";
PREPARE stmt FROM @stmt_create;

From above examples, we can see the mean of ;. The symbol extends from mysql randgen, represents generation of a sequence of sqls.

create unique table when test create statements

Program 1: insert lua script, apply head code block

# declare i to 1
{
i = 1
}

query: 
    create
    
create:
    CREATE TABLE 
    {print(string.format("table%d", i)); i = i+1}
    (a int)

Program 2: create table first, then drop it in next sql

query:
    create
    
create:
    CREATE TABLE t (a int); DROP TABLE t

Result:

CREATE TABLE t (a int);
DROP TABLE t;
CREATE TABLE t (a int);
DROP TABLE t;
...

How to Hack

Compared to mysql randgen, go-randgen is very easy to hack. If you want some feature it doesn't have yet, you can add it conveniently.

Hack zz data

If you think all built-in generators do not meet your demand, you can register new generator in init function in
gendata/generators/register.go

Assume that you add a aaa generator, you will not only can use "aaa" in zz data field, but alse can use _aaa key word in yy.

Hack yy Keyword

You can add your keyword in NewKeyfun function in gendata/gendata.go

Differences with mysql randgen

do not need to add ; at the end of bnf expression, which is the habit of mysql randgen. go-randgen do not depend on ; to recognize different bnf expression. Of course, there is no problem if you add ; at the end of bnf expression, because we are compatible with mysql randgen
go-randgen init zz data with insert statement, other than insert ignore. So if it meets unsigned field, it will try 10 times at most to generate positive value, if it fails in 10 times, 1 will be assigned to this field
data definition in zz can be more precise, other than the only four types in mysql randgen
support generate sqls without connection to db, which takes advantage of schema cache in ddls generation, it is very fast.

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
cmd/go-randgen		cmd/go-randgen
compare		compare
examples		examples
gendata		gendata
grammar		grammar
resource		resource
view		view
.gitignore		.gitignore
Makefile		Makefile
README-ZH.MD		README-ZH.MD
README.MD		README.MD
go.mod		go.mod
go.sum		go.sum

aytrack/go-randgen

Folders and files

Latest commit

History

Repository files navigation

go randgen

Install with go get Command

How to Compile It

Features

Quick Start

gentest

gendata

gensql

exec

Call It as a Library

Grammar Guide

zz Grammar

Quick Start

tables

fields

data

yy Grammar

Quick Start

Comment

Classifications of Identifier

Keyword

Embedded Lua Code Block

Frequent Pattern

How to Hack

Hack zz data

Hack yy Keyword

Differences with mysql randgen

About

Resources

Stars

Watchers

Forks

Languages

Install with `go get` Command