C++ PLpgSQL XSLT Makefile M4
Clone or download
Failed to load latest commit information.
.gitignore gitignore: Add doxygen_sqlite3.db May 3, 2018
COPYING Add COPYING. May 1, 2015
Doxyfile.in doxygen: Enable JAVADOC_AUTOBRIEF. Jul 24, 2016
Makefile.am Makefile: Improve gitrev.h generation. May 3, 2018
README.org README: Update for 1.2 release. May 3, 2018
TODO.org TODO: Update coverage info. Jun 19, 2016
ast.png README: Add image of an AST. May 29, 2016
configure.ac configure: Bump version to 1.2.1 for bugfix release. May 3, 2018
dump.cc Factor RTTI type name pretty printing out of dump.cc. Jun 5, 2016
dump.hh Add some Doxygen @brief lines. Jun 2, 2016
dump.xsl Add AST dumper. Jun 7, 2015
dut.hh Log sqlstate. Nov 19, 2016
expr.cc Avoid deep, high-fanout productions in bool_expr::factory. May 3, 2018
expr.hh grammar: Add NULLIF(...). Jul 14, 2017
grammar.cc MERGE: Add impedance matching, reduce probability. Feb 3, 2018
grammar.hh MERGE: improve scope handling. Feb 3, 2018
impedance.cc impedance: Account for "failed to generate" errors as well. Jun 17, 2016
impedance.hh impedance: Account for "failed to generate" errors as well. Jun 17, 2016
known.txt postgres: Update error message filter tables. Apr 7, 2018
known_re.txt postgres: Update error message filter tables. Apr 7, 2018
log-v1.0-to-v1.2.sql logging: update migration script May 3, 2018
log.cc stderr logger: Truncate error messages at 80 chars. Jul 15, 2017
log.hh Add --dump-all-queries, printing queries as they are generated. Oct 21, 2017
log.sql log.sql: Use bigint for error id and impedance counters. Oct 28, 2017
logo.png Shrink logo.png a bit. Jun 12, 2016
monetdb.cc schema_monetdb: Be less verbose on startup. Oct 21, 2017
monetdb.hh Ported Bo Tang's MonetDB driver to the latest version. Oct 13, 2017
postgres.cc Maintain compatibility with older versions of the postgres catalog Mar 29, 2018
postgres.hh postgres: Fix #include directive for libpq-fe.h. Apr 13, 2017
prod.cc grammar: Don't hardcode the retry limit. Jun 19, 2016
prod.hh grammar: Reduce default retry limit from 200 to 100. Oct 28, 2017
random.cc Swap default RNG with 64-bit Mersenne Twister. Jul 27, 2015
random.hh Add some Doxygen @brief lines. Jun 2, 2016
relmodel.cc relmodel: Add method sqltype::consistent(). Jul 17, 2016
relmodel.hh Add comments. Mar 29, 2018
schema.cc Various changes to try to get the error rate down again. Jul 17, 2016
schema.hh Maintain compatibility with older versions of the postgres catalog Mar 29, 2018
sqlite.cc sqlite.cc: Trusty's gcc wants cstring included May 3, 2018
sqlite.hh Add an "exclude-catalog" option. Oct 13, 2016
sqlsmith.cc Add --dump-all-queries, printing queries as they are generated. Oct 21, 2017
util.hh Comment. Jun 5, 2016




<mba> "I love the smell of coredumps in the morning"


SQLsmith is a random SQL query generator. Its paragon is Csmith, which proved valuable for quality assurance in C compilers.

It currently supports generating queries for PostgreSQL, SQLite 3 and MonetDB. To add support for another RDBMS, you need to implement two classes providing schema information about and connectivity to the device under test.

Besides developers of the RDBMS products, users developing extensions might also be interested in exposing their code to SQLsmith’s random workload.

Since 2015, it found 118 bugs in alphas, betas and releases in the aforementioned products, including security vulnerabilities in released versions. Addtional bugs were squashed in extensions and libraries such as orafce and glibc.



  • C++11
  • libpqxx


  • boost::regex in case your std::regex is broken
  • SQLite3
  • monetdb_mapi

Building on Debian Jessie

apt-get install build-essential autoconf autoconf-archive libpqxx-dev libboost-regex-dev libsqlite3-dev
cd sqlsmith
autoreconf -i # Avoid when building from a release tarball

Building on OSX

In order to build on Mac OSX, assuming you use Homebrew, run the following

brew install libpqxx automake libtool autoconf autoconf-archive pkg-config
cd sqlsmith
autoreconf -i # Avoid when building from a release tarball


SQLsmith connects to the target database to retrieve the schema for query generation and to send the generated queries to. Currently, all generated statements are rolled back. Beware that SQLsmith does call functions that could possibly have side-effects (e.g. pg_terminate_backend). Use a suitably underprivileged user for its connection to avoid this.

Example invocations:

# testing Postgres
sqlsmith --verbose --target="host=/tmp port=65432 dbname=regression"
# testing SQLite
sqlsmith --verbose --sqlite="file:$HOME/.mozilla/firefox/places.sqlite?mode=ro"
# testing MonetDB
sqlsmith --verbose --monetdb="mapi:monetdb://localhost:50000/smith"

The following options are currently supported:

--target=connstrtarget postgres database (default: libpq defaults)
--sqlite=URItarget SQLite3 database
--monetdb=URItarget MonetDB database
--log-to=connstrpostgres db for logging errors into (default: don’t log)
--verboseemit progress output
--versionshow version information
--seed=intseed RNG with specified integer instead of PID
--dry-runprint queries instead of executing them
--max-queries=longterminate after generating this many queries
--exclude-catalogdon’t generate queries using catalog relations
--dump-all-queriesdump queries as they are generated
--dump-all-graphsdump generated ASTs for debugging
--rng-state=stringdeserialize dumped rng state

Sample output:

--verbose makes sqlsmith emit some progress indication to stderr. A symbol is output for each query sent to the server. Currently the following ones are generated:

.okQuery generated and executed with ok sqlstate
Ssyntax errorThese are bugs in sqlsmith - please report
ttimeoutSQLsmith sets a statement timeout of 1s
Cbroken connectionThese happen when a query crashes the server
eother error

When you test against a RDBMS that doesn’t support some of SQLsmith’s grammar, there will be a burst of syntax errors on startup. These should disappear after some time as SQLsmith blacklists productions that consistently lead to errors.

--verbose will also periodically emit error reports. In the following example, these are mostly caused by the primitive type system.

queries: 39000 (202.399 gen/s, 298.942 exec/s)
AST stats (avg): height = 5.599 nodes = 37.8489
82	ERROR:  invalid regular expression: quantifier operand invalid
70	ERROR:  canceling statement due to statement timeout
44	ERROR:  operator does not exist: point = point
27	ERROR:  operator does not exist: xml = xml
22	ERROR:  cannot compare arrays of different element types
11	ERROR:  could not determine which collation to use for string comparison
5	ERROR:  invalid regular expression: nfa has too many states
4	ERROR:  cache lookup failed for index 2619
4	ERROR:  invalid regular expression: brackets [] not balanced
3	ERROR:  operator does not exist: polygon = polygon
2	ERROR:  invalid regular expression: parentheses () not balanced
1	ERROR:  invalid regular expression: invalid character range
error rate: 0.00705128

The only one that looks interesting here is the cache lookup one. Taking a closer look at it reveals that it happens when you query a certain catalog view like this:

self=# select indexdef from pg_catalog.pg_indexes where indexdef is not NULL;
FEHLER:  cache lookup failed for index 2619

This is because the planner then puts pg_get_indexdef(oid) in a context where it sees non-index-oids, which causes it to croak:

                                     QUERY PLAN                                     
 Hash Join  (cost=17.60..30.65 rows=9 width=4)
   Hash Cond: (i.oid = x.indexrelid)
   ->  Seq Scan on pg_class i  (cost=0.00..12.52 rows=114 width=8)
         Filter: ((pg_get_indexdef(oid) IS NOT NULL) AND (relkind = 'i'::"char"))
   ->  Hash  (cost=17.31..17.31 rows=23 width=4)
         ->  Hash Join  (cost=12.52..17.31 rows=23 width=4)
               Hash Cond: (x.indrelid = c.oid)
               ->  Seq Scan on pg_index x  (cost=0.00..4.13 rows=113 width=8)
               ->  Hash  (cost=11.76..11.76 rows=61 width=8)
                     ->  Seq Scan on pg_class c  (cost=0.00..11.76 rows=61 width=8)
                           Filter: (relkind = ANY ('{r,m}'::"char"[]))

Now this is more of a curiosity than a bug, but still illustrating how debugging with the help of SQLsmith might look like.

Large-scale testing

--log-to allows logging of hundreds of sqlsmith instances into a central PostgreSQL database. ./log.sql contains the schema sqlsmith expects and some additional views to generate reports on the logged contents.

It also contains a trigger to filter boring/known errors based on the contents of the tables known and known_re. I periodically COPY my filter tables for testing PostgreSQL into the files ./known_re.txt and ./known.txt to serve as a starting point.



SQLsmith is available under GPLv3. Use it at your own risk. It may damage your database (one of the purposes of this tool is to try and break things). See the file COPYING for details.


Andreas Seltenreich <seltenreich@gmx.de>

Bo Tang <tangloner@gmail.com>

Sjoerd Mullender <sjoerd@acm.org>