Populate all the tables of a MySQL database with a specified number of rows of junk data, and optionally jumble the foreign keys to create sufficient fake key relationships between tables for SQL joins.
Some tasks require populating databases for testing. Database dumps are huge and the data usually verboten without anonymisation (i.e. contains Personally Identifiable Information). My old script fills database tables – useful for schema design but not queries.
database | support | notes |
---|---|---|
basketball | ✔️ | substantial intersection table |
classicmodels | ✔️ | |
employees (old) | ✔️ | |
Joomla (old) | ✔️ | 60 tables |
Northwind | ✔️ | |
Sakila | ✖️ | |
tpcc | ✖️ | partial with option fudges |
world | ✔️ | |
WordPress (old) | ✔️ |
Sakila uses sophisticated spatial data types.
Old schemas reference.
- An empty or truncated or throwaway database schema already present in the MySQL server.
- Sufficient privileges granted for the connecting user.
- No support of MySQL spatial data types.
- Capped length of longer data types.
- Composite primary keys can be troublesome.
Import a database schema-only file into MySQL or use an existing 'throwaway' database already active within MySQL.
Do not use this package on a database that you care about: MySQL Filler will so surely trash it.
Ensure SELECT, INSERT, UPDATE, DROP
privileges for the connecting user (DROP
is required for table wiping.)
In config.py, edit the database credentials, and the MySQL Filler options required, then:
python3 main.py
For multiprocessing support (PROCS = <num_cpu_cores>
) and a significant speed increase, copy the config.py file imports and global variables into src/mysql_filler.py and run as a standalone script.
It's un-pythonic and ugly, but executes multiprocessing more reliably.
NUM_ROWS = 10 # number of rows to add to all database tables
PROCS = 1 # number of processes to spawn
JUMBLE_FKS = True # toggle random jumbling of foreign keys for joins
FK_PCT_REPLACE = 25 # percentage of NUM_ROWS of foreign keys to jumble
STRICT_INSERT = False # toggle INSERT IGNOREs for duplicate hits / bypass strict SQL mode (warnings versus errors)
PROCESS_INT_FKS = True # process (True) or skip (False) integer foreign keys (TPCC schema with tinyint PKs)
COMPOSITE_PK_INCREMENT = False # skip (False) or increment (True) composite primary keys (TPCC schema)
COMPLEX_JSON = False # False for simple fixed JSON data; True to generate variable JSON data
BYTES_DECODE = 'utf-8' # character set used for byte data type conversion
MAX_PACKET = False # True maximises the packet size (root user only)
DEBUG = False # debug output toggle
EXTENDED_DEBUG = False # verbose debug output toggle
TRUNCATE_TABLES = False # toggle truncation of all database tables (instead of populating)
Using the simple MySQL world database,
import the database from the compressed file:
$ tar -xzOf world-db.tar.gz | mysql -h localhost -u root -p
Allocate a user with the required privileges:
mysql> GRANT SELECT, INSERT, UPDATE, DROP ON world.* TO 'general'@'localhost' IDENTIFIED BY 'P@55w0rd';
mysql> FLUSH PRIVILEGES;
Edit the config.py file and wipe all data that the world database ships with by setting TRUNCATE_TABLES
to True
:
TRUNCATE_TABLES = True
python3 main.py
Truncating all tables of `world` database ...
Change TRUNCATE_TABLES = False
and execute main.py again:
python3 main.py
localhost
world
+10 rows ...
mysql> USE world;
mysql> SELECT ID, CountryCode, country.Code, Code2 FROM city INNER JOIN country ON country.Code = city.CountryCode;
Unlikely to be any results returned by adding just 10 rows; perhaps one 'lucky' row returned for this database in 100 rows added.
NUM_ROWS = 200
...
JUMBLE_FKS = True
Execute main.py
localhost
world
+200 rows
`city`
`country`
`countrylanguage`
foreign keys jumbled
A few results now returned from the previous query, and a CountryCode can be selected:
SELECT ID, CountryCode, country.Code FROM city INNER JOIN country ON country.Code = city.CountryCode WHERE CountryCode = 'ewu';
+----+-------------+------+
| ID | CountryCode | Code |
+----+-------------+------+
| 74 | ewu | EWU |
+----+-------------+------+
1 row in set (0.00 sec)
Meaningless data, but the foreign keys are starting to gain relationships, and so SQL joins between tables are now realised. More rows can be added to increase the number of results.
Speed with multiprocessing (combining config.py into one script) is okay. Speed never was on the agenda.
For serious speed, there's Percona's Go-based mysql_random_data_load. Currently, this tool fills one table at a time – fast – yet somewhat laborious for databases with lots of tables, whereas I wanted all database tables populated with one command.
MariaDB has limited support. It has more restrictions on key constraints and is less forgiving than MySQL 5.7 or 8.0
Tested using MySQL 5.7 and 8.0, and MariaDB 10.4
This package cannot hope to support all variations (good and bad) of MySQL schemas.
For example, adding 1,000 rows to the following real-world table is not going to run smoothly:
+----------------+------------+------+-----+---------+
| Field | Type | Null | Key | Default |
+----------------+------------+------+-----+---------+
| google_channel | char(3) | YES | | NULL |
| locale_id | tinyint(3) | YES | UNI | NULL |
+----------------+------------+------+-----+---------+
– because of the restricted range of TINYINT values, the number of rows, and the unique key.
Composite primary keys can cause trouble. However, the config.py options allow overrides to enable at least basic table population.
MySQL Filler is released under the GPL v.3.