-
Notifications
You must be signed in to change notification settings - Fork 0
Load Generator
Garth Goodson edited this page Dec 16, 2025
·
1 revision
The load generator script is available in <springtail>/python/performance/load_generator.py
It contains a config file <springtail>/python/performance/load_config.yaml
The config file looks like below
system_json_path: '../../system.json.test'
# Number of schemas to create
num_schemas: 10
# Table configuration
table_configuration:
# Number of tables per schema
num_tables: 10
# Number of columns per table (range)
min_columns: 3
max_columns: 10
# Index configuration
index_configuration:
# Number of indexes per table (range)
min_indexes: 0
max_indexes: 2
# Number of columns per index (range)
min_columns_per_index: 1
max_columns_per_index: 2
# Load configuration
load_configuration:
# Number of inserts per table
num_inserts: 500
# Number of updates per table
num_updates: 250
# Number of deletes per table
num_deletes: 125
# Whether to allow nulls in the data for inserts
allow_nulls:
rows: true
cols: true
# Whether to batch inserts
batched_inserts: false
# Operations to perform
operations: ['create_table', 'create_index', 'insert_data', 'update_data', 'delete_data']
# Whether to use the previous configuration to load tables
use_existing_config: true
# Comparison threshold for the final aggregates, increase of the percentage will result in failure
comparison_threshold: 15.0
file_configuration:
base_dir: 'performance_test_output'
# Directory for previous run files
prev_run_dir: 'performance_test_output_prev_run'
# Directory for output files
output_files:
dir: 'output_files'
# Query info generated by the load_generator script
query_info: 'query_info.csv'
# Final report
final_report: 'final_report.xlsx'
# Final traces - Raw trace data
final_traces: 'final_traces.csv'
# Final aggregates
final_aggregates: 'final_aggregates.csv'
# Directory for meta files
meta_files:
dir: 'meta_files'
# Run configuration
run_config: 'run_config.csv'
# SQL generated by the load_generator script
load_sql: 'load.sql'
# Table columns
table_columns: 'table_columns.json'
# Table columns
table_columns_csv: 'table_columns.csv'
# Directory for temporary files
temporary_files:
dir: 'temporary_files'
# XID traces
xid_traces: 'xid_traces.csv'
# PG-XID traces
pgxid_traces: 'pgxid_traces.csv'
# XID to PG-XID mapping
xid_pgxid_mapping: 'xid_pgxid_mapping.csv'
# Merged traces
merged_traces: 'merged_traces.csv'
# PG-XID summary
pg_xid_summary: 'pg_xid_summary.csv'
use_s3: true
metrics:
ingest_total_time:
label: 'Ingest total time (ms)'
type: 'negative'
primary_total_time:
label: 'Primary total time (ms)'
type: 'display'
ingest_outperform_primary_percentage:
label: 'Percentage where the ingest is faster than the primary'
type: 'positive'
ingest_outperform_primary_count:
label: 'Number of times the ingest is faster than the primary'
type: 'positive'
primary_outperform_ingest_count:
label: 'Number of times the primary is faster than the ingest'
type: 'display'| Section | Config | Description |
|---|---|---|
| Config | system_json_path | Contains the path to the system JSON config. This is used to read connection to the primary database |
| DDL | num_schema | Determines the number of schemas to be created |
| table_configuration.num_tables | Number of tables under each schema | |
| table_configuration.min_columns / max_columns | Min and max number of columns per table. There are random column types that are created. The tables always have id, created_at and updated_at columns. The random types are: TEXT, INT, BIGINT, FLOAT, DOUBLE PRECISION, BOOLEAN, DATE, TIME, VARCHAR(255), CHAR(10), NUMERIC(10,2) |
|
| index_configuration.min_indexes / max_indexes | Min and max indexes per table | |
| index_configuration.min_columns_per_index / max_columns_per_index | Number of columns per index | |
| DML | load_configuration.num_inserts | Number of inserts per table. Follows batched_inserts config to determine if inserts are done together or in batches |
| load_configuration.num_updates | Randomly selects 1-3 columns to update. Uses ORDER BY with random column. Uses generate_values_list for new values |
|
| load_configuration.num_deletes | Randomly deletes the number of rows specified. Uses ORDER BY with random column |
|
| load_configuration.allow_nulls.rows | When inserts are done, this flag specifies if there can be NULL rows (except the id, created_at, updated_at columns). Randomly selects 10-15% of the total number of INSERTS |
|
| load_configuration.allow_nulls.cols | When inserts are done, this flag specifies if there can be NULL values for the columns (except the id, created_at, updated_at columns). Randomly select 10-15% of the columns |
|
| Other | batched_inserts | Sets if the number of inserts are done in a single transaction or if there are batches of inserts |
| operations | List of operations done. Possible values: create_table, create_index, insert_data, update_data, delete_data |
|
| use_existing_config | If set to true, it will look up the existing config file and recreate the same set of tables as before | |
| use_s3 | If set to true, the previous run configuration will be fetched from S3. If not, any run that happened in local before will be copied over to a _prev_run folder and considered as the previous run. If no previous run folder is present, it will be treated as a fresh run |
|
| comparison_threshold | Threshold above which the performance is considered to have degraded. See "Metrics" for more |
|
| Files | base_dir | Base output directory for the current run |
| prev_run_dir | Base output directory for the previous run | |
| output_files | Contains the following properties: • dir (Main directory for the output files) • query_info (The CSV file containing the details about the queries) • final_report (The XLSX file containing the final aggregated report data) • final_traces (The CSV files containing the traces generated after running the load generator script) • final_aggregates (The CSV files containing the aggregate data like total time taken to be used in the final_report) |
|
| meta_files | Contains the following properties: • dir (Main directory for the meta files) • run_config (The load generator configuration for the current run) • load_sql (A raw SQL that can be run to redo the steps in the current execution) • table_columns (A JSON file containing the information about the tables created as part of the current run, needed for existing_config) • table_columns_csv (CSV file with the information similar to table_columns JSON) |
|
| temporary_files | Contains the following properties: • dir (Main directory for the temporary files) • xid_traces (Traces mapping XID to logs) • pgxid_traces (Traces mapping PGXID to logs) • xid_pgxid_mapping (Mapping between XID to PGXID) • merged_traces (Trace file containing traces with both XID and PGXID mapped) • pg_xid_summary (Summary file with time mapped from the other temporary files) |
|
| Metrics | Metrics section containing the different type of configured metrics. There are 3 types as of now: 1. Negative (If the value goes down from previous run, it's considered good) 2. Positive (If the value goes up from previous run, it's considered good) 3. Display (Mostly used for Primary based metrics which are only used as a display) |
cd <springtail>/python/performance
python3 load_generator.py -c load_config.yaml