Skip to content

Load Generator

Garth Goodson edited this page Dec 16, 2025 · 1 revision

Load generator framework

The load generator script is available in <springtail>/python/performance/load_generator.py

It contains a config file <springtail>/python/performance/load_config.yaml

The config file looks like below

Configuration sample

system_json_path: '../../system.json.test'

# Number of schemas to create
num_schemas: 10

# Table configuration
table_configuration:
  # Number of tables per schema
  num_tables: 10
  # Number of columns per table (range)
  min_columns: 3
  max_columns: 10

# Index configuration
index_configuration:
  # Number of indexes per table (range)
  min_indexes: 0
  max_indexes: 2
  # Number of columns per index (range)
  min_columns_per_index: 1
  max_columns_per_index: 2

# Load configuration
load_configuration:
  # Number of inserts per table
  num_inserts: 500
  # Number of updates per table
  num_updates: 250
  # Number of deletes per table
  num_deletes: 125
  # Whether to allow nulls in the data for inserts
  allow_nulls:
    rows: true
    cols: true

# Whether to batch inserts
batched_inserts: false

# Operations to perform
operations: ['create_table', 'create_index', 'insert_data', 'update_data', 'delete_data']

# Whether to use the previous configuration to load tables
use_existing_config: true

# Comparison threshold for the final aggregates, increase of the percentage will result in failure
comparison_threshold: 15.0

file_configuration:
  base_dir: 'performance_test_output'
  # Directory for previous run files
  prev_run_dir: 'performance_test_output_prev_run'
  # Directory for output files
  output_files:
    dir: 'output_files'
    # Query info generated by the load_generator script
    query_info: 'query_info.csv'
    # Final report
    final_report: 'final_report.xlsx'
    # Final traces - Raw trace data
    final_traces: 'final_traces.csv'
    # Final aggregates
    final_aggregates: 'final_aggregates.csv'

  # Directory for meta files
  meta_files:
    dir: 'meta_files'
    # Run configuration
    run_config: 'run_config.csv'
    # SQL generated by the load_generator script
    load_sql: 'load.sql'
    # Table columns
    table_columns: 'table_columns.json'
    # Table columns
    table_columns_csv: 'table_columns.csv'

  # Directory for temporary files
  temporary_files:
    dir: 'temporary_files'
    # XID traces
    xid_traces: 'xid_traces.csv'
    # PG-XID traces
    pgxid_traces: 'pgxid_traces.csv'
    # XID to PG-XID mapping
    xid_pgxid_mapping: 'xid_pgxid_mapping.csv'
    # Merged traces
    merged_traces: 'merged_traces.csv'
    # PG-XID summary
    pg_xid_summary: 'pg_xid_summary.csv'

use_s3: true

metrics:
  ingest_total_time:
    label: 'Ingest total time (ms)'
    type: 'negative'
  primary_total_time:
    label: 'Primary total time (ms)'
    type: 'display'
  ingest_outperform_primary_percentage:
    label: 'Percentage where the ingest is faster than the primary'
    type: 'positive'
  ingest_outperform_primary_count:
    label: 'Number of times the ingest is faster than the primary'
    type: 'positive'
  primary_outperform_ingest_count:
    label: 'Number of times the primary is faster than the ingest'
    type: 'display'

Configuration details

Section Config Description
Config system_json_path Contains the path to the system JSON config.
This is used to read connection to the primary database
DDL num_schema Determines the number of schemas to be created
table_configuration.num_tables Number of tables under each schema
table_configuration.min_columns / max_columns Min and max number of columns per table.
There are random column types that are created.
The tables always have id, created_at and updated_at columns.
The random types are: TEXT, INT, BIGINT, FLOAT, DOUBLE PRECISION, BOOLEAN, DATE, TIME, VARCHAR(255), CHAR(10), NUMERIC(10,2)
index_configuration.min_indexes / max_indexes Min and max indexes per table
index_configuration.min_columns_per_index / max_columns_per_index Number of columns per index
DML load_configuration.num_inserts Number of inserts per table.
Follows batched_inserts config to determine if inserts are done together or in batches
load_configuration.num_updates Randomly selects 1-3 columns to update.
Uses ORDER BY with random column.
Uses generate_values_list for new values
load_configuration.num_deletes Randomly deletes the number of rows specified.
Uses ORDER BY with random column
load_configuration.allow_nulls.rows When inserts are done, this flag specifies if there can be NULL rows (except the id, created_at, updated_at columns).
Randomly selects 10-15% of the total number of INSERTS
load_configuration.allow_nulls.cols When inserts are done, this flag specifies if there can be NULL values for the columns (except the id, created_at, updated_at columns).
Randomly select 10-15% of the columns
Other batched_inserts Sets if the number of inserts are done in a single transaction or if there are batches of inserts
operations List of operations done.
Possible values: create_table, create_index, insert_data, update_data, delete_data
use_existing_config If set to true, it will look up the existing config file and recreate the same set of tables as before
use_s3 If set to true, the previous run configuration will be fetched from S3.
If not, any run that happened in local before will be copied over to a _prev_run folder and considered as the previous run.
If no previous run folder is present, it will be treated as a fresh run
comparison_threshold Threshold above which the performance is considered to have degraded.
See "Metrics" for more
Files base_dir Base output directory for the current run
prev_run_dir Base output directory for the previous run
output_files Contains the following properties:
• dir (Main directory for the output files)
• query_info (The CSV file containing the details about the queries)
• final_report (The XLSX file containing the final aggregated report data)
• final_traces (The CSV files containing the traces generated after running the load generator script)
• final_aggregates (The CSV files containing the aggregate data like total time taken to be used in the final_report)
meta_files Contains the following properties:
• dir (Main directory for the meta files)
• run_config (The load generator configuration for the current run)
• load_sql (A raw SQL that can be run to redo the steps in the current execution)
• table_columns (A JSON file containing the information about the tables created as part of the current run, needed for existing_config)
• table_columns_csv (CSV file with the information similar to table_columns JSON)
temporary_files Contains the following properties:
• dir (Main directory for the temporary files)
• xid_traces (Traces mapping XID to logs)
• pgxid_traces (Traces mapping PGXID to logs)
• xid_pgxid_mapping (Mapping between XID to PGXID)
• merged_traces (Trace file containing traces with both XID and PGXID mapped)
• pg_xid_summary (Summary file with time mapped from the other temporary files)
Metrics Metrics section containing the different type of configured metrics.
There are 3 types as of now:
1. Negative (If the value goes down from previous run, it's considered good)
2. Positive (If the value goes up from previous run, it's considered good)
3. Display (Mostly used for Primary based metrics which are only used as a display)

Run command

cd <springtail>/python/performance
python3 load_generator.py -c load_config.yaml

Clone this wiki locally