Skip to content

A Go client implementing some of TPC-H's benchmark queries for Postgres. Both raw queries and queries with incremental views (pg_ivm) are supported.

License

Notifications You must be signed in to change notification settings

AndreRijo/postgres-tpch-client

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Disclaimer: This client was designed with the sole purpose of comparing PostgreSQL's performance with PotionDB when executing a subset of TPC-H's queries. As such, it does not implement the complete set of TPC-H queries, and it may be difficult to use this client for purposes other than benchmarking with TPC-H. If you need a general purpose client for Postgres, please consider other alternatives.

PostgreSQL TPC-H Go Client

A Go client for PostgreSQL that implements a part of TPC-H's benchmark. It is assumed the reader is familiar with TPC-H's specification. Some of the code (namely table representations) is shared with PotionDB's TPC-H Client.

This client relies on the Bun library for communcation with PostgreSQL. The Pg_ivm library is used on both client and server side in order to support incrementally maintained materialized views.

Index

Features

  • Full support of TPC-H's dataset: all tables have internal representations as Go's structs;
  • Full support of dataset updates, as defined in the TPC-H's specification;
  • Support with and without materialized views for the following TPC-H queries: 3, 5, 11, 14, 15 and 18.
  • Support for loading TPC-H's dataset and views into PostgreSQL;
  • Customizable client: batching, raw/view queries, % of updates, etc.

TPC-H Dataset generation

Note: If you have already used PotionDB's TPC-H Client, you can re-use the TPC-H dataset generated for that client, and skip to the next section.

In order to benchmark PostgreSQL with TPC-H, it is first necessary to generate a dataset. This can be summarized into the following steps:

  1. Download TPC-H tools and generate the dataset;
  2. (Optionally) Modify the dataset to give locality to the data (For the motivation and explanation on this, please check here).

Note: For VLDB reviewers, step 2 is obligatory in order to reproduce the results reported in the paper.

TPC-H tool and dataset generation

First, download the TPC-H tools here. This will be necessary to generate the TPC-H dataset.

Afterwards, you will need to build the tool. Refer to the README file included with the tool for instructions, as well as the TPC-H specification.

For the parameters of the makefile, use the following:

DATABASE=ORACLE
WORKLOAD=TPCH

To generate the dataset and the set of updates to be used, run, respectively:

dbgen -s 1
dbgen -s 1 -u 1000

Note: If testing PostgreSQL with modest hardware, you may want to test with a lower -s (for example, 0.1). While the tool works for values under 1, said values are not officially supported by TPC-H.

Finally, organize the dataset and updates using the following folder structure:

tpch_data\
    tables\$sSF\
    upds\$sSF\
    headers\

Where $scorresponds to the value used for -s when generating the dataset. For example, for -s 1:

tpch_data\
    tables\1SF\
    upds\1SF\
    headers\

Put all generated tables and updates, respectively, under tables\$sSF and upds\$sSF. Fill the headers folder with the contents of the tpch_headers folder in this repository.

Modifying the dataset for locality

Use the TPC-H Locality tool that we have made, available here.

After the new dataset and updates are generated by the TPC-H Locality tool, replace the contents of the folders tables and upds with the new data. The new data will be inside a folder named _mod that is inside both tables and upds.

PostgreSQL setup

Installing PostgreSQL and pg_ivm extension

This section explains how to setup PostgreSQL with pg_ivm extension for running TPC-H benchmarks with this client. For other purposes, you can either refer to the official documentation for an installer, build from source or use the official PostgreSQL docker image.

This tutorial assumes a linux-based operating system. It also assumes that you have sudo access to the system. We cover two options: setting up PostgreSQL in a RAMDisk, and a standard instalation of PostgreSQL. Both options will include the installation of pg_ivm extension and setting up key settings for PostgreSQL.

VLDB reviewers must use the RAMDisk installation in order to be able to replicate the results published in our paper.

Option 1: Setting up PostgreSQL in a RAMDisk

First, prepare a folder to hold all the required files. We'll refer to such folder as the baseFolder.

Then, download PostgreSQL v16.2's source code here and pg_ivm 1.8 here. Extract them into folders named, respectively, Postgresql/postgresql-16.2 and pg_ivm-1.8. Afterwards, download the scripts in the scriptsPostgreSQL folder of this repository.

Run the following command to set up a RAMDisk and install Postgres on it:

scriptsPostgreSQL/postgres_install_ramdisk.sh $base_folder $ramdisk_folder $ramdisk_size $username

Where:

  • $base_folder is the folder above the Postgresql folder;
  • $ramdisk_folder is the desired mount point for the RAMDisk;
  • $ramdisk_size is the desired size for the RAMDisk, in the format of number+suffix. For example, 64G stands for 64GB;
  • $username stands for the name of the current user of the system. PostgreSQL will use this username to ensure it can write to the RAMDisk.

Example:

scriptsPostgreSQL/postgres_install_ramdisk.sh . /media/ramdisk 64G myuser

After running the command above, PostgreSQL will be running and listening to port 5432.

Note for VLDB reviewers: use 64G as $ramdisk_size in order to more accurately replicate our test environment.

Option 2: Setting up PostgreSQL in a physical disk.

Proceed as in Option 1, up until before running the script.

Then, run the following script to install PostgreSQL and pg_ivm:

scriptsPostgreSQL/postgres_install.sh $base_folder $destination_folder $username

Where:

  • $base_folder is the folder above the Postgresql folder;
  • $destination_folder is the desired instalation location for PostgreSQL (e.g., /usr/local). A new folder will be creased inside destination_folder;
  • $username stands for the name of the current user of the system. PostgreSQL will use this username to ensure it can write in $destination_folder.

Example:

scriptsPostgreSQL/postgres_install.sh . /usr/local myuser

After running the command above, PostgreSQL will be running and listening to port 5432.

Preparing PostgreSQL for TPC-H

To setup the initial data for the TPC-H benchmark, run the following command:

scriptsPostgreSQL/setupDB.sh $psql_location $script_folder $table_folder

Where:

  • $psql_location is the location of the psql executable inside your Postgres instalation. By default, this executable is in ramdisk_folder/pgsql/bin;
  • $script_folder is the location for the scriptsPostgreSQL folder. The folder should be included in the path;
  • $table_folder is the location of the TPC-H tables. By default, this should point to tpch_data/tables/1SF. Adjust the SF to the actual value used for SF during the TPC-H dataset generation.

The command above can also be used to reset PostgreSQL back to its initial state for TPC-H, as it drops any existing TPC-H tables, views and indexes before loading the base data.

Note for VLDB reviewers: use 1SF for SF and do not change any setting in postgres_install_ramdisk.sh, in order to more accurately replicate our test environment.

Extra helpful commands

PostgreSQL can be stopped with the following command:

/media/ramdisk/pgsql/bin/pg_ctl stop -D /media/ramdisk/pgsql/data

To erase the RAMDisk, run the command above followed by:

sudo umount /media/ramdisk

If you want to change some parameters of PostgreSQL, you can either use the psql tool included with PostgreSQL (at /media/ramdisk/pgsql/bin/psql) or edit the postgres_install_ramdisk.sh script.

PostgreSQL TPC-H Go Client setup

Option 1: Using docker and pre-compiled image (easiest solution)

Start by installing Docker: https://docs.docker.com/engine/install/

To obtain the pre-compiled Docker image of PostgreSQL TPC-H Go Client, do:

docker pull andrerj/postgresclient

Option 2: Using Docker and compile from source

First, prepare a folder to hold all the required files. We'll refer to such folder as the baseFolder.

Afterwards, go inside the baseFolder and clone the following repositories:

git clone https://github.com/AndreRijo/potionDB.git potionDB
git clone https://github.com/AndreRijo/tpch-data-processor.git tpch_data_processor
git clone https://github.com/AndreRijo/postgres-tpch-go-lib.git postgresTpchGoLib
git clone https://github.com/AndreRijo/postgres-tpch-client postgresTPCHClient

Make sure you have Docker installed in your system. If not, please check here for instructions on installing Docker.

Afterwards, from the baseFolder, build the Docker image:

docker build -f postgresTPCHClient/Dockerfile . -t postgresclient

In the Executing the client with Docker section, for all commands replace andrerj/postgresclient with postgresclient.

Option 3: Using Go and running from source, without Docker

Proceed as in Option 2 up to (exclusive) the instalation of Docker.

This client should work with any version of Go with support for modules and generics. However, we recommend using Go 1.20.4. or newer.

To check if you have Go installed and what is its version, run the following command:

go version

For installing or upgrading your Go instalation, please check here.

Afterwards, inside baseFolder/postgresTPCHClient/src, run:

go run ./main/main.go --data_folder=$path_to_data --config=$path_to_config --test_name=$path_to-results

Where:

  • $path_to_data is the path to the folder tpch_data
  • $path_to_config is the path to the folder with the configuration file intended to be used;
  • path_to_results is the path to the folder where you want to store the results of the benchmarking.

In the three parameters above, both relative and full paths are acceptable.

For details on how to configure the client, please check Client configuration section.

Executing the client with Docker

(For executing the client without Docker, please check the previous subsection)

Start the client with the following command:

docker run -it --name postgrestpch1 -v "$your_path_to_data/tpch_data/:/go/data" -v "$your_path_to_configs/:/go/configs/extern/" -v "your_path_to_results/:/go/results" -e CONFIG="/go/configs/extern/$YOUR_CONFIG_FOLDER/" andrerj/postgresclient

Where:

  • $your_path_to_data is the full path to the folder tpch_data created earlier;
  • your_path_to_configsis the path to the folder with your own/modified configuration files. If using one of the default ones, this -v can be removed and you should set CONFIG=/go/configs/path_to_the_choosen_config/;
  • your_config_folder the relative (using as base $your_path_to_configs) path to the folder with the intended configuration file. For example, if your config.cfg is at folder /home/yourname/myconfigs/mytpch, $your_path_to_data would be home/yourname/myconfigs and your_config_folderwould be mytpch;
  • your_path_to_results the full path to the folder where you want to store the statistics of executing the TPC-H benchmark.

Note: Depending on your operating system and Docker installation, you may need to set up your Docker to allow sharing of folders.

An example of usage with a custom configuration file:

docker run -it --name postgrestpch1 -v "home/myname/tpch_data:/go/data/" -v "$home/myname/myPostgresTpchClientConfigs/:/go/configs/extern/" -v "home/myname/postgresTpch_results/:/go/results/" -e CONFIG="/go/configs/extern/myconfig/" andrerj/postgresclient

Or with a pre-made configuration file:

docker run -it --name postgrestpch1 -v "home/myname/tpch_data:/go/data/" -v "home/myname/postgresTpch_results/:/go/results/" -e CONFIG="/go/configs/default/1SF/ALL/" andrerj/tpchclient

You can use the command above if you're testing PostgreSQL in your local machine. If PostgreSQL is running on a different machine from the client, you can adapt configs/cluster/1SF/ALL/config.cfg.

The client will start by setting up the views in PostgreSQL. After the views are ready, a pre-defined number of clients will start querying the server. Please check Client configuration on how to configure the client's options.

Client configuration

A full list of all options for configuration files is available on pgConfigs.go, method loadConfigsFile(). It is also possible to set most of those settings by using Docker's environment variables: check pgConfigs.go, method loadFlags() and start.sh

Always use one of the existing configuration files as a base: the configurations in configs/cluster and configs/default are good starting points. You will likely want to change the settings named ip, queryClients, queryWait, queryDuration, nReadsTxn and updRate.

Some of the most relevant, available settings:

  • ip: ip:port of PostgreSQL server;
  • user: user used to access PostgreSQL. Can be either the user specified in PostgreSQL installation or another user added to PostgreSQL;
  • scale: corresponds to the value used for -s when the dataset was generated with dbgen;
  • doDataLoad (true/false): defines if the client should do the initial loading of the database or not. It is preferable for PostgreSQL to load the data itself as loading through the client is very slow;
  • doViewload and doIndexload (true/false on both): defines if the client should, respectively, load the views or indexes to PostgreSQL. Note that queries are very slow without the views. When doViewload is true, the value of doIndexload is irrelevant (as then the indexes no longer help);
  • doQueries and doUpdates (true/false on both): defines if the client should do queries and/or updates. It is recommended to leave these both as true and control the update-query rate with updRate;
  • updRate (0-1): defines the percentage of operations that should be updates. Setting to 0 or 1 corresponds to only doing, respectively, queries or updates;
  • startUpdFile and finishUpdFile: if running multiple instances of the client, you will need to adjust this to ensure each instance has a non-overlaping subset of update files;
  • queryClients: number of connections to use to execute queries and/or updates. You will want to vary this in order to grasp how PostgreSQL scales;
  • queryWait (ms): time for the clients to wait before starting to execute the TPC-H benchmark. The client will subtract from the waiting time any time it took preparing itself and/or doing the initial loading of the database. This is useful for coordinating multiple instances of the client. This time will, unavoidably, depend on your system and scale value: try first running a single client instance only to measure the time it takes for the client and PostgreSQL to be ready. Remember that clients read TPC-H data in order to do updates;
  • queryDuration (ms): the time for which TPC-H's benchmark is run for, after queryWait ends;
  • id: the name of the client. When running multiple client instances, it is important to set a different id for each client, in order to avoid conflicts when statistics files are produced in a shared folder environment;
  • statsLocation: the location to save the statistics files produced by the client.
  • statisticsInterval (ms): time between which statistics are recorded. For most use cases it is fine to leave this unchanged (5000ms).
  • tpchAddRate (0-1): when executing updates, defines the percentage of updates that are new orders, compared to deletion of orders. TPC-H's specification suggests to set this to 0.5, however in a real-world scenario for an e-commerce application this is heavily skewed towards 1 (mostly new with very rare deletions).
  • nReadsTxn: number of operations (either read or updates) to execute per transaction. A higher value may lead to a higher throughput, at the sacrifice of possibly higher latency per transaction.

About

A Go client implementing some of TPC-H's benchmark queries for Postgres. Both raw queries and queries with incremental views (pg_ivm) are supported.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published