Postgres extension to sample statements or transactions to logs
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
debian
expected Fix missing utility statements and change formating Feb 2, 2019
sql
.gitignore Debian packaging Feb 10, 2019
CHANGELOG.md
LICENSE first commit Jan 27, 2019
META.json
Makefile
README.md
pg_sampletolog.c
pg_sampletolog.control

README.md

pg_sampletolog

pg_sampletolog is a PostgreSQL extension which allows to sample statements and/or transactions to logs. It add in PostgreSQL (from 9.4 to 11) same kind of statement sampling added in PostgreSQL 12 (currently not released).

pg_sampletolog allows to:

  • Log a sample of statements
  • Log a sample of transactions
  • Log before or after execution (in order to be compatible with pgreplay)
  • Log all DDL or MOD statements, same as log_statement
  • Log statement's queryid if pg_stat_statements is installed

As it is a recent release, do not use in production. But feedbacks are welcome ;)

Installation

Compiling

The module can be built using the standard PGXS infrastructure. For this to work, the pg_config program must be available in your $PATH. Instruction to install follows:

git clone https://github.com/anayrat/pg_sampletolog.git
cd pg_sampletolog
make
make install

Postgres setup

Extension must me loaded either:

  • In local session with LOAD 'pg_sampletolog'; order.
  • In session_preload_libraries. To be loaded at connection start.

New settings should be visible in pg_settings view:

select name from pg_settings where name like 'pg_sampletolog%';
                 name                 
----------------------------------------
 pg_sampletolog.disable_log_duration
 pg_sampletolog.log_before_execution
 pg_sampletolog.log_level
 pg_sampletolog.log_statement
 pg_sampletolog.statement_sample_rate
 pg_sampletolog.transaction_sample_rate
(6 rows)

Usage

pg_sampletolog is controlled by specifing GUC settings. So you can apply them in your postgresql.conf or change them inside the session with SET order.

By default, pg_sampletolog is disabled, you have to change theses settings (default values in parenthesis):

  • pg_sampletolog.log_level (LOG): Control log severity level. Refer to PostgreSQL documentation for their meaning : Message Severity Levels
  • pg_sampletolog.statement_sample_rate (0): Control fraction of statements to report in log.
  • pg_sampletolog.transaction_sample_rate (0): Control fraction of transactions to report in log.
  • pg_sampletolog.log_statement none): Allow to log all DDL and/or MOD statements.
  • pg_sampletolog.log_before_execution (false): Control to log statements before or after execution.
  • pg_sampletolog.disable_log_duration (false): Disable duration reporting (mainly used for tests).

Examples:

  • Log only 10% of statements: pg_sampletolog.statement_sample_rate = 0.1

pg_sampelog will log arround 10% of statements. It will do computation for each statement by calling random() function whose cost is negligible. Message output will look like (depending of your log_line_prefix) :

2019-01-27 12:50:39.361 CET [27047] LOG:  Sampled query duration: 0.014 ms - SELECT 1;
  • Log only 10 of transactions: pg_sampletolog.transaction_sample_rate = 0.1

Same as previous, with the difference that, when a transaction starts, pg_sampletolog will choose if the transaction will be logged. When a transaction is logged, all statements of the transaction are logged.

Message output will look like (depending of your log_line_prefix) :

Transaction : BEGIN; SELECT 1; SELECT 1; COMMIT;

2019-01-27 12:51:40.562 CET [27069] LOG:  Sampled transaction duration: 0.008 ms - SELECT 1;
2019-01-27 12:51:40.562 CET [27069] LOG:  Sampled transaction duration: 0.005 ms - SELECT 1;
  • Log all DDL statements: pg_sampletolog.log_statement = 'ddl':

pg_sampletolog will log all DDL statements, it is similar to log_statement = ddl parameter in Postgres. It could be useful if you want a sample of read queries but catch all ddl changes.

2019-01-27 12:53:47.564 CET [27103] LOG:  Sampled ddl CREATE TABLE t1(c1 int);
  • Log all data-modifying statements: pg_sampletolog.log_statement = 'mod':

pg_sampletolog will log all data-modifying statements (including ddl), it is similar to log_statement = mod parameter in Postgres. It could be useful if you want a sample of read queries but catch all write changes.

2019-01-27 12:59:54.043 CET [27160] LOG:  Sampled query duration: 0.246 ms - INSERT INTO t1 VALUES(1);
2019-01-27 13:00:16.468 CET [27160] LOG:  Sampled ddl CREATE INDEX ON t1(c1);
  • Log before execution: pg_sampletolog.log_before_execution = on

By default, pg_sampletolog, will log after query is executed. It could be useful if you want to use results with pgreplay.

pg_sampletolog works on primary, but also on standby. If you call an SQL function containing several queries, pg_sampletolog will only log function call.

With a simple test with pgbench and log_min_duration = 0:

pgbench -c8 -j2 -t 30000 -P5 bench
starting vacuum...end.
progress: 5.0 s, 10618.2 tps, lat 0.750 ms stddev 0.170
progress: 10.0 s, 10690.9 tps, lat 0.746 ms stddev 0.168
progress: 15.0 s, 10596.4 tps, lat 0.752 ms stddev 0.149
progress: 20.0 s, 9547.6 tps, lat 0.836 ms stddev 0.176
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: simple
number of clients: 8
number of threads: 2
number of transactions per client: 30000
number of transactions actually processed: 240000/240000
latency average = 0.774 ms
latency stddev = 0.175 ms
tps = 9836.617690 (including connections establishing)
tps = 9837.334350 (excluding connections establishing)
pgbench -c8 -j2 -t 30000 -P5 bench  6.96s user 12.32s system 78% cpu 24.448 total


wc -l logfile
1680009

Same test with pg_sampletolog.statement_sample_rate = 0.1

wc -l logfile
119859

We got less than 10% lines of first test because pg_sampletolog don't log BEGIN or COMMIT orders.

Bonus if you have pg_stat_statements installed, pg_sampletolog will report queryid:

LOG:  Sampled query duration: 0.032 ms - queryid = 5892081150081336634 - UPDATE pgbench_tellers SET tbalance = tbalance + -3337 WHERE tid = 326;

It is useful to get query's parameters from a query you identified in pg_stat_statements' view.

Limitations

No supported:

  • PREPARE statements are not supported.

Testing

Tests are in sql/pg_sampletolog.sql and results in expected/pg_sampletolog.out. Use make installcheck to run them, you should see:

/tmp/pgbuild/lib/postgresql/pgxs/src/makefiles/../../src/test/regress/pg_regress --inputdir=./ --bindir='/tmp/pgbuild/bin'    --dbname=contrib_regression pg_sampletolog
(using postmaster on Unix socket, port 5432)
============== dropping database "contrib_regression" ==============
DROP DATABASE
============== creating database "contrib_regression" ==============
CREATE DATABASE
ALTER DATABASE
============== running regression test queries        ==============
test pg_sampletolog               ... ok

=====================
 All 1 tests passed. 
=====================

Misc

Regex to remove prefix added by pg_sampletolog and replace it by statement: (same prefix as log_statement):

sed -r 's/Sampled (query|ddl|transaction) - (Duration: [0-9]*\.[0-9]* ms)?( - )?(queryid = -?[0-9]*)?( - )?/statement: /g' inputlog.csv > output.log

TODO

  • Add extension to PGXN
  • Add debian packages
  • Add centos packges
  • Log statements by their queryid

Author

Adrien NAYRAT

Licence

pg_sampletolog is free software distributed under the PostgreSQL license.

Copyright (c) 2019, Adrien NAYRAT