Postgres partitioning as easy as pie
Ruby
Clone or download
Latest commit 2c53b7c Jul 7, 2018

README.md

pgslice

Postgres partitioning as easy as pie. Works great for both new and existing tables, with zero downtime and minimal app changes. Archive older data on a rolling basis to keep your database size under control.

🍊 Battle-tested at Instacart

Build Status

Install

pgslice is a command line tool. To install, run:

gem install pgslice

This will give you the pgslice command.

Steps

  1. Ensure the table you want to partition has been created. We’ll refer to this as <table>.

  2. Specify your database credentials

export PGSLICE_URL=postgres://localhost/myapp_development
  1. Create an intermediate table
pgslice prep <table> <column> <period>

Period can be day or month.

This creates a table named <table>_intermediate with the appropriate trigger for partitioning.

  1. Add partitions
pgslice add_partitions <table> --intermediate --past 3 --future 3

This creates child tables that inherit from the intermediate table.

Use the --past and --future options to control the number of partitions.

  1. Optional, for tables with data - Fill the partitions in batches with data from the original table
pgslice fill <table>

Use the --batch-size and --sleep options to control the speed.

To sync data across different databases, check out pgsync.

  1. Analyze tables
pgslice analyze <table>
  1. Swap the intermediate table with the original table
pgslice swap <table>

The original table is renamed <table>_retired and the intermediate table is renamed <table>.

  1. Fill the rest (rows inserted between the first fill and the swap)
pgslice fill <table> --swapped
  1. Back up the retired table with a tool like pg_dump and drop it
pg_dump -c -Fc -t <table>_retired $PGSLICE_URL > <table>_retired.dump
psql -c "DROP <table>_retired" $PGSLICE_URL

Sample Output

pgslice prints the SQL commands that were executed on the server. To print without executing, use the --dry-run option.

pgslice prep visits created_at month
-- Postgres 10

BEGIN;

CREATE TABLE visits_intermediate (LIKE visits INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING STORAGE INCLUDING COMMENTS) PARTITION BY RANGE (created_at);

COMMENT ON TABLE visits_intermediate is 'column:created_at,period:day';

COMMIT;

-- Postgres 9.6 and below

BEGIN;

CREATE TABLE visits_intermediate (LIKE visits INCLUDING ALL);

CREATE FUNCTION visits_insert_trigger()
    RETURNS trigger AS $$
    BEGIN
        RAISE EXCEPTION 'Create partitions first.';
    END;
    $$ LANGUAGE plpgsql;

CREATE TRIGGER visits_insert_trigger
    BEFORE INSERT ON visits_intermediate
    FOR EACH ROW EXECUTE PROCEDURE visits_insert_trigger();

COMMENT ON TRIGGER visits_insert_trigger ON visits_intermediate is 'column:created_at,period:month';

COMMIT;
pgslice add_partitions visits --intermediate --past 1 --future 1
-- Postgres 10

BEGIN;

CREATE TABLE visits_201608 PARTITION OF visits_intermediate FOR VALUES FROM ('2016-08-01') TO ('2016-09-01');

ALTER TABLE visits_201608 ADD PRIMARY KEY (id);

CREATE INDEX ON visits_201608 USING btree (user_id);

CREATE TABLE visits_201609 PARTITION OF visits_intermediate FOR VALUES FROM ('2016-09-01') TO ('2016-10-01');

ALTER TABLE visits_201609 ADD PRIMARY KEY (id);

CREATE INDEX ON visits_201609 USING btree (user_id);

CREATE TABLE visits_201610 PARTITION OF visits_intermediate FOR VALUES FROM ('2016-10-01') TO ('2016-11-01');

ALTER TABLE visits_201610 ADD PRIMARY KEY (id);

CREATE INDEX ON visits_201610 USING btree (user_id);

COMMIT;

-- Postgres 9.6 and below

BEGIN;

CREATE TABLE visits_201608
    (CHECK (created_at >= '2016-08-01'::date AND created_at < '2016-09-01'::date))
    INHERITS (visits_intermediate);

ALTER TABLE visits_201608 ADD PRIMARY KEY (id);

CREATE INDEX ON visits_201608 USING btree (user_id);

CREATE TABLE visits_201609
    (CHECK (created_at >= '2016-09-01'::date AND created_at < '2016-10-01'::date))
    INHERITS (visits_intermediate);

ALTER TABLE visits_201609 ADD PRIMARY KEY (id);

CREATE INDEX ON visits_201609 USING btree (user_id);

CREATE TABLE visits_201610
    (CHECK (created_at >= '2016-10-01'::date AND created_at < '2016-11-01'::date))
    INHERITS (visits_intermediate);

ALTER TABLE visits_201610 ADD PRIMARY KEY (id);

CREATE INDEX ON visits_201610 USING btree (user_id);

CREATE OR REPLACE FUNCTION visits_insert_trigger()
    RETURNS trigger AS $$
    BEGIN
        IF (NEW.created_at >= '2016-09-01'::date AND NEW.created_at < '2016-10-01'::date) THEN
            INSERT INTO visits_201609 VALUES (NEW.*);
        ELSIF (NEW.created_at >= '2016-10-01'::date AND NEW.created_at < '2016-11-01'::date) THEN
            INSERT INTO visits_201610 VALUES (NEW.*);
        ELSIF (NEW.created_at >= '2016-08-01'::date AND NEW.created_at < '2016-09-01'::date) THEN
            INSERT INTO visits_201608 VALUES (NEW.*);
        ELSE
            RAISE EXCEPTION 'Date out of range. Ensure partitions are created.';
        END IF;
        RETURN NULL;
    END;
    $$ LANGUAGE plpgsql;

COMMIT;
pgslice fill visits
/* 1 of 3 */
INSERT INTO visits_intermediate ("id", "user_id", "ip", "created_at")
    SELECT "id", "user_id", "ip", "created_at" FROM visits
    WHERE id > 0 AND id <= 10000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date

/* 2 of 3 */
INSERT INTO visits_intermediate ("id", "user_id", "ip", "created_at")
    SELECT "id", "user_id", "ip", "created_at" FROM visits
    WHERE id > 10000 AND id <= 20000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date

/* 3 of 3 */
INSERT INTO visits_intermediate ("id", "user_id", "ip", "created_at")
    SELECT "id", "user_id", "ip", "created_at" FROM visits
    WHERE id > 20000 AND id <= 30000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date
pgslice analyze visits
ANALYZE VERBOSE visits_201608;

ANALYZE VERBOSE visits_201609;

ANALYZE VERBOSE visits_201610;

ANALYZE VERBOSE visits_intermediate;
pgslice swap visits
BEGIN;

SET LOCAL lock_timeout = '5s';

ALTER TABLE visits RENAME TO visits_retired;

ALTER TABLE visits_intermediate RENAME TO visits;

ALTER SEQUENCE visits_id_seq OWNED BY visits.id;

COMMIT;

Adding Partitions

To add partitions, use:

pgslice add_partitions <table> --future 3

Add this as a cron job to create a new partition each day or month.

# day
0 0 * * * pgslice add_partitions <table> --future 3 --url ...

# month
0 0 1 * * pgslice add_partitions <table> --future 3 --url ...

Add a monitor to ensure partitions are being created.

SELECT 1 FROM
    pg_catalog.pg_class c
INNER JOIN
    pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE
    c.relkind = 'r' AND
    n.nspname = 'public' AND
    c.relname = '<table>_' || to_char(NOW() + INTERVAL '3 days', 'YYYYMMDD')
    -- for months, use to_char(NOW() + INTERVAL '3 months', 'YYYYMM')

Archiving Partitions

Back up and drop older partitions each day or month.

pg_dump -c -Fc -t <table>_201609 $PGSLICE_URL > <table>_201609.dump
psql -c "DROP <table>_201609" $PGSLICE_URL

If you use Amazon S3 for backups, s3cmd is a nice tool.

s3cmd put <table>_201609.dump s3://<s3-bucket>/<table>_201609.dump

Additional Commands

To undo prep (which will delete partitions), use:

pgslice unprep <table>

To undo swap, use:

pgslice unswap <table>

App Considerations

This set up allows you to read and write with the original table name with no knowledge it’s partitioned. However, there are a few things to be aware of.

Writes

Before Postgres 10, if you use INSERT statements with a RETURNING clause (as frameworks like Rails do), you’ll no longer receive the id of the newly inserted record(s) back. If you need this, you can either:

  1. Insert directly into the partition
  2. Get value before the insert with SELECT nextval('sequence_name') (for multiple rows, append FROM generate_series(1, n))

Reads

When possible, queries should include the column you partition on to limit the number of partitions the database needs to check. For instance, if you partition on created_at, try to include it in queries:

SELECT * FROM
    visits
WHERE
    user_id = 123 AND
    -- for performance
    created_at >= '2016-09-01' AND created_at < '2016-09-02'

For this to be effective, ensure constraint_exclusion is set to partition (default value) or on.

SHOW constraint_exclusion;

One Off Tasks

You can also use pgslice to reduce the size of a table without partitioning by creating a new table, filling it with a subset of records, and swapping it in.

pgslice prep <table> --no-partition
pgslice fill <table> --where "id > 1000" # use any conditions
pgslice swap <table>

Schema Updates

Once a table is partitioned, here’s how to change the schema:

  • To add, remove, or modify a column, make the update on the master table only
  • To add or remove an index, make the update on the master table and all partitions (for Postgres 11, make the update on the master table only)

Declarative Partitioning

Postgres 10 introduces declarative partitioning. A major benefit is INSERT statements with a RETURNING clause work as expected. If you prefer to use trigger-based partitioning instead (not recommended), pass the --trigger-based option to the prep command.

Upgrading

Run:

gem install pgslice

To use master, run:

gem install specific_install
gem specific_install https://github.com/ankane/pgslice.git

Docker

docker build -t pgslice .
alias pgslice="docker run --rm -e PGSLICE_URL pgslice"

This will give you the pgslice command.

Reference

Related Projects

Also check out:

  • PgHero - A performance dashboard for Postgres
  • pgsync - Sync Postgres data to your local machine

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To run tests, do:

git clone https://github.com/ankane/pgslice.git
cd pgslice
bundle install
createdb pgslice_test
bundle exec rake

To test against different versions of Postgres with Docker, use:

docker run -p=8000:5432 postgres:10
TZ=Etc/UTC PGSLICE_URL=postgres://postgres@localhost:8000/postgres bundle exec rake

On Mac, you must use Docker for Mac for the port mapping to localhost to work.