Skip to content

aws-samples/amazon-rds-postgresql-bulk-loading-blog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Optimized bulk loading in Amazon RDS for PostgreSQL

The artifacts in this repository support the published blog: Optimized bulk loading in Amazon RDS for PostgreSQL. Refer to the blog for detailed instructions on setup and configuration for benchmarking.

Setting up your testbench

To replicate the results presented in the blog you will need to complete the following steps:

  • Create an EC2 instance to serve as a bastion host from which to execute psql commands.
  • Create a multi-AZ Amazon RDS for PostgreSQL database instance with IAM authentication enabled.
  • Follow this guide to create an IAM policy, an IAM role, and attach the role to Amazon RDS for PostgreSQL instance.
  • Follow this guide to install the aws_s3 extension, create an IAM role and policy that provides Amazon S3 permissions to execute the s3import function.
  • Setup a pgpass file with the proper credentials to allow the bash scripts to execute against the database instance.
  • Create the imdb database, staging schema, and imdb tables using the provided DDL scripts.
  • Download and stage the IMDb public dataset in an Amazon S3 bucket.
  • Git clone this repo to the bastion host.

Optimized bulk load configuration

The folloiwng cli commands and json are provided in the cli folder.

1. Setup optimized database parameter group

First create a database parameter group and set parameters that are optimized for bulk loading.

aws rds create-db-parameter-group \
   --db-parameter-group-name rds-postgres14-bulkload \
   --db-parameter-group-family postgres14 \
   --description "Optimized database parameters for bulk loading into Amazon RDS for PostgreSQL"

Create json file rds-postgresql14-bulkload.json with optimized parameters.

{
    "DBParameterGroupName": "rds-postgresql14-bulkload",
    "Parameters": [
		{
			"ParameterName": "maintenance_work_mem",
			"ParameterValue": "1048576",
			"ApplyMethod": "immediate"
		},
		{
			"ParameterName": "max_wal_size",
			"ParameterValue": "4096",
			"ApplyMethod": "immediate"
		},
		{
			"ParameterName": "checkpoint_timeout",
			"ParameterValue": "1800",
			"ApplyMethod": "immediate"
		}
	]
}

Modify the json file with parameter values to the optimized database parameter group.

aws rds modify-db-parameter-group \
   --db-parameter-group-name rds-postgres14-bulkload \
   --cli-input-json file://rds-postgresql14-bulkload.json

2. Apply optimized configurations prior to bulk loading

Apply database parameter group optimized for bulk loading

aws rds modify-db-instance \
   --db-instance-identifier [DB_INSTANCE_IDENTIFIER] \
   --db-parameter-group rds-postgres14-bulkload

3. Return to normal configuration after bulk loading completes

Apply normal database parameter group

aws rds modify-db-instance \
   --db-instance-identifier [DB_INSTANCE_IDENTIFIER] \
   --db-parameter-group default.postgres14

Performing bulk loading

The bash scripts provided execute psql commands against the Amazon RDS for PostgreSQL database. The script named copy_bulk_sequential.sh executes each psql command after the previous one finishes. The script named copy_bulk_parallel.sh executes all commands in parallel without waiting for the previous commands to complete. There are also a "pre" and "post" bulk loading scripts. The pre_bulkloading.sh script disables autovacuum at the table level. The post_bulkloading.sh script executes a checkpoint command, runs vacuum analyze on each of the target tables, then reenables autovacuum on each table.

Make scripts executable

cd psql
chmod +x pre_bulkloading.sh
chmod +x post_bulkloading.sh
chmod +x copy_bulk_sequential.sh
chmod +x copy_bulk_parallel.sh

Executing bulk load sequentially and report loading time

./pre_bulkloading.sh
time ./copy_bulk_sequential.sh
./post_bulkloading.sh

Executing bulk load in parallel and report loading time

./pre_bulkloading.sh
time ./copy_bulk_parallel.sh
./post_bulkloading.sh

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published