Skip to content

Babelfish Dump and Restore

suprio-amzn edited this page Mar 3, 2024 · 2 revisions

Babelfish Dump and Restore

You can use bbf_dump or bbf_dumpall utilities to perform dump on a Babelfish for PostgreSQL database (babelfish_db) or single Babelfish T-SQL database. You can restore these dumped database(s) on a target instance.

Topics

Prerequisites

Ensure that you have the following prerequisites to perform dump and restore on a Babelfish instance from an EC2 Linux client:

  • Launch Amazon EC2 with an Amazon Linux 2023(AL2023) using the Amazon EC2 console. For more information, see Amazon Linux 2023.
  • Install psql and pg_restore clients for PostgreSQL. The following example shows you how to install PostgreSQL from Amazon Linux 2023:
sudo yum install -y postgresql

The following example query shows how to verify the installation:

psql --version
pg_restore --version

You need to verify the installation in order to restore a Babelfish database or an instance from the dump.

  • Install bbf_dump and bbf_dumpall client utilities for Babelfish using the following command:
sudo yum install -y BabelfishDump

The following example query shows how to verify the installation of the utilities:

bbf_dump --version
bbf_dumpall --version
  • The source database is where you perform the dump and the target database is where you restore the dumped database. Make sure that these databases are reachable from your Linux machine.

Limitations and requirements

  • Tables having SQL_VARIANT columns is always dumped with INSERT commands even if you use COPY option. This slows down the restoration process of such tables but will make sure that SQL_VARIANT column metadata is preserved after restore. You might lose some metadata with COPY option.
  • Cross version dump and restore isn't supported yet. For example, the dump created on a Babelfish database with a PostgreSQL version must only be restored to a database with the same PostgreSQL version. An explicit error will be thrown if you try to perform dump/restore on a cross version.
  • A logical database dump created from Babelfish database having single-db or multi-db migration mode can only be restored onto a Babelfish database having migration mode as single-db and multi-db respectively. An explicit error will be thrown if you try to perform it the other way round.
  • A Babelfish logical database when dumped can only be restored with the same name. In other words, a logical database dump can't be restored with a different name.
  • SSMS integration isn't supported with bbf_dump/restore.
  • You must use bbf_dumpall and bbf_dump commands together. Both the dumpfiles generated by these utilities are used for restore. This is because the Babelfish catalog data gets dumped with bbf_dump. The restore functionality is affected and might leave the database in an inconsistent state, when you use only the script generated either by bbf_dumpall or by bbf_dump.
  • To dump a single Babelfish T-SQL Database, you must use --bbf-database-name command-line option in both bbf_dumpall and bbf_dump commands. To dump the whole PostgreSQL database, --bbf-database-name option is not needed with any of the commands. Providing this option with one of the bbf_dumpall or bbf_dump command can generate incorrect dump of the database.
  • You must provide --single-transaction option to restore a Babelfish database using the utilities psql or pg_restore. Any error would result in complete rollback of the restore or the database can end up in an inconsistent state due to a failure.
  • Failing to comply with the following can result in restore getting rolled back, when --single-transaction option is used. Or, the database might become inconsistent when --single-transaction option is not used in case of a conflict.
    • When you restore the entire Babelfish PostgreSQL database, make sure that the target PostgreSQL database babelfish_db exist. This should be a new database with no existing user data or any user T-SQL databases.
    • When you restore a single Babelfish T-SQL database, make sure that there is no Babelfish T-SQL database with the same name on the target server.

Use the following bbf_dumpall command to dump the roles of the Babelfish database.

bbf_dumpall --database babelfish_db \
            --host= host --port= port \
            --username babelfish_user \
            --bbf-database-name= bbf_dbname \
            --roles-only \
            --quote-all-identifiers \
            --verbose \
            --no-role-passwords \
            -f bbf_dump_roles.sql

The usage of the command-line options is as follows:

  • --host=host or -h file – The host name of the machine on which the database server is running.
  • --port=port or -p port – The TCP port or local Unix domain socketfile extension on which the server is listening for connections.
  • --database=babelfish_db – The name of Babelfish PostgreSQL database babelfish_db to connect to for dumping roles.
  • (optional) --bbf-database-name=bbf_dbname – The name of the Babelfish T-SQL database to selectively dump T-SQL logins associated with that specific T-SQL database. If you don't specify this, the command will dump all the logins, users and roles of the entire Babelfish for PostgreSQL database.

Note

The database name under --database or --dbname command-line option is the PostgreSQL database that bbf_dump will connect to in order to dump the Babelfish PostgreSQL database specified using this flag.

  • --username – The master user name of the Babelfish for PostgreSQL instance with superuser (rds_superuser in case of Aurora PostgreSQL instance) privileges.
  • --roles-only – Dump only roles, no databases or tablespaces.
  • --quote-all-identifiers – This forces all identifiers to be quoted. We recommend using this option as bbf_dump by default quotes only identifiers that are reserved words in its own major version. This sometimes results in compatibility issues when dealing with servers of other versions that may have slightly different sets of reserved words.
  • (optional) --verbose – The verbose mode that causes bbf_dump to output detailed object comments and start or stop times to the dumpfile, and progress messages to standard error.
  • --no-role-passwords – This doesn't dump passwords for roles. When restored, roles will have a null password, and password authentication will always fail until the password is set.
  • -f file or --file=file – Sends output to the specifiedfile. If not specified, the standard output is used.

Use the following bbf_dump command to dump the contents of one or more Babelfish database(s).

bbf_dump --dbname=babelfish_db
        --host= host \
        --port= port \
        --username babelfish_user \
        --bbf-database-name= bbf_dbname \
        --quote-all-identifiers \
        --verbose --file= bbf_dump_file_name \
        --format=<format(plain/custom/directory/tar)> \

Note

You can independently choose bbf_dump_file_name

  • --dbname=dbname or -d dbname – The name of the PostgreSQL database babelfish_db to connect to.
  • (optional) --column-inserts – Dump data as INSERT commands with explicit column names (INSERT INTO table (column, ...) VALUES ...). Although this will make restoration very slow, but any error during restoring will cause only rows that are part of the problematic INSERT to be lost, rather than the entire table contents.
  • (optional) --format=format – The format of the output can be one of the following:
    • plain – Output a plain-text SQL scriptfile (the default).
    • custom – Output a custom-format archive suitable for input into pg_restore. This is most flexible output format in that it allows manual selection and reordering of archived items during restore.
    • directory – Output a directory-format archive suitable for input into pg_restore. This will create a directory with one file for each table and blob being dumped.
    • tar – Output a tar-format archive suitable for input into pg_restore. The tar format is compatible with the directory format.

Note

For more information on these formats, see pg_dump.

  • (optional) --bbf-database-name=bbf_dbname – The name of the Babelfish T-SQL database for which we require a dump. If not specified, the commands will dump whole Babelfish PostgreSQL database containing all the Babelfish T-SQL databases.

Make sure that the target instance is a Babelfish for PostgreSQL database.

psql --host= dest_host \
    --port= dest_port \
    -d babelfish_db \
    -U babelfish_user \
    --single-transaction \
    -f bbf_dump_roles.sql

Now depending upon the type of archive file (dump file) created by bbf_dump, we can restore the data dumped by bbf_dump as follows:

  • If the archive file format is plain-text SQL script, then restore using psql utility:
psql --host= host \
    --port= port \
    -d babelfish_db \
    -U babelfish_user \
    --single-transaction \
    -f bbf_dump_file_name
  • Otherwise, if the archive file format is custom, directory or tar, then restore using pg_restore utility. For more information on pg_restore and its options, see pgrestore.
pg_restore  --host= host \
            --port= port \
            -d babelfish_db \
            -U babelfish_user \
            --single-transaction \
            -f bbf_dump_file_name

bbf_dump_file_name is the dumpfile generated by bbf_dump command.

--format=format is an optional argument provided to pg_restore command to tell the utility about the format of the dumpfile. It is not necessary to specify the format, since pg_restore will determine the format automatically.

  • Use any of the following ways to restore process as a single transaction so that any error would result in complete rollback of the restore.

If you restore using psql utility you can provide --single-transaction option in the above command during restore. For more details, see psql.

If you restore using pg_restore utility you can provide --single-transaction or --exit-on-error option in the above command during restore. For more details, see pg_restore.

Post restore steps

You must perform the below steps before using the target server.

  • Logins are dumped without passwords so it is required to reset their passwords on the target server otherwise authentication will fail for those logins. To reset the passwords, connect to target server using TDS endpoint with master user and run below command for each login:
1> ALTER LOGIN login_name WITH PASSWORD = ' new_password '
2> go

The logins are global objects and are only dumped with the whole Babelfish PostgreSQL database dump. If an individual T-SQL database is dumped with --bbf-database-name then logins are not dumped. So, the above process is only required for the entire Babelfish database restore.

  • Logins are not dumped for an individual T-SQL database (Babelfish logical database) dump so database users won't be linked to any logins in the target server after restore, these are called orphaned users. Following command can be used, to link the orphaned users to logins on the target server:
1> ALTER USER user_name WITH LOGIN = login_name
2> go

The above process is not required for whole Babelfish PostgreSQL database dump.

The following is the dump and restore functionality differences between Babelfish and SQL Server:

  • The dump database command in MS SQLServer creates a database dumpfile which has physical copy of database pages. But bbf_dump creates an export script that contains the actual SQL statements. When these statements are executed against the target database where the dump is to be restored, it creates the data same as that in the source database. This can be identified as a logical database dump instead of physical database dump.
  • Timestamp/Rowversion columns are treated as PostgresSQL’s generated columns. The values of these columns do not get dumped instead it gets re-generated during the restore. These regenerated values will be different than values on the source database as expected.
  • If there is a bloat with multiple row versions on the source, the target will have only one version of rows immediately after restore. As a result, the size of target database might be much smaller than the source database as expected. This behavior is similar to the size reduction of the database that you might encounter after vacuum.
  • There is no way to get incremental dump. bbf_dump only supports dumping either the entire Babelfish PostgreSQL database or a single T-SQL database.
  • We recommend not to use --schema-only or --data-only options provided by utility as they aren't supported yet by bbf_dump.