Skip to content

Conversation

labkey-adam
Copy link
Contributor

@labkey-adam labkey-adam commented Jul 23, 2025

Rationale

We want to provide administrators the ability to easily migrate a SQL Server LabKey SDMS deployment to PostgreSQL. Changes in this PR provide an option to create empty PostgreSQL schemas and migrate all module & provisioned table data into them from a SQL Server deployment.

Specification: https://docs.google.com/document/d/1oY3Mhnusa17OsC3NDj3WR5gS5YlynUP5zzc2tBht_D4/edit?tab=t.0#heading=h.4rg4h5vuek0

This is a significant step forward, but it's still a work in progress. What's missing:

  • Testing with non-trivial SQL Server databases
  • Testing with large amounts of data

There should be no functional changes in the normal case. The new code is invoked only when the server is started with specific command-line arguments.

Changes

If a PostgreSQL database is being bootstrapped and a specific argument is added to the command line, the server will:

  • Bootstrap all PostgreSQL schemas without inserting any* data into them
  • Flag all PostgreSQL tables that still contain rows and sequences set to a value other than one, plus a few other sanity checks
  • Connect to the specified SQL Server database as an external data source and enumerate all module tables, sorted to accommodate foreign key constraints
  • Copy data from every table into the just-created empty PostgreSQL tables
  • Reset the PostgreSQL sequences to ensure that future keys produced by SERIAL columns don't conflict with the copied-in data
  • Create all the provisioned tables based on the rows in exp.DomainDescriptor
  • Repeat the data migration and sequence updates for the provisioned schemas/tables

*Some details:

  • The root and shared containers are actually created for bootstrap purposes (many code paths rely on their presence) and then deleted before populating the tables
  • The core Modules, SqlScripts, and UpgradeSteps tables are populated by the PostgreSQL bootstrap process and not cleared. The data in these SQL Server tables is not migrated.
  • The migrationDataSource argument is used to specify the SQL Server data source to migrate. For example: -DmigrationDataSource=ssDataSource
  • -DemptySchemas=true can be used to test the empty schema creation and verification without attempting an actual migration
  • While not typically needed, a few modules register schema-specific MigrationHandler implementations to customize behavior (e.g., accommodate cyclical FKs or adjust the tables to migrate)

Tasks 📍

  • Option to bootstrap schemas without any data
  • Verify empty schemas and sequences
  • Copy data from SQL Server tables
  • Adjust sequences to accommodate copied data
  • Fix TableSorter handing for self-referencing foreign keys
  • Special case inserts for cyclical FKs in wiki (Pages/PageVersions), prot (Organisms/Identifiers), exp (ExperimentRun/ProtocolApplication)
  • Expand testing to all modules and ensure empty lookup tables throughout
  • Drop problematic, unused Deleted columns from prot schema
  • Update spec
  • Create provisioned tables and migrate data
  • Code review @labkey-jeckels
  • Manual Testing @labkey-danield 📍
  • Needs Automation - Not at this point. Eventually, we'll need some automated testing.

@labkey-adam labkey-adam merged commit c605950 into develop Aug 1, 2025
6 checks passed
@labkey-adam labkey-adam deleted the fb_migrate_ss_to_pg branch August 1, 2025 21:45
@labkey-adam labkey-adam changed the title Option to migrate SQL Server hard tables to PostgreSQL Option to migrate SQL Server tables to PostgreSQL Aug 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants