Skip to content

Conversation

@labkey-adam
Copy link

@bbimber
Copy link
Collaborator

bbimber commented Jul 27, 2025

Hey @labkey-adam , I don't quite get this (granted, am reading one a phone). What the big picture about all this?

Is there a clearer pathway for what it looks like to migrate an established sql server instance to postgres? Is this stuff about telling postgres to repeat the install scripts, even the server exists with a sql server version of X?

I'm just talking off the cuff here, but if the idea is to get an established server to repeat install scripts for postgres, should the server store/understand the database platform against which it ran each install script?

@labkey-adam
Copy link
Author

Hey @labkey-adam , I don't quite get this (granted, am reading one a phone). What the big picture about all this?

Is there a clearer pathway for what it looks like to migrate an established sql server instance to postgres? Is this stuff about telling postgres to repeat the install scripts, even the server exists with a sql server version of X?

I'm just talking off the cuff here, but if the idea is to get an established server to repeat install scripts for postgres, should the server store/understand the database platform against which it ran each install script?

@bbimber the summary at LabKey/platform#6867 should be helpful. I started it last week as a proof of concept, but it came together quickly and is likely the approach we'll use to migrate all SQL Server deployments (eventually). Short version: configure a new PostgreSQL database as labkeyDataSource, configure your SQL Server database as an external data source, provide a command line argument that specifies that SS database as the migration data source, and start the server. LabKey will create empty (no rows) schemas in the PostgreSQL database using the existing PostgreSQL SQL scripts, copy all data from the SQL Server tables into the PostgreSQL tables, adjust all the sequences to the correct value, and shut down. Restarting the server should result in a PostgreSQL clone of the SQL Server deployment.

As mentioned in the PR, there are some significant undones:

  • The code currently migrates all hard tables, but provisioned tables are NYI
  • We haven't tested any of this extensively
  • Adding PostgreSQL support to PRC SQL Server modules will be necessary
  • I haven't conditionalized the inserts in BimberLabKeyModules or DiscvrLabKeyModules yet; this won't be hard, it just wasn't needed to prove the viability of the approach.

Might be easiest to discuss in a call if you have questions.

@bbimber
Copy link
Collaborator

bbimber commented Jul 27, 2025

I see. Would just making Java code issue a truncate at the end be easier? Doing the data inserts during setup is a waste, but not really that big a deal.

@labkey-adam
Copy link
Author

I see. Would just making Java code issue a truncate at the end be easier? Doing the data inserts during setup is a waste, but not really that big a deal.

I don't think truncate would be any easier... or harder. They seem basically equivalent to me. The truncate approach would leave sequences with values other than one... they could be reset along with the truncate. Or maybe nobody cares. I agree there's no performance difference to worry about. I had a slight preference for embedding the skip logic in individual scripts instead of compiling lists of tables to truncate in code, but it's pretty much six of one.

@bbimber
Copy link
Collaborator

bbimber commented Jul 27, 2025

I don't have that strong of preferences - I'm sure you guys have considered this well.

Regarding sequences / serial fields: can you really rely on starting at one anyway? Because of FKs, deletes on existing databases, etc., don't you really want to take those fields as-is from the sql server db? I recall dealing with that problem (probably a decade ago), where we disabled the auto incrementing field, inserted data with row ids as-is, and then reenabled the auto incrementing. I don't recall it being that hard in sql and there's probably an example buried somewhere in existing module sql scripts. It's a little annoying to do, but checking for any field with datatype of serial makes it somewhat easier.

@labkey-adam
Copy link
Author

labkey-adam commented Jul 27, 2025

I don't have that strong of preferences - I'm sure you guys have considered this well.

Regarding sequences / serial fields: can you really rely on starting at one anyway? Because of FKs, deletes on existing databases, etc., don't you really want to take those fields as-is from the sql server db? I recall dealing with that problem (probably a decade ago), where we disabled the auto incrementing field, inserted data with row ids as-is, and then reenabled the auto incrementing. I don't recall it being that hard in sql and there's probably an example buried somewhere in existing module sql scripts. It's a little annoying to do, but checking for any field with datatype of serial makes it somewhat easier.

Yes, we copy all those keys and references from SQL Server. And then setval() on the sequences to match the last value from SQL Server. So, the only time you'd see a difference with the truncate approach (vs. skipping inserts) is on tables that are empty in SQL Server.

@labkey-adam labkey-adam merged commit 2903c9e into develop Aug 1, 2025
7 of 9 checks passed
@labkey-adam labkey-adam deleted the fb_migrate_ss_to_pg branch August 1, 2025 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants