Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution a migration that calls scripts outside of the migration folder? #110

Open
kjn70 opened this issue Mar 18, 2022 · 8 comments
Open
Assignees
Labels
Workaround Issues that can be addressed via a workaround

Comments

@kjn70
Copy link

kjn70 commented Mar 18, 2022

We have multiple development teams on many projects. Database code is organized by project with run-time scripts (in SQL folder) and migration scripts (in a Deploy folder). The migration script is really a SQL script that uses SnowSQL using !Source commands to execute the actual obejcts scripts that are organized by schema and object type. This is all automated through GitLab using on-prem servers and and custom powershell routines. It actually works well.

However, we're currently looking at moving to all cloud based deployment and are looking at schemachange. I have some issues. One is that the version naming is unmanageable across all the folders we have that make up the multiple objects scripts we have (a project can have hundreds, if not thousands of database objects). The versioning has to be unique to the project and not the specific schema folders or object type folders. This is unmanageable. Also, if we have to reorder object execution due to deployment dependencies, the version number has to be updated for all objects manually. Again, this is unmanageable for large projects.

I can still see some benefit in schemachange though, but I would like it to replicate similar functionality as our current deploy process. I would like it to monitor only our deploy folder that has our migration script that reference the other object scripts we need to execution in the order of execution. I tested this but it does not work with SnowSQL !source commands. I currently have not found a way to do this.

If I had this structure:

\migrations
\deploy
\V2022.03.18.1__Initial set up of database.sql
\scripts
\ingest
\tables
\table1.sql
\table2.sql
\table3.sql
\views
\view1.sql
\view2.sql
\snowpipes
\pipe1.sql
\transform
\tables
\table1.sql
\table2.sql
\views
\publish
\tables
\views
\view1.sql
\view2.sql

Where the V2022.03.18.1__Initial set up of database.sql migration script called:

!source ingest\tables\table1.sql
!source ingest\tables\table3.sql
!source ingest\tables\table2.sql
!source ingest\views\view1.sql
!source ingest\views\view2.sql
!source transform\tables\table1.sql
!source transform\tables\table2.sql
!source publish\views\view2.sql
!source publish\views\view1.sql
!source ingest\snowpipes\pipe1.sql

schemachange would point to -f \migrations\deploy

This is actually a very simplified view of our process but it is easy to create these deploy scripts using VSCode. Migration would be performed in our TEST, STAGE and PROD environments through GitLab (depending on stage). We do not migrate to DEV as developers are working directly in that with DBeaver.

Is this type of script execution from within a script possible? If so we will further evaluation this as a potential solution. If not, we'll abandon schemachanges as a viable solution and explore other methods.

@sfc-gh-jhansen
Copy link
Collaborator

Hey there @kjn70, thanks for reaching out. Some of the challenges you've described are more or less fundamental to imperative-style database change management (DCM) tools like schemachange (and others). Other of the challenges you've described are the result of trying to use an imperative-style DCM tool like schemachange with your source code organized for a declarative-style tool. Please check out my DevOps Recommendations when Building on Snowflake recording for an overview of the tools/approaches available.

Happy to chat more if that would be helpful.

@kjn70
Copy link
Author

kjn70 commented Mar 23, 2022

Okay, I understand the difference between imperative and declarative styles. I would say that what we have at the moment is a hybrid of the two, in that we have the organization of declarative but with the imperative nature of the change control. It works well and I find faults with both ends of the imperative vs declarative spectrum.

If I move over to the imperative style, where deployments at based around agile stories for example, I still take issue with the manually driven naming of files that determines what is deployed and how that interacts with GIT (GitLab in my case).

Example.
A table (FOO) is needed to have one column. This is created in Story1 and deployed up to production.

create or replace table FOO (COL1 varchar);

This is saved as V1__create_F00. Schemachange picks this up as new an deploys it to dev, test, stage and prod - over our CICD process.

Later, Story2 is created which adds COL2 to the table.

alter table if exists FOO add (column col2 number);

This is saved as V2__add_COL2_to_FOO. Schemachange picks this up and deploys it but only to test due to project issues that delay deployment in the sprint. So far, so good.

Story 3 is also created which is doing other stuff, but also needs to modify FOO to add another column.

alter table if exists FOO add (column col3 variant);

This is saved as V3__add_COL3_to_FOO. Schemachange picks this up and deploys it and the agile project team advance ahead of story 2 and deploy this up to stage for UAT.

So now we have this situation:

DEV: FOO with columns COL1, COL2 and COL3.
TEST: FOO with columns COL1, COL2 and COL3.
STAGE: FOO with columns COL1 and COL3.
PROD: FOO with columns COL1.

While story 3 is still in UAT, story 2 is now ready to deploy. A merge request is submitted to push from test to stage. Schemachange picks up V2__add_COL2_to_FOO but ignores it as it states that V2 is behind V3 which is already deployed.

At this point it is impossible to proceed with story2. The development team will have to go back and manually rename the version (to V4?) to then force it to deploy, If say story 2 completes UAT before story 3, once story 2 is deployed into PROD, its version number wil now be above story3 so that will get ignored by schemachange. This is not an untypical CICD process for us where there can be multiple stories being worked on at the same time in the same database being managed by feature branches within GitLab.

The manual numbering of versions also causes issue with Gitlab itself. Basically, you are never working on the same file even though you may be working on the same object. So if we have two feature branches representing the two stories that happen to be working on the same object (FOO in this case), GitLab will never detect if there are potential conflicts between the different development groups when merging into our common pipeline (dev, test, stage, main branches). Also you're not relying on GIT for the actual source control of the files. It is manual versioning.

I'm trying to give schemachange a fair shot, but I am having difficulty seeing how this will work in a real-world multi-team global development environment.

@seediang
Copy link
Contributor

@kjn70 with regards to "script execution from within a script", your would need to use the jinja include statement.

Approach one

"V2022.03.18.1__Initial set up of database.sql" would become:

{% include 'scripts/ingest/tables/table1.sql' %}
{% include 'scripts/ingest/tables/table3.sql' %}
{% include 'scripts/ingest/tables/table2.sql' %}
{% include 'scripts/ingest/views/view1.sql' %}
{% include 'scripts/ingest/views/view2.sql' %}
{% include 'scripts/transform/tables/table1.sql' %}
{% include 'scripts/transform/tables/table2.sql' %}
{% include 'scripts/publish/views/view2.sql' %}
{% include 'scripts/publish/views/view1.sql' %}
{% include 'scripts/ingest/snowpipes/pipe1.sql' %}

then point schemachange to -f migrations

All of the SQL files under the scripts folder do not follow the Change Script naming approach and will be ignored. When you run in verbose mode you will see all the "Ignoring non-change file" messages in the output.

Approach two

The second option is to use the -m modules_folder functionality. "V2022.03.18.1__Initial set up of database.sql" would become:

{% include 'modules/ingest/tables/table1.sql' %}
{% include 'modules/ingest/tables/table3.sql' %}
{% include 'modules/ingest/tables/table2.sql' %}
{% include 'modules/ingest/views/view1.sql' %}
{% include 'modules/ingest/views/view2.sql' %}
{% include 'modules/transform/tables/table1.sql' %}
{% include 'modules/transform/tables/table2.sql' %}
{% include 'modules/publish/views/view2.sql' %}
{% include 'modules/publish/views/view1.sql' %}
{% include 'modules/ingest/snowpipes/pipe1.sql' %}

and schemachange would need to be executed with -f migrations\deploy -m migrations\scripts

The advantage of this approach is you will not get any "Ignoring non-change file" messages in the output.

@kjn70
Copy link
Author

kjn70 commented Mar 27, 2022

@saperry I like this approach, particularly the second. I can actually see giving our developers two options. I can configure our CICD process to call a templated schemachange project through a reference and then I can have two project templates: one for imperative projects with a certain code organization that works for that style of code deployment and one for declarative style for code organization that a majority of our developers will understand. I can then configure the projects with a variable switch that will call schemachange with one of the two option via shell script. When we set up a new project they can create from one of the two templates depending on project need while reference a code schemachange CICD process.

I love this. I will try it out on Monday.

@kjn70
Copy link
Author

kjn70 commented Apr 1, 2022

@saperry I can get option 1 to work but but option 2. Not sure what we are doing wrong? Can Jinga work across folders or does it have to be nested?

@seediang
Copy link
Contributor

@kjn70 the two folders do not need to be nested. The two locations are combined into a single jinja context. The -m argument is added as a 'virtual location' named modules. Not enough information to tell what is happening in your situation so I have mocked up an example of option2 in example.zip. Hope this helps.

Using the render comand, my example below was run on windows in powershell.

schemachange render 'migrations\deploy\V2022.03.18.1__Initial set up of database.sql'  -f migrations\deploy -m migrations\scripts

Should give you an output like the following:

schemachange version: 3.4.1
Using config file: .\schemachange-config.yml
Using root folder ......\migrations\deploy
Using Jinja modules folder ......\migrations\scripts
Using variables: {}
Checksum 483dde8b1e89ddc1b01470b6b2c17c967ecae1983e07d9ec987b3ba4
SELECT 'ingest/tables/table1.sql';
SELECT 'ingest/tables/table3.sql';
SELECT 'ingest/tables/table2.sql';
SELECT 'ingest/views/views1.sql';
SELECT 'ingest/views/views2.sql';
SELECT 'transform/tables/table1.sql';
SELECT 'transform/tables/table2.sql';
SELECT 'publish/views/views2.sql';
SELECT 'publish/views/views1.sql';
SELECT 'ingest/snowpipes/pipes1.sql'

@kjn70
Copy link
Author

kjn70 commented Apr 18, 2022

Thank you, this was helpful and I have it working now. What was not clear to me before was that the module reference inside the jinga command was a virtual mapping to what was specified in the -m switch in schemachange. I was thinking this was a typo before and had changed this for my needs incorrectly.

@sfc-gh-tmathew sfc-gh-tmathew self-assigned this Sep 26, 2023
@sfc-gh-tmathew sfc-gh-tmathew added the Workaround Issues that can be addressed via a workaround label Sep 26, 2023
@sfc-gh-tmathew
Copy link
Collaborator

Thank you @saperry for a creative use of the -f and -m options. Have you run into any SQL file size limits ? What was the largest sql file in terms of lines generated in the initialize file ?

Of course if we do hit limits, we can organize the initialize files into multiple files for a specific release and change the files that need to be executed in a certain order.

Glad to see @kjn70 was able to use the workaround and move forward.

Let me review the workaround and see how we can incorporate this to help a lot more folks who have found schemachange version numbering a deterrant to use schemachange.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Workaround Issues that can be addressed via a workaround
Projects
None yet
Development

No branches or pull requests

4 participants