Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Flyway from Application Startup #115

Closed
fdefalco opened this issue Jul 14, 2016 · 13 comments
Closed

Remove Flyway from Application Startup #115

fdefalco opened this issue Jul 14, 2016 · 13 comments

Comments

@fdefalco
Copy link
Contributor

New repository should be created.

@chrisknoll
Copy link
Collaborator

Flyway at startup ensures the database has the correct tables installed. If flyway doesn't do this, what will?

@fdefalco
Copy link
Contributor Author

Flyway will still be used to install the database schema however it will be setup as a command line utility. This will provide several improvements.

  1. The username/password configuration of a source connection string will no longer require administrative privileges in the application configuration to create/modify schema. This is viewed as a security concern.
  2. The same Flyway component will be able to be used in both environment setup and docker image configurations.
  3. Some people in the OHDSI community have expressed concerns about using updates of the WebAPI as it makes silent database changes which raises concerns especially in production environments. Switching to a user initiated, transparent process will alleviate these concerns.

This was discussed on today's architecture call - minutes are available here.

Kevin Cox from ConvergeHEALTH will be taking the lead on the initiative to refactor the use of Flyway in this fashion. We're anticipating incorporating this into the platform in a release after the OHDSI symposium.

@chrisknoll
Copy link
Collaborator

For 1, the username/password configuration of a source connection string never needed administrative privileges. Where did you get that information from?

For 2, this was always the case, if someone wanted to initiate the flyway migration at a command line, you can execute the flyway command line tool pointing at the resources folder of the WebAPI.

For 3: Who are these people? We should discuss those concerns on the forum. If someone wants transparency of the deployment, they simply need to have a staging environment where they can deploy a new WebAPI and confirm the database changes. No one should ever just blindly throw a new release out to their production environment without some verification. OR: Use a command line to deploy the database (per #2) , but leave the startup flyway enabled to verify the schema. This way you can have the flyway configuration in the WebAPI have a non-elevated privledge account (just so it can check the migration status of the database) and the webapi can report db schema problems during startup. This schema check can be done without elevated privileges and therefore should not raise any security concerns.

The notion of 'silent database changes' is interesting to me: all the database schema changes are announced in the log files. @jduke99 and @pbr6cornell both wanted a simple and easy setup of the ohdsi tools via OLYMPUS, but that automated far more than just database schema. I think @leeevans would attest that the automated database configuration simplifies a lot of his work in deploying webAPI from git.

So, Community, if there is concerns about the flyway migration in WebAPI, voice those on the forums. We want to be transparent with everything we're doing here, so let's address those concerns openly.

@chrisknoll
Copy link
Collaborator

For reference, if anyone would like to discuss the concerns with automated database migrations, here's the forum post that originally discussed the topic:
http://forums.ohdsi.org/t/managing-database-migrations/291

@fdefalco
Copy link
Contributor Author

The connection string provided in the configuration is used to connect to the database and run scripts. These scripts create tables and modify existing tables and require permissions to do so. It is these permissions that are undesirable. Let's take a sample script:

https://github.com/OHDSI/WebAPI/blob/master/src/main/resources/db/migration/sqlserver/V1.0.0.7.2__feasability_multihomed_support.sql.sql

This script runs CREATE, DROP, EXECUTE and ALTER statements. The discussion today focused on the desire to limit the permissions of the WebAPI to CRUD operations and restrict the ability of the WebAPI to directly modify the schema of the database.

I welcome an ongoing discussion in the forums. We would also welcome your participation in the bi-weekly architecture working group calls. We had a dozen or so people on today's call but there's always room for one more! :)

@chrisknoll
Copy link
Collaborator

Thanks for the offer, but the details of the discussion will probably be missed unless it's written down.

The script you described is (as you should know) is not executed against a cdm. You stated that "configuration of a source connection string", which, if I'm to read that at face value: a string that is configured in a source (table: SOURCE) to connect to a CDM. Flyway does not use that information. What you are describing is not a connection string at all. It's a datasource configuration for flyway. As I said in my earlier response, if there's a concern about storing an elevated permission in the WebAPI configuration, a site has the flexibility of providing a non-elevated account as the flyway account, and the only thing that will happen is the version of the webAPI schema will be checked. If if's not up to date the application will fail to start (the desired behavior). We don't need to disable flyway on startup to address the security concerns that people have about an elevated permission in the configuration. Removing flyway from startup will allow the application to run against an incomparable schema. Removing that functionality is taking a step backwards.

@leeevans
Copy link
Contributor

leeevans commented Jul 15, 2016

Thanks @chrisknoll for adding to the discussion.

Flyway is invaluable for automating database configuration in deploying the WebAPI within the CI process. On the Architecture call I made the point that the Flyway database migration scripts will still need to be kept up to date if the Flyway code is refactored out of the WebAPI application startup.

If Flyway was removed from app startup, I would need to add a call to the Flyway command line tool to perform the database migrations in the WebAPI CI build job, as the first step of the deployment process.

@chrisknoll I think you raise an important point that unfortunately wasn't discussed on the Architecture call - calling Flyway on application startup to verify the database has the correct tables installed.

The approach you are proposing seems to me like a very valid alternative to refactoring Flyway into a separate project. It also retains the important functionality of validating the database on startup while addressing the security & 'silent upgrade' concerns raised on the Architecture call:

  1. Keep Flyway as part of the WebAPI app startup but only use a non-elevated database account (only read access on the OHDSI schema) to verify that the database schema state matches the WebAPI database migration scripts. The database will never be automatically migrated during app startup and no elevated database account is needed to run the application.
  2. Run the Flyway command line (either manually or as part of a deploy script) with an elevated Flyway specific database account to perform the actual database migrations using the WebAPI Flyway database migration scripts.

The above approach also addresses the current problem with performing database migrations during app startup. On initial application deployment the source and source_daimon tables are not available until the application starts for the first time and then the application must be stopped and restarted to pick up the configuration info inserted into those tables. With the above approach those tables are created by running the Flyway command line and populated manually/in a deploy script prior to the first startup of the application.

@fdefalco do you see any other benefits of refactoring the Flyway code to another project versus following the above approach @chrisknoll is proposing?

I'll also send Kevin Cox a link to this issue via email to keep him in the loop. I don't know his GitHub account id.

@kwcox2
Copy link

kwcox2 commented Jul 15, 2016

I agree with the approach @chrisknoll raised as well. It seems reasonable to leave Flyway in the WebAPI project, but we may want to consider the following to clarify how Flyway should be used:

  • Update the Web API setup instructions - currently, it states "Note that the user should have read, write, and create privileges on the OHDSI schema." We should provide some guidance about the DB accounts configured.
  • Flyway does support a "validate only" option that wouldn't actually do a migration - maybe we could update the Web API config to have that be the default. It would be nice to have an explicit control over DB schema migration rather than relying on the permissions granted to the Flyway configured DB account.
  • Create a script and instructions for how to run the Flyway migration command manually using the schema in the WebAPI.

Look forward to further discussion around this topic.

@fdefalco
Copy link
Contributor Author

The "source" i was referring to was the "flyway.datasource" not a CDM. It would be easier if we all agreed to some basic vocabulary. OHDSI schema is the application schema that is currently configured by Flyway when the WebAPI starts up. CDM schema contains the person level data and RESULTS schema is a set of tables that store CDM specific data generated by various operations (cohort generation, heracles analyses, etc...)

I agree that @chrisknoll's approach is a valid alternative and unfortunately a perspective that wasn't represented on the architecture call. However no one currently uses this approach. We also do not have any documentation on how this would be done (another ding on us for poor documentation). We should also loop in @mark-velez and @t-abdul-basser as they were other contributors to the conversation.

@leeevans - We suffered that same pain with the WebAPI related to having to stop and restart the application to pickup source configuration changes so we actually added a new route (also not documented) that allows a refresh without a restart which you will find at /WebAPI/source/refresh. It is a nice time saver.

One last point - our Flyway implementation is inadequate. In our Janssen environment we have to manually run scripts for each CDM to create the required RESULTS schema. The current implementation creates RESULT schema tables in the OHDSI schema even though they aren't required there. The happy consequence of this has been people who run in a single CDM environment can launch the WebAPI and have everything created for them. But in that environment the configuration information used by the WebAPI does have administrative privileges to the OHDSI, CDM, and RESULTS schema.

I have personally been working on a solution to this issue which I've completed in the cohort-comparison branch. If you look at the modifications to the source service you will find that it will now also properly configure the RESULT schema for each configured results daimon. If we were to refactor the way in which we use Flyway (to command line and not part of application startup) then I would also need to refactor this change and likely run the command line Flyway installation for each RESULT schema individually (less convenient but also less required permissions for the connection string configured in the "source" table in the OHDSI schema.

@chrisknoll
Copy link
Collaborator

I think you're conflating the webapi schema managed by flyway and the cdm results schema which is (at this moment) not managed by flyway (except in your branch). While it's true that the webapi schema contains tables found in the results schema, this was a convenience for those single-CDM sites who would be willing to share the webapi schema with the results schema. Once your work is complete by having a separate set of flyway migraitons specifically for results schema, we can address how that should be hooked into webapi, but for this specific issue, I think we're all in agreement (waiting for @mark-velez and @t-abdul-basser input) that disabling flyway on startup doesn't solve any issues.

Re; the "our Flyway implementation is inadequate. In our Janssen environment we have to manually run scripts for each CDM to create the required RESULTS schema": this was your decision to proceed this way. The two options that we had to choose from was

  1. put the results tables in webapi and figure out a way to upload the results of an analysis query from the cdm source into the result table in webapi (and the results table would need a source_key column amended to each of the results tables) OR:
  2. leave the results tables in the CDM servers but have to deal with manual creation of the tables in sites that have multiple CDMS. You said you were fine supporting this approach, so it's strange that you're calling it 'inadequate' now.

Unfortunately this was just discussed verbally, so there's no notes detailing this decision, which is why I am collaborating this in a forum/issue tracking mode so that these details are captured.

If you want to create an issue to have results schema managed by an independent flyway migration, that's a fine new feature. But if what we're talking about here is disabling flyway on statup (which is the subject of this issue), and you're satisfied that disabling flyway is not the right approach, then let's close this issue.

PS: the flyway migration feature should not be co-mingled into a feature branch for cohort-comparison. If you can find a way t cherry pick the commits related to the flyway update separate from the cohort-comparison commits, you should do so and create two separate branches.

@chrisknoll
Copy link
Collaborator

@kwcox2 : I think there is a validate() method on the api (separate from migrate()):
https://flywaydb.org/documentation/api/javadoc.html

However, I'd still argue that we let flyway attempt to migrate in webapi because in those sites that do want to have automated migration, they give an account where it has permissions to do so. In the sites where they don't want automatic migrations, they give an account with read-only access, and executing the migrations will fail. I like the idea of leaving the choice of allowing automatic migrations up to the site owner instead of forcing them to run the command line.

Alternatively we could define a flag that would say what the flyway startup mode should do, and the code would look something like this:

if (flywayMode = "migrate")
  flyway.migrate();
else
  flyway.validate()

The 'flywayMode' would just be a maven managed application property which we could default to 'validate' and let those sites who want automatic migration specify 'migrate' if they choose.

@chrisknoll
Copy link
Collaborator

The issue related to implementing the flyway migrations for the results schema is out of scope of this issue. I've started a forum post on this here:
http://forums.ohdsi.org/t/flyway-managemetn-of-results-schema/1535

There is already an issue opened to remove the results schemas from the webapi flyway migration here:
#77

We can open up a new issue to track implementing flyway migrations of results schemas.

@kwcox2
Copy link

kwcox2 commented Jul 18, 2016

@chrisknoll Yes, I was referring to making the migrate more explicit via configuration. I've worked on Grails Flyway-enabled apps in the past, and that is how it's done there. Here is more info for the configuration option (autoMigrate) in that scenario: https://grails.org/plugin/gflyway2. My vote is to make this explicit instead of relying on obscure security settings for a particular environment. I also propose we update the docs and add scripts to run Flyway command line to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants