Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DS-2167: Add Flyway automatic database upgrades to DSpace #686

Merged
merged 42 commits into from
Oct 31, 2014

Conversation

tdonohue
Copy link
Member

@tdonohue tdonohue commented Oct 6, 2014

https://jira.duraspace.org/browse/DS-2167

Update: This PR is now ready to test for the 4.x -> 5.0 upgrade process. It's "functionally complete". I'm still working on backporting other migrations scripts in order to auto-migrate from old versions of the DSpace DB.

This PR does the following (unchecked boxes have not been fully implemented yet):

  • Integrates dspace-api with (Flyway)[http://flywaydb.org/] for automatic database schema upgrades (Upgrades occur when you startup DSpace - i.e. NO MORE RUNNING SQL SCRIPTS MANUALLY!)
  • Refactors our various "database_schema*.sql" scripts into Flyway compatible SQL scripts
  • Refactors the XMLWorkflow setup scripts into a Flyway compatible "Java migration". It is kicked off via Java in the new "org.dspace.storage.rdbms.migration.V5_0__XMLWorkflow_Migration" class, and ONLY Run when it's enabled in the workflow.cfg file.
  • Reworks/simplifies Unit Tests to just use Flyway to initialize the Unit Testing Database
  • Create a Flyway "callback" to automatically reload metadata registry & format registry from config files each time a migration is done
  • Create a Flyway "callback" to automatically reindex your data in your search/browse engine of choice each time a migration is completed.
  • Remove all Ant related DB scripts, as they are now unnecessary
  • Fix Unit Test failure, because the "Metadata 4 All" Oracle upgrade script is unfortunately not compatible with H2 (as it uses "DECLARE" which is throwing an error in H2)
  • Backport all older DB migration scripts to help auto-migrate folks from older versions of DSpace
  • Other minor cleanup of notes / documentation to no longer require any manual DB upgrades. Doc updates have begun at https://wiki.duraspace.org/display/DSDOC5x/Upgrading+to+5.x (more will be forthcoming as well)

I've done some basic testing of upgrades from 4.0 -> 5.0 database. It seems to work well.

However, fair warning for developers: Testing this WILL REQUIRE you to start with a 4.0 compatible database. Flyway is not "smart" enough to know if you already manually ran the 'database_schema-4-5.sql" script yet...so it will attempt to run those database schema changes a second time.

@peterdietz
Copy link
Member

You can keep chugging away on this to finish up (assuming you've got the cycles). This seems like it could potentially help the upgrade process. Hopefully the remaining work can be accomplished in time for further testing/review, feature freeze. Also, there are ~100 PR's to review, so we'll have our hands full.

@hardyoyo
Copy link
Member

hardyoyo commented Oct 7, 2014

I agree with Peter, +1 extension granted, this looks really cool, it's exactly the sort of thing that I like to see in every DSpace release. If there's anything we can do to help, just ask.

@tdonohue
Copy link
Member Author

FYI - Just rebased on latest master. Also added a commit to fix Unit Tests.

The bad news is that H2 needs its own custom migrations directory ([src]/dspace/etc/migrations/h2/).

Unfortunately, the H2 "ALTER TABLE" syntax is annoyingly different from both PostgreSQL and Oracle...and obviously "ALTER TABLE" is used heavily in DB upgrade scripts. This means that in order to support Unit Testing of our upgrade process, we will need to write custom migrations for PostgreSQL, Oracle AND H2.

@hardyoyo
Copy link
Member

I wonder if we could actually script the PostgreSQL and Oracle upgrade script creation, and derive them both from the H2 upgrade script (don't know how exactly)? Mostly just wondering if this could be made a "good thing" instead of "yet another thing".

@tdonohue
Copy link
Member Author

I've pushed up some more enhancements which now automatically update the metadata/format registries during database upgrades, and also automatically re-indexes your content (as long as Solr is running) post-upgrade.

I've modified the original description. This is now functionally "complete" for the 4.x -> 5.0 upgrade process. I'm now working on backporting older migration scripts to Flyway to hopefully support migrating 1.x version all the way to 5.0. I suspect it shouldn't take too long to get those older migrations working (fingers crossed).

@helix84
Copy link
Member

helix84 commented Oct 22, 2014

I just tried to test on Postgres. I manually upgraded a small 3.x DB to 4.x. Then I ran this PR and it errored out:

2014-10-22 11:35:00,916 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL getDataSource Error -
java.sql.SQLException: CANNOT AUTOUPGRADE DSPACE DATABASE, AS IT DOES NOT LOOK TO BE A VALID DSPACE 4.0 DATABASE. Please manually upgrade your database to DSpace 4.0 compatibility.

I checked the code and it checks for the presence of the "Webapp" table. I verified that I have that in my DB. Could the problem be in case-sensitive table names?

@tdonohue
Copy link
Member Author

@helix84 - It looks like the DatabaseMetaData.getTables() method is case-sensitive. In this scenario, I'm looking for a "Webapp" table (capital W only). Is your table named "webapp" or "WEBAPP"? If so, I'm wondering how that happened, since the 3.x->4.x upgrade scripts have it as "Webapp".

I'm digging around for ideas for how to do this match in a case insensitive manner. All I've come up with is to load all your DB Tables into an ArrayList (or similar), lowercase them all, and then try to match against a lowercase name.

@hardyoyo
Copy link
Member

I'm sure we're all aware, but I thought I'd just mention that Oracle "upcases" all table, row and column names.

@tdonohue
Copy link
Member Author

@helix84 - try it again. I just figured out how to tell if a database lowercases or uppercases all "identifiers" (i.e. table names, etc.). Just need to use DatabaseMetaData.storesLowerCaseIdentifiers() and DatabaseMetaData.storesUpperCaseIdentifiers().

Just pushed up a commit to check the DB defaults, and then properly lowercase or uppercase the table name search string, as necessary. I think it should work now.

@tdonohue
Copy link
Member Author

UPDATE: I've just pushed up a commit which attempts to determine which version of DSpace you are using. It needs a lot of testing, but my basic tests look to work thus far.

But, unfortunately, DSpace 1.7 is difficult as it only involved a single sequence deletion. So, we may want to just update it's SQL script to check if the sequence exists before deleting it.

READY FOR TESTING ON ANY / ALL UPGRADES. Let's see if this works!

@helix84
Copy link
Member

helix84 commented Oct 22, 2014

I tested it and detection of DSpace 4 db format now works. Upgrade to DSpace 5 db format also works. Congrats! There were couple of errors, I sent them your way in an email.

Regarding 1.7, you can check for existence of a sequence in Postgres using the following SQL.
SELECT count(1) FROM pg_class WHERE relkind = 'S' AND relname = 'dctyperegistry_seq';

@helix84
Copy link
Member

helix84 commented Oct 23, 2014

I did a couple more tests on two different databases (3->5, 1.8->5, 1.6->5) and apart from some errors, it looks good (the DBs have been upgraded to 5 and schema_version seems to contain correct information). I sent the logs to Tim.

One thing we need to document is that the first run of a new version of DSpace (Flyway DB upgrade) may take long, depending on DB size, so the user should check the log to see whether it completed, failed or is still running. A db of 30k items took ~16 minutes.

@helix84
Copy link
Member

helix84 commented Oct 23, 2014

This should help with detecting a sequence in Oracle:
http://stackoverflow.com/questions/21738117/how-can-i-get-all-sequences-in-an-oracle-database

I'll be available to test again on Monday (perhaps Sunday night).

@tdonohue
Copy link
Member Author

I've reviewed the logs from @helix84. From what I can tell each of the migrations (4->5, 3->5, 1.8->5, 1.6->5) were all SUCCESSFUL (which is awesome). The errors he reported were in the post-migration processing steps, and both were minor errors:

  • There was a minor error during the auto-reindexing process, as it looks like Solr must have been unavailable at that time. I may need to rethink this process, or figure out a way to test if Solr is actually there before attempting a reindex.
  • There was a minor error during the process to auto-update the bitstream format registry. It tried to add a "License" format that already existed in the database. Again, something we should be able to easily fix.

This still needs some testing on Oracle. But, it sounds like the basics are working (with a few minor errors in post-processing that I'll work to clean up)

@tdonohue
Copy link
Member Author

I've just pushed up fixes to the two minor errors that @helix84 encountered. For now I had to turn off the auto-reindexing, as the DB Migration process runs before anything else spins up...so Solr isn't yet started, etc.

@helix84
Copy link
Member

helix84 commented Oct 26, 2014

Hi Tim, great job, the errors are now gone after I tested the 3 -> 5 upgrade. However, DSpace didn't start up after the upgrade because Solr isn't running. A manual index rebuild (-b) failed for the same reason.

ERROR org.dspace.discovery.SolrServiceImpl @ Server refused connection at: http://localhost:8080/solr/search

Then, after a Tomcat restart, DSpace started and shows the comm & coll structure, but the items aren't accessible due to the failed reindex (which can be fixed using -f).

@tdonohue
Copy link
Member Author

Just pushed an update to add in better commandline tools for DB actions.:

  • ./dspace database test => Tests connection (replaces 'test-database')
  • ./dspace database info => Displays basic DB info
  • ./dspace database migrate => Allows you to manually kick off any migrations prior to booting up DSpace
  • ./dspace database clean => Wipes clean your DB (after verifying it is in fact what you want to do)

@peterdietz
Copy link
Member

I'm +1, this recognized a 1.8 DB, and upgraded it to the latest.

The only surprise I had was that a page load to XMLUI stalled until a discovery reindex occurred. That's probably fine though. Maybe if possible, run the reindex in the background?

@tdonohue
Copy link
Member Author

FYI for all...I've just rebased this PR on latest master (as it sounds like some folks were seeing minor errors since it wasn't fully merged to master). So, if you have this PR branch already checked out, you will need to delete that old branch and re-checkout for further testing.

@tdonohue tdonohue force-pushed the DS-2167_flyway branch 2 times, most recently from 43ec26a to 9ec3766 Compare October 29, 2014 21:39
@tdonohue
Copy link
Member Author

FYI, just rebased on "master" again as there were missing Spring configs in my branch. Again, you'll need to checkout a fresh copy of this branch.

@peterdietz
Copy link
Member

Hi Tim,

I've managed to set up the Oracle DB VirtualBox instance, and cloned one of our Oracle clients databases onto my local setup. Here's the errors I got after tomcat restarted.

dspace.log https://gist.github.com/peterdietz/0e4bed9a79c50451f736
catalina.out https://gist.github.com/peterdietz/2f0f3f2fabd4c7622388

Essentially:

WARNING: Exception initializing DB pool
org.flywaydb.core.api.FlywayException: Unable to init metadata table "DSPACE"."schema_version" as it already contains migrations

@peterdietz
Copy link
Member

Here's the contents of schema_version table:

version_rank installed_rank version description type script checksum installed_by installed_on execution_time success
1 1 3.0 Initializing from DSpace 3.0 database schema INIT Initializing from DSpace 3.0 database schema DSPACE 2014-10-29 23:03:33.076567 0 1
2 2 4.0 Upgrade to DSpace 4.x schema SQL V4.0__Upgrade_to_DSpace_4.x_schema.sql -496478056 DSPACE 2014-10-29 23:03:34.861229 40 0

Also, it's strange that schema_version and tempone are lowercase, and all the other tables are uppercase. You could use DatabaseManager.canonicalize(...) to make it match case.

tdonohue and others added 24 commits October 31, 2014 14:17
…yet exist and Solr isn't running when DB migration runs.
…g test connections. Refactor DatabaseManager to move commandline tools to DatabaseUtils
…needs to be done once. Also use ConfigurationManager, since Kernel is sometimes null.
migrations, instead create them after DB is initialized
…te a post-upgrade script for sequence updates
… inline (after migrating from dctyperegistry to metadataschemaregistry)
@tdonohue
Copy link
Member Author

FYI, just performed a fresh "rebase" in order to prepare for merger of this code into "master"

mwoodiupui added a commit that referenced this pull request Oct 31, 2014
DS-2167: Add Flyway automatic database upgrades to DSpace
@mwoodiupui mwoodiupui merged commit 8f2fe3a into DSpace:master Oct 31, 2014
@tdonohue tdonohue deleted the DS-2167_flyway branch October 31, 2014 14:52
amirkamran added a commit to kosarko/DSpace that referenced this pull request Mar 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants