The Database Preservation Toolkit allows conversion between Database formats, including connection to live systems, for purposes of digitally preserving databases. The toolkit allows conversion of live or backed-up databases into preservation formats such as DBML, an XML format created for the purpose of database preservation. The toolkit also allows conversion of the preservation formats back into live systems to allow the full functionality of databases. For example, it supports a specialized export into MySQL, optimized for PhpMyAdmin, so the database can be fully experimented using a web interface.
This toolkit was part of the RODA project and now has been released as a project by its own due to the increasing interest on this particular feature.
The toolkit is created as a platform that uses input and output modules. Each module supports read and/or write to a particular database format or live system. New modules can easily be added by implementation of a new interface and adding of new drivers.
- Download the latest stable release.
- Unzip and open the folder on a command-line terminal
- Download Oracle Database 12.1.0.1 JDBC Driver for java 6 ojdbc6.jar
- Install the jar with command:
mvn install:install-file -DgroupId=com.oracle -DartifactId=ojdbc6 -Dversion=12.1.0.1 -Dpackaging=jar -Dfile=ojdbc6.jar -DgeneratePom=true
- Download DB2 JDBC driver db2jcc4.jar
- Extract the db2jcc4.jar from the downloaded zip and install with the command:
mvn install:install-file -DgroupId=com.ibm -DartifactId=db2jcc4 -Dversion=4.16.53 -Dpackaging=jar -Dfile=db2jcc4.jar -DgeneratePom=true
- Build with Maven
mvn clean package
Binaries will be on the target
folder
Binaries with all dependencies included:
To use the program, open a command-line and try out the following command:
$ java -jar db-preservation-toolkit-1.0.0-jar-with-dependencies.jar
Synopsys: java -jar roda-common-convert-db.jar -i IMPORT_MODULE [options...] -o EXPORT_MODULE [options...]
Available import modules:
SQLServerJDBC serverName [port|instance] database username password useIntegratedSecurity encrypt
PostgreSQLJDBC hostName [port] database username password encrypt
MySQLJDBC hostName [port] database username password
Oracle12cJDBC hostName port database username password
DB2JDBC hostname port database username password
SIARD dir
Available export modules:
SQLServerJDBC serverName [port|instance] database username password useIntegratedSecurity encrypt
PostgreSQLJDBC hostName [port] database username password encrypt
MySQLJDBC hostName [port] database username password
DB2JDBC hostname port database username password
SIARD dir
You have to select an input and an output module, providing for each its configuration.
For example, if you want to connect to a live MySQL database and export its content to DBML format, you can use the following command.
$ java -jar db-preservation-toolkit-1.0.0-jar-with-dependencies.jar \
-i MySQLJDBC localhost example_db username p4ssw0rd \
-o SIARD example_db_siard_export
- Presentation "Database migration: CLI" by José Ramalho at "A Pratical Approach to Database Archiving", Danish National Archives, Copenhagen, Denmark, 2012-02-07.
- Presentation "RODA: a service-oriented digital repository: database archiving" by José Ramalho at "A Pratical Approach to Database Archiving", Danish National Archives, Copenhagen, Denmark, 2012-02-07.
- Presentation "RODA - Repository of Authentic Digital Objects" by Luis Faria at the International Workshop on Database Preservation, Edinburgh, 2007.
- José Carlos Ramalho, Relational database preservation through XML modelling, in proceedings of the International Workshop on Markup of Overlapping Structures (Extreme Markup 2007), Montréal, Canada, 2007.
- Marta Jacinto, Bidirectional conversion between XML documents and relational data bases, in proceedings of the International Conference on CSCW in Design, Rio de Janeiro, 2002.
- Ricardo Freitas, Significant properties in the preservation of relational databases, Springer, 2010.
Other related publications:
- Neal Fitzgerald, "Using data archiving tools to preserve archival records in business systems – a case study", in proceedings of iPRES 2013, Lisbon, 2013.
Getting exception "java.net.ConnectException: Connection refused"
Most databases are not configured by default to allow TCP/IP connections. Check your database configuration if it accepts TCP/IP connection and if your IP address is allowed to connect. Also, ensure that the user has permissions to access the database from your IP address.
Problems importing from Microsoft Access
To import from Microsoft Access you need to be on a Windows machine with Microsoft Access installed. This is because the current Microsoft Access import module is implemented using ODBC connection. Therefore, you need Windows installed to be able to use ODBC. Also, you need Microsoft Access installed so its ODBC driver is installed on your system.
Furthermore, in order to extract DB structures we need to have access to the internal database table Msysrelationships
. You need to perform some hacking over the DBMS and this is version dependent. Please follow the instructions described on Microsoft's white paper, which explains how to do this for all Microsoft Access versions: "Preparing a Microsoft Access Database for Migration".
Got error "java.lang.OutOfMemoryError: Java heap space"
The toolkit might need more memory than it is available by default (normally 64MB). To increase the available memory use the -Xmx
option. For example, the following command will increase the heap size to 3 GB.
$ java -Xmx3g -jar db-preservation-toolkit-1.0.0-jar-with-dependencies.jar ...
The toolkit needs enough memory to put the table structure definition in memory (not the data) and to load each data row or row set, which might include having some BLOBs completely in memory, but this depends on the database driver implementation.
For more information or commercial support, contact KEEP SOLUTIONS.