# Example 2: Migrate MS Access databases

Migration of MS Access databases can rescue invaluable data from oblivion. The MS Access 97 format for instance cannot be opened anymore by the latest version of MS Access. Furthermore, if you have large quantities of databases in an older format it may become quite tedious to do a one-by-one migration by hand. [Axxess](https://github.com/DANS-repo/axxess) can be used to migrate complete directory structures with databases to the same or similar directory structures. The original databases will be migrated to the csv-format you prefer and/or newer versions of MS Access. At your choice, the csv files can also be zipped, with or without compression.

In this Notebook we will convert two databases that were saved in MS Access versions '97 and 2000 to csv files, zip and compress the csv files and rebuild the databases in a newer version. For brevity we do it here for two databases; a batch conversion of a thousand databases would need the same amount of preparation (altough it would obviously take a bit more time to run). At the end of this Notebook we'll hand a solution for an error that sometimes occurs when (re)building a database.

### Prerequisites

- System requirement: java 8 or higher

In [1]:
!java -version

java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)


### Table of contents

- [Preparation of the workspace](#pows)
    * [Example data](#pows_data)
    * [Download Axxess](#pows_axxess)
    * [Axxess help](#pows_help)  
     
     
- [Migrate databases](#migrate)
    * [Write a logback.xml](*logging)
    * [Write a properties file](#properties)
    * [Start Axxess](#start)
    * [Inspect output](#inspect)  
     
     
- [Errors (and what to do with them)](#errors)
    * [Constraint vioalation exception](#constraint)

# Preparation of the workspace <a id="pows"></a>

If you played te previous example Notebook [Eample 1](https://github.com/DANS-repo/axxess-jupyter/blob/master/acc2csv.ipynb) then half of the workspace is already prepared. For newbees we'll walk the installation a second time.

Create directories were we can store source data files and the Axxess jar-file...

In [2]:
import os

os.makedirs("java", exist_ok=True)
os.makedirs("work/migrate", exist_ok=True)

A download routine...

In [3]:
import requests

def download_file(url, filename):
    mb = 2**20
    bts = 0
    print("Downloading", url)
    r = requests.get(url, stream=True)
    with open(filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            bts += 1024
            if chunk:
                f.write(chunk)
                if bts % mb == 0:
                    print('.', end='', flush=True)
    return filename, bts, bts/mb

### Example data <a id="pows_data"></a>

In [4]:
download_file("https://easy.dans.knaw.nl/ui/rest/datasets/61704/files/4917456/content", "work/migrate/avereest.mdb")

Downloading https://easy.dans.knaw.nl/ui/rest/datasets/61704/files/4917456/content
.....................................................................

('work/migrate/avereest.mdb', 72353792, 69.001953125)

In [5]:
download_file("https://easy.dans.knaw.nl/ui/rest/datasets/48968/files/2964358/content", "work/migrate/Boedelbestand Walcheren 1755-1855.MDB")

Downloading https://easy.dans.knaw.nl/ui/rest/datasets/48968/files/2964358/content
....

('work/migrate/Boedelbestand Walcheren 1755-1855.MDB', 4562944, 4.3515625)

### Download Axxess jar-with-dependencies <a id="pows_axxess">

In [6]:
url = "https://github.com/DANS-repo/axxess/releases/download/1.0.1/axxess-jar-with-dependencies.jar"
filename = "java/axxess-jar-with-dependencies.jar"
download_file(url, filename)

Downloading https://github.com/DANS-repo/axxess/releases/download/1.0.1/axxess-jar-with-dependencies.jar
.........................................................

('java/axxess-jar-with-dependencies.jar', 60484608, 57.6826171875)

### Axxess help <a id="pows_help"></a>

Access help will briefly inform you about its usage. Help will be printed if invoking axxess with no arguments or with arguments -h or --help.

In [7]:
!java -jar java/axxess-jar-with-dependencies.jar -h

Axxess is a tool for converting MS Access databases to and from csv files.
See also: https://github.com/DANS-repo/axxess

USAGE:

          java -jar axxess-jar-with-dependencies.jar [axxess.properties] [logback configuration]

axxess.properties      - configuration file.
                         See https://github.com/DANS-repo/axxess/blob/master/docker/cfg/axxess.properties
                         If no properties file given will look for cfg/axxess.properties

logback configuration  - logging configuration.
                         See https://logback.qos.ch/manual/configuration.html
                         If no logging configuration given will log to logs/axxess.log


# Migrate databases <a id="migrate"></a>

As the help showed us, as a default, logging will be done to `logs/axxess.log`. While we only have two databases and the log output wont be too prolific, it might be nice to have Axxess log to our Notebook. In that case logging must be directed to `System.out`.

### Write a logback.xml <a id="logging"></a>

In [8]:
%%writefile java/logback.xml
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
  <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
    <encoder>
      <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{10} - %msg%n</pattern>
    </encoder>
  </appender>
  <root level="info">
    <appender-ref ref="STDOUT"/>
  </root>
</configuration>

Overwriting java/logback.xml


Axxess can be configured and instructed with a properties file. An example properties file and documentation can be found [here](https://github.com/DANS-repo/axxess/blob/master/docker/cfg/axxess.properties). Most properties have a default value, the only ones you need to provide for Access to Access conversion are `axxess.mode` and `db.source.file`.

### Write a properties file <a id="properties"></a>

In [9]:
%%writefile java/axxess.properties
axxess.mode=aca
# access to csv properties
db.source.file=work/migrate
csv.target.directory=work/migrate-csv
create.zip=true
compress.zip=true
# csv to access properties
db.target.database.format=V2010
db.target.directory=work/migrate

Overwriting java/axxess.properties


Explanation of the properties. We set the mode to `aca`, that means convert **a**ccess to **c**sv and back to **a**ccess. We set `db.source.file`  and `db.target.directory` both to `work/migrate`, that means migrated databases wil be in the same directory (structure) as the source files. `csv.target.directory` has been set to `work/migrate-csv`. By that we instruct Axxess to put the csv-output in said directory. We set `create.zip` and `compress.zip` both to `true`, which means a compressed zip will be created from the csv files. Finally we set `db.target.database.format` to `V2010`. That last statement was not strictly necessary, the default format is already `V2010`.

### Start Axxess <a id="start"></a>

In [10]:
!java -jar java/axxess-jar-with-dependencies.jar java/axxess.properties  java/logback.xml

15:08:55.824 [main] INFO  n.k.d.r.a.a.AxxessMain - 

           %#@      
          %#@ &!    
         %#@  &!*   
        %#@    &!*  
       %#@#####&!*  
      %#@^^^^^^^&!* 
     %#@        &!*~ 
   ___________________

15:08:55.826 [main] INFO  n.k.d.r.a.a.AxxessMain - Configuring Axxess run from /Users/ecco/git/axxess-jupyter/java/axxess.properties
15:08:55.828 [main] INFO  n.k.d.r.a.a.AxxessMain - axxess.mode=aca
15:08:55.828 [main] INFO  n.k.d.r.a.a.AxxessMain - db.source.file=work/migrate
15:08:55.828 [main] INFO  n.k.d.r.a.a.AxxessMain - csv.target.directory=work/migrate-csv
15:08:55.828 [main] INFO  n.k.d.r.a.a.AxxessMain - Absolute csv.target.directory=/Users/ecco/git/axxess-jupyter/work/migrate-csv
15:08:55.828 [main] INFO  n.k.d.r.a.a.AxxessMain - Absolute       db.source.file=/Users/ecco/git/axxess-jupyter/work/migrate
15:08:55.833 [main] INFO  n.k.d.r.a.a.AxxessMain - csv.target.include.metadata=true (Default)
15:08:55.833 [main] INFO  n.k.d.r.a.a.AxxessMain - csv.targ

As we can see from the log, Axxess or better the underlying library [Jackcess](http://jackcess.sourceforge.net/), had a little difficulty with the indexes on some tables in `avereest.mdb`. It made them read-only.

So were are the converted files?

### Inspect the output <a id="inspect"></a>

In [11]:
!ls -la work/migrate

total 156272
drwxr-xr-x  6 ecco  staff       192 Mar 17 15:09 [34m.[m[m
drwxr-xr-x  9 ecco  staff       288 Mar 17 15:08 [34m..[m[m
-rw-r--r--  1 ecco  staff   4562944 Mar 17 15:08 Boedelbestand Walcheren 1755-1855.MDB
drwxr-xr-x  4 ecco  staff       128 Mar 17 15:09 [34mBoedelbestand Walcheren 1755-1855_MDB[m[m
-rw-r--r--  1 ecco  staff  72353792 Mar 17 15:08 avereest.mdb
drwxr-xr-x  4 ecco  staff       128 Mar 17 15:10 [34mavereest_mdb[m[m


In `work/migrate` we still have the original databases and two new directories with the same name as the old databases, only the dot (.) in the database name has been replaced with an underscore (\_) in the directory name. What's in this directory?

In [12]:
!ls -la work/migrate/Boedelbestand\ Walcheren\ 1755-1855_MDB

total 2056
drwxr-xr-x  4 ecco  staff      128 Mar 17 15:09 [34m.[m[m
drwxr-xr-x  6 ecco  staff      192 Mar 17 15:09 [34m..[m[m
-rw-r--r--  1 ecco  staff  1048576 Mar 17 15:09 Boedelbestand Walcheren 1755-1855.MDB.accdb
-rw-r--r--  1 ecco  staff       85 Mar 17 15:09 manifest-sha1.txt


Exactly. The migrated database, now with a `.accdb` extension and a `manifest-sha1.txt` with the SHA1 checksum of the newly created database.

The zip files are, along with the csv files, in `work/migrate-csv`:

In [13]:
!ls -la work/migrate-csv

total 18856
drwxr-xr-x   6 ecco  staff      192 Mar 17 15:09 [34m.[m[m
drwxr-xr-x   9 ecco  staff      288 Mar 17 15:08 [34m..[m[m
-rw-r--r--   1 ecco  staff   140033 Mar 17 15:09 Boedelbestand Walcheren 1755-1855.MDB.csv.zip
drwxr-xr-x   7 ecco  staff      224 Mar 17 15:09 [34mBoedelbestand Walcheren 1755-1855_MDB[m[m
-rw-r--r--   1 ecco  staff  9153018 Mar 17 15:09 avereest.mdb.csv.zip
drwxr-xr-x  68 ecco  staff     2176 Mar 17 15:09 [34mavereest_mdb[m[m


In [14]:
!ls -la work/migrate-csv/Boedelbestand\ Walcheren\ 1755-1855_MDB

total 1040
drwxr-xr-x  7 ecco  staff     224 Mar 17 15:09 [34m.[m[m
drwxr-xr-x  6 ecco  staff     192 Mar 17 15:09 [34m..[m[m
-rw-r--r--  1 ecco  staff  101153 Mar 17 15:09 Boedelbestand Walcheren 1755-1855.MDB.1e_periode_erflaters_Walcheren_(platteland).csv
-rw-r--r--  1 ecco  staff  132391 Mar 17 15:09 Boedelbestand Walcheren 1755-1855.MDB.2e_periode_erflaters_Walcheren_(stad_+_platteland).csv
-rw-r--r--  1 ecco  staff  113071 Mar 17 15:09 Boedelbestand Walcheren 1755-1855.MDB.3e_periode_erflaters_Walcheren_(stad_+_platteland).csv
-rw-r--r--  1 ecco  staff  175144 Mar 17 15:09 Boedelbestand Walcheren 1755-1855.MDB._metadata.csv
-rw-r--r--  1 ecco  staff     488 Mar 17 15:09 manifest-sha1.txt


# Errors (and what to do with them) <a id="errors"></a>

Sometimes a migration from csv back to Access ends with an exception. This does not (always) mean we cannot recover the database. Sometimes we need to restrict what database metadata we want to convert in order to rescue the data in tables. Here is an example. First restrict the log output to get a clearer vision.

In [15]:
%%writefile java/errorlogback.xml
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
  <appender name="STDERR" class="ch.qos.logback.core.ConsoleAppender">
    <encoder>
      <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{10} - %msg%n</pattern>
    </encoder>
  </appender>
  <root level="error">
    <appender-ref ref="STDERR"/>
  </root>
</configuration>

Overwriting java/errorlogback.xml


### Constraint violation exception <a id="constraint"></a>

A database cannot be rebuild because table data does not satisfy index restrictions. Remedy: leaf out the indexes.

In [16]:
%%writefile java/axxess.properties
axxess.mode=aca
# access to csv properties
db.source.file=work/src-data/Art17_MS_EU27_2015.mdb
csv.target.directory=work/error-csv
# csv to access properties
db.target.directory=work/error-db

Overwriting java/axxess.properties


In [17]:
!java -jar java/axxess-jar-with-dependencies.jar java/axxess.properties java/errorlogback.xml

15:10:55.294 [main] ERROR n.k.d.r.a.c.Csv2AxxessConverter - Context:  Format=V2000 [VERSION_4], Table=data_measures file: /Users/ecco/git/axxess-jupyter/work/error-csv/Art17_MS_EU27_2015_mdb/Art17_MS_EU27_2015.mdb._metadata.csv
com.healthmarketscience.jackcess.ConstraintViolationException: New row [3.1, 08a791256e1d92149d6d15a083e620a9, null, false, false, false, true, false, M, false, false, true, false, false, true, false, false, false] violates uniqueness constraint for index (Db=Art17_MS_EU27_2015.mdb.accdb;Table=data_measures;Index=unique)
	at com.healthmarketscience.jackcess.impl.IndexData.prepareAddEntry(IndexData.java:657)
	at com.healthmarketscience.jackcess.impl.IndexData.prepareAddRow(IndexData.java:626)
	at com.healthmarketscience.jackcess.impl.IndexData.prepareAddRow(IndexData.java:604)
	at com.healthmarketscience.jackcess.impl.TableImpl.addRows(TableImpl.java:2223)
	at com.healthmarketscience.jackcess.impl.TableImpl.addRow(TableImpl.java:2086)
	at nl.knaw.dans.repo.axxess

#### Solution

Do not write indexes when building the new database. You can control this with the property `db.target.include.indexes = true | false`. Unfortunately this sets off indexes **and** relationships for the whole database. Here we go again, csv files have already been extracted, just build in mode `ca` and forget about indexes.

In [18]:
%%writefile java/axxess.properties
axxess.mode=ca
# csv to access properties
csv.source.file=work/error-csv
db.target.include.indexes=false
db.target.directory=work/error-db

Overwriting java/axxess.properties


In [19]:
!java -jar java/axxess-jar-with-dependencies.jar java/axxess.properties

No log configuration given. Using default. See: logs/axxess.log
See logs for details
