Proxy for Dataverse OAI-MPH

This script ensures CESSDA's Data Catalogue can request metadata from AUSSDA's Dataverse. It ensures the files are in correct DDI profile structure so that data can be presented in CESSDA's Datacatalogue.

Dataverse exports its file metadata through OAI exports. The proxy checks if elements (e.g. nation) and their attributes (e.g. @abbr) are present, and if they are not it adds default entries

The proxy's configuration happens through assets/defaults.json and defined through xpaths). By default the proxy is setup to ensure the existence mandatory profile elements and attributes are present. You have to populate the default file with paths and default values. These values will be visible at the Data Catalogue. The proxy also puts the DOI links of the file in the correct element so that they are visible in the datacatalogue.

Be aware that setting specific metadata on a dataset is not possible. If there are multiple datasets missing the abstract element, the proxy will set the same default value for all. You cannot define abstract A for one datafile and abstract B for another datafile, they will have the same abstract.

Generating defaults

We also provide a a small script assets/gen_defaults.py that generates these files based on the DDI profile XML. Please see CESSDA's profile documentation on how to populate these values.

$ python3 assets/gen_defaults.py --help
usage: gen_defaults.py [-h] [-c CONSTRAINT] [-p PROFILE]

Creates a json file for each field/attribute per constraint level

optional arguments:
  -h, --help            show this help message and exit
  -c CONSTRAINT, --constraint CONSTRAINT
                        Mandatory, recommended, optional constraint level
  -p PROFILE, --profile PROFILE
                        The location of the file to parse

For example to process the cdc25_profile.xml and to pass the mandatory and recommended constraints, run this command:

$ python3 assets/gen_defaults.py -c Mandatory -c Recommended

You should now have an empty defaults.json file

$ cat assets/defaults.json 
{
  "/codeBook/@xml:lang": "",
  "/codeBook/@xsi:schemaLocation": "",
  "/codeBook/stdyDscr/citation/titlStmt/titl": "",
  "/codeBook/stdyDscr/citation/titlStmt/IDNo": "",
  "/codeBook/stdyDscr/citation/titlStmt/IDNo/@agency": "",
  "/codeBook/stdyDscr/citation/holdings/@URI": "",
  "/codeBook/stdyDscr/citation/rspStmt/AuthEnty": "",
  "/codeBook/stdyDscr/citation/distStmt/distrbtr": "",
  "/codeBook/stdyDscr/citation/distStmt/distDate/@date": "",
  "/codeBook/stdyDscr/stdyInfo/subject/keyword": "",
  "/codeBook/stdyDscr/stdyInfo/subject/keyword/@vocab": "",
  "/codeBook/stdyDscr/stdyInfo/subject/topcClas": "",
  "/codeBook/stdyDscr/stdyInfo/subject/topcClas/@vocab": "",
  "/codeBook/stdyDscr/stdyInfo/subject/topcClas/@vocabURI": "",
  "/codeBook/stdyDscr/stdyInfo/abstract": "",
  "/codeBook/stdyDscr/stdyInfo/sumDscr/collDate/@event": "",
  "/codeBook/stdyDscr/stdyInfo/sumDscr/collDate/@date": "",
  "/codeBook/stdyDscr/stdyInfo/sumDscr/nation": "",
  "/codeBook/stdyDscr/stdyInfo/sumDscr/nation/@abbr": "",
  "/codeBook/stdyDscr/stdyInfo/sumDscr/anlyUnit": "",
  "/codeBook/stdyDscr/stdyInfo/sumDscr/anlyUnit/concept": "",
  "/codeBook/stdyDscr/stdyInfo/sumDscr/anlyUnit/concept/@vocab": "",
  "/codeBook/stdyDscr/method/dataColl/timeMeth": "",
  "/codeBook/stdyDscr/method/dataColl/timeMeth/concept/@vocab": "",
  "/codeBook/stdyDscr/method/dataColl/sampProc/concept/@vocab": "",
  "/codeBook/stdyDscr/method/dataColl/collMode": "",
  "/codeBook/stdyDscr/method/dataColl/collMode/concept/@vocab": "",
  "/codeBook/stdyDscr/dataAccs/useStmt/restrctn": "",
  "/codeBook/fileDscr/fileTxt/fileName": ""
}

Installation

We assume you have a running Dataverse 4.20 or later and that you have Python 3.8 or later installed.

Clone the repostiory somewhere. We recommend something like /etc/dataverse.

mkdir /etc/dataverse
git clone https://github.com/aussda/proxy /etc/dataverse

Install requirements

pip3 install -r /etc/dataverse/proxy/requirements.txt

Create a cronjob that runs the script periodically

sudo crontab -e

# Every day at 04:00 run the script.
0 4 * * * /usr/bin/su - dataverse -c 'python3 /etc/dataverse/proxy/app/main.py'

Note that Dataverse automatically generates metadata exports daily, so we need to run the script daily as well. If you would like to revert the changes, you will need to delte all existing exports and request a reExportAll.

Configuration page

You can create a simple, more user friendly .html page that shows the proxy's configuraton. Simply run:

python3 public/gen_report.py

Contribution and contact

We are happy for any pull requests!

You can reach Archival Technologies at AUSSDA

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
app		app
assets		assets
public		public
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proxy for Dataverse OAI-MPH

Generating defaults

Installation

Configuration page

Contribution and contact

About

Releases 4

Packages

Languages

License

AUSSDA/proxy

Folders and files

Latest commit

History

Repository files navigation

Proxy for Dataverse OAI-MPH

Generating defaults

Installation

Configuration page

Contribution and contact

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages