This repo holds scripts user to migrate content into InvenioRDM. These have generally been used for one-time migration activities, but may be useful in the future.
migrate_caltechdata.py
was usilized to move records from the TIND-managed
Invenio instance to InvenioRDM
migrate_caltechthesis.py
was utilized to creats some minimal test records in
InvenioRDM. It is not complete.
For large collections of data we sometimes need to move the data first, and then create InvenioRDM records. An S3 object store like the Open Storage Network is a great option. You can bulk move records efficiently with s5cmd and the management scripts.
Run python make_command.py
to generate a list of files to sync. You'll need
to set environment variables with
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
S3_ENDPOINT_URL https://renc.osn.xsede.org
AWS_REGION us-east-1
Then run the command with
nohup ./s5cmd -numworkers 100 run commands.txt >> & log2017.txt ; echo Done >> & log2017.txt &
.
You may be able to adjust the numworkers component depending on the OS.
Raise an issue on the issue tacker.
Software produced by the Caltech Library is Copyright (C) 2023, Caltech. This software is freely distributed under a BSD/MIT type license. Please see the LICENSE file for more information.
These scripts were written by Tom Morrell.
This work was funded by the California Institute of Technology Library.