Tools to help you work with the ORCID datadump
Switch branches/tags
Nothing to show
Clone or download
Tom Demeranville Tom Demeranville
Tom Demeranville and Tom Demeranville updated readme
Latest commit 61b7dbc Nov 10, 2017
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docker-python added scholix transform and docker support Nov 10, 2017
.gitignore first draft Oct 16, 2017
dump_to_scholix_v2api.js added scholix transform and docker support Nov 10, 2017
readme.MD updated readme Nov 10, 2017

readme.MD

ORCID datadump to Mongo

This python3 script takes the ORCID JSON datadump and loads it directly into MongoDB. This is orders of magnitude quicker than extracting it to the file system.

dependencies

you need pymongo

python3 -m pip install pymongo

and a mongo instance. Either install locally, or use docker:

docker run --name orcid-mongo -v /Volumes/Transcend/docker/mongostorage/:/data/db -d mongo:3.5

Usage

have an instance of mongo running

python3 importer.py --file filename.tar.gz --collection target-collection-name

other scripts

dump_to_scholix_v2api.js : this takes the dump (in mongo) and transforms it into a scholix like format

Docker

This also works with docker

To start mongo:

docker run --name orcid-mongo -d -p 27017:27017 -v /Volumes/Transcend/docker/mongostorage/:/data/db mongo:3.5

To start the scipt

cd docker-python
docker build -t orcid-mongo-import .
docker run --name importer --link orcid-mongo:orcid-mongo --rm orcid-mongo-import --file myFile.tar.gz --collection myCollection --resume 0 

To run the script again

docket start importer