An OAI-PMH to RDF translator.
This project provides an implementation of an OAI-PMH harvester to generate RDF documents.
v16.13.0
yarn install
The code requires a JSON-LD configuration file with for each OAI-PMH endpoint a configuration section.
{
"@context": [
"https://linkedsoftwaredependencies.org/bundles/npm/componentsjs/^5.0.0/components/context.jsonld",
"https://linkedsoftwaredependencies.org/bundles/npm/oai3rdf/^0.0.0/components/context.jsonld"
],
"@graph": [
{
"@id": "https://repository.ubn.ru.nl/oai/request",
"@type": "GetRecordResolver",
"baseUrl": "https://repository.ubn.ru.nl/oai/request",
"recordUrlPrefix": "https://repository.ubn.ru.nl/handle/",
"landingPagePrefix": "https://repository.ubn.ru.nl/handle/",
"fileUrlPrefix": "https://repository.ubn.ru.nl//bitstream/handle"
},
{
"@id": "https://pub.uni-bielefeld.de/oai",
"@type": "GetRecordResolver",
"baseUrl": "https://pub.uni-bielefeld.de/oai",
"recordUrlPrefix": "oai:pub.uni-bielefeld.de:" ,
"landingPagePrefix": "https://pub.uni-bielefeld.de/record/",
"fileUrlPrefix": "https://pub.uni-bielefeld.de/download/"
}
]
}
Fields:
@id
: the base url of the OAI-PMH endpoint@type
:GetRecordResolver
baseUrl
: the base url of the OAI-PMH endpointrecordUrlPrefix
: the prefix of the OAI-PMH record identifierslandingPagePrefix
: the prefix of the repository landing page (we assume that the landing page of a record can be found by removing therecordUrlPrefix
from a OAI-PMH record identifier and appending this truncated identifier to thelandingPagePrefix
)fileUrlPrefix
: the prefix to recognize a full-text link in a OAI-PMH record
Harvest the first 10 records that are node deleted from an OAI-PMH endpoint into the ./in
directory
oai2rdf --info --from '2023-01-01' --until '2023-01-31' --max-no-del 10 --setSpec 'col_2066_119644' --outdir in https://repository.ubn.ru.nl/oai/request
To fetch all not deleted records, the command above needs to be executed multiple times until no new records are produced.
@prefix as: <https://www.w3.org/ns/activitystreams#>.
@prefix ietf: <http://www.iana.org/assignments/relation/>.
@prefix sorg: <https://schema.org/>.
@prefix ore: <http://www.openarchives.org/ore/terms/>.
_:n3-5 a sorg:AboutPage;
ore:isDescribedBy <https://pub.uni-bielefeld.de/record/2982899>;
as:summary "Robust Feature Selection and Robust Training to Cope with Hyperspectral Sensor Shifts";
sorg:datePublished "2023";
as:url <https://pub.uni-bielefeld.de/download/2982899/2982900>.
<https://pub.uni-bielefeld.de/download/2982899/2982900> as:mediaType "application/pdf";
<http://purl.org/dc/terms/accessRights> "open";
a as:Article, sorg:ScholarlyArticle.
Output the version number.
Specify the JSON-LD config file (default ./config.jsonld
).
Specify the cache file containing all known record identifiers (default ./cache.db
).
Harvestable metadataPrefic (default ./oai_dc
)
Harvestable set.
Date offset.
Date offset.
Harvesting offset in number of days.
Output directory.
Don't produce any output.
Do not produce more than NUMBER number of records.
Do not produce more than NUMBER number of not deleted records.
Debugging flags.
This code is part of the Mellon Scholarly Communication project.