Skip to content

KONDE-AT/dsebaseapp-archeutils

Repository files navigation

dsebaseapp-archeutils

utility module to ease the creation of ARCHE-RDF

This modules tries to ease the curation effort needed to describe a dataset of XML/TEI documents managed by a dsebaseapp as an ARCHE-RDF. Its main idea is to reuse as much existing metadata as possible and avoid any potential data duplication. The module consits of three main parts

  • an XQuery module named archeutils.xql
  • several API endpoints for serialising ARCHE-RDF data
  • a single configuration file for project/resource specific data data/meta/arche_constants.rdf

Whereas the first two parts are generic and therefore provided as reusable module, the configuration file needs to customized for each dsebaseapp-project and is therefore NOT included in this module.

install

  • add this repo as submodule to your dsebaseapp project
    • git submodule add https://github.com/KONDE-AT/dsebaseapp-archeutils.git archeutils
  • create a document data/meta/arche_constants.rdf

archeutils.xql

The XQuery module named archeutils.xql exposes several variables needed to create an ARCHE-RDF fetched from

  • the application structure
  • data/meta/arche_constants.rdf

API-Endpoints

archeutils/ids.xql

The main entry point is the API-Endpoint archeutils/ids.xql which returns a json with the following structure:

{
    "arche_constants": "http://127.0.1.1:8080/exist/apps/thun/archeutils/dump-arche-cols.xql",
    "id_prefix": {
        "url": "https://id.acdh.oeaw.ac.at/thun"
    },
    "ids": [{
        "id": "https://id.acdh.oeaw.ac.at/thun/editions/ansichten-ueber-die-neuorganisation-der-volksschulen-od-a3-xxi-d650.xml",
        "filename": "ansichten-ueber-die-neuorganisation-der-volksschulen-od-a3-xxi-d650.xml",
        "html": "http://127.0.1.1:8080/exist/apps/thun/pages/show.html?document=ansichten-ueber-die-neuorganisation-der-volksschulen-od-a3-xxi-d650.xml&directory=editions",
        "md": "http://127.0.1.1:8080/exist/apps/thun/archeutils/md.xql?id=ansichten-ueber-die-neuorganisation-der-volksschulen-od-a3-xxi-d650.xml&collection=editions",
        "payload": "http://127.0.1.1:8080/exist/apps/thun/resolver/resolve-doc.xql?doc-name=ansichten-ueber-die-neuorganisation-der-volksschulen-od-a3-xxi-d650.xml&collection=editions",
        "mimetype": "application/xml"
    },
    {
        "id": "https://id.acdh.oeaw.ac.at/thun/editions/faller-an-thun-1859-01-31-a3-xxi-d494.xml",
        "filename": "faller-an-thun-1859-01-31-a3-xxi-d494.xml",
        "html": "http://127.0.1.1:8080/exist/apps/thun/pages/show.html?document=faller-an-thun-1859-01-31-a3-xxi-d494.xml&directory=editions",
        "md": "http://127.0.1.1:8080/exist/apps/thun/archeutils/md.xql?id=faller-an-thun-1859-01-31-a3-xxi-d494.xml&collection=editions",
        "payload": "http://127.0.1.1:8080/exist/apps/thun/resolver/resolve-doc.xql?doc-name=faller-an-thun-1859-01-31-a3-xxi-d494.xml&collection=editions",
        "mimetype": "application/xml"
        }
    ]
}
  • arche_constants points to the archeutils/dump-arche-cols.xql endpoint which returns ARCHE-MD serialized in RDF/XML by calling archeutils:dump_collections($cols)
  • each object in the ids array represents an XML/TEI resource which should be ingested into arche. The md key points to an resource specific archeutils endpoint archeutils/md.xql?id={id/doc-name of the resource to ingest}. The ARCHE-MD is generated by archeutils/md.xql which basically calls archeutils:populate_tei_resource

params

  • limit={random-string} list only 10 items, useful for testing the response of the endpoint as well as the actual ingest
  • custom_parent=true use this if you'll have a custom collection structure (see more below)

data/meta/arche_constants.rdf

(Ab)uses repo-schema to provide project specific data. E.g. thun-data/meta/arche_constants.rdf

in arche_constants.rdf you can basically set three types of MD

  1. Hand-made or literal MD. This is needed for project specific information, like project-descriptions, defining project-related agents (like e.g. PIs or funding bodies
  2. Constants for either all collections/resources or dedicated collections/resources
  3. dynamic md-properties derived from the actual XML/TEI Documents. For this you'll need to provide a mapping using Xpath.

Xpath mapping

  • the TEI Mapping needs to be done by collection, whereas the matching collection needs to be defined in the @collection in the <acdh:TeiLookUps collection='name-of-collection'> element

  • the element name matches an arche-schema property

  • the @type value can either be

    • literal -> the evaluated xpath expression becomes the text() of the element
    • literal_no_lang -> the evaluated xpath expression becomes the text() of the element but no default lang-attribute will be set
    • no_eval the text() will be copied into the arche-element (no need for this actually, as you can set constants on resource level anyway...)
    • date -> the elment get typed as date via rdf:datatype="http://www.w3.org/2001/XMLSchema#date"
    • resource -> the evaluated xpath expression is set as value for an @rdf:resource
    • resource_many -> in case the evaluated xpath expression returns a sequence, than for each item in the sequence, a new element (i.e. rdf-triple) is created
  • to override the default language you can set a @lang parameter, e.g.

    • <acdh:hasTitle type="literal" lang="und">normalize-space($item/tei:persName[1]/tei:forename/text()||' '||$item/tei:persName[1]/tei:surname/text())</acdh:hasTitle>

custom parents

Sometimes the default eXist/dsebaseapp collection structure is not feasable for ARCHE. To circumvent this, you can pass a &custom-parent=true URL-param to the ids.xql endpoint. This will avoid the default behaviour of adding the default isPartOf triple to any XML/TEI (which is the ARCHE-ID of its collection) but using the value defined in arche_constants.rdf BUT be aware that you'll need to provide the ARCHE-MD for those custom collections yourself and you'll need to be able to generate the matching IDs through XPATH (or custom xquery functions) called in arche_constants.rdf, e.g. something like:

<acdh:isPartOf type="resource">concat($item/@xml:base, '/',  substring-before($item//tei:title[@type="iso-date"]/text(), '-'))</acdh:isPartOf>

archeutils/dump-arche-persons.xql?start=0&length=100

  • serializes person like entites tei:person returns something like:
<rdf:RDF xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:acdh="https://vocabs.acdh.oeaw.ac.at/schema#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xml:base="https://id.acdh.oeaw.ac.at/">
    <acdh:Person>
        <acdh:hasIdentifier rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/abbondi-giorgio"/>
        <acdh:hasTitle xml:lang="und">Giorgio de Abbondi</acdh:hasTitle>
    </acdh:Person>
    <acdh:Person>
        <acdh:hasIdentifier rdf:resource="https://d-nb.info/gnd/118893106"/>
        <acdh:hasTitle xml:lang="und">Abdülmecid I. (auch Abdul Mecid)</acdh:hasTitle>
    </acdh:Person>
    <acdh:Person>
        <acdh:hasIdentifier rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/abraham-stefan"/>
        <acdh:hasTitle xml:lang="und">Stefan Abraham</acdh:hasTitle>
    </acdh:Person>
</rdf:RDF>
  • The actual output is derived from the mapping in arche_constants:
<acdh:PersonLookUps source="indices/listperson.xml">
  <acdh:hasIdentifier type="resource_many">archeutils:get_entity_id($item)</acdh:hasIdentifier>
  <acdh:hasTitle type="literal" lang="und">normalize-space($item/tei:persName[1]/tei:forename/text()||' '||$item/tei:persName[1]/tei:surname/text())</acdh:hasTitle>
</acdh:PersonLookUps>

The function archeutils:get_entity_id($item) checks if there is a tei:idno with a textnode containing a string with 'd-nb.info', 'geonames' or 'viaf' and returns this text-node as ARCHE-ID. If not, a generic ARCHE-ID is constructed from the elements @xml:id

archeutils/dump-arche-places.xql?start=0&length=100

  • same as for persons

archeutils/dump-arche-all-mentions.xql

  • serializes the resources and their mentioned entities expressed in ARCHE-RDF
<rdf:RDF xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:acdh="https://vocabs.acdh.oeaw.ac.at/schema#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xml:base="https://id.acdh.oeaw.ac.at/">
    <acdh:Resource rdf:about="https://id.acdh.oeaw.ac.at/thun/editions/simor-an-thun-1854-12-01-a3-xxi-d296d.xml">
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/123271606"/>
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/118757393"/>
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/141265825"/>
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/101780664"/>
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/117619027"/>
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/118594729"/>
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/119459159"/>
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/116016671"/>
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/138333823"/>
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/189010959"/>
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/118787977"/>
        <acdh:hasActor rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/henriques-de-carvalho-guilherme"/>
        <acdh:hasActor rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/bonel-y-orbe-juan-jose"/>
        <acdh:hasActor rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/kunszt-jozef"/>
        <acdh:hasActor rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/scitovsky-jan"/>
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/116106832"/>
        <acdh:hasSpatialCoverage rdf:resource="https://sws.geonames.org/3169070/"/>
        <acdh:hasSpatialCoverage rdf:resource="https://d-nb.info/gnd/4018145-5"/>
        <acdh:hasSpatialCoverage rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/place_064b3fb95f9ed52eb2b1da3d5e807b17"/>
        <acdh:hasSpatialCoverage rdf:resource="https://sws.geonames.org/2921044/"/>
        <acdh:hasSpatialCoverage rdf:resource="https://d-nb.info/gnd/4055964-6"/>
        <acdh:hasSpatialCoverage rdf:resource="https://sws.geonames.org/719819/"/>
        <acdh:hasSpatialCoverage rdf:resource="https://sws.geonames.org/3172395/"/>
        <acdh:hasSpatialCoverage rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/place_6f1d35d511be7a1f29234d7dda06e2dd"/>
        <acdh:hasSpatialCoverage rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/place_48e23d043764ef6b2d7d7acd9ac09860"/>
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/4402265-7"/>
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/1086824806"/>
    </acdh:Resource>
    <acdh:Resource rdf:about="https://id.acdh.oeaw.ac.at/thun/editions/memorandum-mikulas-neueinteilung-superintendenzen-1860-a3-xxi-d627.xml">
        <acdh:hasActor rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/mikulas-johann"/>
        <acdh:hasSpatialCoverage rdf:resource="https://sws.geonames.org/719819/"/>
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/1086824806"/>
    </acdh:Resource>
    <acdh:Resource rdf:about="https://id.acdh.oeaw.ac.at/thun/editions/thun-an-ficker-1854-05-09-ca179.xml">
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/118757393"/>
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/118532863"/>
        <acdh:hasActor rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/scheffer-boichorst-auguste-amalia"/>
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/119059312"/>
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/118535013"/>
        <acdh:hasSpatialCoverage rdf:resource="https://sws.geonames.org/2761367/"/>
        <acdh:hasSpatialCoverage rdf:resource="https://d-nb.info/gnd/4065781-4"/>
        <acdh:hasSpatialCoverage rdf:resource="https://id.acdh.oeaw.ac.at/thun/entity/place_064b3fb95f9ed52eb2b1da3d5e807b17"/>
        <acdh:hasSpatialCoverage rdf:resource="https://sws.geonames.org/2775220/"/>
        <acdh:hasSpatialCoverage rdf:resource="https://sws.geonames.org/2946447/"/>
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/36150-1"/>
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/36165-3"/>
        <acdh:hasActor rdf:resource="http://d-nb.info/gnd/2024703-5"/>
    </acdh:Resource>
</rdf:RDF>

About

utility module to ease the creation of ARCHE-RDF

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages