A set of scripts:
- Merging metadata of a collection from inputs in various formats
- Validating the merged metadata
- Generating XLSX metadata templates based on the current ontology (see the horizontal metadata files in metadata formats description)
used for the metadata curation during ARCHE ingestions.
- Install PHP 8 and composer
- Run:
composer require acdh-oeaw/arche-metadata-crawler
- Install docker.
- Run the
acdhch/repo-file-checker
image mounting your data directory into it:docker run --rm -ti --entrypoint bash -u `id -u`:`id -g` \ -v pathToYourDataDir:/data \ acdhch/repo-file-checker
- Run the scripts, e.g.
and
/opt/vendor/bin/arche-create-metadata-template /data al
/opt/vendor/bin/arche-crawl-meta \ /data/metadata \ /data/merged.ttl \ /ARCHE/staging/GlaserDiaries_16674/data \ https://id.acdh.oeaw.ac.at/glaserdiaries
- if you need the file-checker,
it is available under
/opt/vendor/bin/arche-filechecker
- if you need the file-checker,
it is available under
Nothing to be done. It is installed there already.
(For a full walk-trough using repo-ingestion@hephaistos and the Wollmilchsau test collection please look here)
First, get the arche-ingestion workload console by:
- Opening this link (if you are redirected to the login page, open the link once again after you log in)
- Clicking on the bluish button with three vertical dots in the top-right corner of the screen and and choosing
> Execute Shell
Then:
- Generate and validate the metadata:
- Open a screen session (the shell disconnects after one minute of inactivity) with
screen
- If you need to reconnect to the screen session because it was disconnected, run
screen -rd
- If you need to reconnect to the screen session because it was disconnected, run
- Run the
arche-crawl-meta
script:e.g./ARCHE/vendor/bin/arche-crawl-meta \ <pathToMetadataDirectory> \ --filecheckerReportDir <pathToTheFileCheckerReportDirectory> \ <outputTtlPath> \ <basePathOfTheCollection> \ <idPrefix> \ 2>&1 | tee <pathToLogFile>
/ARCHE/vendor/bin/arche-crawl-meta \ /ARCHE/staging/GustavMahlerArchiv_22334/metadata \ --filecheckerReportDir /ARCHE/staging/GustavMahlerArchiv_22334/checkReports/2024_04_08_09_19_24 \ /ARCHE/staging/GustavMahlerArchiv_22334/scriptFiles/metadata.ttl \ /ARCHE/staging/GustavMahlerArchiv_22334/data \ https://id.acdh.oeaw.ac.at/GustavMahlerArchiv \ 2>&1 | tee /ARCHE/staging/GustavMahlerArchiv_22334/scriptFiles/metadata.log
- If you are want to skip the checks (which speeds up the process significantly), add the
--noCheck
parameter, e.g./ARCHE/vendor/bin/arche-crawl-meta \ /ARCHE/staging/GustavMahlerArchiv_22334/metadata \ --filecheckerReportDir /ARCHE/staging/GustavMahlerArchiv_22334/checkReports/2024_04_08_09_19_24 \ /ARCHE/staging/GustavMahlerArchiv_22334/scriptFiles/metadata.ttl \ /ARCHE/staging/GustavMahlerArchiv_22334/data \ https://id.acdh.oeaw.ac.at/GustavMahlerArchiv \ --noCheck \ 2>&1 | tee /ARCHE/staging/GustavMahlerArchiv_22334/scriptFiles/metadata.log
- If you are want to skip the checks (which speeds up the process significantly), add the
- Open a screen session (the shell disconnects after one minute of inactivity) with
- Create metadata templates:
e.g. to create templates in the current directory
/ARCHE/vendor/bin/arche-create-metadata-template \ <pathToDirectoryWhereTemplateShouldBeCreated> \ all
/ARCHE/vendor/bin/arche-create-metadata-template . all
- Generating and validaing the metadata:
e.g.
vendor/bin/arche-crawl-meta \ --filecheckerOutput <pathTo_fileList.json_generatedBy_repo-filechecker> \ <pathToCollectionData> \ <pathToTargetMetadataFile>
vendor/bin/arche-crawl-meta \ metaDir \ metadata.ttl `pwd`/data https://id.acdh.oeaw.ac.at/myCollection
- Creating metadata templates:
e.g. to create templates in the current directory
vendor/bin/arche-create-metadata-template \ <pathToDirectoryWhereTemplateShouldBeCreated> \ all
bin/arche-create-metadata-template . all
Remarks:
- To get a list of all available parameters run:
vendor/bin/arche-crawl-meta --help vendor/bin/arche-create-metadata-template --help
- Creating metadata templates:
Run a container mounting directory where templates should be created under
/mnt
inside the container:e.g. to create the templates in the current directorydocker run \ --rm -u `id -u`:`id -g`\ -v <pathToDirectoryWhereTemplateShouldBeCreated:/mnt \ acdhch/repo-file-checker createTemplate all
docker run \ --rm -u `id -u`:`id -g` -v `pwd`:/mnt acdhch/repo-file-checker createTemplate all