pal-museum-metadata

Metadata validation and packaging tools for Merritt Ingest.

This code will be run from a Cloud9 environment into which the following resources have been loaded.

File structure

an inventory listing of existing tif files residing in S3
- /mrt/inventory/inventory.txt
mods files describing the tif files
- /mrt/mods
temp dir for pulling tif file samples
- /mrt/files
code directory
- /home/ec2-user/environment/code/pal-museum-metadata

Saving Changes to GitHub

This Cloud9 environment is shared by members of the UC3 team.

Therefore, it will be important to not save your github credentials into this working environement.

All of our code will live in a public repository, so it will be easy to pull code into this environment.

git fetch origin main

When you want to save changes back to GitHub, you have a few options

push the changes from Cloud9 to GitHub
- you will need to provide github credentials each time you push
- because you have 2FA enabled (a good thing!), you will need to use a GitHub Personal Access Token to save your work
  - Create a Personal Access Token
  - Name it something like "Leading Fellows Token"
  - Set the expiration date for the end of the fellowship
  - Enable only "public_repo" for this token
  - Save the generated token in a safe place that will be easy to copy/paste
- When you are prompted for a username and password
  - use your github username for username
  - use your personal access token as a password
make the changes through the github website
clone the repository to your PC and push the changes from there

git push origin main

Running the code

cd ~/environment
python code/pal-museum-metadata/src/scan.py

Goals

What becomes a Merritt Object
What identifier(s) will be used
- This will be used for any metadata updates
- What if we get access to the database
What metadata will be stored with the images
What percent objects have / do not have images and metadata
Create Merritt ingest manifest file(s) for each object
- Has identifier(s)
- Has erc descriptive metadata
- Has full file list
  - Url to the mods files
    - Terry will build a web service to make these accessible to the ingest service (done)
  - Url to the image files
    - Terry will build a web service to make these accessible to the ingest process (done)

Tasks

Weeks 1-3 (does this include travel time?) - starting Aug 9

Analyze match between files in the inventory vs identifiers in mods
- List of matching image and metadata
- List images missing metadata
- List metadata missing images
Recommend local identifier(s) to utilize likely some form of: 0001.02.0001
Map mods fields to Merritt erc
Hand generate a manifest file for a single Pal Museum object (urls depend on where mods and images are served)
- Create ingest manifest for an object with one or more files; supply metadata through Merritt UI
Load hand generated manifest to Merritt stage

Next steps

Generate list of files per object identifier
Create ingest manifests for objects with one or more files; create a manifest of manifests to supply corresponding metadata

Questions to answer

Metadata
- What is the format of our LocalId?
- What mods metadata do we want to publish?
What objects to publish?
- Mods + Image - Yes
- Mods only - ?
- Image only - ? (if a valid id can be created)
- Who consults on this decision?
Additional data
- What is in the new batch of data?
  - Images only?
  - Images and mods?
  - Replacement files?
- Copy to S3
- Modify program to pull new resources
What is in the database?
- Is there unique data in the database that is not already in mods?
- Can we associate this data with an identifier?
Manifest Generation (technical questions)
- Where should the web server run for the image files?
  - images are in S3, served by docker01
  - mods are in S3, served by docker01
- How will ERC metadata be associated with objects?
  - manifest of manifests?
  - erc files?
  - Terry will discuss this with Mark
- How will a batch of individual manifests be published to a url for the ingest process?
  - presume we will copy these into S3. Additional rights will be needed to push to S3
  - Terry will discuss options

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
ca_sql		ca_sql
src-ruby		src-ruby
src		src
testdata/manifests		testdata/manifests
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pal-museum-metadata

File structure

Saving Changes to GitHub

Running the code

Goals

Tasks

Weeks 1-3 (does this include travel time?) - starting Aug 9

Next steps

Questions to answer

About

Releases

Packages

Languages

CDLUC3/pal-museum-metadata

Folders and files

Latest commit

History

Repository files navigation

pal-museum-metadata

File structure

Saving Changes to GitHub

Running the code

Goals

Tasks

Weeks 1-3 (does this include travel time?) - starting Aug 9

Next steps

Questions to answer

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages