# Generate XMLs - EDM and METS/MODS Examples

This Jupyter notebook demonstrates how to generate XML files compliant with two important metadata standards used in digital libraries and cultural heritage:

1. **Europeana Data Model (EDM)**: A flexible model for describing digital objects in the cultural heritage domain, used by Europeana.
2. **METS/MODS**: Metadata Encoding and Transmission Standard (METS) combined with Metadata Object Description Schema (MODS), commonly used for describing digital library objects.

We'll use two example templates, which can be modified or extended.

#### Objectives

- Implement reserved persistent links into XMLs
- Create valid EDM and METS/MODS XML files
- Use the Validator against their respective schemas

In [2]:
from datetime import datetime, timezone
import os
from pathlib import Path
import re
from string import Template
os.chdir(Path().absolute().parent) if Path().absolute().name == "Tutorials" else None

from main_functions import create_record, publish_record, upload_files_into_deposition
from utilities import validate_edm_xml, validate_metsmods, validate_zenodo_metadata

# Define Template Paths
edm_template_path = Path("Templates/tutorial_template_edm.xml") # this is an intentionally faulty one
metsmods_template_path = Path("Templates/template_metsmods.xml")

# Load XMLs as Templates
edm_template = Template(edm_template_path.read_text(encoding="utf-8"))
metsmods_template = Template(metsmods_template_path.read_text(encoding="utf-8"))

#### (optional) Use other Placeholder Formats
If your template consists of other placeholder formats, e.g. %id%, replace them using regular expression processing to make it compatible with the string/Template module:

In [10]:
# Replace custom Placeholders
edm_template_txt = re.sub(r"%(\w+)%", r"${\1}", edm_template_path.read_text(encoding="utf-8"))
metsmods_template_txt = re.sub(r"%(\w+)%", r"${\1}", metsmods_template_path.read_text(encoding="utf-8"))
# Load above variables as Templates here...

### Reserve Zenodo Record with Test Data

In order to retrieve persistent identifiers for our XMLs, we need to reserve a Zenodo Record with a DOI and links to the files. We will use a 3D model and a thumbnail as test data, but in order to reduce the amount of requests, the upload process will be done at a later stage (when the XMLs were generated):

In [4]:
# Define Zenodo Metadata
zenodo_metadata = {
    "metadata": {
        "title": "Test 3D Model",
        "description": "Test Model",
        "upload_type": "other",
        "publication_date": datetime.now().strftime("%Y-%m-%d"),
        "access_right": "open",
        "license": "cc-by",
        "version": "0.0.1",
        "keywords": ["3D model", "tutorial"],
        "creators": [{"name": "Doe, John", "affiliation": "Tutorial University",}]   
    }
}

assert not validate_zenodo_metadata(zenodo_metadata), "Metadata invalid."

# Reserve Zenodo Record
create_msg, create_data = create_record(zenodo_metadata)
assert create_msg["success"], f"Could not create Record: {create_msg['text']}"
concept_recid = create_data["conceptrecid"]
record_id = create_data["id"]

# Define Filepaths
glb_path = Path("Tutorials/3DModels/test_model.glb")
glb_filename = glb_path.name
thumbnail_path = Path("Tutorials/Thumbnails/test_model_perspective_4.png")
thumbnail_filename = thumbnail_path.name
filepaths = [glb_path, thumbnail_path]

# Construct Links before Upload
record_link = create_data["links"]["html"]
glb_link = f"https://sandbox.zenodo.org/api/records/{record_id}/files/{glb_filename}/content"
thumbnail_link = f"https://sandbox.zenodo.org/api/records/{record_id}/files/{thumbnail_filename}/content"

### EDM Mapping

Now, let's define the data that should be inserted into the template:

In [None]:
# Set current date timestamp
date_created = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")

# Define EDM data for substitution
edm_data = {
    'edm_providedCHO': '12345',
    'dc_creator': 'John Doe',
    'dc_description': 'This is a test 3D model.',
    'dc_format': 'gLTF',
    'dc_identifier': '12345',
    'dc_language': 'English',
    'dc_type': '3D',
    'dc_title': 'My test 3D Model',
    'dc_subject': 'Architecture',
    'dc_isPartOf': 'Test Project',
    'dc_spatial': '12345/place',
    'edm_type': '3D',
    'edm_webresource': record_link,
    'dcterms_created': date_created,
    'edm_place': '12345/place',
    'wgs84_lat': '50.928788',
    'wgs84_lon': '11.584776',
    'skos_prefLabel': 'Jena',
    'ore_aggregation': '12345/ORE',
    'edm_aggregatedCHO': '12345',
    'edm_dataProvider': 'Test Data Provider',
    'edm_isShownBy': glb_link,
    'edm_object': thumbnail_link,
    'edm_provider': 'Test EDM Provider',
    'edm_rights': 'http://creativecommons.org/licenses/by/4.0/'
}

# (load XML again in case of template modifications)
edm_template = Template(edm_template_path.read_text(encoding="utf-8"))

# Substitute Variables by edm_data
edm_xml_string = edm_template.substitute(edm_data)

# Write XML file and add path to filepaths for upload
edm_xml_filepath = Path("Tutorials/XMLs/mapped_edm.xml")
edm_xml_filepath.write_text(edm_xml_string, encoding='utf-8')
filepaths.append(edm_xml_filepath) if not edm_xml_filepath in filepaths else None
print(edm_xml_string)

### Validate EDM XML

Now we should validate the generated EDM XML against the [latest EDM XML schema](https://github.com/europeana/metis-schema/tree/master/src/main/resources/schema_xsds); we can use the XML string or the file itself.
<br>**Attention!** There are some intentional errors in the template, which we will try to fix in order to learn more about the validator:

In [None]:
# Define path to EDM validation schema
edm_xsd_path = "Templates/EDM_Schemas/EDM.xsd"

# (load XML string again in case of template modifications)
edm_template = Template(edm_template_path.read_text(encoding="utf-8"))
edm_xml_string = edm_template.substitute(edm_data)

# Validate EDM XML string
validate_edm_xml(xsd_path=edm_xsd_path, xml_string=edm_xml_string)

Based on these validation errors, we are quite sure that our [template](../Templates/tutorial_template_edm.xml) is incorrect, so we would need to do the following:
1. In lines 42 and 43, remove " " from the placeholder.
2. In line 47, replace `rdf:resource` with `rdf:about`.
3. Run the above two cells again and enjoy the valid EDM XML!

### METS/MODS Mapping

Now, let's apply the same logic for METS/MODS XML:

In [None]:
metsmods_data = {
    'mods_title': '3D Test Model',
    'mods_person_displayForm': 'Tutorial Project',
    'mods_role_personal': 'aut',
    'mods_corporate_displayForm': 'Tutorial Project',
    'mods_role_corporate': 'prv',
    'mods_physicalLocation': 'Jena',
    'mods_license': 'CC-BY',
    'mods_recordInfoNote': 'tutorial',
    'dv_owner': 'Tutorial Repository',
    'dv_ownerLogo': 'https://www.gw.uni-jena.de/phifakmedia/30480/bg-human-digital.png',
    'dv_ownerSiteURL': 'https://www.gw.uni-jena.de/en/8465/juniorprofessur-fuer-digital-humanities',
    'dv_ownerContact': 'https://link.to/contactPage',
    'mets_fileMimetype': 'model/gltf-binary',
    'mets_fileLink': glb_link,
    'mets_thumbMimetype': 'image/png',
    'mets_thumbLink': thumbnail_link
}

# (load XML again in case of template modifications)
metsmods_template = Template(metsmods_template_path.read_text(encoding="utf-8"))

# Substitute Variables by metsmods_data
metsmods_xml_string = metsmods_template.substitute(metsmods_data)

# Write XML file and add path to filepaths for upload
metsmods_xml_filepath = Path("Tutorials/XMLs/mapped_metsmods.xml")
metsmods_xml_filepath.write_text(metsmods_xml_string, encoding='utf-8')
filepaths.append(metsmods_xml_filepath) if not metsmods_xml_filepath in filepaths else None
print(metsmods_xml_string)

### Validate METS/MODS XML

Validate the (in this case correct) template against the [Mets](https://www.loc.gov/standards/mets/) and [Mods](https://www.loc.gov/standards/mods/) XML Schemas, both provided by the Library of Congress (LoC).

In [None]:
# Define Paths to validation files
mets_xsd_path = "Templates/MetsMods_Schemas/mets.xsd"
mods_xsd_path = "Templates/MetsMods_Schemas/mods.xsd"

# Validate METS/MODS XML string
validate_metsmods(mets_xsd_path, mods_xsd_path, xml_string=metsmods_xml_string)

### Upload Files & Publish Record

Now that we have generated the XMLs, let's upload everything with the GLB file and thumbnail!

In [None]:
files_msg, files_data = upload_files_into_deposition(create_data, filepaths)
assert files_msg["success"], f"Error uploading files: {files_msg['text']}"

publish_msg, publish_data = publish_record(create_data)
assert publish_msg["success"], f"Error publishing Record: {files_msg['text']}"

print(f"Zenodo Record published at: {publish_data['links']['html']}")