Skip to content

Latest commit

 

History

History
99 lines (71 loc) · 3.29 KB

reading-mets.rst

File metadata and controls

99 lines (71 loc) · 3.29 KB

Reading METS files

metsrw supports reading METS files from disk, from strings, or from lxml _Element or _ElementTree objects.

.. testcode::

    # From a file on disk
    mets = metsrw.METSDocument.fromfile("../fixtures/complete_mets.xml")

    #  From bytes
    mets_str = b"""<?xml version='1.0' encoding='ASCII'?>
    <mets xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.loc.gov/METS/" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/version18/mets.xsd">
        <metsHdr CREATEDATE="2015-12-16T22:38:48"/>
        <structMap ID="structMap_1" LABEL="Archivematica default" TYPE="physical"/>
    </mets>"""
    mets = metsrw.METSDocument.fromstring(mets_str)

    # From an lxml object
    tree = lxml.etree.parse("../fixtures/complete_mets.xml")
    mets = metsrw.METSDocument.fromtree(tree)


Accessing METS Data

To retrieve an :class:`metsrw.FSEntry`, use the :meth:`~metsrw.METSDocument.get_file` or :meth:`~metsrw.METSDocument.all_files` methods.

>>> mets = metsrw.METSDocument()
>>> file_uuid = str(uuid.uuid4())
>>> file_1 = metsrw.FSEntry(
...     label="hello.pdf", path="test/hello.pdf", type="Item",
...     file_uuid=file_uuid)
>>> mets.append_file(file_1)

>>> mets.get_file(file_uuid=file_uuid)
FSEntry(type='Item', path='test/hello.pdf', use='original', ...)

>>> mets.all_files()
{FSEntry(type='Item', path='test/hello.pdf', use='original', ...)}

# Currently, filtering files can only be done via iteration
>>> [entry for entry in mets.all_files() if entry.use == "original"]
[FSEntry(type='Item', path='test/hello.pdf', use='original', ...)]

amdSec and dmdSec data is accessible via the :attr:`~metsrw.FSEntry.amdsecs` and :attr:`~metsrw.FSEntry.dmdsecs` properties.

>>> mets = metsrw.METSDocument.fromfile('../fixtures/complete_mets.xml')
>>> fsentry = mets.get_file(file_uuid="ab5c67fc-8f80-4e46-9f20-8d5ae29c43f2")
>>> amdsec1 = fsentry.amdsecs[0]
>>> [section for section in amdsec1.subsections if section.subsection == 'techMD']
[<metsrw.metadata.SubSection ...>]
>>> fsentry.dmdsecs[0]
<metsrw.metadata.SubSection ...>

Note

In most cases, you'll want to access PREMIS data via the get_premis series of methods, rather than accessing the amdSec or dmdSec data directly. See Accessing PREMIS Data for more info.

Accessing PREMIS Data

To access PREMIS metadata associated with a file, use the following methods:

# Currently, filtering PREMIS objects can only be done via iteration
>>> ingestion_events = []
>>> mets = metsrw.METSDocument.fromfile('../fixtures/complete_mets.xml')
>>> for fsentry in mets.all_files():
...     for event in fsentry.get_premis_events():
...          if event.type == "ingestion":
...              ingestion_events.append(event)
>>> ingestion_events[0]
('event', ...)