Skip to content

Commit

Permalink
Add transform for DSpace METS records
Browse files Browse the repository at this point in the history
Why these changes are being introduced:
In order to migrate legacy sources into the new TIMDEX data model, we
need to be able to transform DSpace records in the METS XML format.

How this addresses that need:
* Adds a dspace_mets source to the source transforms, with
  transformations for all applicable fields into the TIMDEX record
  data model.
* Adds dspace source to cli source options and config.
* Adds a new helper function, generate_citation(), to generate a
  citation from other transformed fields if a citation field is not
  present in the source data.
* Refactors logging config a bit to set the root logger level based on
  verbosity input to cli (to ensure that all modules get logged
  appropriately, not just the cli module).
* Adds tests for all new functionality with dspace_mets fixtures
  representing various expected source record conditions.
* Cleans up cli tests a bit.

Side effects of this change:
Other source transforms can be updated to use the generate_citation()
helper function.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/RDI-120
  • Loading branch information
hakbailey committed Jun 13, 2022
1 parent 1784992 commit da1b8f7
Show file tree
Hide file tree
Showing 16 changed files with 1,656 additions and 69 deletions.
5 changes: 4 additions & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,7 @@ ignore_missing_imports = True
ignore_missing_imports = True

[mypy-smart_open.*]
ignore_missing_imports = True
ignore_missing_imports = True

[tool:pytest]
log_level = DEBUG
165 changes: 165 additions & 0 deletions tests/fixtures/dspace/dspace_mets_record_all_fields.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
<?xml version="1.0" encoding="UTF-8"?>
<record xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<header>
<identifier>oai:dspace.mit.edu:1721.1/142832</identifier>
<datestamp>2022-06-01T03:46:27Z</datestamp>
<setSpec>com_1721.1_7582</setSpec>
<setSpec>hdl_1721.1_7582</setSpec>
<setSpec>com_1721.1_7581</setSpec>
<setSpec>hdl_1721.1_7581</setSpec>
<setSpec>col_1721.1_131023</setSpec>
<setSpec>hdl_1721.1_131023</setSpec>
</header>
<metadata>
<mets xmlns="http://www.loc.gov/METS/"
xmlns:doc="http://www.lyncode.com/xoai"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xlink="http://www.w3.org/1999/xlink" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd" PROFILE="DSpace METS SIP Profile 1.0" TYPE="DSpace ITEM" ID="&#10;&#9;&#9;&#9;&#9;DSpace_ITEM_1721.1-142832" OBJID="&#10;&#9;&#9;&#9;&#9;hdl:1721.1/142832">
<metsHdr CREATEDATE="2022-06-06T14:02:17Z">
<agent TYPE="ORGANIZATION" ROLE="CUSTODIAN">
<name>mit-6</name>
</agent>
</metsHdr>
<dmdSec ID="DMD_1721.1_142832">
<mdWrap MDTYPE="MODS">
<xmlData xmlns:mods="http://www.loc.gov/mods/v3" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-1.xsd">
<mods:mods xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-1.xsd">
<mods:name>
<mods:role>
<mods:roleTerm type="text">advisor</mods:roleTerm>
</mods:role>
<mods:namePart>Checkelsky, Joseph</mods:namePart>
</mods:name>
<mods:name>
<mods:role>
<mods:roleTerm type="text">author</mods:roleTerm>
</mods:role>
<mods:namePart>Tatsumi, Yuki</mods:namePart>
</mods:name>
<mods:name>
<mods:role>
<mods:roleTerm type="text">department</mods:roleTerm>
</mods:role>
<mods:namePart>Massachusetts Institute of Technology. Department of Physics</mods:namePart>
</mods:name>
<mods:name>
<mods:namePart>Smith, Susie Q.</mods:namePart>
</mods:name>
<mods:extension>
<mods:dateAccessioned encoding="iso8601">2022-05-31T13:31:20Z</mods:dateAccessioned>
</mods:extension>
<mods:extension>
<mods:dateAvailable encoding="iso8601">2022-05-31T13:31:20Z</mods:dateAvailable>
</mods:extension>
<mods:originInfo>
<mods:dateIssued encoding="iso8601">2021-09</mods:dateIssued>
</mods:originInfo>
<mods:identifier type="citation">Tatsumi, Yuki. "Magneto-thermal Transport and Machine Learning-assisted Investigation of Magnetic Materials." Massachusetts Institute of Technology © 2022.</mods:identifier>
<mods:identifier type="uri">https://hdl.handle.net/1721.1/142832</mods:identifier>
<mods:abstract>Heat is carried by different types quasiparticles in crystals, including phonons, charge carriers, and magnetic excitations. In most materials, thermal transport can be understood as the flow of phonons and charge carriers; magnetic heat flow is less well-studied and less well understood.&#13; &#13;Recently, the concept of the flat band, with a vanishing dispersion, has gained importance. Especially in electronic systems, many theories and experiments have proven that some structures such as kagome or honeycomb lattices hosts such flat bands with non-trivial topology. Even though a number of theories suggest that such dispersionless mode exist in magnonic bands under the framework of the Heisenberg spin model, few experiments indicate its existence. Not limited to these flat band effects, magnetic insulators can assume a variety of nontrivial topologies such as magnetic skyrmions. In this thesis, I investigate the highly frustrated magnetic system Y0.5Ca0.5BaCo4O7, where the kagome lattice could potentially lead to nontrivial thermal transport originated from its flat band. While we do not observe signatures of the flat band in thermal conductivity, the observed anomalous Hall effect in electrical transport and spin glass-like behavior suggest a complex magnetization-transport mechanism.&#13;&#13;Motivated by the rapid advancement of artificial inteligence, the application of machine learning into materials exploration is recently investigated. Using a graphical representation of crystallines orginally suggested in Crystal Graphical Convolutional Neural Network (CGCNN), we developed the ML-asssited method to explore magnetic compounds. Our machine learning model can, so far, distiguish ferromagnet or antiferromagnet systems with over 70% accuracy based only on structual/elemental information. Prospects of studying more complex magnets are described.</mods:abstract>
<mods:language>
<mods:languageTerm authority="rfc3066">en_US</mods:languageTerm>
</mods:language>
<mods:originInfo>
<mods:publisher>Massachusetts Institute of Technology</mods:publisher>
</mods:originInfo>
<mods:accessCondition type="useAndReproduction">In Copyright - Educational Use Permitted</mods:accessCondition>
<mods:titleInfo>
<mods:title>Magneto-thermal Transport and Machine Learning-assisted Investigation of Magnetic Materials</mods:title>
</mods:titleInfo>
<mods:titleInfo>
<mods:title type="alternative">A Slightly Different Title</mods:title>
</mods:titleInfo>
<mods:genre>Thesis</mods:genre>
<mods:subject>
<mods:topic>Metallurgy and Materials Science</mods:topic>
</mods:subject>
<mods:relatedItem type="series">MIT-CSAIL-TR-2018-016</mods:relatedItem>
<mods:relatedItem type="host">Nature Communications</mods:relatedItem>
</mods:mods>
</xmlData>
</mdWrap>
</dmdSec>
<amdSec ID="FO_1721.1_142832_1">
<techMD ID="TECH_O_1721.1_142832_1">
<mdWrap MDTYPE="PREMIS">
<xmlData xmlns:premis="http://www.loc.gov/standards/premis" xsi:schemaLocation="http://www.loc.gov/standards/premis http://www.loc.gov/standards/premis/PREMIS-v1-0.xsd">
<premis:premis>
<premis:object>
<premis:objectIdentifier>
<premis:objectIdentifierType>URL</premis:objectIdentifierType>
<premis:objectIdentifierValue>https://dspace.mit.edu/bitstream/1721.1/142832/1/tatsumi-yukit-sm-physisc-2021-thesis.pdf</premis:objectIdentifierValue>
</premis:objectIdentifier>
<premis:objectCategory>File</premis:objectCategory>
<premis:objectCharacteristics>
<premis:fixity>
<premis:messageDigestAlgorithm>MD5</premis:messageDigestAlgorithm>
<premis:messageDigest>3a06f8863f4dfb2f1ab22607007c34e7</premis:messageDigest>
</premis:fixity>
<premis:size>12351827</premis:size>
<premis:format>
<premis:formatDesignation>
<premis:formatName>application/pdf</premis:formatName>
</premis:formatDesignation>
</premis:format>
</premis:objectCharacteristics>
<premis:originalName>tatsumi-yukit-sm-physisc-2021-thesis.pdf</premis:originalName>
</premis:object>
</premis:premis>
</xmlData>
</mdWrap>
</techMD>
</amdSec>
<amdSec ID="FT_1721.1_142832_2">
<techMD ID="TECH_T_1721.1_142832_2">
<mdWrap MDTYPE="PREMIS">
<xmlData xmlns:premis="http://www.loc.gov/standards/premis" xsi:schemaLocation="http://www.loc.gov/standards/premis http://www.loc.gov/standards/premis/PREMIS-v1-0.xsd">
<premis:premis>
<premis:object>
<premis:objectIdentifier>
<premis:objectIdentifierType>URL</premis:objectIdentifierType>
<premis:objectIdentifierValue>https://dspace.mit.edu/bitstream/1721.1/142832/2/tatsumi-yukit-sm-physisc-2021-thesis.pdf.txt</premis:objectIdentifierValue>
</premis:objectIdentifier>
<premis:objectCategory>File</premis:objectCategory>
<premis:objectCharacteristics>
<premis:fixity>
<premis:messageDigestAlgorithm>MD5</premis:messageDigestAlgorithm>
<premis:messageDigest>588f139172959b8eee0a67e406ef4016</premis:messageDigest>
</premis:fixity>
<premis:size>76618</premis:size>
<premis:format>
<premis:formatDesignation>
<premis:formatName>text/plain</premis:formatName>
</premis:formatDesignation>
</premis:format>
</premis:objectCharacteristics>
<premis:originalName>tatsumi-yukit-sm-physisc-2021-thesis.pdf.txt</premis:originalName>
</premis:object>
</premis:premis>
</xmlData>
</mdWrap>
</techMD>
</amdSec>
<fileSec>
<fileGrp USE="ORIGINAL">
<file ID="BITSTREAM_ORIGINAL_1721.1_142832_1" MIMETYPE="application/pdf" SEQ="1" SIZE="12351827" CHECKSUM="3a06f8863f4dfb2f1ab22607007c34e7" CHECKSUMTYPE="MD5" ADMID="FO_1721.1_142832_1" GROUPID="GROUP_BITSTREAM_1721.1_142832_1">
<FLocat xlink:type="simple" LOCTYPE="URL" xlink:href="https://dspace.mit.edu/bitstream/1721.1/142832/1/tatsumi-yukit-sm-physisc-2021-thesis.pdf"/>
</file>
</fileGrp>
<fileGrp USE="TEXT">
<file ID="BITSTREAM_TEXT_1721.1_142832_2" MIMETYPE="text/plain" SEQ="2" SIZE="76618" CHECKSUM="588f139172959b8eee0a67e406ef4016" CHECKSUMTYPE="MD5" ADMID="FT_1721.1_142832_2" GROUPID="GROUP_BITSTREAM_1721.1_142832_2">
<FLocat xlink:type="simple" LOCTYPE="URL" xlink:href="https://dspace.mit.edu/bitstream/1721.1/142832/2/tatsumi-yukit-sm-physisc-2021-thesis.pdf.txt"/>
</file>
</fileGrp>
</fileSec>
<structMap TYPE="LOGICAL" LABEL="DSpace Object">
<div TYPE="DSpace Object Contents" ADMID="DMD_1721.1_142832">
<div TYPE="DSpace BITSTREAM">
<fptr FILEID="BITSTREAM_ORIGINAL_1721.1_142832_1"/>
</div>
</div>
</structMap>
</mets>
</metadata>
</record>
Loading

0 comments on commit da1b8f7

Please sign in to comment.