Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

premis:originalName value of directories should not be normalized #1051

Closed
lwo opened this issue Apr 9, 2018 · 4 comments
Closed

premis:originalName value of directories should not be normalized #1051

lwo opened this issue Apr 9, 2018 · 4 comments
Assignees
Labels
IISH International Institute of Social History Status: in progress Issue that is currently being worked on. Waffle label. Type: bug A flaw in the code that causes the software to produce an incorrect or unexpected result.
Milestone

Comments

@lwo
Copy link

lwo commented Apr 9, 2018

The AIP mets (located in the AIP package under /data/METS.{uuid].xml) contains premis:originalName elements.

We noticed that with directories, the premis:originalName value contains normalized values:

<mets:dmdSec ID="dmdSec_9">
    <mets:mdWrap MDTYPE="PREMIS:OBJECT">
      <mets:xmlData>
        <premis:object xmlns:premis="http://www.loc.gov/premis/v3" xsi:type="premis:intellectualEntity" xsi:schemaLocation="http://www.loc.gov/premis/v3 http://www.loc.gov/standards/premis/v3/premis.xsd" version="3.0">
          <premis:objectIdentifier>
            <premis:objectIdentifierType>UUID</premis:objectIdentifierType>
            <premis:objectIdentifierValue>907c4251-a731-4d63-929a-fbcf8e3506ef</premis:objectIdentifierValue>
          </premis:objectIdentifier>
          <premis:originalName>%SIPDirectory%objects/untitled_folder_name_with_more_than_so_many_characters/untitled_subfolder_name_with_more_than_so_many_characters/untitled_subsubfolder_name_with_more_than_so_many_characters/</premis:originalName>
        </premis:object>
      </mets:xmlData>
    </mets:mdWrap>
  </mets:dmdSec>

We expect the original value of the filesystem and not the normalized version. In the above example we expect spaces and not underscores. The premis:originalName value for filenames does have spaces.
qubot pushed a commit that referenced this issue Apr 25, 2018
Modifies the archivematicaCreateMETS2.py client script to stop using the
sanitized (current location) of directories as the value for the
<premis:originalName> value in the METS file.

Fixes #1051
@jrwdunham jrwdunham added the Type: bug A flaw in the code that causes the software to produce an incorrect or unexpected result. label Apr 25, 2018
@jrwdunham jrwdunham added this to the 1.8.0 milestone Apr 25, 2018
@jrwdunham
Copy link
Contributor

This should be fixed by #1063

@jhsimpson jhsimpson modified the milestones: 1.8.0, 1.7.2 May 3, 2018
@ross-spencer ross-spencer added the IISH International Institute of Social History label May 11, 2018
@ross-spencer
Copy link
Contributor

Ref: PREMIS Data Dictionary v3.0:

image

http://www.loc.gov/standards/premis/v3/premis-3-0-datadictionary-only.pdf (Page 64)

@ross-spencer
Copy link
Contributor

Given a transfer with name: iish_2

Then the information that allows us to see the original directory names is in: iish_2-8b62cae4-1561-44c9-a890-d83a933804fb/iish_2-8b62cae4-1561-44c9-a890-d83a933804fb/data/logs/transfers/iish_2-31761017-ad28-4718-a59e-31a224220373/logs/filenameCleanup.log

E.g.

Sanitized name: %transferDirectory%objects/folder with spaces/file with spaces.txt  ->  %transferDirectory%objects/folder_with_spaces/file_with_spaces.txt
Sanitized name: %transferDirectory%objects/folder with spaces/  ->  %transferDirectory%objects/folder_with_spaces/
sanitizeObjectNames.py: INFO      2018-05-15 16:16:03,327  archivematica.mcp.client.sanitizeObjectNames:sanitize_object_names:116:  Sanitized name: %transferDirectory%objects/folder with spaces/file with spaces.txt -> %transferDirectory%objects/folder_with_spaces/file_with_spaces.txt
sanitizeObjectNames.py: INFO      2018-05-15 16:16:03,508  archivematica.mcp.client.sanitizeObjectNames:sanitize_object_names:116:  Sanitized name: %transferDirectory%objects/folder with spaces/ -> %transferDirectory%objects/folder_with_spaces/

@nickwilkinson
Copy link

IISH

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IISH International Institute of Social History Status: in progress Issue that is currently being worked on. Waffle label. Type: bug A flaw in the code that causes the software to produce an incorrect or unexpected result.
Projects
None yet
Development

No branches or pull requests

7 participants