Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

identifyFileFormat: Write identification into Events #15

Merged
merged 3 commits into from Mar 13, 2014

Conversation

mistydemeo
Copy link
Contributor

This replaces the old archivematicaFITS event-writing code, which will allow us to remove archivematicaFITS and merge #6.

@mistydemeo
Copy link
Contributor Author

This produces output in the METS that matches the old format:

<format>
  <formatDesignation>
    <formatName>MPEG 1/2 Audio Layer 3</formatName>
    <formatVersion></formatVersion>
  </formatDesignation>
  <formatRegistry>
    <formatRegistryName>PRONOM</formatRegistryName>
    <formatRegistryKey>fmt/134</formatRegistryKey>
  </formatRegistry>
</format>
<eventIdentifier>
  <eventIdentifierType>UUID</eventIdentifierType>
  <eventIdentifierValue>01fbf8da-6acf-48da-85a9-eeb65e2b6ec2</eventIdentifierValue>
</eventIdentifier>
<eventType>format identification</eventType>
<eventDateTime>2014-02-07T22:36:22</eventDateTime>
<eventDetail>program="Fido"; version="1"</eventDetail>
<eventOutcomeInformation>
  <eventOutcome>Positive</eventOutcome>
  <eventOutcomeDetail>
    <eventOutcomeDetailNote>fmt/134</eventOutcomeDetailNote>
  </eventOutcomeDetail>
</eventOutcomeInformation>

@Hwesta
Copy link
Contributor

Hwesta commented Feb 11, 2014

Code looks good. :)

What if the format it's identified as doesn't have a PRONOM ID? Eg. identify was done by extension to a FormatVersion that doesn't have a PRONOM ID.

Also, how does the METS get generated? It used to be from FilesIDs, but I don't see that getting inserted, or another related change.

@qubot qubot closed this Feb 12, 2014
@qubot qubot reopened this Feb 12, 2014
@mistydemeo
Copy link
Contributor Author

What if the format it's identified as doesn't have a PRONOM ID? Eg. identify was done by extension to a FormatVersion that doesn't have a PRONOM ID.

We maintain an internal map for this in the FPR. At the time we identify formats, we associate the file with a FormatVersion, and FormatVersion has a pronom_id column. The tool output isn't what we're recording.

Also, how does the METS get generated? It used to be from FilesIDs, but I don't see that getting inserted, or another related change.

¯_(ツ)_/¯

This is still kind of a mystery to me but I'll look into it.

@mistydemeo
Copy link
Contributor Author

Here's a good example of how the PUID is still written when file extension is used: https://gist.github.com/mistydemeo/a9f6b74a1856002be3dd (second section)

@mistydemeo
Copy link
Contributor Author

Looks like FilesIDs wasn't being populated, so updated that. File IDs look the same for PRONOM file types, and for non-PRONOM file types they look like this:

<format>
  <formatDesignation>
    <formatName>Generic Bitmap</formatName>
    <formatVersion></formatVersion>
  </formatDesignation>
  <formatRegistry>
    <formatRegistryName>Archivematica Format Policy Registry</formatRegistryName>
    <formatRegistryKey>.bmp</formatRegistryKey>
  </formatRegistry>
</format>

@mistydemeo
Copy link
Contributor Author

Tested, and removing the archivematicaFITS code has not caused obvious issues.

@Hwesta
Copy link
Contributor

Hwesta commented Feb 27, 2014

archivematicaFITS is run on submission documentation as well (after normalization, MSCLs 33d7ac55-291c-43ae-bb42-f599ef428325 and 76d87f57-9718-4f68-82e6-91174674c49c). Is this change going to affect that? I don't know what information about the submission documentation is actually needed.

@mistydemeo
Copy link
Contributor Author

Hm, that's a good question. I'll ask Evelyn about this.

@mistydemeo
Copy link
Contributor Author

Evelyn suggests adding running identifyFileFormat on submission documentation, and allowing a selection of tool there the same as in the other two places it's run. I like that idea.

mistydemeo and others added 3 commits March 13, 2014 13:27
This uses identification information from the identification tool
used, instead of using the output of archivematicaFITS. This is more
accurate, since archivematicaFITS is no longer actually being used for
the identifications used for other tasks within Archivematica.

Fixes #6161
Now that archivematicaFITS is no longer the authority on file
identification, identifyFileFormat is responsible for writing these
events and IDs. This removes the events from archivematicaFITS.

refs #6161
@qubot qubot merged commit c5add6e into 1.0 Mar 13, 2014
@qubot qubot deleted the dev/issue-6161-identification-premis branch March 13, 2014 21:56
sevein added a commit that referenced this pull request Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants