New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
identifyFileFormat: Write identification into Events #15
Conversation
This produces output in the METS that matches the old format: <format>
<formatDesignation>
<formatName>MPEG 1/2 Audio Layer 3</formatName>
<formatVersion></formatVersion>
</formatDesignation>
<formatRegistry>
<formatRegistryName>PRONOM</formatRegistryName>
<formatRegistryKey>fmt/134</formatRegistryKey>
</formatRegistry>
</format> <eventIdentifier>
<eventIdentifierType>UUID</eventIdentifierType>
<eventIdentifierValue>01fbf8da-6acf-48da-85a9-eeb65e2b6ec2</eventIdentifierValue>
</eventIdentifier>
<eventType>format identification</eventType>
<eventDateTime>2014-02-07T22:36:22</eventDateTime>
<eventDetail>program="Fido"; version="1"</eventDetail>
<eventOutcomeInformation>
<eventOutcome>Positive</eventOutcome>
<eventOutcomeDetail>
<eventOutcomeDetailNote>fmt/134</eventOutcomeDetailNote>
</eventOutcomeDetail>
</eventOutcomeInformation> |
Code looks good. :) What if the format it's identified as doesn't have a PRONOM ID? Eg. identify was done by extension to a FormatVersion that doesn't have a PRONOM ID. Also, how does the METS get generated? It used to be from FilesIDs, but I don't see that getting inserted, or another related change. |
We maintain an internal map for this in the FPR. At the time we identify formats, we associate the file with a FormatVersion, and FormatVersion has a pronom_id column. The tool output isn't what we're recording.
¯_(ツ)_/¯ This is still kind of a mystery to me but I'll look into it. |
Here's a good example of how the PUID is still written when file extension is used: https://gist.github.com/mistydemeo/a9f6b74a1856002be3dd (second section) |
Looks like FilesIDs wasn't being populated, so updated that. File IDs look the same for PRONOM file types, and for non-PRONOM file types they look like this: <format>
<formatDesignation>
<formatName>Generic Bitmap</formatName>
<formatVersion></formatVersion>
</formatDesignation>
<formatRegistry>
<formatRegistryName>Archivematica Format Policy Registry</formatRegistryName>
<formatRegistryKey>.bmp</formatRegistryKey>
</formatRegistry>
</format> |
Tested, and removing the archivematicaFITS code has not caused obvious issues. |
archivematicaFITS is run on submission documentation as well (after normalization, MSCLs 33d7ac55-291c-43ae-bb42-f599ef428325 and 76d87f57-9718-4f68-82e6-91174674c49c). Is this change going to affect that? I don't know what information about the submission documentation is actually needed. |
Hm, that's a good question. I'll ask Evelyn about this. |
Evelyn suggests adding running |
This uses identification information from the identification tool used, instead of using the output of archivematicaFITS. This is more accurate, since archivematicaFITS is no longer actually being used for the identifications used for other tasks within Archivematica. Fixes #6161
fixes #6161
Now that archivematicaFITS is no longer the authority on file identification, identifyFileFormat is responsible for writing these events and IDs. This removes the events from archivematicaFITS. refs #6161
This replaces the old archivematicaFITS event-writing code, which will allow us to remove archivematicaFITS and merge #6.