Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fmt/468 definition not matching valid files #82

Closed
ross-spencer opened this issue Jan 28, 2016 · 12 comments
Closed

fmt/468 definition not matching valid files #82

ross-spencer opened this issue Jan 28, 2016 · 12 comments
Labels

Comments

@ross-spencer
Copy link

DROID: 6.1.5
Sig: v84
Container: 20160112

I have a set of ISO's created using different tools that aren't being matched by the current PRONOM definition.

The internal structure suggests that they should be matching the signature definition. The output from Siegfried also shows that this should work:

  ---
  siegfried   : 1.4.4
  scandate    : 2016-01-28T19:22:48+13:00
  signature   : pronom.sig
  created     : 2016-01-28T19:22:32+13:00
  identifiers : 
    - name    : 'pronom'
      details : 'DROID_SignatureFile_V84.xml; container-signature-20160121.xml; built without reports'
  ---
  filename : 'isos\fmt-468-signature-id-730.iso'
  filesize : 40972
  modified : 2015-05-02T13:35:06+12:00
  errors   : 
  matches  :
    - id      : pronom
      puid    : fmt/468
      format  : 'ISO Disk Image File'
      version : 
      mime    : 
      basis   : 'extension match; byte match at [[[32769 5]] [[40966 6]]]'
      warning : 
  ---
  filename : 'isos\skeleton-001.iso'
  filesize : 36855
  modified : 2016-01-28T18:40:58+13:00
  errors   : 
  matches  :
    - id      : pronom
      puid    : fmt/468
      format  : 'ISO Disk Image File'
      version : 
      mime    : 
      basis   : 'extension match; byte match at [[[32769 5]] [[36759 6]]]'
      warning : 
  ---
  filename : 'isos\skeleton-030.iso'
  filesize : 37120
  modified : 2016-01-28T18:50:27+13:00
  errors   : 
  matches  :
    - id      : pronom
      puid    : fmt/468
      format  : 'ISO Disk Image File'
      version : 
      mime    : 
      basis   : 'extension match; byte match at [[[32769 5]] [[36864 6]]]'
      warning : 
  ---
  filename : 'isos\skeleton-050.iso'
  filesize : 36960
  modified : 2016-01-28T18:52:14+13:00
  errors   : 
  matches  :
    - id      : pronom
      puid    : fmt/468
      format  : 'ISO Disk Image File'
      version : 
      mime    : 
      basis   : 'extension match; byte match at [[[32769 5]] [[36864 6]]]'
      warning : 

The distance between both signatures should be 4095 in each of these instances which would fit into the PRONOM definition:

4344303031{1-16384}FF4344303031 

Attached are a set of skeleton files created from the ISO's that I am looking at.

Also attached is my standard skeleton file. This unfortunately does not suffer the same issue. Please note, that the files attached are okay to be wrapped in any future test suite you create depending on the resolution of this issue.

isos.zip

@Brian-O-TNA
Copy link
Contributor

Thanks for raising this Ross and apologies that it has taken some time to come back on this. The ISOs which are not identified as fmt/468 have an additional occurrence of the character sequence after the first occurrence at 32769 and before the ÿCD001 (in all the real world examples I've looked at, this additional occurrence is at offset 34817) . Overwriting this second occurrence with alternate bytes will result in the ISOs being identified in DROID. Conversely, editing an ISO which is identified correctly to add a further CD001 after the first one, and before the ÿCD001 will result in the file no longer being identified as fmt/468 in DROID.

As a short term fix, changing the signature's maximum sequence offset (i.e signature ID 730, SubSeqMaxOffset="34817") will result in the offending ISOs being identified. However, it would of course also allow malformed ISOs to be identified as false positives, since it would not enforce the required occurrence at 32769 - so is not a long term solution.

Rewriting the signature more thoroughly is another option - however the way DROID processes left (and right) fragments in relation to sequence offsets comes into play here and may be relevant in other edge cases we've seen, investigation continues.
Regards, Brian

@Brian-O-TNA
Copy link
Contributor

Brian-O-TNA commented Jul 13, 2016

This appears to be fixed in the build at:
https://github.com/snail1966/droid/tree/multiFragmentInstances
However, further research and testing is required and we would want to take on board possible similar edge cases before pushing to a release.
Regards, Brian

@ross-spencer
Copy link
Author

Hi Brian,

As I have issues building from source because of various unit test failures previously reported, if you had a way of providing a test release of this version of DROID I could do some user testing if you feel it might be valuable.

Cheers,

Ross

@Brian-O-TNA
Copy link
Contributor

Hi Ross
Unfortunately the zip is too large to upload here, however I've sent the updated jar file to you via email. Alternatively, you should also be able to build it in Maven by disabling the tests (i.e. mvn clean install -Dmaven.test.skip=true).
Cheers, Brian

@ross-spencer
Copy link
Author

Thanks for sending the JAR Brian. Just letting you know it was blocked by my work's mail server. If you've my gmail (Alex or David might) then please feel free to send it there.

What's your deadline for testing/release on this?

Also, does the fix take care of the right fragment reported for some signatures definitions such as x-fmt/263

@Dclipsham
Copy link

Dclipsham commented Jul 20, 2016

Hi Ross,

Having issues sending it directly to your Gmail too, so have sent an invite to my G-Drive to both your work and personal addresses. Let me know of any problems accessing it.

David

@ross-spencer
Copy link
Author

Cheers David, have managed to access it, but will also download it at work in the morning.

Tried to download a new droid on a recent system to test it tonight but experiencing unusual download speeds... Something your side? (30mbs in 20 mins)

Thanks for the help! Hope to be able to feed back to you next week.

On 20/07/2016, at 22:56, David Clipsham notifications@github.com wrote:

Hi Ross,

Having issues sending it directly to your Gmail too, so have sent an invite to my G-Drive to both your work and peronsal addresses. Let me know of any problems accessing it.

David


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@Dclipsham
Copy link

Seems okay here right now (~30 secs, both internal and external, plus mobile connections) is slowness persisting? I'll flag with our Web Engineers...

@Brian-O-TNA
Copy link
Contributor

After all that, I've found this fix doesn't work in cases (unlike the present one) where there is more than one left fragment - so unfortunately the revised code will need some further reworking (and could be made more efficient anyway). We don't seem to have quite the same issue with right fragments, possibly a variation on a theme in such cases, needs further investigation...

@ross-spencer
Copy link
Author

Thanks both! - I'll hold off testing for now. I guess it was a good exercise nontheless!! (I hope not too bothersome :) )

David - download from my work was just over a minute so will check what's going on at home. Thank you nonetheless.

@Brian-O-TNA
Copy link
Contributor

Brian-O-TNA commented Aug 4, 2016

As I suspected, this is indeed related to an issue we've found where intermediate left/right fragments with variable offsets occur more than once within their valid offset range - leading to subsequent fragments not matching as expected.
This is addressed in the fix at:
https://github.com/snail1966/droid/tree/multiFragmentInstances
However, it needs further in depth testing and review before it can be made available.
Regards, Brian

@paulyoung84
Copy link

Fixed in 6.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants