How to handle two sets of bytes for matching improvements? #46

NebularNerd · 2024-01-31T17:19:49Z

Hi there,

I'm looking for a python package to help identify weird and wonderful files inside various scripts. I had seen fleep but that appears to be dead. Puremagic looks to offer the same functionality for what I want it for.

One job is for handling Amiga .iff files in an image conversion script. Having a quick look, it's nice to see .iff getting some love:

puremagic/puremagic/magic_data.json

Line 1084 in ff042db

["464f524d", 0, "", "application/x-iff", "IFF file"],

But in Amiga land that .iff FORM header is used for many things Wikipedia: List_of_file_signatures

Is there a way to help improve mapping and confidence by adding additional matching strings such as ILBM ACBM etc..? I'm happy to help with a PR if it can be done.

The text was updated successfully, but these errors were encountered:

cdgriffith · 2024-02-06T15:20:58Z

What we could do there is instead of matching at offset 0 and FORM we can change to the offset where the more accurate info lives and match there instead.

Don't currently have a way to do wildcards, so can't be as accurate matching both FORM and ACBM

Thanks for the info, I can work on that when I have time. If you know a source of sample files for that please share!

NebularNerd · 2024-02-06T15:49:00Z

Instead or in addition to wildcards another option could be dual match, take our .iff sample, we could look to do...

[["464f524d","494c424d"], [0,8], "", "application/x-iff", "IFF file"],

If your code sees a list instead of a string, process both hex matches using the matching offset from the next list, if both matches, we get pretty much 100% confidence it's what we think it is. Logic is a little weirder than wildcarding but it's another possible way.

Aminet is pretty much the internet oldest resource for all things Amiga, we should be able to find pretty much all things there.

Samples
iff

7zip will happily unpack most of the .lha and other formats you'll find there. If you get stuck on any let me know and I'm sure I can unearth samples from somewhere.

cdgriffith · 2024-02-07T01:33:09Z

Thanks for the samples! Added a multi-part detect.

Should be working in 1.20 https://github.com/cdgriffith/puremagic/releases/tag/1.20

NebularNerd · 2024-02-07T08:40:33Z

Nice! I've just looked at the implementation and that's way a great way to handle it, much tidier than mine. I'll test it out later on a script I have for handling converting images between formats.

For retro uses this will be handy as there are a lot of older formats like file packers that use a two part fingerprint.

cdgriffith closed this as completed Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle two sets of bytes for matching improvements? #46

How to handle two sets of bytes for matching improvements? #46

NebularNerd commented Jan 31, 2024

cdgriffith commented Feb 6, 2024

NebularNerd commented Feb 6, 2024

cdgriffith commented Feb 7, 2024

NebularNerd commented Feb 7, 2024

How to handle two sets of bytes for matching improvements? #46

How to handle two sets of bytes for matching improvements? #46

Comments

NebularNerd commented Jan 31, 2024

cdgriffith commented Feb 6, 2024

NebularNerd commented Feb 6, 2024

cdgriffith commented Feb 7, 2024

NebularNerd commented Feb 7, 2024