Skip to content

Improve/Fix the get method to accept a Reader #104

@Aargonian

Description

@Aargonian

Hello! I see a potential issue with the current interface for get that I think should be solvable. I have noticed the get function the crate seems to be based around only accepts an &[u8]. I'm not as super familiar with Rust at this level yet as I would like to be, so forgive me if I have misunderstood, but that implies I need to read (potentially) the entire file into memory before it can be matched. Looking at the implementation for get_from_file(), this appears to be what that function abstracts.

#58 makes note that all tests seem to pass when test files have been truncated to only 4kb, even if the files themselves are now invalid. DOC/PPT/XLS Seem to be the exceptions as they are parsed rather than read as binary.

Of course, that is a much easier ask, to read only 4kb, but there is no guarantee this will hold into the future, and the crate interface does not make clear that this is all that may be needed to identify a file.

My thought is it would certainly be much better if we could either pass an iterator or Reader of some kind to get instead, allowing us to control the amount of data kept in memory. Even better, it seems like it may be possible (though perhaps not trivial/easy, I cannot say with my current understanding) to add some information such as the "max required size" to each matcher, denoting that the matcher should never need more than X bytes to determine if the file matches or not. Obviously this does not fly for all file types (although, again, the previous pull request seems to imply it works for pretty much all currently supported types), but that can be abstracted behind some enum or other interface for those looking to use the optimization. Perhaps by making the matcher a trait with an associated constant, the default of which is 0 to denote requiring the whole file?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions