You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently infer just seems to offer infer::get for a buffer of bytes and infer::get_from_path for a path. However, the crates I am working with generally accept a R: Read or R: Read + Seek type, such that you can pass in a std::io::Cursor or std::io::File or really anything that just implements Read and perhaps Seek.
Another problem is that infer::get either assumes the user has the full byte buffer available or somehow knows the worst-case number of bytes required. The first option is not always feasible, for instance, in my particular use case I am working with archives which can easily end up being several hundreds of megabytes or even gigabytes.
I think it would be nice to have a function that accepts a R: Read, such that it can take as many bytes as it needs to figure out what the file type is. However, for this to work efficiently, it would probably be best to construct some sort of finite state machine from the known byte patterns that takes bytes until it knows what the actual file type is, such that for a ZIP archive it would only need to read 3 bytes and for a tarball it would need to read 261 bytes. Alternatively, you can sort the patterns by length and try each of them in ascending order to achieve the same.
The text was updated successfully, but these errors were encountered:
Currently infer just seems to offer
infer::get
for a buffer of bytes andinfer::get_from_path
for a path. However, the crates I am working with generally accept aR: Read
orR: Read + Seek
type, such that you can pass in astd::io::Cursor
orstd::io::File
or really anything that just implementsRead
and perhapsSeek
.Another problem is that
infer::get
either assumes the user has the full byte buffer available or somehow knows the worst-case number of bytes required. The first option is not always feasible, for instance, in my particular use case I am working with archives which can easily end up being several hundreds of megabytes or even gigabytes.I think it would be nice to have a function that accepts a
R: Read
, such that it can take as many bytes as it needs to figure out what the file type is. However, for this to work efficiently, it would probably be best to construct some sort of finite state machine from the known byte patterns that takes bytes until it knows what the actual file type is, such that for a ZIP archive it would only need to read 3 bytes and for a tarball it would need to read 261 bytes. Alternatively, you can sort the patterns by length and try each of them in ascending order to achieve the same.The text was updated successfully, but these errors were encountered: