Improve file type detection using headers and ignoring URL arguments #2856
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🚀 🚀 Pull Request
Impact
Description
Previously, we just looked at the end of the file string to determine the filetype in a
ds.tensor_name.append(deeplake.read(...))
call.For data read from URLs with query parameters, the end of the path was not necessarily the extension. Also, with URLs in general the path does not necessarily include the extension.
This PR removes any query parameters from the path when checking for a recognized extension, and also checks more header data to detect filetypes than the images we did previously.
Things to be aware of
Not all types can be detected by headers, but most are now