Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve file type detection using headers and ignoring URL arguments #2856

Merged
merged 5 commits into from
May 15, 2024

Conversation

nvoxland-al
Copy link
Contributor

🚀 🚀 Pull Request

Impact

  • Bug fix (non-breaking change which fixes expected existing functionality)
  • Enhancement/New feature (adds functionality without impacting existing logic)
  • Breaking change (fix or feature that would cause existing functionality to change)

Description

Previously, we just looked at the end of the file string to determine the filetype in a ds.tensor_name.append(deeplake.read(...)) call.

For data read from URLs with query parameters, the end of the path was not necessarily the extension. Also, with URLs in general the path does not necessarily include the extension.

This PR removes any query parameters from the path when checking for a recognized extension, and also checks more header data to detect filetypes than the images we did previously.

Things to be aware of

Not all types can be detected by headers, but most are now

Copy link

sonarcloud bot commented May 15, 2024

@nvoxland-al nvoxland-al merged commit ef5fcd3 into main May 15, 2024
8 of 10 checks passed
@nvoxland-al nvoxland-al deleted the detect_headers branch May 15, 2024 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants