Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: improve detect_filetype warning to include filename #355

Conversation

tomaarsen
Copy link
Contributor

Closes #208

Hello!

Pull Request overview

  • Improved detect_filetype warning to include filename if provided.

Details

See #208 for details on the desired improvement.

Reproduction

Consider the following script to reproduce the warning:

from unstructured.file_utils.filetype import detect_filetype

print(detect_filetype(filename="example-docs/fake-email-header.eml"))

I've used the already-available example-docs/fake-email-header.eml as libmagic responds with an unknown mime type for it. This makes it a good target file to test out these changes with.

On the main branch, the output of this script is:

MIME type was message/rfc822. This file type is not currently supported in unstructured.
FileType.UNK

After this PR, the output will be:

The MIME type of 'example-docs/fake-email-header.eml' is 'message/rfc822'. This file type is not currently supported in unstructured.
FileType.UNK

I'm open to feedback, as always. And feel free to let me know if I need to handle any merge conflicts.

  • Tom Aarsen

Copy link
Contributor

@MthwRobinson MthwRobinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!!

@MthwRobinson MthwRobinson merged commit 3d21b40 into Unstructured-IO:main Mar 10, 2023
@tomaarsen tomaarsen deleted the enhancement/improve_detect_filetype_warning branch March 10, 2023 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

More meaningful warnings for unknown filetypes in detect_filetype
2 participants