-
Notifications
You must be signed in to change notification settings - Fork 1.1k
refactor: Introduce 'exactly_one' to simplify partitioning functions #343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
qued
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments and suggestions. I did some digging on what it would take to get the new function to do the type verification in a way mypy would accept. I didn't come up with anything conclusive, but I got the impression there might be a more pythonic way of doing this using TypeGuards or overloads. I'm ok with the current implementation with some changes though.
To use **kwargs
In favor of unreachable Exceptions, as discussed in the review comments
…into refactor/partition_utils
|
@qued Thank you for the very detailed review! I love those. I've tackled each of the concerns in separate commits, so they can be verified more easily. |
…into refactor/partition_utils
…into refactor/partition_utils
qued
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Hello!
Pull Request overview
exactly_onefunction to simplify the partitioning functions.Details
I noticed that many partitioning functions follow this outline:
unstructured/unstructured/partition/html.py
Lines 40 to 71 in 2979e17
I.e. first checking if a non-zero amount of arguments are used, then a number of really large conditionals ensuring that only one is used, and then a final else-branch to raise virtually the same error as before, but now stating to use a maximum of one argument.
I figured this could easily be simplified by introducing a function that verifies that exactly one of the arguments are non-None. This function is called once, and if it doesn't result in an error, then the partitioning function can assume that only one of the arguments is used, allowing the simplification of the conditionals (e.g. no more
if file is not None and not filename and not text and not url, but justif file).Just like before, if a method is called incorrectly, an error is raised:
But the partitioning functions themselves are simplified a lot.
Let me know if you need anything else from me at this point.