Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated setup.cfg mypy flags and resolved related errors. #703

Merged
merged 31 commits into from
Nov 9, 2022

Conversation

Sanketh7
Copy link
Contributor

@Sanketh7 Sanketh7 commented Nov 1, 2022

Added the following flags to setup.cfg. Run pre-commit install then pre-commit run -a to test these changes locally.

[mypy]
warn_return_any = True
warn_unused_configs = True
ignore_missing_imports = True

All of the warnings that needed to be resolved were because of warn_return_any. Some of my solutions are more elegant than others so let me know if you find a better way.

@taylorfturner taylorfturner added Bug Something isn't working static_typing mypy static typing issues High Priority Dramatic improvement, inaccurate calculation(s) or bug / feature making the library unusable labels Nov 1, 2022
@taylorfturner
Copy link
Contributor

@Sanketh7 update branch on this to make sure tests still passing

Copy link
Contributor

@taylorfturner taylorfturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just questions... I'll also test for if np.nan does get seen as a float instead of Any on my end.

Need to review more too all the cast() statements

dataprofiler/data_readers/json_data.py Show resolved Hide resolved
dataprofiler/data_readers/json_data.py Show resolved Hide resolved
dataprofiler/data_readers/json_data.py Show resolved Hide resolved
dataprofiler/data_readers/structured_mixins.py Outdated Show resolved Hide resolved
dataprofiler/labelers/base_data_labeler.py Show resolved Hide resolved
dataprofiler/data_readers/avro_data.py Show resolved Hide resolved
dataprofiler/profilers/float_column_profile.py Outdated Show resolved Hide resolved
dataprofiler/profilers/histogram_utils.py Outdated Show resolved Hide resolved
@taylorfturner
Copy link
Contributor

taylorfturner commented Nov 2, 2022

@Sanketh7 update branch ... there may be a couple new fixes needed too. My PSI #688 PR made a bunch of additions to numerical_column_stats.py

@taylorfturner taylorfturner removed the Bug Something isn't working label Nov 2, 2022
Copy link
Contributor

@taylorfturner taylorfturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments

dataprofiler/data_readers/structured_mixins.py Outdated Show resolved Hide resolved
dataprofiler/profilers/float_column_profile.py Outdated Show resolved Hide resolved
dataprofiler/profilers/float_column_profile.py Outdated Show resolved Hide resolved
@taylorfturner taylorfturner enabled auto-merge (squash) November 2, 2022 22:03
auto-merge was automatically disabled November 2, 2022 22:08

Head branch was pushed to by a user without write access

@taylorfturner taylorfturner self-requested a review November 2, 2022 22:09
)
bin_counts_impose = bin_counts_impose_pos + bin_counts_impose_neg

median_inds = np.abs(bin_counts_impose - 0.5) < 1e-10
if np.sum(median_inds) > 1:
return np.mean(bin_edges_impose[median_inds])
return cast(float, np.mean(bin_edges_impose[median_inds]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be a float64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting a similar issue as before:

"median_absolute_deviation": utils.find_diff_of_numbers(
         self.median_abs_deviation, other_profile.median_abs_deviation
 ),
dataprofiler/profilers/numerical_column_stats.py:381: error: Value of type variable "T" of "find_diff_of_numbers" cannot be "object"

Looking into it more, I believe it's because we're trying to assign a Union to a TypeVar with bound=Subtractable and the Union itself does not follow the Subtractable protocol. So far the only reasonable solution I can think of is doing:

"median_absolute_deviation": utils.find_diff_of_numbers(
         cast(float, self.median_abs_deviation), cast(float, other_profile.median_abs_deviation)
 ),

which isn't very ideal. I'll look into mypy generics more to see if there's a workaround.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. See #703 (comment)

cls, clsname, bases, attrs
)
new_class._register_subclass()
new_class._register_subclass() # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% positive, but I think the bound above should be BaseModel which would fix this.

We should really try to avoid using these bandaid fixes with #type: ignore unless absolutely necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sanketh7 can we get rid of this #type: ignore?

Copy link
Contributor Author

@Sanketh7 Sanketh7 Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the typevar to BaseModel didn't end up fixing it because mypy doesn't detect the relationship between BaseModel and AutoSubRegistrationMeta (this caused super(AutoSubRegistrationMeta, cls) to error).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has some interesting suggestions:
https://stackoverflow.com/questions/66121127/calling-new-on-an-any

However, if we can't do this, ultimately we should cast referring to the output as a derivation of BaseModel of some type. Returning it as AutoSubRegistrationMeta is not truly what matter since it is just a mixin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://stackoverflow.com/questions/63054541/how-to-type-the-new-method-in-a-python-metaclass-so-that-mypy-is-happy

This answer looks similar to what we're doing but they still end up doing a cast at the end. They also have a return type of the metaclass and not the derived class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I think that should go in a separate PR if we have to add anything to further validate. For now, I think as implemented is okay with regards to this PR getting merged.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it should be BaseModel, T should be identified as T ultimately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend in order to get the rest of this merged, @Sanketh7, revert this change for the AutoSubRegistration and get it working to pass mypy checks. Then in a follow-up PR, we can resolve this specific issue more fully.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted these changes so we can tackle it in another PR.

@@ -1842,7 +1857,7 @@ def is_int(x: str) -> bool:
return a == b

@staticmethod
def np_type_to_type(val: Any) -> Union[int, float]:
def np_type_to_type(val: Any) -> Union[int, float, Any]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this can be any, we technically don't need union, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, will this error then bc we are using Any?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the Union was redundant. Changing to Any actually doesn't cause issues with our mypy flag warn_return_any = True because that only complains when we return Any and the function return type is not Any.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@@ -412,7 +413,19 @@ def __sub__(self: T, other: T) -> Any:
T = TypeVar("T", bound=Subtractable)


@overload
def find_diff_of_numbers(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up going with an overload here. The issue is that there are a couple cases where we pass in a Union[float, np.float64] to find_diff_of_numbers and Unions don't satisfy the typevar bound.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice -- I would recommend formatting slightly different based on the docs I was seeing here, but nice improvement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, disregard this. I'm seeing it both ways

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to format it like that but the pre-commit formatter would add the extra new lines.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, an object shouldn't be casted as Union[float, np.float64] but has the option to return as one or the other.

I wonder if this meant we did those returns wrong and should have done overloads on those funcs where it could return a float in one or np.float64 in the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason we have the float case is because np.nan is float and some of our methods could return np.nan if certain conditions aren't met. From what I can see, a lot of these conditions can't be solved by typing overloads (for example, some are based on class fields and not method parameters).

At least for finding the diff between 2 Union[float, np.float64], we shouldn't run into issues because float gets promoted to float64 automatically.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataprofiler/profilers/numerical_column_stats.py Outdated Show resolved Hide resolved
@@ -412,7 +413,19 @@ def __sub__(self: T, other: T) -> Any:
T = TypeVar("T", bound=Subtractable)


@overload
def find_diff_of_numbers(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice -- I would recommend formatting slightly different based on the docs I was seeing here, but nice improvement

@taylorfturner taylorfturner enabled auto-merge (squash) November 9, 2022 16:03
taylorfturner
taylorfturner previously approved these changes Nov 9, 2022
Copy link
Contributor

@taylorfturner taylorfturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- let's create a follow-up issue for more fully resolving some of these comments in a subsequent, smaller PR

auto-merge was automatically disabled November 9, 2022 20:24

Head branch was pushed to by a user without write access

@taylorfturner taylorfturner enabled auto-merge (squash) November 9, 2022 20:27
taylorfturner
taylorfturner previously approved these changes Nov 9, 2022
Copy link
Contributor

@taylorfturner taylorfturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow-up PRs as discussed

@JGSweets JGSweets merged commit 5cf3784 into capitalone:main Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
High Priority Dramatic improvement, inaccurate calculation(s) or bug / feature making the library unusable static_typing mypy static typing issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants