Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential new check: compare_with_value() #69

Closed
emmaKts opened this issue Feb 14, 2020 · 6 comments
Closed

Potential new check: compare_with_value() #69

emmaKts opened this issue Feb 14, 2020 · 6 comments
Labels
question Further information is requested wontfix This will not be worked on

Comments

@emmaKts
Copy link
Contributor

emmaKts commented Feb 14, 2020

Maybe consider the following?

Check a column in a dataframe to make sure all values are greater/lower/equal/ge/le/ than a specific value(not a range).

import operator

def compare_with_value(df, columns, value, which_operator):
    if any(which_operator(column_value, value) for column_value in df[columns].values):       
        raise AssertionError
    return df

Used with a MultiCheck decorator:

@dc.MultiCheck(checks={ck.has_no_nans: {"columns": "test_column"},
                       compare_with_value: {"columns": "test_column",
                                            "which_operator": operator.le,
                                            "value": 10.0
                                            }
                       }
               )

Also, in case it helps anyone:
I tried has_vals_within_range() following this format (i.e. as has_no_nans()):

ck.has_vals_within_range: {"columns": "test_column"}

but it seems that it needs an items element instead (see below).

 ck.has_vals_within_range: {"items": {"test_column": (2, 50)}}

I am just mentioning here as it's minor, but maybe consider renaming "items" to "columns" just for consistency?

@ZaxR
Copy link
Owner

ZaxR commented Feb 14, 2020

Thanks for the issue!

In this case, I'd suggest using ck.has_vals_within_range with {"test_column": (-np.inf, 10)} to check that all values are less than or equal to 10, but I did have a similar idea to the check you're describing awhile back. At the time the feedback from other users was that it wasn't "important" enough functionality to include, since it's easy enough to write a custom_check for.

Re: the has_vals_within_range example, it would be awesome if you would be willing to contribute that example in the docstring, so it will be available in the documentation. Here's a link to where it could be added in the code: https://github.com/ZaxR/bulwark/blob/master/bulwark/checks.py#L366

Re: the argument name items, that is a legacy name from the library this one is meant to replace, but is consistent across checks that take mappings (as opposed to lists, like when the "columns" argument name is used). I also find the name a bit confusing, though, so open to ideas to rename those params in the next major version release.

@emmaKts
Copy link
Contributor Author

emmaKts commented Feb 17, 2020

Thanks for your quick reply!

With regards to your:

1st comment - Yes, I thought of doing that but I had another case where I wanted to check that the column values were less than 10.0 (not equal) so it would be something like {"test_column": (-np.inf, 9.99)}, but it is still possible and easy with a custom check, so not really that important.

2nd comment - I can do that, sure. The example above shows how to use has_vals_within_range() with a MultiCheck decorator, so just to make sure, is that the example you think would be useful to add in the docstring?

3rd comment - That makes sense and -again- not a big issue, I just thought to mention. If we add the examples then it might be even not needed.

@ZaxR
Copy link
Owner

ZaxR commented Feb 17, 2020

Re: 2: It's on the to-do list to have decorator versions of checks inherit the docstring of the check (Issue #3 ). The code is set up to only easily allow docstrings for checks as a result. I was thinking that writing a function example would be sufficient, and that the value of the examples is showing the parameter use as opposed to general decorator syntax, but I'd be interested in your feedback.

Re:3: I think it's a good call. I want to make the library as intuitive as possible. As nice as good documentation is, intuitive use is equally/more important IMO.

@emmaKts
Copy link
Contributor Author

emmaKts commented May 22, 2020

Apologies for the delay,

I've now created a PR for this. There's an issue with sphinx as described here which is causing test failures.

@ZaxR
Copy link
Owner

ZaxR commented May 30, 2020

Thanks! I'll see what's going on with the docs on python 3.8 and then review this PR.

@ZaxR
Copy link
Owner

ZaxR commented May 30, 2020

Looks like it's an issue with m2r re: Sphinx 3.x.x. There's a PR out to m2r, but seems unlikely it's going to get merged, so pinning this project's Sphinx to 2.x.x for now to avoid the issue. I just merged #79 so we can check tests for your PR now

@ZaxR ZaxR added question Further information is requested wontfix This will not be worked on labels May 30, 2020
@ZaxR ZaxR closed this as completed May 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants