Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port over highly-null Data Check and define BasicDataChecks and DisableDataChecks classes #745
Port over highly-null Data Check and define BasicDataChecks and DisableDataChecks classes #745
Changes from 8 commits
c267dff
92b6c94
1de7c70
0095684
fd660e9
583ca0e
312d7f0
af806f5
d9d7a13
bd438cc
0389f6b
e3b97f1
0f8f160
59dcb4d
865bce3
7218bc3
41fc203
9b3e9d5
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's update this description. It doesn't mention what the percentage applies to. In fact, I wonder if we should rename this parameter. Perhaps
pct_null_threshold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like
pct_null_threshold
! I'm not sure what you mean by "doesn't mention what the percentage applies to"; is the input feature not clear? :oThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsherry I just changed it to:
Is that better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 yeah thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I think thats the last comment to address on this PR so I'll merge it when tests are green 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this output for inputs which aren't
pd.DataFrame
, or for dataframes which don't have column names set? If a dataframe's column names aren't set,pd.DataFrame.to_dict
will use the dataframe index. I see we have coverage for this intest_highly_null_data_check_input_formats
. I guess there's not much we can do about that, haha.Relatedly, in the future we'll probably want each data check to have its own message type. For instance, if we had
HighlyNullColumnWarning
, we could have that add acolumn_name
parameter as metadata. I don't think we should add that now, but I bet we'll need that later.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, right now it'd use the dataframe index, which I don't this is too bad of an idea? I think it's pretty nice for 2d data, but maybe a little weird for lists / series.
And yeah, I think the idea of each data check having their own message type was something we'd had talked about during the design phase, but seemed excessive / unnecessary for now.