-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add integration tests for end to end flow for data checks --> data check actions #2883
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2883 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 309 310 +1
Lines 29399 29495 +96
=======================================
+ Hits 29308 29404 +96
Misses 91 91
Continue to review full report at Codecov.
|
…2815_dc_integration
@@ -381,16 +381,16 @@ def _make_component_list_from_actions(actions): | |||
cols_to_drop = [] | |||
for action in actions: | |||
if action.action_code == DataCheckActionCode.DROP_COL: | |||
cols_to_drop.append(action.metadata["column"]) | |||
cols_to_drop.extend(action.metadata["columns"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleanup after the standardization in #2869
fail-fast: false | ||
matrix: | ||
include: | ||
- python_version: "3.8" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to run just one python version for now, but curious if there are other opinions!
make installdeps | ||
make installdeps-test | ||
pip freeze | ||
- name: Erase Coverage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, surprised I can call coverage erase
here. I thought the two workflows would step on each others' toes and we'd end up with a subpar coverage score, but I guess not 🤷♀️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Angela, this looks good to me! This will lead to some difficult questions about what constitutes an integration tests and how we handle all of them, but I think this is a good starting point.
} | ||
) | ||
for component in action_components: | ||
X_t = component.fit_transform(X_t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test highlights how nicely DataCheck/Actions fits into the fit/transform syntax. Very nicely done. I like this, very intuitive.
X_t = pd.DataFrame(data=data) | ||
X_t.iloc[0, 3] = 1000 | ||
X_t.iloc[3, 25] = 1000 | ||
X_t.iloc[5, 55] = 10000 | ||
X_t.iloc[10, 72] = -1000 | ||
X_t.iloc[:, 90] = "string_values" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but can we not just use X? X isn't changing is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I think you're right--I had made a separate variable in case we were changing X somewhere / wanted to check the original value, but that was probably unnecessarily cautious :). Removing!
Closes #2815, closes #1998.
Along the way, I noticed that our data checks return dictionary forms of the data check messages / actions (since those are more easily serializable), but
_make_component_list_from_actions
takes in actual data check action objects. Added a helper method to go from one to another. I also noticed it's not super easy to use a list of components, so going to work on #2968 sooner rather than later to make transformations easier :)I think it's fine to just one python version for now and linux only, but curious if there are other opinions!