Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integration tests for end to end flow for data checks --> data check actions #2883

Merged
merged 36 commits into from
Nov 2, 2021

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Oct 7, 2021

Closes #2815, closes #1998.

Along the way, I noticed that our data checks return dictionary forms of the data check messages / actions (since those are more easily serializable), but _make_component_list_from_actions takes in actual data check action objects. Added a helper method to go from one to another. I also noticed it's not super easy to use a list of components, so going to work on #2968 sooner rather than later to make transformations easier :)

I think it's fine to just one python version for now and linux only, but curious if there are other opinions!

@angela97lin angela97lin self-assigned this Oct 7, 2021
@codecov
Copy link

codecov bot commented Oct 7, 2021

Codecov Report

Merging #2883 (55a7474) into main (b7b64b9) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2883     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        309     310      +1     
  Lines      29399   29495     +96     
=======================================
+ Hits       29308   29404     +96     
  Misses        91      91             
Impacted Files Coverage Δ
evalml/data_checks/invalid_targets_data_check.py 100.0% <ø> (ø)
...alml/data_checks/target_distribution_data_check.py 100.0% <ø> (ø)
evalml/data_checks/data_check_action.py 100.0% <100.0%> (ø)
evalml/data_checks/data_check_action_code.py 100.0% <100.0%> (ø)
evalml/pipelines/utils.py 99.5% <100.0%> (ø)
.../tests/data_checks_tests/test_data_check_action.py 100.0% <100.0%> (ø)
..._tests/test_data_checks_and_actions_integration.py 100.0% <100.0%> (ø)
evalml/tests/pipeline_tests/test_pipeline_utils.py 99.7% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7b64b9...55a7474. Read the comment docs.

@@ -381,16 +381,16 @@ def _make_component_list_from_actions(actions):
cols_to_drop = []
for action in actions:
if action.action_code == DataCheckActionCode.DROP_COL:
cols_to_drop.append(action.metadata["column"])
cols_to_drop.extend(action.metadata["columns"])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup after the standardization in #2869

fail-fast: false
matrix:
include:
- python_version: "3.8"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to run just one python version for now, but curious if there are other opinions!

make installdeps
make installdeps-test
pip freeze
- name: Erase Coverage
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, surprised I can call coverage erase here. I thought the two workflows would step on each others' toes and we'd end up with a subpar coverage score, but I guess not 🤷‍♀️

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Angela, this looks good to me! This will lead to some difficult questions about what constitutes an integration tests and how we handle all of them, but I think this is a good starting point.

}
)
for component in action_components:
X_t = component.fit_transform(X_t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test highlights how nicely DataCheck/Actions fits into the fit/transform syntax. Very nicely done. I like this, very intuitive.

Comment on lines 117 to 122
X_t = pd.DataFrame(data=data)
X_t.iloc[0, 3] = 1000
X_t.iloc[3, 25] = 1000
X_t.iloc[5, 55] = 10000
X_t.iloc[10, 72] = -1000
X_t.iloc[:, 90] = "string_values"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but can we not just use X? X isn't changing is it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I think you're right--I had made a separate variable in case we were changing X somewhere / wanted to check the original value, but that was probably unnecessarily cautious :). Removing!

@angela97lin angela97lin merged commit 4c9dba3 into main Nov 2, 2021
@angela97lin angela97lin deleted the 2815_dc_integration branch November 2, 2021 19:49
@chukarsten chukarsten mentioned this pull request Nov 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants