Add integration tests for end to end flow for data checks --> data check actions #2883

angela97lin · 2021-10-07T18:29:04Z

Closes #2815, closes #1998.

Along the way, I noticed that our data checks return dictionary forms of the data check messages / actions (since those are more easily serializable), but _make_component_list_from_actions takes in actual data check action objects. Added a helper method to go from one to another. I also noticed it's not super easy to use a list of components, so going to work on #2968 sooner rather than later to make transformations easier :)

I think it's fine to just one python version for now and linux only, but curious if there are other opinions!

codecov · 2021-10-07T18:34:20Z

Codecov Report

Merging #2883 (55a7474) into main (b7b64b9) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #2883     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        309     310      +1     
  Lines      29399   29495     +96     
=======================================
+ Hits       29308   29404     +96     
  Misses        91      91

Impacted Files	Coverage Δ
evalml/data_checks/invalid_targets_data_check.py	`100.0% <ø> (ø)`
...alml/data_checks/target_distribution_data_check.py	`100.0% <ø> (ø)`
evalml/data_checks/data_check_action.py	`100.0% <100.0%> (ø)`
evalml/data_checks/data_check_action_code.py	`100.0% <100.0%> (ø)`
evalml/pipelines/utils.py	`99.5% <100.0%> (ø)`
.../tests/data_checks_tests/test_data_check_action.py	`100.0% <100.0%> (ø)`
..._tests/test_data_checks_and_actions_integration.py	`100.0% <100.0%> (ø)`
evalml/tests/pipeline_tests/test_pipeline_utils.py	`99.7% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7b64b9...55a7474. Read the comment docs.

…2815_dc_integration

…k to action

angela97lin · 2021-10-26T15:23:47Z

evalml/pipelines/utils.py

@@ -381,16 +381,16 @@ def _make_component_list_from_actions(actions):
    cols_to_drop = []
    for action in actions:
        if action.action_code == DataCheckActionCode.DROP_COL:
-            cols_to_drop.append(action.metadata["column"])
+            cols_to_drop.extend(action.metadata["columns"])


Cleanup after the standardization in #2869

angela97lin · 2021-10-27T15:39:43Z

.github/workflows/linux_integration_tests.yml

+      fail-fast: false
+      matrix:
+        include:
+          - python_version: "3.8"


I think it's fine to run just one python version for now, but curious if there are other opinions!

angela97lin · 2021-10-27T15:41:44Z

.github/workflows/linux_integration_tests.yml

+          make installdeps
+          make installdeps-test
+          pip freeze
+      - name: Erase Coverage


Honestly, surprised I can call coverage erase here. I thought the two workflows would step on each others' toes and we'd end up with a subpar coverage score, but I guess not 🤷‍♀️

chukarsten

Angela, this looks good to me! This will lead to some difficult questions about what constitutes an integration tests and how we handle all of them, but I think this is a good starting point.

chukarsten · 2021-11-01T14:41:54Z

evalml/tests/integration_tests/test_data_checks_and_actions_integration.py

+        }
+    )
+    for component in action_components:
+        X_t = component.fit_transform(X_t)


This test highlights how nicely DataCheck/Actions fits into the fit/transform syntax. Very nicely done. I like this, very intuitive.

chukarsten · 2021-11-01T14:48:20Z

evalml/tests/integration_tests/test_data_checks_and_actions_integration.py

+    X_t = pd.DataFrame(data=data)
+    X_t.iloc[0, 3] = 1000
+    X_t.iloc[3, 25] = 1000
+    X_t.iloc[5, 55] = 10000
+    X_t.iloc[10, 72] = -1000
+    X_t.iloc[:, 90] = "string_values"


Maybe I'm missing something, but can we not just use X? X isn't changing is it?

Yeah... I think you're right--I had made a separate variable in case we were changing X somewhere / wanted to check the original value, but that was probably unnecessarily cautious :). Removing!

init

8ac51dc

angela97lin self-assigned this Oct 7, 2021

angela97lin added 18 commits October 14, 2021 17:37

add empty

887d8b7

in the middle of return row removal test

2a3e898

add integration test command and workflow

7fba44d

ignore integration in other core tests

fb2c330

rename and call integration test command

2281c68

Merge branch 'main' into 2815_dc_integration

deb121d

merging

00fc198

Merge branch 'main' into 2815_dc_integration

0d40228

release notes

14d8af2

Merge branch 'main' into 2815_dc_integration

0e799c1

fix integration matrix

c11f33e

Merge branch '2815_dc_integration' of github.com:alteryx/evalml into …

8602e35

…2815_dc_integration

clean up row removal test and add new method to convert from dict bac…

fe1cc9e

…k to action

add tests

38ac09e

add drop rows test

650f7ab

Merge branch 'main' into 2815_dc_integration

ff475f3

linting

b3b6e4b

cleaning up yaml

259c2bf

angela97lin commented Oct 26, 2021

View reviewed changes

angela97lin added 8 commits October 26, 2021 11:28

rename integration tests folder

a45edde

add empty init file

4ffb6f9

try to add coverage

c14ca5e

remove entirely

c491e7c

attempt to remove

c7483f7

move back

c43327c

fix yaml

8c4bd98

add coverage to integration yaml

434b08a

angela97lin added 2 commits October 27, 2021 01:43

makefile

d063aa4

try add erase coverage block

6e88b18

angela97lin commented Oct 27, 2021

View reviewed changes

angela97lin marked this pull request as ready for review October 27, 2021 15:43

angela97lin requested review from freddyaboulton, chukarsten, bchen1116, christopherbunn, dsherry and eccabay October 27, 2021 15:43

chukarsten approved these changes Nov 1, 2021

View reviewed changes

angela97lin added 7 commits November 1, 2021 17:06

merging main

515765e

clean up merging and outdated code

d463c6f

fix tests from merging

9001544

Merge branch 'main' into 2815_dc_integration

0fe1b8c

remove unnecessary assignment from tests:

e49756c

Merge branch 'main' into 2815_dc_integration

e5f9efc

Merge branch 'main' into 2815_dc_integration

55a7474

angela97lin merged commit 4c9dba3 into main Nov 2, 2021

angela97lin deleted the 2815_dc_integration branch November 2, 2021 19:49

chukarsten mentioned this pull request Nov 9, 2021

Release v0.37.0 #3029

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add integration tests for end to end flow for data checks --> data check actions #2883

Add integration tests for end to end flow for data checks --> data check actions #2883

angela97lin commented Oct 7, 2021 •

edited

codecov bot commented Oct 7, 2021 •

edited

angela97lin Oct 26, 2021

angela97lin Oct 27, 2021

angela97lin Oct 27, 2021 •

edited

chukarsten left a comment

chukarsten Nov 1, 2021

chukarsten Nov 1, 2021

angela97lin Nov 2, 2021

Add integration tests for end to end flow for data checks --> data check actions #2883

Add integration tests for end to end flow for data checks --> data check actions #2883

Conversation

angela97lin commented Oct 7, 2021 • edited

codecov bot commented Oct 7, 2021 • edited

Codecov Report

angela97lin Oct 26, 2021

Choose a reason for hiding this comment

angela97lin Oct 27, 2021

Choose a reason for hiding this comment

angela97lin Oct 27, 2021 • edited

Choose a reason for hiding this comment

chukarsten left a comment

Choose a reason for hiding this comment

chukarsten Nov 1, 2021

Choose a reason for hiding this comment

chukarsten Nov 1, 2021

Choose a reason for hiding this comment

angela97lin Nov 2, 2021

Choose a reason for hiding this comment

angela97lin commented Oct 7, 2021 •

edited

codecov bot commented Oct 7, 2021 •

edited

angela97lin Oct 27, 2021 •

edited