Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and update test fixtures #1382

Merged
merged 10 commits into from Apr 5, 2021
Merged

Refactor and update test fixtures #1382

merged 10 commits into from Apr 5, 2021

Conversation

thehomebrewnerd
Copy link
Contributor

Refactor and update test fixtures

Closes #1364

This PR refactors tests fixtures to avoid direct assignment of Entity.df, which will not be possible when a Woodwork dataframe is used to replace Entity.

During review of the test fixtures, two other fixtures were found to not be using Dask and/or Koalas entityset versions so these fixtures were also updated to include the Dask/Koalas versions.

@thehomebrewnerd
Copy link
Contributor Author

@rwedge Although the Woodwork integration work prompted the creation of this PR, nothing in this breaks the current implementation, and a couple of the fixture parameterization updates could be beneficial for the current code base. I am requesting a merge into main instead of the woodwork-integrations branch as a result.

@thehomebrewnerd thehomebrewnerd marked this pull request as draft April 1, 2021 21:23
@@ -161,7 +163,6 @@ def read_entity_data(description, path):
else:
error = 'must be one of the following formats: {}'
raise ValueError(error.format(', '.join(FORMATS)))
dtypes = description['loading_info']['properties']['dtypes']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the way the Dask entityset was created was causing a Dask error in test_to_csv. Dask was inferring the log entity zipcode column as int64, but it should be an object column based on the data it contains. The changes here were needed to tell Dask what the column types should be when reading the CSV, so it doesn't try to infer them. This is probably an overall safer way to read the file anyway since we know from serialization what the column dtypes should be.

I'm not exactly sure why changing the entityset construction highlighted this, but it did...

Also, I had to change the way the dtypes variable was created above because one of the tests for invalid input does not contain the properties entry in loading_info so using the old selection method resulted in a KeyError.

@thehomebrewnerd thehomebrewnerd marked this pull request as ready for review April 2, 2021 13:47
@gsheni gsheni requested a review from rwedge April 2, 2021 13:53
@codecov
Copy link

codecov bot commented Apr 2, 2021

Codecov Report

Merging #1382 (f23a5ad) into main (ca05bee) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1382   +/-   ##
=======================================
  Coverage   98.58%   98.58%           
=======================================
  Files         135      135           
  Lines       14546    14578   +32     
=======================================
+ Hits        14340    14372   +32     
  Misses        206      206           
Impacted Files Coverage Δ
featuretools/entityset/deserialize.py 100.00% <100.00%> (ø)
featuretools/tests/conftest.py 100.00% <100.00%> (ø)
featuretools/tests/entityset_tests/test_es.py 100.00% <100.00%> (ø)
...s/tests/primitive_tests/test_transform_features.py 98.80% <100.00%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ca05bee...f23a5ad. Read the comment docs.

Comment on lines 203 to 204
# if ks and any(isinstance(e.df, ks.DataFrame) for e in simple_es.entities):
# pytest.xfail("Koalas does not support categorical dtype")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Removed.

Copy link
Contributor

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@thehomebrewnerd thehomebrewnerd merged commit 95a1f56 into main Apr 5, 2021
@thehomebrewnerd thehomebrewnerd deleted the ww-update-fixtures branch April 5, 2021 17:33
@rwedge rwedge mentioned this pull request Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update test fixtures to avoid direct assignment of Entity.df
2 participants