Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Woodwork #1277

Merged
merged 110 commits into from Sep 17, 2021
Merged

Integrate Woodwork #1277

merged 110 commits into from Sep 17, 2021

Conversation

thehomebrewnerd
Copy link
Contributor

@thehomebrewnerd thehomebrewnerd commented Dec 15, 2020

This PR is being opened in draft mode to help track when the woodwork-integration branch is out of date with main and needs to be updated.

Changes needed to support Woodwork integration in Featuretools should be merged into the woodwork-integration branch first. Once all significant changes are complete, the woodwork-integration branch can be merged into main after reviews are complete.

MOVED LIST OF CHANGES TO ISSUE #1257

* move add interesting values to EntitySet

* update release notes

* add test for verbose output

* update test for better coverage

* coverage update

* remove outdated comments

* rename entity to datatable

* fix release notes

* update logger in test

* fix merge conflicts

* rename datatable_id to entity_id
@thehomebrewnerd thehomebrewnerd marked this pull request as draft December 15, 2020 20:04
@codecov
Copy link

codecov bot commented Dec 15, 2020

Codecov Report

Merging #1277 (a4c3648) into main (1bd6568) will increase coverage by 0.04%.
The diff coverage is 99.60%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1277      +/-   ##
==========================================
+ Coverage   98.63%   98.67%   +0.04%     
==========================================
  Files         140      138       -2     
  Lines       14989    15336     +347     
==========================================
+ Hits        14784    15133     +349     
+ Misses        205      203       -2     
Impacted Files Coverage Δ
featuretools/entityset/api.py 100.00% <ø> (ø)
...ools/primitives/base/aggregation_primitive_base.py 100.00% <ø> (ø)
featuretools/primitives/base/primitive_base.py 96.34% <ø> (ø)
...uretools/tests/computational_backend/test_utils.py 100.00% <ø> (ø)
featuretools/tests/conftest.py 100.00% <ø> (ø)
featuretools/tests/demo_tests/test_demo_data.py 100.00% <ø> (ø)
featuretools/tests/entityset_tests/test_dask_es.py 100.00% <ø> (ø)
featuretools/tests/entityset_tests/test_es.py 100.00% <ø> (ø)
...uretools/tests/entityset_tests/test_es_metadata.py 100.00% <ø> (ø)
...aturetools/tests/entityset_tests/test_koalas_es.py 100.00% <ø> (ø)
... and 125 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1bd6568...a4c3648. Read the comment docs.

@thehomebrewnerd
Copy link
Contributor Author

Due to Woodwork redesign work that will change the API, this work is being put on hold. This branch (woodwork-integration) contains changes that will likely be needed once the Woodwork integration work resumes. After the Woodwork redesign is complete, the integration work can be restarted from this branch.

@thehomebrewnerd
Copy link
Contributor Author

Opening to restart work on integrating Woodwork.

thehomebrewnerd and others added 5 commits April 8, 2021 16:06
* update Relationship init

* refactor add_relationship

* update dostring

* update release notes

* revert files

* test coverage fix

* restrict smart-open

* update code terminology

* allow relationship object for adding relationships

* test error

* add breaking changes to release notes

* update relationship construction

* pr clean up

* update schema version to 6.0.0

* add code examples to release notes

* lots of renaming

* update docs
tamargrey and others added 3 commits May 24, 2021 14:49
* Create separate files for ww changes

* comement out unecessary methods for now

* Allow initalizing an entityset with woodwork dataframes

* Allow adding a dataframe with params

* Get getitem working

* test add dataframe directly

* update repr

* get relationship init working no real checks

* update relationship path methods

* start working on normalize dataframe

* Get secondary_time_index working

* Get normalize dataframe working

* cleanup df usage

* clean  up time index usage

* cleanup comments

* Get update dataframe working

* Add comment

* Move changes to regular entityset file and comment out es tests

* start converting tests to use woodwork

* continue moving over tests

* more tests

* use logical types instead of vtypes

* Use string dtypes for default dtype values

* more test changes

* remove uneccessary files

* clean up comments

* fix rest of non behavior change tests

* get make_ecommerce_entityset and fixture working

* start using es tests - broken koalas

* have child and parent columns have woodwork info

* relationship tests

* Use woodwork typing for time type

* performe inference on column if necessary

* convert remaining possible tests

* fix koalas fixture to handle nans

* start working on last time indexes

* use ww syntax in last_time_indexes

* update names

* small fixes

* fix datetime conversion error

* use ww for test_last_time_index

* get some lti tests running

* Get last time index tests working apart from koalas make index

* cleanup comments

* Cleanup imports

* fix matching index and time index tests

* Only use first column as index if woodwork not initialized

* stop allowing non string column names

* warn if performing type inference on dask and koalas

* xfail koalas make index tests

* Update index reordering test to not care about reordering

* Change logical type of foreign key if it doesn't match the index's

* Continue replacing Entity (#1416)

* warn for extra parameters

* update es_metadata tests - raising a lot of warnings??

* update timedelta tests

* Update dask es tests

* Update koalas es tests

* test update dataframe better

* sort at update_dataframe if necessary

* update column dtype properly

* allow woodwork initialized dataframe at update dataframe

* update sizeof

* update demo functions

* update docstrings

* Fix warnings in tests

* update error messages

* use latlong test with dask and koalas

* clean up comments

* use relationship attrs instead of woodwork name

* use get_df_tags better in tests

* fix reordering of columns in update dataframe

* remove unecessary latlong index setting

* start responding to PR comments

* use relationship attrs in entityset instead of woodwork attrs

* update foreign key usage in koalas and dask test

* More pr comments

* Keep original schema in update dataframe even if ww initialized

* create public and private set secondary time index methods

* fix update_dataframe docstring

* add test for external dataframe set secondary time index

* remove unecessary tests

* Clean up replace Entity Woodwork integration (#1427)

* remove woodwork index tags on relationship cols comment

* remove unecessary metadata setting

* cleanup conftest

* Add time type tests

* lint fix for testing

* clean up normalize dataframe

* clean up variable usage in tests

* Add time type test with double and integer

* reverse order of time type checks

* Add check that primary time index is set on a dataframe before adding secondary time index

* include column metadata and descriptions in normalization

* Store interesting values on column metadata (#1421)

* interesting values work

* update tests

* lint fix

* add test and lint fix

* fix test

* update docstring variable -> column

* update comment

* refactor finding where-able cols

* update comment

* lint fix again

* update docstring

* lint fix

* expand flight ordinal order

* expand docstring of set_Secondary_time_index

* change _parent_dataframe_id to _parent_dataframe_name

* change _child_dataframe_id to _child_dataframe_name

* change _child_column_id to _child_column_name

* change _parent_column_id to _parent_column_name

* change dataframe_id to dataframe_name

* more id to name changes

* change remaining id mentions

* update docstrings in entityset

* consolidate copy and additional columns validation

* look at copy columns for make time index

* lint fix

* confirm column doesnt get removed in copy columns

* Add breaking changes and update release notes

* Revert "lint fix" because of incorrect linting

This reverts commit a585c1f.

* lint fix

* Change woodwork requirements

* remove duplicate error message in dask and koalas tests

* make dataframe_name an optional parameter and require it if woodwork not initialized

* Update demo and mock entitysets to have optional dataframe name

* update remaining tests to use optional dataframe name

* Add parameter check to ltype comparison warning

* raise error for conflicting df names but allow same df name

* Change conflicting name error msg

Co-authored-by: Nate Parsons <4307001+thehomebrewnerd@users.noreply.github.com>
* use latest woodwork version

* update woodwork requirement

* lint fix

* fix ltype parameter test

* Add release note

* fix reelease notes

* remove release note
* implement plot on entityset

* messy column schema for columns string

* simpler type string

* format column types better for plot

* Add release note

* Add note to docstring about woodwork typing
thehomebrewnerd and others added 20 commits September 1, 2021 12:14
Merge change from main - attempt 2
* initialize ww on feature matrix

* lint

* add origin attribute

* init ww on each partial feature matrix; update test answers

* update primitive return logical types

* fix some approximate tests

* fix for Koalas append issue

* bin cutoff times expects ww dataframe

* make helper and fix concat in parallel_calcluate_chunks

* make a copy of semantic tags before modifying

* various test fixes + lint fix + revert to_pandas changes

* copy semantic tags

* fix test_boolean_multiply

* fix test_koalas_dfs.py

* make CumCount intergernullable

* add more to_pandas casts to test

* fix test_no_data_for_cutoff_time

* lint fix

* init ww on encoded feature matrix

* lint fix

* use None return type for NMostCommon

* fix test_init_and_name

* swap possible input order of PerentTrue to fix test

* make answer data a dataframe for easier comparison

* fix dask single table test

* fix test_transform_consistency

* fix test_dask_entityset_secondary_time_index

* fix test_approximate_features test

* fix test_features_only

* fix category dtype check

* new s3 urls for serialized objects

* update to Week ordinal to account for 53 week years

* add docstring and default args to get_ww_types

* update woodwork syntax in get_ww_types_from_features

* ordinal prims: fix order, specify order param

* calculate_chunk: single concat and then init ww

* update release notes

* fix range in hour primitive

* encode_features: use defaults in get_ww_types_from_features

* update label leakage example in docs faq

* make IsWeekend return type BooleanNullable

* enable dask test for test_concat_with_lti

* remove skip from koalas test

* include labels in test feature matrix

Co-authored-by: Nate Parsons <nate.parsons@alteryx.com>
* fix typos in transition guide

* Add release note
Merge in latest changes from main
* Add banner to all docs pages about upcoming 1.0 release (#1669)

* add banner about FT1.0

* update release notes

* Update docs/source/templates/layout.html

Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>

* update banner message

* update transition guide link

* update wording

Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>

* v0.27.1 (#1671)

* v0.27.1

* update setup.py

* fix foreign key tag bug

* update release notes

Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>
* Automated Latest Dependency Updates (#1673)

* Remove old categorical code (#1677)

* remove old code

* update release notes

* spelling error

* fix merge issue

* reorg release notes

* lint fix?

* move future release

Co-authored-by: machineFL <49695056+machineFL@users.noreply.github.com>
Merge latest from main - attempt 2
* remove unused _dataframes_equal function

* remove unused camel_to_snake func

* update release notes

* lint fix
* bump woodwork requirement

* update release notes

* update other requirements files
* remove redundant coercion; woodwork init will cover this

* update release notes
* bump woodwork requirement

* don't build features on lti columns

* update release notes

* skip lti col in _add_identity_features instead
* initial comment clean up

* more clean up

* update release notes

* spelling fix

* fix vlaues
* encode feats - concat once and skip drop if not inplace

* use existing ww schema to skip infer on unchanged columns

* update release notes

* request the columns dictionary once
* bump ww min version to 0.8.1

* update release notes

* fix koalas file
@thehomebrewnerd thehomebrewnerd changed the title Integrate Woodwork [DO NOT MERGE] Integrate Woodwork Sep 17, 2021
@thehomebrewnerd thehomebrewnerd marked this pull request as ready for review September 17, 2021 16:27
README.md Outdated Show resolved Hide resolved
@thehomebrewnerd thehomebrewnerd merged commit 9cc5e24 into main Sep 17, 2021
@thehomebrewnerd thehomebrewnerd mentioned this pull request Sep 17, 2021
@thehomebrewnerd thehomebrewnerd mentioned this pull request Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants