Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update DFS primitive matching to use ColumnSchema #1523

Merged
merged 26 commits into from Jul 14, 2021

Conversation

thehomebrewnerd
Copy link
Contributor

@thehomebrewnerd thehomebrewnerd commented Jul 9, 2021

Update DFS primitive matching to use ColumnSchema

Closes #1259
Closes #1263
Closes #1264

This PR updates DFS to use Woodwork ColumnSchemas for primitive matching. In order to generate the same features as before, the logical types for some of the index and foreign key columns in the ecommerce entity set test fixture were updated. These updates break some of our EntitySet tests and can be reverted if we do not need to match the same feature output.

@thehomebrewnerd thehomebrewnerd marked this pull request as draft July 9, 2021 17:17
Copy link
Contributor Author

@thehomebrewnerd thehomebrewnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding a few comments about the changes made in this PR so far.

@@ -50,7 +50,7 @@ class CumCount(TransformPrimitive):
name = "cum_count"
input_types = [[ColumnSchema(semantic_tags={'foreign_key'})],
[ColumnSchema(semantic_tags={'category'})]]
return_type = ColumnSchema(logical_type=ltypes.Integer)
return_type = ColumnSchema(logical_type=ltypes.Integer, semantic_tags={'numeric'})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't include the numeric tag here this primitive won't get picked up by other primitives that use numeric as an input type.

Comment on lines +263 to +266
# cutoff_time_dtype = cutoff_time['time'].dtype.name
# # TODO: refactor for woodwork columns, maybe use ww is_datetime and is_numeric?
# is_numeric = cutoff_time_dtype in PandasTypes._pandas_numerics
# is_datetime = cutoff_time_dtype in PandasTypes._pandas_datetimes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just commented this out for now so the lint tests pass.

Comment on lines 604 to 607
# Don't create transform features for foreign key columns unless any column schema is valid for input
if any('foreign_key' in bf.column_schema.semantic_tags for bf in matching_input):
if not any((input_type == ColumnSchema() or _schemas_equal(input_type, ColumnSchema(semantic_tags={'foreign_key'}))) for input_type in input_types):
continue
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to confirm, but I think we might be able to remove this messy logic if we always require foreign key columns to be Categorical.

Comment on lines 647 to 650
# Don't create groupby transform features for foreign key columns unless any column schema is valid for input
if any('foreign_key' in bf.column_schema.semantic_tags for bf in matching_input):
if not any((input_type == ColumnSchema() or _schemas_equal(input_type, ColumnSchema(semantic_tags={'foreign_key'}))) for input_type in input_types):
continue
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe can remove if foreign key cols are Categorical

Comment on lines 731 to 734
if (any(_schemas_equal(bf.column_schema, ColumnSchema(semantic_tags={'foreign_key', 'numeric'})) for bf in matching_input) and
not any((input_type == ColumnSchema() or _schemas_equal(input_type, ColumnSchema(semantic_tags={'foreign_key'}))) for input_type in input_types)):
# Don't build agg features for numeric foreign key columns unless explicitly allowed
continue
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe can remove if foreign key cols are Categorical

Comment on lines 830 to 831
# TODO: Investigate whether this needs to be a set?
column_schemas=list(input_types))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColumnSchema objects don't currently define a hash function, so they can't be placed in a set. Things seem to work fine without using a set here, but if we need or want a set we will need to add __hash__ to ColumnSchema.

@gsheni gsheni requested a review from rwedge July 12, 2021 15:17
@thehomebrewnerd thehomebrewnerd marked this pull request as ready for review July 12, 2021 19:48
@rwedge rwedge removed their request for review July 12, 2021 21:39
featuretools/primitives/options_utils.py Outdated Show resolved Hide resolved
featuretools/primitives/options_utils.py Outdated Show resolved Hide resolved
featuretools/synthesis/deep_feature_synthesis.py Outdated Show resolved Hide resolved
featuretools/synthesis/deep_feature_synthesis.py Outdated Show resolved Hide resolved
featuretools/synthesis/dfs.py Outdated Show resolved Hide resolved
@rwedge
Copy link
Contributor

rwedge commented Jul 14, 2021

should we make an issue to review DFS behavior once CFM has been implemented?

@thehomebrewnerd
Copy link
Contributor Author

should we make an issue to review DFS behavior once CFM has been implemented?

@rwedge Are you thinking specifically in terms of how we handle foreign key columns in generating features or something else?

@rwedge
Copy link
Contributor

rwedge commented Jul 14, 2021

@rwedge Are you thinking specifically in terms of how we handle foreign key columns in generating features or something else?

Yes and also looking confirming other primitives behave the same and benchmarking how fast DFS runs with the new implementation

@thehomebrewnerd
Copy link
Contributor Author

@rwedge Are you thinking specifically in terms of how we handle foreign key columns in generating features or something else?

Yes and also looking confirming other primitives behave the same and benchmarking how fast DFS runs with the new implementation

Added #1530 to track items related to this.

Copy link
Contributor

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thehomebrewnerd thehomebrewnerd merged commit af195aa into woodwork-integration Jul 14, 2021
@thehomebrewnerd thehomebrewnerd deleted the ww-dfs-matching branch July 14, 2021 20:48
thehomebrewnerd added a commit that referenced this pull request Sep 17, 2021
* Remove add interesting values from Entity (#1269)

* move add interesting values to EntitySet

* update release notes

* add test for verbose output

* update test for better coverage

* coverage update

* remove outdated comments

* rename entity to datatable

* fix release notes

* update logger in test

* fix merge conflicts

* rename datatable_id to entity_id

* update release notes

* Move set_secondary_time_index to EntitySet (#1280)

* move set_secondary_time_index to entityset

* update release notes

* break long line

* merge fixes

* update docs

* fix release notes formatting

* Refactor Relationship Creation (#1370)

* update Relationship init

* refactor add_relationship

* update dostring

* update release notes

* revert files

* test coverage fix

* restrict smart-open

* update code terminology

* allow relationship object for adding relationships

* test error

* add breaking changes to release notes

* update relationship construction

* pr clean up

* update schema version to 6.0.0

* add code examples to release notes

* lots of renaming

* update docs

* Move update_data method from Entity to EntitySet (#1398)

* Move default variable description logic to generate_description (#1403)

* Move time type check (#1400)

* Replace Entity with Woodwork DataFrame (#1405)

* Create separate files for ww changes

* comement out unecessary methods for now

* Allow initalizing an entityset with woodwork dataframes

* Allow adding a dataframe with params

* Get getitem working

* test add dataframe directly

* update repr

* get relationship init working no real checks

* update relationship path methods

* start working on normalize dataframe

* Get secondary_time_index working

* Get normalize dataframe working

* cleanup df usage

* clean  up time index usage

* cleanup comments

* Get update dataframe working

* Add comment

* Move changes to regular entityset file and comment out es tests

* start converting tests to use woodwork

* continue moving over tests

* more tests

* use logical types instead of vtypes

* Use string dtypes for default dtype values

* more test changes

* remove uneccessary files

* clean up comments

* fix rest of non behavior change tests

* get make_ecommerce_entityset and fixture working

* start using es tests - broken koalas

* have child and parent columns have woodwork info

* relationship tests

* Use woodwork typing for time type

* performe inference on column if necessary

* convert remaining possible tests

* fix koalas fixture to handle nans

* start working on last time indexes

* use ww syntax in last_time_indexes

* update names

* small fixes

* fix datetime conversion error

* use ww for test_last_time_index

* get some lti tests running

* Get last time index tests working apart from koalas make index

* cleanup comments

* Cleanup imports

* fix matching index and time index tests

* Only use first column as index if woodwork not initialized

* stop allowing non string column names

* warn if performing type inference on dask and koalas

* xfail koalas make index tests

* Update index reordering test to not care about reordering

* Change logical type of foreign key if it doesn't match the index's

* Continue replacing Entity (#1416)

* warn for extra parameters

* update es_metadata tests - raising a lot of warnings??

* update timedelta tests

* Update dask es tests

* Update koalas es tests

* test update dataframe better

* sort at update_dataframe if necessary

* update column dtype properly

* allow woodwork initialized dataframe at update dataframe

* update sizeof

* update demo functions

* update docstrings

* Fix warnings in tests

* update error messages

* use latlong test with dask and koalas

* clean up comments

* use relationship attrs instead of woodwork name

* use get_df_tags better in tests

* fix reordering of columns in update dataframe

* remove unecessary latlong index setting

* start responding to PR comments

* use relationship attrs in entityset instead of woodwork attrs

* update foreign key usage in koalas and dask test

* More pr comments

* Keep original schema in update dataframe even if ww initialized

* create public and private set secondary time index methods

* fix update_dataframe docstring

* add test for external dataframe set secondary time index

* remove unecessary tests

* Clean up replace Entity Woodwork integration (#1427)

* remove woodwork index tags on relationship cols comment

* remove unecessary metadata setting

* cleanup conftest

* Add time type tests

* lint fix for testing

* clean up normalize dataframe

* clean up variable usage in tests

* Add time type test with double and integer

* reverse order of time type checks

* Add check that primary time index is set on a dataframe before adding secondary time index

* include column metadata and descriptions in normalization

* Store interesting values on column metadata (#1421)

* interesting values work

* update tests

* lint fix

* add test and lint fix

* fix test

* update docstring variable -> column

* update comment

* refactor finding where-able cols

* update comment

* lint fix again

* update docstring

* lint fix

* expand flight ordinal order

* expand docstring of set_Secondary_time_index

* change _parent_dataframe_id to _parent_dataframe_name

* change _child_dataframe_id to _child_dataframe_name

* change _child_column_id to _child_column_name

* change _parent_column_id to _parent_column_name

* change dataframe_id to dataframe_name

* more id to name changes

* change remaining id mentions

* update docstrings in entityset

* consolidate copy and additional columns validation

* look at copy columns for make time index

* lint fix

* confirm column doesnt get removed in copy columns

* Add breaking changes and update release notes

* Revert "lint fix" because of incorrect linting

This reverts commit a585c1f.

* lint fix

* Change woodwork requirements

* remove duplicate error message in dask and koalas tests

* make dataframe_name an optional parameter and require it if woodwork not initialized

* Update demo and mock entitysets to have optional dataframe name

* update remaining tests to use optional dataframe name

* Add parameter check to ltype comparison warning

* raise error for conflicting df names but allow same df name

* Change conflicting name error msg

Co-authored-by: Nate Parsons <4307001+thehomebrewnerd@users.noreply.github.com>

* Use woodwork 0.4.0 (#1451)

* use latest woodwork version

* update woodwork requirement

* lint fix

* fix ltype parameter test

* Add release note

* fix reelease notes

* remove release note

* Update EntitySet.plot to use Woodwork (#1468)

* implement plot on entityset

* messy column schema for columns string

* simpler type string

* format column types better for plot

* Add release note

* Add note to docstring about woodwork typing

* Store last time index as column on DataFrame (#1456)

* set lti as a column at end

* convert last_time_index tests

* update cdata lti test - broken

* store last time indexes in dictionary

* small updates

* clean up comments

* cleanup tests

* Make sure to remove existing ltis for any dfs added to queue

* finish fixing tests

* lint fix

* add broken test

* Add release note

* Add tests and init with correct time type

* add merge to handle dask and koalas

* keep fixing dask and koalas merge

* fix index issues

* add time type consistency back in

* fix duplicate last time index popping

* clean up comments

* lint fix

* update sizeof

* clean up test

* fix dask and koalas tests and expand int_es to use dask and koalas

* explain why apply is needed for time type conversion

* remove lti column in update dataframe if not on new dataframe

* Ochange lti column name and error if user placed

* Add comments

* make lti name a global variable

* expand recalculate lti tests

* pr comment

* Implement deep copy on EntitySet to retain Woodwork typing info (#1465)

* set lti as a column at end

* convert last_time_index tests

* update cdata lti test - broken

* store last time indexes in dictionary

* small updates

* clean up comments

* cleanup tests

* Make sure to remove existing ltis for any dfs added to queue

* finish fixing tests

* lint fix

* add broken test

* Add release note

* Add tests and init with correct time type

* add merge to handle dask and koalas

* keep fixing dask and koalas merge

* fix index issues

* add time type consistency back in

* fix duplicate last time index popping

* clean up comments

* lint fix

* update sizeof

* clean up test

* fix dask and koalas tests and expand int_es to use dask and koalas

* explain why apply is needed for time type conversion

* remove lti column in update dataframe if not on new dataframe

* Ochange lti column name and error if user placed

* Add comments

* make lti name a global variable

* expand recalculate lti tests

* pr comment

* TEMP

* initial deepcopy implementation missing attrs

* fix entityset equality check

* implement deepcopy on entityset

* use deepcopy on fixture

* expand fixture for copy test

* move tests to test_es

* add release notes

* remove comments

* fix spelling error

* bump woodwork version (#1478)

* Replace list_variable_types with list_logical_types (#1477)

* Allow deep equality check on EntitySet (#1480)

* papply deep keyword to entityset equality check

* stick with woodwork equality

* Add release note

* pr comments

* Update query_by_values for Woodwork Integration (#1467)

* initial query_by_values update for ww

* update release notes

* revert accidental concat changes

* pr comments - update wording

* update warning and add test for warning

* dt -> schema in _handle_time

* update variable names

* lint-fix

* qbv test update

* update lti tests

* update test

* lint fix

* WW/FT Serialization Updates (#1452)

* initialize serialization updates

* serialization test updates

* remove commented code

* remove entity wording

* update tests

* koalas update

* remove comments

* pr comment updates

* recreate category dtypes

* update comment

* Merge latest changes from main (#1493)

* Add function to list semantic tags (#1486)

* Update EntitySet.concat to use Woodwork (#1490)

* update concat, broken

* get simple concat working

* uncomment long concat test

* start fixing test

* finish updating test

* start expanding tests

* test sorting entityset

* finish test coverage

* fix warning

* cleanup comments

* add release note

* lint fix

* clean up test

* fix sort index test

* use dataframe type

* lint fix

* implement deepcopy that works with koalas to use for all concat tests

* add checks to sort index tests

* clean up xfails for concat entityset test

* split up large test

* Replace entity_from_dataframe with add_dataframe (#1504)

* update conftest

* update test_feature_set_calculator

* use add_dataframe

* use add_dataframe

* use add_dataframe in docs

* use logical_types

* use dataframe_name

* lint fix

* use logical types param in docs

* replace variable_types with logical_types

* use logical types

* replace variable types with logical types

* replace variable types with logical types

* change to integer

* remove index semantic tag

* replace vtypes with ltypes

* replace vtypes with ltypes

* declare logical type for id

* Rename target entity to target dataframe (#1506)

* replace target_entity with target_dataframe

* fix glossary

* fix handle time parameters

* use references from woodwork

* use index from woodwork

* get index and time index from ww accessor

* fix docstring

* get index from ww

* revert compose changes

* update get_valid_primitives

* update glossary

* use dataframe_dict

* Primitives use Woodwork ColumnSchema for input types and return type (#1411)

* update input and return types for agg primitives

* update binary transform primitive variable types to use woodwork

* replace input and return types with ColumnSchema in cum_transform_feature.py

* update transform primitives to use column schema for input and return type

* replace input type and return types in test files

* replace _get_names_valid_inputs with _get_unique_input_types

* remove entity references in primitive tests

* lint

* add BooleanNullable to  some primitives

* update MulitplyBoolean, And, Or

* update more input/return types

* fix add_dataframe argument order

* add ordinal order for datetime transformations

* lint

* update docstrings of make_x_primitive functions

* fix Not input_types

* remove unused Numeric import

* specify order for Weekday primitive return type

* Woodwork Integration - Features (#1501)

* update input and return types for agg primitives

* update binary transform primitive variable types to use woodwork

* replace input and return types with ColumnSchema in cum_transform_feature.py

* update transform primitives to use column schema for input and return type

* replace input type and return types in test files

* replace _get_names_valid_inputs with _get_unique_input_types

* remove entity references in primitive tests

* lint

* add BooleanNullable to  some primitives

* update MulitplyBoolean, And, Or

* update more input/return types

* fix add_dataframe argument order

* add ordinal order for datetime transformations

* lint

* update docstrings of make_x_primitive functions

* remove entity.py and variables.py

* update FeatureBase to use dataframes

* update feature descriptions to use dataframes

* update aggregation primitive base to use dataframe terminology

* update generate_name in Count primitive

* update tests to use new feature parameters

* add feature_base/utils.py

* a couple more additions to FeatureBase

* ensure cohort_name is categorical

* add category tag to ordinal return types

* update feature visualizer to use dataframe terms

* fix Not input_types

* update primitive tests

* use set operations to simplify check for index columns

* simplify getting index name in get_aggregation_groupby

* fix category semantic tag check in variable_filter

* move replace_latlong_nan out of entity_utils

* fix comparison in test_copy

* check logical type in test_return_type_inference_index

* update var names in test_multi_output_features

* use ColumnSchemas in test_return_variable_types

* more specific TODO for _check_cutoff_time_type

* update __mul__ logic for boolean * boolean

* rename utils.entity_utils to utils.latlong_utils

* update _check_againt_time_column

* update _check_time_against_column

* correct schema access in _check_time_against_column

* Add make_index functionality to Featuretools (#1507)

* initial make index updates

* add make_index logic back to Featuretools

* lint fix

* undo accidental file deletion

* fix file

* update check for warning

* PR feedback updates

* use es.dataframe_type

* fix outdated info in release notes (#1522)

* Remove entity tests (#1521)

* remove check time type

* remove commented out tests

* remove variable ordering test

* remove variable tests

* remove test

* remove commented out imports

* remove file

* Revert "remove check time type"

This reverts commit 6b3c5d3.

* update _check_time_type

* update docstring

* Standardize imports for Woodwork in codebase (#1526)

* direct imports of woodworks in codebase

* sort imports

* use ww_type_system

* Update DFS primitive matching to use ColumnSchema (#1523)

* update dfs primitive matching

* work on test_deep_feature_synthesis tests

* work on dfs tests

* fix more tests

* exclude foriegn key cols from transform feats

* fix Trend

* more test updates

* more test work

* remove files

* fix dfs to match old features

* lint fix

* remove old print statement

* more naming updates

* update handling of foreign key columns

* lots of naming updates

* fix test names

* even more naming updates

* more cleanup and test fixes

* rename return_variable_types to return_types

* fix broken entityset tests

* pr naming updates

* remove unnecssary primitive

* add new _schemas_equal conditions

* lint fix

* Update doc page on using entity sets with Woodwork (#1532)

* refactor to jupyter notebook

* use dataframes

* use dataframe in comments

* remove rst file

* use EntitySet

* add link to Woodwork

* use target_dataframe_name

* update comment on adding relationships

* Updates from featuretools v0.26.0 (#1539)

* Change to use GitHub Token rather than GitHub PAT (#1402)

* Update dependency_check.yml

* Update release_notes.rst

* Update dependency_check.yml

* Use builtin secret token with create pull request (#1407)

* Use builtin secret token with create pull request

* Update release_notes.rst

* Use repo scoped token again (#1409)

* Use repo scoped token again

* Update release_notes.rst

* Update latest_dependencies.txt (#1410)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Lower max depth to 1 if single entity (#1412)

* lower DFS max depth to 1 if single entity

* add test for including seed features with depth greater than max depth

* test max depth=0 doesn't create depth 1 features on single table

* update release notes

* change log entry to user warning and test for warning

* lint

* fix max depth warning in docs

* remove outdated comment

* rework single table assertions to be more readable

* use feature_with_name helper in seed_features test

* lint

* add max_depth=None and max_depth=-1 cases to single table test

* move helper function def out of loop; remove invalid max_depth=None case

* lint

* Drop Python 3.6 support (#1413)

* remove py36 from CI test matrix

* remove warning when importing featuretools about dropping 3.6 support

* remove python 3.6 from setup.py

* remove py36 from list of supported version in installation docs

* remove py36 constraint on dependency

* update release notes

* v0.24.0 (#1414)

* bump version number

* update release notes

* Update latest_dependencies.txt (#1415)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Separate workflows and unit tests (#1422)

* separate workflows

* update release notes

* fix incorrect word

* update numpy req

* separate link check

* remove dask separation

* copy from main

* release notes

* Add minimum dependency generator GitHub Action (#1428)

* add min deps checker

* update release notes

* fix filename

* generate auto PR

* update latest dep check

* file rename

* better release notes

* move to 1 folder

* fix fastparquet?

* Update minimum dependencies (#1431)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Bump pyyaml from 3.12 to 5.4 in /featuretools/tests/requirement_files (#1433)

* Bump pyyaml from 3.12 to 5.4 in /featuretools/tests/requirement_files

Bumps [pyyaml](https://github.com/yaml/pyyaml) from 3.12 to 5.4.
- [Release notes](https://github.com/yaml/pyyaml/releases)
- [Changelog](https://github.com/yaml/pyyaml/blob/master/CHANGES)
- [Commits](yaml/pyyaml@3.12...5.4)

Signed-off-by: dependabot[bot] <support@github.com>

* Update requirements.txt

* Update release_notes.rst

* Update release_notes.rst

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>

* Update nbsphinx version to resolve docs build issue (#1436)

* update release note for test

* update release notes

* pin markupsafe version

* update nbsphinx version and remove markupsafe

* update release notes

* Update latest dependencies (#1437)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update latest dependencies (#1439)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Bump psutil requirement (#1438)

* Bump psutil requirement

* Update release_notes.rst

* Update minimum dependencies (#1443)

* Add unit tests against minimum dependencies (#1432)

* Fix numpy installation for minimum unit tests (#1445)

* Update latest dependencies (#1446)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update latest dependencies (#1448)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* v0.24.1 (#1450)

* Update latest dependencies (#1454)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update latest dependencies (#1455)

* Bump urllib3 from 1.26.4 to 1.26.5 in /featuretools/tests/requirement_files (#1457)

* Bump urllib3 in /featuretools/tests/requirement_files

Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.4 to 1.26.5.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](urllib3/urllib3@1.26.4...1.26.5)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update test-requirements.txt

* Update release_notes.rst

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>

* Update alteryx_open_src_update_checker to 2.0.0 (#1460)

* Update setup.py

* Update __init__.py

* Update release_notes.rst

* Update setup.py

* Update install_test.yml

* double for loop

* Update latest dependencies (#1464)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Add get_valid_primitives function (#1462)

* add function skeleton

* add tests

* add get_valid_primitives and update tests

* add test and fix typo

* update release notes

* add test for non-str invalid primitive

* remove unused code from custom primitives

* lint

* remove unused var names and avoid erroring due to compatibility

* rework compatibility check

* make ft.get_valid_primitives callable, add to API reference, add note to docstring

* make get_entityset_type private

* Bump minimum pip from 19.0.2 to 21.1.2 (#1475)

* Bump pip from 19.0.2 to 19.2 in /featuretools/tests/requirement_files

Bumps [pip](https://github.com/pypa/pip) from 19.0.2 to 19.2.
- [Release notes](https://github.com/pypa/pip/releases)
- [Changelog](https://github.com/pypa/pip/blob/main/NEWS.rst)
- [Commits](pypa/pip@19.0.2...19.2)

---
updated-dependencies:
- dependency-name: pip
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update test-requirements.txt

* Update test-requirements.txt

* Update minimum_test_requirements.txt

* Update release_notes.rst

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>
Co-authored-by: Roy Wedge <roy.wedge@alteryx.com>

* Add dataframe_type property to EntitySet (#1473)

* add dataframe_type property

* remove _get_entityset_type

* update if not pandas entityset checks in tests

* add docstring to dataframe_type

* update release notes

* rework dataframe_type logic

* add test cases

* use dataframe_type in more tests

* remove some unused ks imports

* more test updates

* fix faulty comparison in tests

* v0.25.0 (#1485)

* bump version number

* update release notes

* Update latest dependencies (#1487)

* Update latest dependencies (#1499)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fix docs to avoid logging demos (#1498)

* set testing header to prevent logging

* add library to url

* release notes

* release notes

Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>

* Update latest dependencies (#1500)

* Update latest dependencies (#1502)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update latest dependencies (#1503)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Add replace_inf_values util function (#1505)

* add replace_inf_values util function

* update release notes

* fix release notes

* add optional columns parameter to function

* lint fix

* Test compatibility with upcoming pandas release 1.3.0 (#1492)

* update requirements

* comment at local error

* fix test_transform error

* fix boolean conversion error

* remove requirements change

* fix timezone warning

* fix astype warning and use view

* Add release note

* Update latest dependencies (#1520)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Add URL and Email Address primitives (#1508)

* Update latest dependencies (#1524)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Primitive options include entities overrides exclude entities (#1518)

* update ignore_entity for primitive

* update variable_filter to return True if entity in include_entities

* update release notes

* Update TLD list and add license for email file (#1531)

* add license to primitive data

* update TLD list

* update release notes

* typo

* update TLD list

* v0.26.0 (#1525)

* bump version

* update release notes

* make underline longer

* alphabetize contributors

* Update docs/source/release_notes.rst

* Update docs/source/release_notes.rst

Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>

Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>

* Update latest dependencies (#1534)

* uncomment future release

* replace target_entity in a few tests

* delete test_entity.py again

* fix include_over_exclude test

* put Fixes section back in the changelog

Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>
Co-authored-by: machineFL <49695056+machineFL@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Nate Parsons <4307001+thehomebrewnerd@users.noreply.github.com>
Co-authored-by: Jeff Hernandez <12969559+jeff-hernandez@users.noreply.github.com>
Co-authored-by: Frances Hartwell <frances.hartwell@alteryx.com>
Co-authored-by: Tamar Grey <64278226+tamargrey@users.noreply.github.com>
Co-authored-by: Ethan Tu <34871276+tuethan1999@users.noreply.github.com>

* fix entityset tests (#1548)

* Update DFS page to use Woodwork  (#1557)

* update dfs doc page

* remove old rst file

* clear outputs and remove outdated entity reference

* Update Feature Primitives page to use woodwork (#1556)

* update primitives docs

* revert graphs

* update references

* add link to woodwork type and tag guide

* Refactor add_interesting_values to leverage Woodwork (#1550)

* refactor add_interesting_values

* update getting semantic tags

* update verbose msg

* move 0 check

* use append instead of concat

* Update calculate feature matrix to work with Woodwork (#1533)

* update dfs primitive matching

* work on test_deep_feature_synthesis tests

* work on dfs tests

* fix more tests

* exclude foriegn key cols from transform feats

* fix Trend

* more test updates

* more test work

* remove files

* fix dfs to match old features

* lint fix

* remove old print statement

* more naming updates

* update handling of foreign key columns

* lots of naming updates

* fix test names

* even more naming updates

* more cleanup and test fixes

* rename return_variable_types to return_types

* initial exploration

* work on tests in test_feature_set_calculator

* fix test_feature_set.py

* several more test fixes

* fix more tests

* fix topn test and lint fix

* fix latlong tests

* fix int_es tests

* Revert "fix int_es tests"

This reverts commit ae4d5a6.

* update synthesis tests

* lint fix

* fix primitive tests

* lint fix

* replace usage of entity

* lint fix

* remove more references to entities

* update variable naming

* update nmostcommon related tests

* update cfm docstring

* pr updates

* sub_dataframe -> sub_dataframe_name

* update dask test to use null val

* add new test for n_most_common

* update verbose msg

* fix typo

* wording updates

* dataframe_df_trie -> dataframe_trie

* Update handling time guide to use Woodwork (#1552)

* add handling time guide notebook with woodwork

* clean up notebook to use woodwork

* fix cfm lines and images

* suppress cells

* rename notebook and remove old rst file

* use reST format for raw cells

* update func link

* PR comments

* hide cells with nbsphinx

* PR comments

* Synthesis test fixes (#1580)

* update entityset time_type value

* fix two edge cases with encode features

* Fix remaining broken primitive tests (#1568)

* fix test_direct_features

* initial work on test_agg_feats

* fix test_groupby_transform_primitives

* fix test_features_serializer.py

* some work to fix dask tests

* bump min reqs

* Preserve EntitySet Woodwork schemas on pickling (#1581)

* fix entityset pickle to preserve woodwork schema

* add tests

* lint

* add cluster fixture and use constant for schema key

* use update instead of resetting _dict_

* update fixture name

* Update advanced custom primitives guide to use Woodwork (#1587)

* update custom primitives guide

* use woodwork specific typing info

* add link to ww typing guide and refence ColumnSchema

* Update Deployment docs page (#1588)

* update deployement.rst to ipynb

* add function links

* update link to rst instead of html

* Update Improving Performance Guide (#1591)

* update performance guide to use jupyter notebook and woodwork

* remove entity references and replace entity set with entityset

* Update Using Dask EntitySets Guide (#1590)

* update using Dask EntitySets guide

* PR feedback updates

* Update Specifying Primitive Options guide for Woodwork (#1593)

* update specifying primitive options doc

* clear notebook outputs

* update indentation

* Add Woodwork Typing Guide (#1589)

* create ww typing guide and start writing

* flesh out semantic tags section

* clean up ww types guide

* move to getting started

* revamp guide

* shorten guide

* add links

* pare down and proofread

* replace table usage with DataFrame

* PR comments

* Add to getting started rst

* rework semantic tags section

* Add release note

* clean up wording

* pr comments

* fix typo

* Update api reference to match new api (#1600)

* Updates for WW 0.6.0 and fix other failing tests (#1597)

* update requirements

* fix selection tests

* fix test_ww_es.py

* fix logical type comparisons

* fix test_encode_features.py

* fix test_deep_feature_synthesis.py

* fix test_feature_set_calculator.py

* fix test_dask_es.py

* fix test in test_es.py

* fix selection test

* add test skips

* update requirements

* update requirements

* update dask reqs

* more requirements updates

* bump pandas min version

* bump koalas version

* eliminate _schemas_equal func

* fix post merge issues

* fix release notes

* Update index doc to use Woodwork (#1602)

* Create notebook and add contents from index.rst

* walk through cells and update to use markdown and woodwork

* hide and format raw cells

* remove index rst file

* Pr comments

* Fix DFSTransformer Documentation (#1605)

* update featuretools-sklearn-transformer to install from branch

* update version to 1.0.0.dev0

* Update feature description guide to use Woodwork (#1603)

* create feature description notebook and move contents from rst file

* superficial updating of code and language

* make sure outputs are as expected and that language makes more sense for woodwork

* format links and headers

* remove comments and hide cell

* remove rst file

* PR comments

* reword warning about getitem usage

* Update Koalas Guide to use Woodwork (#1604)

* Add koalas guide notebook and add contents from rst file

* hide cell and show rst dropdown in metadata

* update to use woodwork language

* clean up

* remove rst file

* change varable_name to column_name

* PR Comments

* fix link to woodwork guide

* remove extra spaces

* Update Glossary with Woodwork terms (#1608)

* update glossary page

* add logical type and semantic tags to glossary

* small updates and add ColumnSchema

* remove old todos

* woodwork's column -> woodwork column's

* Update tuning dfs guide to use Woodwork (#1610)

* Move contents of rst file into jupyter notebook

* get code running

* update wording to use woodwork

* clean up

* Proof read

* remove rst file

* use lower case feature

* update nlp-primitives requirement (#1609)

* Remove more references to entity, entities, variable, var (#1612)

* update flight.py

* update retail.py

* update entity and entityset wording

* more entity updates

* rename variable to colum

* lint fix

* rename var to col

* remove variable usage

* more wording updates

* remove graph variable types related code

* add comment back

* Fix small formatting issues around Woodwork docs (#1607)

* fix code in docstrings

* Fix dataframes dict formatting in docstrings

* fix links in handling time

* fix link in primitives doc

* fix link to ww guide in advanced custom primitives

* fix linkss from rreferencing rst files to notebooks

* fix add lti docstring

* use ref anchor instead of doc

* fix formatting

* fix typo

* lint fix

* remove variables doc and reference to variables (#1629)

* Remove categorical encoding library and CI test (#1632)

* remove categorical_encoding

* update release notes

* Remove autonormalize add-on library and CI test (#1636)

* Update install_test.yml

* Update install.rst

* Update setup.py

* Update release_notes.rst

* Update dev-requirements.txt

* remove faq autonormalize q

* Update release_notes.rst

* Remove tsfresh, nlp_primitives, sklearn_transformer add-on library and CI test (#1638)

* remove add ons

* Update dev-requirements.txt

* Update release_notes.rst

* remove docs DFStransformer

* Update api_reference.rst

* Use make index to re-create index on new DataFrame in EntitySet.replace_dataframe (#1630)

* Add ability to create index at updat_dataframe

* change update_dataframe to replace_dataframe

* update docstrings

* expand docstring

* fix release notes

* dont raise warning if index is present and split test

* Revert changes to Equal and NotEqual primitives (#1640)

* update flight.py

* update retail.py

* revert changes to Equal and NotEqual primitives

* Update Feature Selection Page with Woodwork Dataframe (#1618)

* Changes to the feature selection doc

* cleared outputs

* Fixed woodwork initialization issues

* update docs

* clear notebook

* stop skipping correlated check

* test woodwork init in highly correlated

* update release notes

* PR comments

* add note about ww init to docstring

* change to rst cell

Co-authored-by: Tamar Grey <tamar.grey@alteryx.com>

* Merge in latest from main branch (#1643)

* Specify conda channel and Windows exe in graphviz installation instructions (#1611)

* Update latest dependencies (#1615)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update latest dependencies (#1616)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Remove GA from documentation layout html (#1622)

* Update layout.html

* Update release_notes.rst

* v0.26.2 (#1628)

* v0.26.2

* Update release_notes.rst

Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>

* update release notes

Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>
Co-authored-by: machineFL <49695056+machineFL@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Pranav Simha <pranav0521@gmail.com>

* Update to use Woodwork 0.7.1 (#1648)

* update requirements to 0.7.0

* Add comments and use full schema init

* switch back to partial schema init with unique index bug and test mismatched index bug fix

* remove redundant dataframe operations in replace_dataframe

* init in last time index with partial schema

* use AgeFractional primitive

* update requirements.txt

* update min requirements

* PR comments

* Update CumCount primitive (#1651)

* update CumCount primitive

* use IntegerNullable return type

* update release notes

* Create features from Woodwork columns (#1582)

* add reference

* track dataframe name

* refactor feature base

* fix refactor

* remove extra line

* update api for identity feature

* use api for identity feature

* refactor api for feature

* refactor api for feature class

* refactor api for feature class

* fix syntax

* add es ref to df

* fix feature base init arg

* fix syntax

* fix transform super init df

* refactor to private method

* refactor api for identity feature

* fix api for direct feature

* add references when updating df

* refactor api for feature

* store entityset reference keys in metadata

* refactor feature check

* check feature in groupby transform

* add references after last time index

* refactor api for feature

* update entityset ref

* refactor api for feature

* lint fix

* use global es ref

* use _validate_base_features

* update feature base docstring

* reference column and foreign key

* remove double call of feature

* use column reference

* update notebook cells

* update notebook cell

* add release notes entry (include missing)

* Update FAQ with Woodwork (#1649)

* update cells to new api

* use new api

* add question to faq

* use woodwork references

* update comments

* remove usage of entity and entities

* fix typo

* update faq question

* fix bullet points

* fix link

* fix grammar

* shorten sentence

* update answer for dask df

* update to numeric and boolean values

* remove then

* include count in agg and where primitives

* update sentence

* reorder cells

* fix grammar

* add comment about semantic tags

* add placeholder link

* update error message

* clarify sentence

* update comments

* fix grammar

* update comments

* add link

* fix link

* add release note entry

* Add missing release notes (#1663)

* Add pr number for breaking changes

* start adding missing release notes

* finish adding missing release notes

* PR Comments

* Merge updates from main (#1666)

* Specify conda channel and Windows exe in graphviz installation instructions (#1611)

* Update latest dependencies (#1615)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update latest dependencies (#1616)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Remove GA from documentation layout html (#1622)

* Update layout.html

* Update release_notes.rst

* v0.26.2 (#1628)

* v0.26.2

* Update release_notes.rst

Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>

* Remove add on libraries from install docs, setup.py and CI tests (#1644)

* remove add ons

* update release notes

* fix dev req

* fix api ref

* add scikit-learn to dev reqs

* Update latest dependency checker with proper install comment (#1652)

* Update latest_dependency_checker.yml

* Update release_notes.rst

* Update release_notes.rst

* Update latest_dependency_checker.yml

* Isort 5 (#1654)

* Update isort requirement

* no more isort --recursive (deprecated)

* Update release_notes.rst

* Update latest dependencies (#1653)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>

* Update primitives loading to be more robust (#1662)

* Update primitives loading to be more robust

* Emit warning when primitives entry point throws error
* Prevent overwriting of names in primitive namespace

* Update release notes

* Add no cover to entry_point loop

* v0.27.0 (#1665)

* bump version

* update release notes

* add contributor

* fix release notes

* fix release notes

Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>
Co-authored-by: machineFL <49695056+machineFL@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Pranav Simha <pranav0521@gmail.com>
Co-authored-by: Roy Wedge <roy.wedge@alteryx.com>
Co-authored-by: David Sanders <david.sanders@alteryx.com>
Co-authored-by: Jeff Hernandez <12969559+jeff-hernandez@users.noreply.github.com>

* Add Version 1.0 Transition Guide (#1627)

* update flight.py

* update retail.py

* start working on guide

* move guide to resources

* more transition guide work on entitysets

* update primitives and other info sections

* update feature section

* update dfs and cfm section

* final draft updates

* fix various spelling and capitalization issues

* improve wording and hide cell

* more pr clean up

* additional context

* relationship wording

* various PR fixes and additions

* Update docs/source/resources/transition_to_ft_v1.0.ipynb

Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>

* Update docs/source/resources/transition_to_ft_v1.0.ipynb

Co-authored-by: Jeff Hernandez <12969559+jeff-hernandez@users.noreply.github.com>

* update link

* move mapping and add link

* update release notes

* update why make these changes

* update what has changed section

* remove old comment

* add table of significant changes

* remove blank line

* remove links

* remove code formatting

* fix table

* uncomment code

* remove code formatting from tables

Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>
Co-authored-by: Jeff Hernandez <12969559+jeff-hernandez@users.noreply.github.com>

* update release notes

* update release notes

* Calculate feature matrix returns woodwork dataframe (#1664)

* initialize ww on feature matrix

* lint

* add origin attribute

* init ww on each partial feature matrix; update test answers

* update primitive return logical types

* fix some approximate tests

* fix for Koalas append issue

* bin cutoff times expects ww dataframe

* make helper and fix concat in parallel_calcluate_chunks

* make a copy of semantic tags before modifying

* various test fixes + lint fix + revert to_pandas changes

* copy semantic tags

* fix test_boolean_multiply

* fix test_koalas_dfs.py

* make CumCount intergernullable

* add more to_pandas casts to test

* fix test_no_data_for_cutoff_time

* lint fix

* init ww on encoded feature matrix

* lint fix

* use None return type for NMostCommon

* fix test_init_and_name

* swap possible input order of PerentTrue to fix test

* make answer data a dataframe for easier comparison

* fix dask single table test

* fix test_transform_consistency

* fix test_dask_entityset_secondary_time_index

* fix test_approximate_features test

* fix test_features_only

* fix category dtype check

* new s3 urls for serialized objects

* update to Week ordinal to account for 53 week years

* add docstring and default args to get_ww_types

* update woodwork syntax in get_ww_types_from_features

* ordinal prims: fix order, specify order param

* calculate_chunk: single concat and then init ww

* update release notes

* fix range in hour primitive

* encode_features: use defaults in get_ww_types_from_features

* update label leakage example in docs faq

* make IsWeekend return type BooleanNullable

* enable dask test for test_concat_with_lti

* remove skip from koalas test

* include labels in test feature matrix

Co-authored-by: Nate Parsons <nate.parsons@alteryx.com>

* Fix typos in transition guide (#1672)

* fix typos in transition guide

* Add release note

* Fix foreign_key tag bug (#1675)

* Add banner to all docs pages about upcoming 1.0 release (#1669)

* add banner about FT1.0

* update release notes

* Update docs/source/templates/layout.html

Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>

* update banner message

* update transition guide link

* update wording

Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>

* v0.27.1 (#1671)

* v0.27.1

* update setup.py

* fix foreign key tag bug

* update release notes

Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>

* Merge code from main removing old code (#1679)

* Automated Latest Dependency Updates (#1673)

* Remove old categorical code (#1677)

* remove old code

* update release notes

* spelling error

* fix merge issue

* reorg release notes

* lint fix?

* move future release

Co-authored-by: machineFL <49695056+machineFL@users.noreply.github.com>

* Remove unused utility functions (#1683)

* remove unused _dataframes_equal function

* remove unused camel_to_snake func

* update release notes

* lint fix

* Update WW version to 0.8.0 (#1689)

* bump woodwork requirement

* update release notes

* update other requirements files

* Encode features - remove typecasting loop now handled by ww (#1694)

* remove redundant coercion; woodwork init will cover this

* update release notes

* Update DFS to not build features on last time index columns (#1695)

* bump woodwork requirement

* don't build features on lti columns

* update release notes

* skip lti col in _add_identity_features instead

* Review comments and commented code and clean up (#1701)

* initial comment clean up

* more clean up

* update release notes

* spelling fix

* fix vlaues

* Encode Features - prefer runtime over space if not inplace (#1699)

* encode feats - concat once and skip drop if not inplace

* use existing ww schema to skip infer on unchanged columns

* update release notes

* request the columns dictionary once

* Bump Woodwork min version to 0.8.1 (#1702)

* bump ww min version to 0.8.1

* update release notes

* fix koalas file

* combine release notes sections (#1703)

* fix README dfs param

Co-authored-by: Jeff Hernandez <12969559+jeff-hernandez@users.noreply.github.com>
Co-authored-by: Roy Wedge <roy.wedge@alteryx.com>
Co-authored-by: Tamar Grey <64278226+tamargrey@users.noreply.github.com>
Co-authored-by: Gaurav Sheni <gvsheni@gmail.com>
Co-authored-by: machineFL <49695056+machineFL@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Frances Hartwell <frances.hartwell@alteryx.com>
Co-authored-by: Ethan Tu <34871276+tuethan1999@users.noreply.github.com>
Co-authored-by: Pranav Simha <pranav0521@gmail.com>
Co-authored-by: Tamar Grey <tamar.grey@alteryx.com>
Co-authored-by: David Sanders <david.sanders@alteryx.com>
@thehomebrewnerd thehomebrewnerd mentioned this pull request Sep 17, 2021
@thehomebrewnerd thehomebrewnerd mentioned this pull request Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants