- Future Release
- Enhancements
- Support use of Koalas DataFrames in entitysets (
1031
) - Add feature selection functions for null, correlated, and single value features (
1126
)
- Support use of Koalas DataFrames in entitysets (
- Fixes
- Fix
encode_features
converting excluded feature columns to a numeric dtype (1123
) - Improve performance of unused primitive check in dfs (
1140
)
- Fix
- Changes
- Remove the ability to stack transform primitives (
1119
) - Sort primitives passed to
dfs
to get consistent ordering of features* (1119
)
- Remove the ability to stack transform primitives (
- Documentation Changes
- Added return values to dfs and calculate_feature_matrix (
1125
)
- Added return values to dfs and calculate_feature_matrix (
- Testing Changes
- Better test case for normalizing from no time index to time index (
1113
)
- Better test case for normalizing from no time index to time index (
* When passing multiple instances of a primitive built with
make_trans_primitive
ormaxe_agg_primitive
, those instances must have the same relative order when passed todfs
to ensure a consistent ordering of features.Thanks to the following people for contributing to this release:
tamargrey
,gsheni
,rwedge
,frances-h
,tuethan1999
,thehomebrewnerd
Breaking Changes
ft.dfs
will no longer build features from Transform primitives where one of the inputs is a Transform feature, a GroupByTransform feature, or a Direct Feature of a Transform / GroupByTransform feature. This will make some features that would previously be generated byft.dfs
only possible if explicitly specified inseed_features
.
- v0.18.1 Aug 12, 2020
- Fixes
- Fix
EntitySet.plot()
when given a dask entityset (1086
)
- Fix
- Changes
- Use
nlp-primitives[complete]
install fornlp_primitives
extra insetup.py
(1103
)
- Use
- Documentation Changes
- Fix broken downloads badge in README.md (
1107
)
- Fix broken downloads badge in README.md (
- Testing Changes
- Use CircleCI matrix jobs in config to trigger multiple runs of same job with different parameters (
1105
)
- Use CircleCI matrix jobs in config to trigger multiple runs of same job with different parameters (
Thanks to the following people for contributing to this release:
gsheni
,systemshift
,thehomebrewnerd
- v0.18.0 July 31, 2020
- Enhancements
- Warn user if supplied primitives are not used during dfs (
1073
)
- Warn user if supplied primitives are not used during dfs (
- Fixes
- Use more consistent and uniform warnings (
1040
) - Fix issue with missing instance ids and categorical entity index (
1050
) - Remove warnings.simplefilter in feature_set_calculator to un-silence warnings (
1053
) - Fix feature visualization for features with '>' or '<' in name (
1055
) - Fix boolean dtype mismatch between encode_features and dfs and calculate_feature_matrix (
1082
) - Update primitive options to check reversed inputs if primitive is commutative (
1085
) - Fix inconsistent ordering of features between kernel restarts (
1088
)
- Use more consistent and uniform warnings (
- Changes
- Make DFS match
TimeSince
primitive with allDatetime
types (1048
) - Change default branch to
main
(1038
) - Raise TypeError if improper input is supplied to
Entity.delete_variables()
(1064
) - Updates for compatibility with pandas 1.1.0 (
1079
,1089
) - Set pandas version to pandas>=0.24.1,<2.0.0. Filter pandas deprecation warning in Week primitive. (
1094
)
- Make DFS match
- Documentation Changes
- Remove benchmarks folder (
1049
) - Add custom variables types section to variables page (
1066
)
- Remove benchmarks folder (
- Testing Changes
- Add fixture for
ft.demo.load_mock_customer
(1036
) - Refactor Dask test units (
1052
) - Implement automated process for checking critical dependencies (
1045
,1054
,1081
) - Don't run changelog check for release PRs or automated dependency PRs (
1057
) - Fix non-deterministic behavior in Dask test causing codecov issues (
1070
)
- Add fixture for
Thanks to the following people for contributing to this release:
frances-h
,gsheni
,monti-python
,rwedge
,systemshift
,tamargrey
,thehomebrewnerd
,wsankey
- v0.17.0 June 30, 2020
- Enhancements
- Add
list_variable_types
andgraph_variable_types
for Variable Types (1013
) - Add
graph_feature
to generate a feature lineage graph for a given feature (1032
)
- Add
- Fixes
- Improve warnings when using a Dask dataframe for cutoff times (
1026
) - Error if attempting to add entityset relationship where child variable is also child index (
1034
)
- Improve warnings when using a Dask dataframe for cutoff times (
- Changes
- Remove
Feature.get_names
(1021
) - Remove unnecessary
pd.Series
andpd.DatetimeIndex
calls from primitives (1020
,1024
) - Improve cutoff time handling when a single value or no value is passed (
1028
) - Moved
find_variable_types
to Variable utils (1013
)
- Remove
- Documentation Changes
- Add page on Variable Types to describe some Variable Types, and util functions (
1013
) - Remove featuretools enterprise from documentation (
1022
) - Add development install instructions to contributing.md (
1030
)
- Add page on Variable Types to describe some Variable Types, and util functions (
- Testing Changes
- Add
required
flag to CircleCI codecov upload command (1035
)
- Add
Thanks to the following people for contributing to this release:
frances-h
,gsheni
,kmax12
,rwedge
,thehomebrewnerd
,tuethan1999
Breaking Changes
- Removed
Feature.get_names
,Feature.get_feature_names
should be used instead
- v0.16.0 June 5, 2020
- Enhancements
- Support use of Dask DataFrames in entitysets (
783
) - Add
make_index
when initializing an EntitySet by passing in anentities
dictionary (1010
) - Add ability to use primitive classes and instances as keys in primitive_options dictionary (
993
)
- Support use of Dask DataFrames in entitysets (
- Fixes
- Cleanly close tqdm instance (
1018
) - Resolve issue with
NaN
values inLatLong
columns (1007
)
- Cleanly close tqdm instance (
- Testing Changes
- Update tests for numpy v1.19.0 compatability (
1016
)
- Update tests for numpy v1.19.0 compatability (
Thanks to the following people for contributing to this release:
Alex-Monahan
,frances-h
,gsheni
,rwedge
,thehomebrewnerd
- v0.15.0 May 29, 2020
- Enhancements
- Add
get_default_aggregation_primitives
andget_default_transform_primitives
(945
) - Allow cutoff time dataframe columns to be in any order (
969
,995
) - Add Age primitive, and make it a default transform primitive for DFS (
987
) - Add
include_cutoff_time
arg - control whether data at cutoff times are included in feature calculations (959
) - Allow
variables_types
to be referenced by theirtype_string
for theentity_from_dataframe
function (988
)
- Add
- Fixes
- Fix errors with Equals and NotEquals primitives when comparing categoricals or different dtypes (
968
) - Normalized type_strings of
Variable
classes so that thefind_variable_types
function produces a dictionary with a clear key to name transition (982
,996
) - Remove pandas.datetime in test_calculate_feature_matrix due to deprecation (
998
)
- Fix errors with Equals and NotEquals primitives when comparing categoricals or different dtypes (
- Documentation Changes
- Add python 3.8 support for docs (
983
) - Adds consistent Entityset Docstrings (
986
)
- Add python 3.8 support for docs (
- Testing Changes
- Add automated tests for python 3.8 environment (
847
) - Update testing dependencies (
976
)
- Add automated tests for python 3.8 environment (
Thanks to the following people for contributing to this release:
ctduffy
,frances-h
,gsheni
,jeff-hernandez
,rightx2
,rwedge
,sebrahimi1988
,thehomebrewnerd
,tuethan1999
Breaking Changes
- Calls to
featuretools.dfs
orfeaturetools.calculate_feature_matrix
that use a cutoff time dataframe, but do not label the time column with either the target entity time index variable name or astime
, will now result in anAttributeError
. Previously, the time column was selected to be the first column that was not the instance id column. With this update, the position of the column in the dataframe is no longer used to determine the time column. Now, both instance id columns and time columns in a cutoff time dataframe can be in any order as long as they are named properly. - The
type_string
attributes of allVariable
subclasses are now a snake case conversion of their class names. This changes thetype_string
of theUnknown
,IPAddress
,EmailAddress
,SubRegionCode
,FilePath
,LatLong
, andZIPcode
classes. Old saved entitysets that used these variables may load incorrectly.
- v0.14.0 Apr 30, 2020
- Enhancements
- ft.encode_features - use less memory for one-hot encoded columns (
876
)
- ft.encode_features - use less memory for one-hot encoded columns (
- Fixes
- Use logger.warning to fix deprecated logger.warn (
871
) - Add dtype to interesting_values to fix deprecated empty Series with no dtype (
933
) - Remove overlap in training windows (
930
) - Fix progress bar in notebook (
932
)
- Use logger.warning to fix deprecated logger.warn (
- Changes
- Change premium primitives CI test to Python 3.6 (
916
) - Remove Python 3.5 support (
917
)
- Change premium primitives CI test to Python 3.6 (
- Documentation Changes
- Fix README links to docs (
872
) - Fix Github links with correct organizations (
908
) - Fix hyperlinks in docs and docstrings with updated address (
910
) - Remove unused script for uploading docs to AWS (
911
)
- Fix README links to docs (
Thanks to the following people for contributing to this release:
frances-h
,gsheni
,jeff-hernandez
,rwedge
Breaking Changes
- Using training windows in feature calculations can result in different values than previous versions. This was done to prevent consecutive training windows from overlapping by excluding data at the oldest point in time. For example, if we use a cutoff time at the first minute of the hour with a one hour training window, the first minute of the previous hour will no longer be included in the feature calculation.
- v0.13.4 Mar 27, 2020
Warning
The next non-bugfix release of Featuretools will not support Python 3.5
- Fixes
- Fix ft.show_info() not displaying in Jupyter notebooks (
863
)
- Fix ft.show_info() not displaying in Jupyter notebooks (
- Changes
- Added Plugin Warnings at Entry Point (
850
,869
)
- Added Plugin Warnings at Entry Point (
- Documentation Changes
- Add links to primitives.featurelabs.com (
860
) - Add source code links to API reference (
862
) - Update links for testing Dask/Spark integrations (
867
) - Update release documentation for featuretools (
868
)
- Add links to primitives.featurelabs.com (
- Testing Changes
- Miscellaneous changes (
861
)
- Miscellaneous changes (
Thanks to the following people for contributing to this release:
frances-h
,FreshLeaf8865
,jeff-hernandez
,rwedge
,thehomebrewnerd
- v0.13.3 Feb 28, 2020
- Fixes
- Fix a connection closed error when using n_jobs (
853
)
- Fix a connection closed error when using n_jobs (
- Changes
- Pin msgpack dependency for Python 3.5; remove dataframe from Dask dependency (
851
)
- Pin msgpack dependency for Python 3.5; remove dataframe from Dask dependency (
- Documentation Changes
- Update link to help documentation page in Github issue template (
855
)
- Update link to help documentation page in Github issue template (
Thanks to the following people for contributing to this release:
frances-h
,rwedge
- v0.13.2 Jan 31, 2020
- Enhancements
- Support for Pandas 1.0.0 (
844
)
- Support for Pandas 1.0.0 (
- Changes
- Remove dependency on s3fs library for anonymous downloads from S3 (
825
)
- Remove dependency on s3fs library for anonymous downloads from S3 (
- Testing Changes
- Added GitHub Action to automatically run performance tests (
840
)
- Added GitHub Action to automatically run performance tests (
Thanks to the following people for contributing to this release:
frances-h
,rwedge
- v0.13.1 Dec 28, 2019
- Fixes
- Raise error when given wrong input for ignore_variables (
826
) - Fix multi-output features not created when there is no child data (
834
) - Removing type casting in Equals and NotEquals primitives (
504
)
- Raise error when given wrong input for ignore_variables (
- Changes
- Replace pd.timedelta time units that were deprecated (
822
) - Move sklearn wrapper to separate library (
835
,837
)
- Replace pd.timedelta time units that were deprecated (
- Testing Changes
- Run unit tests in windows environment (
790
) - Update boto3 version requirement for tests (
838
)
- Run unit tests in windows environment (
Thanks to the following people for contributing to this release:
jeffzi
,kmax12
,rwedge
,systemshift
- v0.13.0 Nov 30, 2019
- Enhancements
- Added GitHub Action to auto upload releases to PyPI (
816
)
- Added GitHub Action to auto upload releases to PyPI (
- Fixes
- Fix issue where some primitive options would not be applied (
807
) - Fix issue with converting to pickle or parquet after adding interesting features (
798
,823
) - Diff primitive now calculates using all available data (
824
) - Prevent DFS from creating Identity Features of globally ignored variables (
819
)
- Fix issue where some primitive options would not be applied (
- Changes
- Remove python 2.7 support from serialize.py (
812
) - Make smart_open, boto3, and s3fs optional dependencies (
827
)
- Remove python 2.7 support from serialize.py (
- Documentation Changes
- remove python 2.7 support and add 3.7 in install.rst (
805
) - Fix import error in docs (
803
) - Fix release title formatting in changelog (
806
)
- remove python 2.7 support and add 3.7 in install.rst (
- Testing Changes
- Use multiple CPUS to run tests on CI (
811
) - Refactor test entityset creation to avoid saving to disk (
813
,821
) - Remove get_values() from test_es.py to remove warnings (
820
)
- Use multiple CPUS to run tests on CI (
Thanks to the following people for contributing to this release:
frances-h
,jeff-hernandez
,rwedge
,systemshift
Breaking Changes
- The libraries used for downloading or uploading from S3 or URLs are now optional and will no longer be installed by default. To use this functionality they will need to be installed separately.
- The fix to how the Diff primitive is calculated may slow down the overall calculation time of feature lists that use this primitive.
- v0.12.0 Oct 31, 2019
- Enhancements
- Added First primitive (
770
) - Added Entropy aggregation primitive (
779
) - Allow custom naming for multi-output primitives (
780
)
- Added First primitive (
- Fixes
- Prevents user from removing base entity time index using additional_variables (
768
) - Fixes error when a multioutput primitive was supplied to dfs as a groupby trans primitive (
786
)
- Prevents user from removing base entity time index using additional_variables (
- Changes
- Drop Python 2 support (
759
) - Add unit parameter to AvgTimeBetween (
771
) - Require Pandas 0.24.1 or higher (
787
)
- Drop Python 2 support (
- Documentation Changes
- Update featuretools slack link (
765
) - Set up repo to use Read the Docs (
776
) - Add First primitive to API reference docs (
782
)
- Update featuretools slack link (
- Testing Changes
- CircleCI fixes (
774
) - Disable PIP progress bars (
775
)
- CircleCI fixes (
Thanks to the following people for contributing to this release:
ablacke-ayx
,BoopBoopBeepBoop
,jeffzi
,kmax12
,rwedge
,thehomebrewnerd
,twdobson
- v0.11.0 Sep 30, 2019
Warning
The next non-bugfix release of Featuretools will not support Python 2
- Enhancements
- Improve how files are copied and written (
721
) - Add number of rows to graph in entityset.plot (
727
) - Added support for pandas DateOffsets in DFS and Timedelta (
732
) - Enable feature-specific top_n value using a dictionary in encode_features (
735
) - Added progress_callback parameter to dfs() and calculate_feature_matrix() (
739
,745
) - Enable specifying primitives on a per column or per entity basis (
748
)
- Improve how files are copied and written (
- Fixes
- Fixed entity set deserialization (
720
) - Added error message when DateTimeIndex is a variable but not set as the time_index (
723
) - Fixed CumCount and other group-by transform primitives that take ID as input (
733
,754
) - Fix progress bar undercounting (
743
) - Updated training_window error assertion to only check against observations (
728
) - Don't delete the whole destination folder while saving entityset (
717
)
- Fixed entity set deserialization (
- Changes
- Raise warning and not error on schema version mismatch (
718
) - Change feature calculation to return in order of instance ids provided (
676
) - Removed time remaining from displayed progress bar in dfs() and calculate_feature_matrix() (
739
) - Raise warning in normalize_entity() when time_index of base_entity has an invalid type (
749
) - Remove toolz as a direct dependency (
755
) - Allow boolean variable types to be used in the Multiply primitive (
756
)
- Raise warning and not error on schema version mismatch (
- Documentation Changes
- Updated URL for Compose (
716
)
- Updated URL for Compose (
- Testing Changes
- Update dependencies (
738
,741
,747
)
- Update dependencies (
Thanks to the following people for contributing to this release:
angela97lin
,chidauri
,christopherbunn
,frances-h
,jeff-hernandez
,kmax12
,MarcoGorelli
,rwedge
,thehomebrewnerd
Breaking Changes
- Feature calculations will return in the order of instance ids provided instead of the order of time points instances are calculated at.
- v0.10.1 Aug 25, 2019
- Fixes
- Fix serialized LatLong data being loaded as strings (
712
)
- Fix serialized LatLong data being loaded as strings (
- Documentation Changes
- Fixed FAQ cell output (
710
)
- Fixed FAQ cell output (
Thanks to the following people for contributing to this release:
gsheni
,rwedge
- v0.10.0 Aug 19, 2019
Warning
The next non-bugfix release of Featuretools will not support Python 2
- Enhancements
- Give more frequent progress bar updates and update chunk size behavior (
631
,696
) - Added drop_first as param in encode_features (
647
) - Added support for stacking multi-output primitives (
679
) - Generate transform features of direct features (
623
) - Added serializing and deserializing from S3 and deserializing from URLs (
685
) - Added nlp_primitives as an add-on library (
704
) - Added AutoNormalize to Featuretools plugins (
699
) - Added functionality for relative units (month/year) in Timedelta (
692
) - Added categorical-encoding as an add-on library (
700
)
- Give more frequent progress bar updates and update chunk size behavior (
- Fixes
- Fix performance regression in DFS (
637
) - Fix deserialization of feature relationship path (
665
) - Set index after adding ancestor relationship variables (
668
) - Fix user-supplied variable_types modification in Entity init (
675
) - Don't calculate dependencies of unnecessary features (
667
) - Prevent normalize entity's new entity having same index as base entity (
681
) - Update variable type inference to better check for string values (
683
)
- Fix performance regression in DFS (
- Changes
- Moved dask, distributed imports (
634
)
- Moved dask, distributed imports (
- Documentation Changes
- Miscellaneous changes (
641
,658
) - Modified doc_string of top_n in encoding (
648
) - Hyperlinked ComposeML (
653
) - Added FAQ (
620
,677
) - Fixed FAQ question with multiple question marks (
673
)
- Miscellaneous changes (
- Testing Changes
- Add master, and release tests for premium primitives (
660
,669
) - Miscellaneous changes (
672
,674
)
- Add master, and release tests for premium primitives (
Thanks to the following people for contributing to this release:
alexjwang
,allisonportis
,ayushpatidar
,CJStadler
,ctduffy
,gsheni
,jeff-hernandez
,jeremyliweishih
,kmax12
,rwedge
,zhxt95
,- v0.9.1 July 3, 2019
- Enhancements
- Speedup groupby transform calculations (
609
) - Generate features along all paths when there are multiple paths between entities (
600
,608
)
- Speedup groupby transform calculations (
- Fixes
- Select columns of dataframe using a list (
615
) - Change type of features calculated on Index features to Categorical (
602
) - Filter dataframes through forward relationships (
625
) - Specify Dask version in requirements for python 2 (
627
) - Keep dataframe sorted by time during feature calculation (
626
) - Fix bug in encode_features that created duplicate columns of features with multiple outputs (
622
)
- Select columns of dataframe using a list (
- Changes
- Remove unused variance_selection.py file (
613
) - Remove Timedelta data param (
619
) - Remove DaysSince primitive (
628
)
- Remove unused variance_selection.py file (
- Documentation Changes
- Add installation instructions for add-on libraries (
617
) - Clarification of Multi Output Feature Creation (
638
) - Miscellaneous changes (
632
,639
)
- Add installation instructions for add-on libraries (
- Testing Changes
- Miscellaneous changes (
595
,612
)
- Miscellaneous changes (
Thanks to the following people for contributing to this release:
CJStadler
,kmax12
,rwedge
,gsheni
,kkleidal
,ctduffy
- v0.9.0 June 19, 2019
- Enhancements
- Add unit parameter to timesince primitives (
558
) - Add ability to install optional add on libraries (
551
) - Load and save features from open files and strings (
566
) - Support custom variable types (
571
) - Support entitysets which have multiple paths between two entities (
572
,544
) - Added show_info function, more output information added to CLI featuretools info (
525
)
- Add unit parameter to timesince primitives (
- Fixes
- Normalize_entity specifies error when 'make_time_index' is an invalid string (
550
) - Schema version added for entityset serialization (
586
) - Renamed features have names correctly serialized (
585
) - Improved error message for index/time_index being the same column in normalize_entity and entity_from_dataframe (
583
) - Removed all mentions of allow_where (
587
,588
) - Removed unused variable in normalize entity (
589
) - Change time since return type to numeric (
606
)
- Normalize_entity specifies error when 'make_time_index' is an invalid string (
- Changes
- Refactor get_pandas_data_slice to take single entity (
547
) - Updates TimeSincePrevious and Diff Primitives (
561
) - Remove unecessary time_last variable (
546
)
- Refactor get_pandas_data_slice to take single entity (
- Documentation Changes
- Add Featuretools Enterprise to documentation (
563
) - Miscellaneous changes (
552
,573
,577
,599
)
- Add Featuretools Enterprise to documentation (
- Testing Changes
- Miscellaneous changes (
559
,569
,570
,574
,584
,590
)
- Miscellaneous changes (
Thanks to the following people for contributing to this release:
alexjwang
,allisonportis
,CJStadler
,ctduffy
,gsheni
,kmax12
,rwedge
- v0.8.0 May 17, 2019
- Rename NUnique to NumUnique (
510
) - Serialize features as JSON (
532
) - Drop all variables at once in normalize_entity (
533
) - Remove unnecessary sorting from normalize_entity (
535
) - Features cache their names (
536
) - Only calculate features for instances before cutoff (
523
) - Remove all relative imports (
530
) - Added FullName Variable Type (
506
) - Add error message when target entity does not exist (
520
) - New demo links (
542
) - Remove duplicate features check in DFS (
538
) - featuretools_primitives entry point expects list of primitive classes (
529
) - Update ALL_VARIABLE_TYPES list (
526
) - More Informative N Jobs Prints and Warnings (
511
) - Update sklearn version requirements (
541
) - Update Makefile (
519
) - Remove unused parameter in Entity._handle_time (
524
) - Remove build_ext code from setup.py (
513
) - Documentation updates (
512
,514
,515
,521
,522
,527
,545
) - Testing updates (
509
,516
,517
,539
)
Thanks to the following people for contributing to this release:
bphi
,CharlesBradshaw
,CJStadler
,glentennis
,gsheni
,kmax12
,rwedge
- Rename NUnique to NumUnique (
Breaking Changes
NUnique
has been renamed toNumUnique
.Previous behavior
from featuretools.primitives import NUnique
New behavior
from featuretools.primitives import NumUnique
- v0.7.1 Apr 24, 2019
- Automatically generate feature name for controllable primitives (
481
) - Primitive docstring updates (
489
,492
,494
,495
) - Change primitive functions that returned strings to return functions (
499
) - CLI customizable via entrypoints (
493
) - Improve calculation of aggregation features on grandchildren (
479
) - Refactor entrypoints to use decorator (
483
) - Include doctests in testing suite (
491
) - Documentation updates (
490
) - Update how standard primitives are imported internally (
482
)
Thanks to the following people for contributing to this release:
bukosabino
,CharlesBradshaw
,glentennis
,gsheni
,jeff-hernandez
,kmax12
,minkvsky
,rwedge
,thehomebrewnerd
- Automatically generate feature name for controllable primitives (
- v0.7.0 Mar 29, 2019
- Improve Entity Set Serialization (
361
) - Support calling a primitive instance's function directly (
461
,468
) - Support other libraries extending featuretools functionality via entrypoints (
452
) - Remove featuretools install command (
475
) - Add GroupByTransformFeature (
455
,472
,476
) - Update Haversine Primitive (
435
,462
) - Add commutative argument to SubtractNumeric and DivideNumeric primitives (
457
) - Add FilePath variable_type (
470
) - Add PhoneNumber, DateOfBirth, URL variable types (
447
) - Generalize infer_variable_type, convert_variable_data and convert_all_variable_data methods (
423
) - Documentation updates (
438
,446
,458
,469
) - Testing updates (
440
,444
,445
,459
)
Thanks to the following people for contributing to this release:
bukosabino
,CharlesBradshaw
,ColCarroll
,glentennis
,grayskripko
,gsheni
,jeff-hernandez
,jrkinley
,kmax12
,RogerTangos
,rwedge
- Improve Entity Set Serialization (
Breaking Changes
ft.dfs
now has agroupby_trans_primitives
parameter that DFS uses to automatically construct features that group by an ID column and then apply a transform primitive to search group. This change applies to the following primitives:CumSum
,CumCount
,CumMean
,CumMin
, andCumMax
.Previous behavior
ft.dfs(entityset=es, target_entity='customers', trans_primitives=["cum_mean"])
New behavior
ft.dfs(entityset=es, target_entity='customers', groupby_trans_primitives=["cum_mean"])
Related to the above change, cumulative transform features are now defined using a new feature class,
GroupByTransformFeature
.Previous behavior
ft.Feature([base_feature, groupby_feature], primitive=CumulativePrimitive)
New behavior
ft.Feature(base_feature, groupby=groupby_feature, primitive=CumulativePrimitive)
- v0.6.1 Feb 15, 2019
- Cumulative primitives (
410
) - Entity.query_by_values now preserves row order of underlying data (
428
) - Implementing Country Code and Sub Region Codes as variable types (
430
) - Added IPAddress and EmailAddress variable types (
426
) - Install data and dependencies (
403
) - Add TimeSinceFirst, fix TimeSinceLast (
388
) - Allow user to pass in desired feature return types (
372
) - Add new configuration object (
401
) - Replace NUnique get_function (
434
) - _calculate_idenity_features now only returns the features asked for, instead of the entire entity (
429
) - Primitive function name uniqueness (
424
) - Update NumCharacters and NumWords primitives (
419
) - Removed Variable.dtype (
416
,433
) - Change to zipcode rep, str for pandas (
418
) - Remove pandas version upper bound (
408
) - Make S3 dependencies optional (
404
) - Check that agg_primitives and trans_primitives are right primitive type (
397
) - Mean primitive changes (
395
) - Fix transform stacking on multi-output aggregation (
394
) - Fix list_primitives (
391
) - Handle graphviz dependency (
389
,396
,398
) - Testing updates (
402
,417
,433
) - Documentation updates (
400
,409
,415
,417
,420
,421
,422
,431
)
Thanks to the following people for contributing to this release:
CharlesBradshaw
,csala
,floscha
,gsheni
,jxwolstenholme
,kmax12
,RogerTangos
,rwedge
- Cumulative primitives (
- v0.6.0 Jan 30, 2018
- Primitive refactor (
364
) - Mean ignore NaNs (
379
) - Plotting entitysets (
382
) - Add seed features later in DFS process (
357
) - Multiple output column features (
376
) - Add ZipCode Variable Type (
367
) - Add primitive.get_filepath and example of primitive loading data from external files (
380
) - Transform primitives take series as input (
385
) - Update dependency requirements (
378
,383
,386
) - Add modulo to override tests (
384
) - Update documentation (
368
,377
) - Update README.md (
366
,373
) - Update CI tests (
359
,360
,375
)
Thanks to the following people for contributing to this release:
floscha
,gsheni
,kmax12
,RogerTangos
,rwedge
- Primitive refactor (
- v0.5.1 Dec 17, 2018
- Add missing dependencies (
353
) - Move comment to note in documentation (
352
)
- Add missing dependencies (
- v0.5.0 Dec 17, 2018
- Add specific error for duplicate additional/copy_variables in normalize_entity (
348
) - Removed EntitySet._import_from_dataframe (
346
) - Removed time_index_reduce parameter (
344
) - Allow installation of additional primitives (
326
) - Fix DatetimeIndex variable conversion (
342
) - Update Sklearn DFS Transformer (
343
) - Clean up entity creation logic (
336
) - remove casting to list in transform feature calculation (
330
) - Fix sklearn wrapper (
335
) - Add readme to pypi
- Update conda docs after move to conda-forge (
334
) - Add wrapper for scikit-learn Pipelines (
323
) - Remove parse_date_cols parameter from EntitySet._import_from_dataframe (
333
)
Thanks to the following people for contributing to this release:
bukosabino
,georgewambold
,gsheni
,jeff-hernandez
,kmax12
, andrwedge
.- Add specific error for duplicate additional/copy_variables in normalize_entity (
- v0.4.1 Nov 29, 2018
- Resolve bug preventing using first column as index by default (
308
) - Handle return type when creating features from Id variables (
318
) - Make id an optional parameter of EntitySet constructor (
324
) - Handle primitives with same function being applied to same column (
321
) - Update requirements (
328
) - Clean up DFS arguments (
319
) - Clean up Pandas Backend (
302
) - Update properties of cumulative transform primitives (
320
) - Feature stability between versions documentation (
316
) - Add download count to GitHub readme (
310
) - Fixed #297 update tests to check error strings (
303
) - Remove usage of fixtures in agg primitive tests (
325
)
- Resolve bug preventing using first column as index by default (
- v0.4.0 Oct 31, 2018
- Remove ft.utils.gen_utils.getsize and make pympler a test requirement (
299
) - Update requirements.txt (
298
) - Refactor EntitySet.find_path(...) (
295
) - Clean up unused methods (
293
) - Remove unused parents property of Entity (
283
) - Removed relationships parameter (
284
) - Improve time index validation (
285
) - Encode features with "unknown" class in categorical (
287
) - Allow where clauses on direct features in Deep Feature Synthesis (
279
) - Change to fullargsspec (
288
) - Parallel verbose fixes (
282
) - Update tests for python 3.7 (
277
) - Check duplicate rows cutoff times (
276
) - Load retail demo data using compressed file (
271
)
- Remove ft.utils.gen_utils.getsize and make pympler a test requirement (
- v0.3.1 Sept 28, 2018
- Handling time rewrite (
245
) - Update deep_feature_synthesis.py (
249
) - Handling return type when creating features from DatetimeTimeIndex (
266
) - Update retail.py (
259
) - Improve Consistency of Transform Primitives (
236
) - Update demo docstrings (
268
) - Handle non-string column names (
255
) - Clean up merging of aggregation primitives (
250
) - Add tests for Entity methods (
262
) - Handle no child data when calculating aggregation features with multiple arguments (
264
) - Add is_string utils function (
260
) - Update python versions to match docker container (
261
) - Handle where clause when no child data (
258
) - No longer cache demo csvs, remove config file (
257
) - Avoid stacking "expanding" primitives (
238
) - Use randomly generated names in retail csv (
233
) - Update README.md (
243
)
- Handling time rewrite (
- v0.3.0 Aug 27, 2018
- Improve performance of all feature calculations (
224
) - Update agg primitives to use more efficient functions (
215
) - Optimize metadata calculation (
229
) - More robust handling when no data at a cutoff time (
234
) - Workaround categorical merge (
231
) - Switch which CSV is associated with which variable (
228
) - Remove unused kwargs from query_by_values, filter_and_sort (
225
) - Remove convert_links_to_integers (
219
) - Add conda install instructions (
223
,227
) - Add example of using Dask to parallelize to docs (
221
)
- Improve performance of all feature calculations (
- v0.2.2 Aug 20, 2018
- Remove unnecessary check no related instances call and refactor (
209
) - Improve memory usage through support for pandas categorical types (
196
) - Bump minimum pandas version from 0.20.3 to 0.23.0 (
216
) - Better parallel memory warnings (
208
,214
) - Update demo datasets (
187
,201
,207
) - Make primitive lookup case insensitive (
213
) - Use capital name (
211
) - Set class name for Min (
206
) - Remove
variable_types
from normalize entity (205
) - Handle parquet serialization with last time index (
204
) - Reset index of cutoff times in calculate feature matrix (
198
) - Check argument types for .normalize_entity (
195
) - Type checking ignore entities. (
193
)
- Remove unnecessary check no related instances call and refactor (
- v0.2.1 July 2, 2018
- Cpu count fix (
176
) - Update flight (
175
) - Move feature matrix calculation helper functions to separate file (
177
)
- Cpu count fix (
- v0.2.0 June 22, 2018
- Multiprocessing (
170
) - Handle unicode encoding in repr throughout Featuretools (
161
) - Clean up EntitySet class (
145
) - Add support for building and uploading conda package (
167
) - Parquet serialization (
152
) - Remove variable stats (
171
) - Make sure index variable comes first (
168
) - No last time index update on normalize (
169
) - Remove list of times as on option for cutoff_time in calculate_feature_matrix (
165
) - Config does error checking to see if it can write to disk (
162
)
- Multiprocessing (
- v0.1.21 May 30, 2018
- Support Pandas 0.23.0 (
153
,154
,155
,159
) - No EntitySet required in loading/saving features (
141
) - Use s3 demo csv with better column names (
139
) - more reasonable start parameter (
149
) - add issue template (
133
) - Improve tests (
136
,137
,144
,147
) - Remove unused functions (
140
,143
,146
) - Update documentation after recent changes / removals (
157
) - Rename demo retail csv file (
148
) - Add names for binary (
142
) - EntitySet repr to use get_name rather than id (
134
) - Ensure config dir is writable (
135
)
- Support Pandas 0.23.0 (
- v0.1.20 Apr 13, 2018
- Primitives as strings in DFS parameters (
129
) - Integer time index bugfixes (
128
) - Add make_temporal_cutoffs utility function (
126
) - Show all entities, switch shape display to row/col (
124
) - Improved chunking when calculating feature matrices (
121
) - fixed num characters nan fix (
118
) - modify ignore_variables docstring (
117
)
- Primitives as strings in DFS parameters (
- v0.1.19 Mar 21, 2018
- More descriptive DFS progress bar (
69
) - Convert text variable to string before NumWords (
106
) - EntitySet.concat() reindexes relationships (
96
) - Keep non-feature columns when encoding feature matrix (
111
) - Uses full entity update for dependencies of uses_full_entity features (
110
) - Update column names in retail demo (
104
) - Handle Transform features that need access to all values of entity (
91
)
- More descriptive DFS progress bar (
- v0.1.18 Feb 27, 2018
- fixes related instances bug (
97
) - Adding non-feature columns to calculated feature matrix (
78
) - Relax numpy version req (
82
) - Remove entity_from_csv, tests, and lint (
71
)
- fixes related instances bug (
- v0.1.17 Jan 18, 2018
- LatLong type (
57
) - Last time index fixes (
70
) - Make median agg primitives ignore nans by default (
61
) - Remove Python 3.4 support (
64
) - Change normalize_entity to update secondary_time_index (
59
) - Unpin requirements (
53
) - associative -> commutative (
56
) - Add Words and Chars primitives (
51
)
- LatLong type (
- v0.1.16 Dec 19, 2017
- fix EntitySet.combine_variables and standardize encode_features (
47
) - Python 3 compatibility (
16
)
- fix EntitySet.combine_variables and standardize encode_features (
- v0.1.15 Dec 18, 2017
- Fix variable type in demo data (
37
) - Custom primitive kwarg fix (
38
) - Changed order and text of arguments in make_trans_primitive docstring (
42
)
- Fix variable type in demo data (
- v0.1.14 November 20, 2017
- Last time index (
33
) - Update Scipy version to 1.0.0 (
31
)
- Last time index (
- v0.1.13 November 1, 2017
- Add MANIFEST.in (
26
)
- Add MANIFEST.in (
- v0.1.11 October 31, 2017
- Package linting (
7
) - Custom primitive creation functions (
13
) - Split requirements to separate files and pin to latest versions (
15
) - Select low information features (
18
) - Fix docs typos (
19
) - Fixed Diff primitive for rare nan case (
21
) - added some mising doc strings (
23
) - Trend fix (
22
) - Remove as_dir=False option from EntitySet.to_pickle() (
20
) - Entity Normalization Preserves Types of Copy & Additional Variables (
25
)
- Package linting (
- v0.1.10 October 12, 2017
- NumTrue primitive added and docstring of other primitives updated (
11
) - fixed hash issue with same base features (
8
) - Head fix (
9
) - Fix training window (
10
) - Add associative attribute to primitives (
3
) - Add status badges, fix license in setup.py (
1
) - fixed head printout and flight demo index (
2
)
- NumTrue primitive added and docstring of other primitives updated (
- v0.1.9 September 8, 2017
- Documentation improvements
- New
featuretools.demo.load_mock_customer
function
- v0.1.8 September 1, 2017
- Bug fixes
- Added
Percentile
transform primitive
- v0.1.7 August 17, 2017
- Performance improvements for approximate in
calculate_feature_matrix
anddfs
- Added
Week
transform primitive
- Performance improvements for approximate in
- v0.1.6 July 26, 2017
- Added
load_features
andsave_features
to persist and reload features - Added save_progress argument to
calculate_feature_matrix
- Added approximate parameter to
calculate_feature_matrix
anddfs
- Added
load_flight
to ft.demo
- Added
- v0.1.5 July 11, 2017
- Windows support
- v0.1.3 July 10, 2017
- Renamed feature submodule to primitives
- Renamed prediction_entity arguments to target_entity
- Added training_window parameter to
calculate_feature_matrix
- v0.1.2 July 3rd, 2017
- Initial release