Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Allow dask dataframe during entity creation (#783)
* allow dask dataframe for entity creation * update create entity from dask df test * update test to specify variable types * add dask entityset relationships and dfs * multiple dask updates and add simple dask test * multiple updates for dask dataframes * update dask requirements * update dask test for test_hackathon * fixes after merging in changes from master * initial updates for aggregation with dask dataframes * update dask tests * updated dask tests * dask multipartition tests * update requirements to fix circleci featuretools installation * generalize path to hackathon dataset * add hackathon data to manifest * bump circleci resource size for unit tests * fix test for py35, try bumping circleci resources again * remove hackathon test from circleci, reset resources * use pd.testing.assert_frame_equal with check_like=True for different column orderings * additional test fixes * performance testing improvements * add profiling script * add checking for df types when creating entityset * add test for training_window and fix text for cutoff time df * update dask tests for consistency * add test for approximate (doesn't pass currently) * add test for adding last_time_index to dask entityset * add dask test for secondary_time_index * fix issue with TimeSince primitive with dask entityset * update hackathon test * remove some easy to remove computes * fix Pandas 1.0 issues * various updates for dask entities * lint fix plus missing_ids change * fix hackathon test * update requirements.txt * fixes for windows tests * dask dfs fixes * update aggregation primitives to use dask aggregation * add temp tests directory * update temporary tests * update agg test file * update encode_features for dask * featuretools/dask-tests-tmp/test_instacart.py * update dask tests * update entity creation code * lint and test updates * instacart test updates * lint fix * remove leftover head() call * fix encode features inplace test * fix some issues with dask aggregations * various dask updates * update instacart test files * instacart test updates * instacart test updates * cutoff time updates in cfm * entity updates for _handle_time * instacart test update * update add_last_time_index to use dask * instacart test updates * add dask test for time_window * update add_last_time_indeices * improve Entity.query_by_values() implementation for Dask * update dask tests * lint fix * revert entityset __repr__ code back to master code * Fix issue with make_index and Dask entities (#895) * start test for dask with make_index * update dask test for make_index * remove unnecessary code in _create_index() * update dask make index test * Update set_time_index code path for Dask dataframes and impacted test (#914) * set_time_index converts variable type in dask * remove uses_full_entity primitives * use groupby_trans_primitives for uses_full_entity primitives * remove groupby features (currently unsupported in dask) * simplify logic for getting time_type * Update dask tests (#920) * remove unsupported primitives, update tests * update test_aggregation to remove ambiguity * don't run windows tests in parallel to find failing * skip dask-tmp-tests * revert circleci config * skip tmp dask tests on circleci for windows * lint * move ignore dask-temp-tests to setup.cfg * Compose compatability for Dask (#909) * convert pass_through df to pandas for dask * add compose to test_instacart, lint * add test to check compose label_times accepted * lint and add composeml to test requirements * remove force to pandas, add dask compose test * use >=0.2.0 for composeml * test fixes * add tests * Refactor update_feature_columns (#924) * refactor update_feature_columns * update primitives to work with new update_feature_columns code * update dask tests to skip unsupported primitives * lint fix * fix test_aggregation * lint-fix * remove check for list input in NumWords and NumCharacters * remove check for list input from binary transform primitives * Dask DFS errors with unsupported primitives (#925) * add dask_compatible flag to primitives * add tests * remove unsupported from default, fix tests * lint * percentile needs full entity * update error message * Error if dask dataframe used for cutoff_time (#931) * error if dask dataframe used for cutoff_time * dfs compute with warning * split out test Co-authored-by: Roy Wedge <rwedge@featurelabs.com> * Error if no vtypes given for Dask entity (#929) * error if no vtypes supplied * lint Co-authored-by: Roy Wedge <rwedge@featurelabs.com> * Restore len() call for Pandas in EntitySets.add_relationships (#943) * restore len check for pandas and add test * add dtype check to test * error if feature_matrix is not Pandas df (#955) * error if approximate or training window used with dask (#954) * Revert changes in infer_variable_types (#957) * update infer variable types * remove unnecessary change * Updates for running home credit example with Dask (#953) * home credit tests * update home credit test * Improve column assignment for trans features * update home credit test * testing updates * home credit test updates * home credit notebook update * home credit test updates * home credit test updates * home credit test updates * home credit test updates * home credit updates * update feature_set_calculator.py * remove unnecessary repartitioning from notebook * update notebook text * update notebook to use os.path.join * Update list_primitives to indicate Dask compatibility (#963) * update ft.list_primitives to include dask_compatible column * fix merge mess * Add Dask support to EqualScalar and NotEqualScalar primitives (#967) * add dask support to EqualScalar and NotEqualScalar primitives * remove pd.Series cast * Add demo notebook for using Dask with Instacart dataset (#956) * add instacart with dask notebook example * update notebook text * remove %%time from notebook cells * Dask Test Updates (#973) * remove dask hackathon test * reorganize and remove unnecessary tests * remove dask worker files * remove dask-tests-tmp directory * remove dask_profiling.py * update Makefile and MANIFEST.in * Dask entityset serialization/deserialization (#981) * add deserialize support for dask entities * add tests * error if to_pickle, fix tests * add to_pickle errors test * bump schema version * fix merge issue * Support numeric time index for Dask entityset (#992) * support numeric time index for Dask entityset * remove unused test fixture * refactor pass through cols merging * fix dask test with cutoff times * Update docs for using Dask entitysets (#965) * initial doc updates for dask * update parallel computation guide * finish initial docs for using dask entitysets * update EntitySet styling in docs * various doc additions and improvements * wording updates * label dask entityset support as beta and add link for reporting issues * clear faq notebook output * remove dask_profiling.py * fix spelling errors * update Dask guide * Dask cleanup (#964) * initial clean + revert unused * lint * more reverts * remove unused check * Run unit tests on pandas and dask entitysets (#999) * rename es to pd_es and parameterize es * initial work on synthesis tests * more synthesis test updates * update utils_test * update mock_customer_es fixture * start primitive_tests updates * update primitive_tests * parameterize diamond_es * parameterize games_es fixture * update primitive_tests * fix dfs tests with compose * synthesis test updates * first pass, failing tests need investigation * lint and add missing dask tests to test_es.py * update tests in test_last_time_index.py * fix test_dask_primitives.py * use dd.to_numeric for dask type conversions * update test_es to xfail * xfail synthesis tests * update entityset_tests with xfail * xfail primitive_tests * xfail computational backend tests * fix failing dask test * lint * fix tests and clean up * cleanup and xfail plotting * small fixes Co-authored-by: Nate Parsons <nate.parsons@alteryx.com> * changelog * changelog * revert changes * Dask reverts for performance (#1008) * reverts for performance * update compose tests * remove unused code and update docs (#1012) * Uncomment Future Release section * fix docs build * Dask documentation improvements (#1015) * improve Dask docs * combine parallel computation and performance guides * more doc updates * fix note text * Update setup.cfg Co-authored-by: Frances Hartwell <frances.hartwell@alteryx.com> Co-authored-by: Roy Wedge <rwedge@featurelabs.com>
- Loading branch information