Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix fragmentation PerformanceWarning in feature_set_calculator.py #2424

Merged
merged 18 commits into from
Jan 4, 2023

Conversation

thehomebrewnerd
Copy link
Contributor

@thehomebrewnerd thehomebrewnerd commented Dec 22, 2022

Fixes #2405

  • Fix fragmentation PerformanceWarning in feature_set_calculator.py
  • Fix warning related to deprecation of pandas.util
  • Fix warning related to setting as type with "datetime" without units

@thehomebrewnerd thehomebrewnerd self-assigned this Dec 22, 2022
@thehomebrewnerd thehomebrewnerd marked this pull request as draft December 22, 2022 15:21
Comment on lines 944 to 951
else:
for name, col in new_cols.items():
col.name = name
if isinstance(data, dd.DataFrame):
data = dd.concat([data, col], axis=1)
else:
data = ps.concat([data, col], axis=1)
return data
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like this approach for Dask/Spark, but we can't create a new dataframe from the columns here like we can for pandas, so we have to concat the columns one at a time. Maybe there is a better way, but I didn't come up with one off the top of my head.

@thehomebrewnerd thehomebrewnerd changed the title Fix fragmentation warning Fix fragmentation PerformanceWarning in feature_set_calculator.py Dec 22, 2022
@codecov
Copy link

codecov bot commented Dec 22, 2022

Codecov Report

Merging #2424 (f4828d8) into main (89962ca) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #2424   +/-   ##
=======================================
  Coverage   99.44%   99.44%           
=======================================
  Files         340      340           
  Lines       20893    20908   +15     
=======================================
+ Hits        20778    20793   +15     
  Misses        115      115           
Impacted Files Coverage Δ
featuretools/entityset/deserialize.py 100.00% <ø> (ø)
...ts/primitive_tests/test_datetoholiday_primitive.py 100.00% <ø> (ø)
...rimitive_tests/test_distancetoholiday_primitive.py 100.00% <ø> (ø)
...s/tests/primitive_tests/test_is_federal_holiday.py 100.00% <ø> (ø)
.../tests/primitive_tests/test_transform_primitive.py 100.00% <ø> (ø)
...s/computational_backends/feature_set_calculator.py 98.71% <100.00%> (+0.02%) ⬆️
...imitives/standard/transform/binary/greater_than.py 100.00% <100.00%> (ø)
...standard/transform/binary/greater_than_equal_to.py 100.00% <100.00%> (ø)
...d/transform/binary/greater_than_equal_to_scalar.py 100.00% <100.00%> (ø)
...s/standard/transform/binary/greater_than_scalar.py 100.00% <100.00%> (ø)
... and 6 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@thehomebrewnerd thehomebrewnerd marked this pull request as ready for review December 22, 2022 19:36
@thehomebrewnerd thehomebrewnerd merged commit 0f2dc08 into main Jan 4, 2023
@thehomebrewnerd thehomebrewnerd deleted the feature-set-calc-fragmenting branch January 4, 2023 21:23
@thehomebrewnerd thehomebrewnerd mentioned this pull request Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[refactor] PerformanceWarning: DataFrame is highly fragmented during DFS
6 participants