Skip to content

Conversation

@jeff-hernandez
Copy link
Contributor

Closes #1864



def replace_nan_with_flag(pdf, flag=-1):
def replace_nan_with_flag(pdf, flag=-1.):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark series doesn't support an array that contains floats and integers

ps.from_pandas(pd.Series([[0.0, 0.0], [7.0, 3.0], [14.0, 6.0], [-1, -1], [-1, -1]]))
TypeError: element in array field 0: DoubleType can not accept object -1 in type <class 'int'>

Comment on lines 1582 to 1584
def _create_index(df, index):
if isinstance(df, dd.DataFrame):
if isinstance(df, (dd.DataFrame, ps.DataFrame)):
df[index] = 1
df[index] = df[index].cumsum() - 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark doesn't support range type for column assignment

setup.cfg Outdated
psutil >= 5.6.6
click >= 7.0.0
woodwork >= 0.8.1
woodwork @ git+https://github.com/alteryx/woodwork.git@migrate-to-pyspark-api#egg=woodwork
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to change this to the corresponding woodwork version before merging

@jeff-hernandez
Copy link
Contributor Author

we'll need to switch the required unit tests from koalas to spark

@codecov
Copy link

codecov bot commented Mar 11, 2022

Codecov Report

Merging #1949 (db333c9) into main (55cb2be) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head db333c9 differs from pull request most recent head ed639ae. Consider uploading reports for the commit ed639ae to get more accurate results

@@            Coverage Diff             @@
##             main    #1949      +/-   ##
==========================================
- Coverage   98.99%   98.99%   -0.01%     
==========================================
  Files         146      146              
  Lines       16478    16437      -41     
==========================================
- Hits        16313    16271      -42     
- Misses        165      166       +1     
Impacted Files Coverage Δ
...computational_backends/calculate_feature_matrix.py 100.00% <100.00%> (ø)
...s/computational_backends/feature_set_calculator.py 98.69% <100.00%> (ø)
featuretools/computational_backends/utils.py 96.44% <100.00%> (ø)
featuretools/entityset/entityset.py 99.21% <100.00%> (-0.01%) ⬇️
featuretools/entityset/serialize.py 100.00% <100.00%> (ø)
...ools/primitives/standard/aggregation_primitives.py 96.60% <100.00%> (ø)
...aturetools/primitives/standard/binary_transform.py 100.00% <100.00%> (ø)
...imitives/standard/datetime_transform_primitives.py 100.00% <100.00%> (ø)
...retools/primitives/standard/transform_primitive.py 100.00% <100.00%> (ø)
featuretools/primitives/utils.py 99.51% <100.00%> (ø)
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 55cb2be...ed639ae. Read the comment docs.

@jeff-hernandez jeff-hernandez marked this pull request as ready for review March 11, 2022 18:26
@jeff-hernandez jeff-hernandez requested a review from a team March 11, 2022 19:02
@jeff-hernandez jeff-hernandez requested a review from rwedge March 14, 2022 13:08
Copy link
Contributor

@thehomebrewnerd thehomebrewnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jeff-hernandez
Copy link
Contributor Author

I think we'll need to merge and release the changes in Woodwork first before doing the same in featuretools.

@thehomebrewnerd
Copy link
Contributor

@jeff-hernandez Just a quick heads-up. The compatibility attribute for the primitives added by #1948 will need to be updated as well when you fix the release notes merge conflict.

Copy link
Contributor

@thehomebrewnerd thehomebrewnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, assuming tests pass.

@jeff-hernandez jeff-hernandez enabled auto-merge (squash) March 15, 2022 17:55
@jeff-hernandez jeff-hernandez merged commit aa8e2e7 into main Mar 15, 2022
@thehomebrewnerd thehomebrewnerd mentioned this pull request Mar 15, 2022
@rwedge rwedge deleted the pyspark-api branch June 16, 2022 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update koalas code to pyspark pandas API instead in Featuretools

4 participants