-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace Koalas with pandas API on Spark #1949
Conversation
def replace_nan_with_flag(pdf, flag=-1): | ||
def replace_nan_with_flag(pdf, flag=-1.): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark series doesn't support an array that contains floats and integers
ps.from_pandas(pd.Series([[0.0, 0.0], [7.0, 3.0], [14.0, 6.0], [-1, -1], [-1, -1]]))
TypeError: element in array field 0: DoubleType can not accept object -1 in type <class 'int'>
featuretools/entityset/entityset.py
Outdated
if isinstance(df, dd.DataFrame): | ||
if isinstance(df, (dd.DataFrame, ps.DataFrame)): | ||
df[index] = 1 | ||
df[index] = df[index].cumsum() - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark doesn't support range type for column assignment
setup.cfg
Outdated
woodwork >= 0.8.1 | ||
woodwork @ git+https://github.com/alteryx/woodwork.git@migrate-to-pyspark-api#egg=woodwork |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to change this to the corresponding woodwork version before merging
we'll need to switch the required unit tests from koalas to spark |
Codecov Report
@@ Coverage Diff @@
## main #1949 +/- ##
==========================================
- Coverage 98.99% 98.99% -0.01%
==========================================
Files 146 146
Lines 16478 16437 -41
==========================================
- Hits 16313 16271 -42
- Misses 165 166 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I think we'll need to merge and release the changes in Woodwork first before doing the same in featuretools. |
@jeff-hernandez Just a quick heads-up. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, assuming tests pass.
Closes #1864