-
Notifications
You must be signed in to change notification settings - Fork 909
Replace Koalas with pandas API on Spark #1949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
|
|
||
| def replace_nan_with_flag(pdf, flag=-1): | ||
| def replace_nan_with_flag(pdf, flag=-1.): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark series doesn't support an array that contains floats and integers
ps.from_pandas(pd.Series([[0.0, 0.0], [7.0, 3.0], [14.0, 6.0], [-1, -1], [-1, -1]]))TypeError: element in array field 0: DoubleType can not accept object -1 in type <class 'int'>
| def _create_index(df, index): | ||
| if isinstance(df, dd.DataFrame): | ||
| if isinstance(df, (dd.DataFrame, ps.DataFrame)): | ||
| df[index] = 1 | ||
| df[index] = df[index].cumsum() - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark doesn't support range type for column assignment
setup.cfg
Outdated
| psutil >= 5.6.6 | ||
| click >= 7.0.0 | ||
| woodwork >= 0.8.1 | ||
| woodwork @ git+https://github.com/alteryx/woodwork.git@migrate-to-pyspark-api#egg=woodwork |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to change this to the corresponding woodwork version before merging
|
we'll need to switch the required unit tests from koalas to spark |
Codecov Report
@@ Coverage Diff @@
## main #1949 +/- ##
==========================================
- Coverage 98.99% 98.99% -0.01%
==========================================
Files 146 146
Lines 16478 16437 -41
==========================================
- Hits 16313 16271 -42
- Misses 165 166 +1
Continue to review full report at Codecov.
|
thehomebrewnerd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
I think we'll need to merge and release the changes in Woodwork first before doing the same in featuretools. |
|
@jeff-hernandez Just a quick heads-up. The |
thehomebrewnerd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, assuming tests pass.
Closes #1864