Skip to content
This repository has been archived by the owner on Dec 4, 2019. It is now read-only.

Use generate sklearn UDT within gapply() [SPARK-16062 blocks this] #33

Open
vlad17 opened this issue Jun 28, 2016 · 3 comments
Open

Use generate sklearn UDT within gapply() [SPARK-16062 blocks this] #33

vlad17 opened this issue Jun 28, 2016 · 3 comments

Comments

@vlad17
Copy link
Contributor

vlad17 commented Jun 28, 2016

Currently, KeyedModel fitting in KeyedEstimator._fit is implemented by generating an array of a single serialized estimator, requiring an additional pass over the resulting dataframe which deserializes the UDT.

This is necessary because of a pyspark bug, and the circuitous implementation should be straightened out once the UDT issues are resolved (SPARK-16062).

@vlad17 vlad17 changed the title Use generate sklearn UDT within gapply() Use generate sklearn UDT within gapply() [SPARK-16062 blocker] Jun 28, 2016
@vlad17 vlad17 changed the title Use generate sklearn UDT within gapply() [SPARK-16062 blocker] Use generate sklearn UDT within gapply() [SPARK-16062 blocks this] Jun 28, 2016
@vlad17
Copy link
Contributor Author

vlad17 commented Jun 28, 2016

Also blocking this test from being unskipped:
test_gapply_python_only_udt_val (spark_sklearn.tests.test_gapply.GapplyTests) ... SKIP: python only UDTs can't be nested in arraytypes for now, see SPARK-15989

@thunterdb
Copy link
Contributor

This has been fixed in Spark 2.0.1-SNAPSHOT. We can work on it once I cut the the release of spark-sklearn.

@srowen
Copy link
Collaborator

srowen commented Dec 9, 2018

Was this resolved? I see the test was un-skipped in 7583ee1

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants