Use generate sklearn UDT within gapply() [SPARK-16062 blocks this] #33

vlad17 · 2016-06-28T00:04:23Z

Currently, KeyedModel fitting in KeyedEstimator._fit is implemented by generating an array of a single serialized estimator, requiring an additional pass over the resulting dataframe which deserializes the UDT.

This is necessary because of a pyspark bug, and the circuitous implementation should be straightened out once the UDT issues are resolved (SPARK-16062).

vlad17 · 2016-06-28T00:20:29Z

Also blocking this test from being unskipped:
test_gapply_python_only_udt_val (spark_sklearn.tests.test_gapply.GapplyTests) ... SKIP: python only UDTs can't be nested in arraytypes for now, see SPARK-15989

thunterdb · 2016-08-16T01:53:03Z

This has been fixed in Spark 2.0.1-SNAPSHOT. We can work on it once I cut the the release of spark-sklearn.

srowen · 2018-12-09T23:46:45Z

Was this resolved? I see the test was un-skipped in 7583ee1

vlad17 changed the title ~~Use generate sklearn UDT within gapply()~~ Use generate sklearn UDT within gapply() [SPARK-16062 blocker] Jun 28, 2016

vlad17 changed the title ~~Use generate sklearn UDT within gapply() [SPARK-16062 blocker]~~ Use generate sklearn UDT within gapply() [SPARK-16062 blocks this] Jun 28, 2016

vlad17 mentioned this issue Aug 9, 2016

Upgrade to Spark 2.0 #42

Merged

srowen added the enhancement label Jan 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use generate sklearn UDT within gapply() [SPARK-16062 blocks this] #33

Use generate sklearn UDT within gapply() [SPARK-16062 blocks this] #33

vlad17 commented Jun 28, 2016

vlad17 commented Jun 28, 2016

thunterdb commented Aug 16, 2016

srowen commented Dec 9, 2018

Use generate sklearn UDT within gapply() [SPARK-16062 blocks this] #33

Use generate sklearn UDT within gapply() [SPARK-16062 blocks this] #33

Comments

vlad17 commented Jun 28, 2016

vlad17 commented Jun 28, 2016

thunterdb commented Aug 16, 2016

srowen commented Dec 9, 2018