[ZEPPELIN-97][ZEPPELIN-134] pyspark issue with mllib api by Leemoonsoo · Pull Request #129 · apache/zeppelin

Leemoonsoo · 2015-06-29T20:26:53Z

There were issue ZEPPELIN-97 with pyspark 1.4. The reason is, from pyspark 1.4, java gateway is created with auto_convert = True option. This PR fixes the problem.

This PR also handles ZEPPELIN-134, inject sqlContext.

And it finally improves to print more verbose stacktrace message, for example

from

(<type 'exceptions.AttributeError'>, AttributeError("'list' object has no attribute '_get_object_id'",), <traceback object at 0x392b638>)

to

Traceback (most recent call last):
  File "/var/folders/zt/nd4j13y14jjg7_5pc4xgy7t80000gn/T//zeppelin_pyspark.py", line 110, in <module>
    eval(compiledCode)
  File "<string>", line 3, in <module>
  File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/dataframe.py", line 1200, in withColumn
    return self.select('*', col.alias(colName))
  File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/dataframe.py", line 738, in select
    jdf = self._jdf.select(self._jcols(*cols))
  File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/dataframe.py", line 630, in _jcols
    return self._jseq(cols, _to_java_column)
  File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/dataframe.py", line 617, in _jseq
    return _to_seq(self.sql_ctx._sc, cols, converter)
  File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/column.py", line 60, in _to_seq
    return sc._jvm.PythonUtils.toSeq(cols)
  File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 529, in __call__
    [get_command_part(arg, self.pool) for arg in new_args])
  File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 265, in get_command_part
    command_part = REFERENCE_TYPE + parameter._get_object_id()
AttributeError: 'list' object has no attribute '_get_object_id'

Leemoonsoo · 2015-06-29T21:49:59Z

Ready to merge.

felixcheung · 2015-06-30T08:17:17Z

LGTM

There were issue [ZEPPELIN-97](https://issues.apache.org/jira/browse/ZEPPELIN-97) with pyspark 1.4. The reason is, from pyspark 1.4, java gateway is created with `auto_convert = True` option. This PR fixes the problem. This PR also handles [ZEPPELIN-134](https://issues.apache.org/jira/browse/ZEPPELIN-134), inject sqlContext. And it finally improves to print more verbose stacktrace message, for example from ``` (<type 'exceptions.AttributeError'>, AttributeError("'list' object has no attribute '_get_object_id'",), <traceback object at 0x392b638>) ``` to ``` Traceback (most recent call last): File "/var/folders/zt/nd4j13y14jjg7_5pc4xgy7t80000gn/T//zeppelin_pyspark.py", line 110, in <module> eval(compiledCode) File "<string>", line 3, in <module> File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/dataframe.py", line 1200, in withColumn return self.select('*', col.alias(colName)) File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/dataframe.py", line 738, in select jdf = self._jdf.select(self._jcols(*cols)) File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/dataframe.py", line 630, in _jcols return self._jseq(cols, _to_java_column) File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/dataframe.py", line 617, in _jseq return _to_seq(self.sql_ctx._sc, cols, converter) File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/column.py", line 60, in _to_seq return sc._jvm.PythonUtils.toSeq(cols) File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 529, in __call__ [get_command_part(arg, self.pool) for arg in new_args]) File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 265, in get_command_part command_part = REFERENCE_TYPE + parameter._get_object_id() AttributeError: 'list' object has no attribute '_get_object_id' ``` Author: Lee moon soo <moon@apache.org> Closes #129 from Leemoonsoo/ZEPPELIN-97 and squashes the following commits: 1fa4bf6 [Lee moon soo] apply auto_convert for spark 1.4 bce3c1d [Lee moon soo] Print more stacktrace (cherry picked from commit 6a894b0) Signed-off-by: Lee moon soo <moon@apache.org>

Print more stacktrace

bce3c1d

Leemoonsoo force-pushed the ZEPPELIN-97 branch from 3f7927b to ead4144 Compare June 29, 2015 21:07

apply auto_convert for spark 1.4

1fa4bf6

Leemoonsoo force-pushed the ZEPPELIN-97 branch from ead4144 to 1fa4bf6 Compare June 29, 2015 21:07

asfgit closed this in 6a894b0 Jun 30, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ZEPPELIN-97][ZEPPELIN-134] pyspark issue with mllib api#129

[ZEPPELIN-97][ZEPPELIN-134] pyspark issue with mllib api#129
Leemoonsoo wants to merge 2 commits intoapache:masterfrom
Leemoonsoo:ZEPPELIN-97

Leemoonsoo commented Jun 29, 2015

Uh oh!

Leemoonsoo commented Jun 29, 2015

Uh oh!

felixcheung commented Jun 30, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Leemoonsoo commented Jun 29, 2015

Uh oh!

Leemoonsoo commented Jun 29, 2015

Uh oh!

felixcheung commented Jun 30, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants