ZEPPELIN-1903. ZeppelinContext can not display pandas DataFrame in PySparkInterpreter #1839

zjffdu · 2017-01-04T09:27:50Z

What is this PR for?

I copy some code from PythonInterpreter to PySparkInterpreter to enable display pandas DataFrame in PySparkInterpreter. Ideally IMO all the features in PythonInterpreter should be available in PySparkInterpeter. PySparkInterpreter should be an extension of PythonInterpreter. After refactoring of PythonInterpreter is done, we can consider about it.

What type of PR is it?

[Improvement]

Todos

- Task

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-1903

How should this be tested?

Unit test is added and also manually tested.

Screenshots (if appropriate)

Questions:

Does the licenses files need update? No
Is there breaking changes for older versions? No
Does this needs documentation? No

zjffdu · 2017-01-04T09:28:38Z

@bzz @Leemoonsoo @felixcheung Please help review

zjffdu · 2017-01-05T03:41:04Z

CI is failed, working on it.

felixcheung · 2017-01-20T05:31:49Z

spark/src/main/resources/python/zeppelin_pyspark.py

    self._displayhook = lambda *args: None

  def show(self, obj):
    from pyspark.sql import DataFrame
    if isinstance(obj, DataFrame):
      print(gateway.jvm.org.apache.zeppelin.spark.ZeppelinContext.showDF(self.z, obj._jdf))
+    elif type(obj).__name__ == "DataFrame": # does not play well with sub-classes
+      # `isinstance(obj, DataFrame)` would req `import pandas.core.frame.DataFrame`


hmm, why is that the case, if pandas is not imported here?

ok, I think I figure out what you mean.
How about adding the check in a method, something like

def isPandas(obj): try: from pandas.core.frame import DataFrame return isinstance(obj, DataFrame) except ImportError: return false

…SparkInterpreter

zjffdu · 2017-04-28T07:37:45Z

Sorry for late update, @felixcheung Please help review.

felixcheung

looks good

felixcheung · 2017-04-28T07:50:39Z

spark/src/main/resources/python/zeppelin_pyspark.py

@@ -44,15 +49,59 @@ def flush(self):
 class PyZeppelinContext(dict):
  def __init__(self, zc):
    self.z = zc
+    self.max_result = 1000


make this from a interpreter property? I think there is a PR on something like this

This would be done in #2282, I will update this PR after #2282 is merged.

felixcheung · 2017-04-28T07:51:01Z

spark/src/main/resources/python/zeppelin_pyspark.py

+        body_buf.write(str(cell))
+      body_buf.write("\n")
+    body_buf.seek(0); header_buf.seek(0)
+    #TODO(bzz): fix it, so it shows red notice, as in Spark


what are we going to do with this?

Actually all the code of method show_dataframe are copied from python interpreter, I don't have much context about this piece of code. Ideally we should have all the logic in python interpreter, and PySparkInterprete rshould just reuse or extend python interpreter. But this would be a large follow up ticket.

felixcheung · 2017-04-28T07:52:03Z

spark/src/main/resources/python/zeppelin_pyspark.py

    else:
      print(str(obj))

+  def show_dataframe(self, df, show_index=False):
+    """Pretty prints DF using Table Display System


how much of this overlap with z.showData? should we abstract out a method in zeppelin context to do pretty print

felixcheung · 2017-04-28T07:52:37Z

spark/src/main/resources/python/zeppelin_pyspark.py

+        body_buf.write("\t")
+        body_buf.write(str(cell))
+      body_buf.write("\n")
+    body_buf.seek(0); header_buf.seek(0)


nit: style - new line, don't use ;

zjffdu force-pushed the ZEPPELIN-1903 branch from 2f82a7c to 7e8521a Compare January 5, 2017 03:23

zjffdu changed the title ~~ZEPPELIN-1903. ZeppelinContext can not display pandas DataFrame in PySparkInterpreter~~ [WIP] ZEPPELIN-1903. ZeppelinContext can not display pandas DataFrame in PySparkInterpreter Jan 5, 2017

zjffdu force-pushed the ZEPPELIN-1903 branch 2 times, most recently from 84ec048 to 1ee3c60 Compare January 6, 2017 03:27

zjffdu closed this Jan 6, 2017

zjffdu reopened this Jan 6, 2017

zjffdu changed the title ~~[WIP] ZEPPELIN-1903. ZeppelinContext can not display pandas DataFrame in PySparkInterpreter~~ ZEPPELIN-1903. ZeppelinContext can not display pandas DataFrame in PySparkInterpreter Jan 6, 2017

felixcheung reviewed Jan 20, 2017

View reviewed changes

zjffdu force-pushed the ZEPPELIN-1903 branch from 1ee3c60 to edfbf20 Compare April 28, 2017 03:41

zjffdu changed the title ~~ZEPPELIN-1903. ZeppelinContext can not display pandas DataFrame in PySparkInterpreter~~ [WIP] ZEPPELIN-1903. ZeppelinContext can not display pandas DataFrame in PySparkInterpreter Apr 28, 2017

ZEPPELIN-1903. ZeppelinContext can not display pandas DataFrame in Py…

a16a1cf

…SparkInterpreter

zjffdu force-pushed the ZEPPELIN-1903 branch 2 times, most recently from 6a922e2 to f7977c0 Compare April 28, 2017 05:06

install pandas in travis

3abd77e

zjffdu force-pushed the ZEPPELIN-1903 branch from f7977c0 to 3abd77e Compare April 28, 2017 06:38

zjffdu changed the title ~~[WIP] ZEPPELIN-1903. ZeppelinContext can not display pandas DataFrame in PySparkInterpreter~~ ZEPPELIN-1903. ZeppelinContext can not display pandas DataFrame in PySparkInterpreter Apr 28, 2017

felixcheung reviewed Apr 28, 2017

View reviewed changes

asfgit force-pushed the master branch from b11d355 to 3712ce6 Compare May 9, 2018 05:45

zjffdu closed this May 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZEPPELIN-1903. ZeppelinContext can not display pandas DataFrame in PySparkInterpreter #1839

ZEPPELIN-1903. ZeppelinContext can not display pandas DataFrame in PySparkInterpreter #1839

zjffdu commented Jan 4, 2017

zjffdu commented Jan 4, 2017

zjffdu commented Jan 5, 2017

felixcheung Jan 20, 2017

felixcheung Jan 20, 2017 •

edited

zjffdu commented Apr 28, 2017

felixcheung left a comment

felixcheung Apr 28, 2017

zjffdu Apr 28, 2017

felixcheung Apr 28, 2017

zjffdu Apr 28, 2017

felixcheung Apr 28, 2017

felixcheung Apr 28, 2017

ZEPPELIN-1903. ZeppelinContext can not display pandas DataFrame in PySparkInterpreter #1839

ZEPPELIN-1903. ZeppelinContext can not display pandas DataFrame in PySparkInterpreter #1839

Conversation

zjffdu commented Jan 4, 2017

What is this PR for?

What type of PR is it?

Todos

What is the Jira issue?

How should this be tested?

Screenshots (if appropriate)

Questions:

zjffdu commented Jan 4, 2017

zjffdu commented Jan 5, 2017

felixcheung Jan 20, 2017

Choose a reason for hiding this comment

felixcheung Jan 20, 2017 • edited

Choose a reason for hiding this comment

zjffdu commented Apr 28, 2017

felixcheung left a comment

Choose a reason for hiding this comment

felixcheung Apr 28, 2017

Choose a reason for hiding this comment

zjffdu Apr 28, 2017

Choose a reason for hiding this comment

felixcheung Apr 28, 2017

Choose a reason for hiding this comment

zjffdu Apr 28, 2017

Choose a reason for hiding this comment

felixcheung Apr 28, 2017

Choose a reason for hiding this comment

felixcheung Apr 28, 2017

Choose a reason for hiding this comment

felixcheung Jan 20, 2017 •

edited