-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZEPPELIN-1903. ZeppelinContext can not display pandas DataFrame in PySparkInterpreter #1839
Conversation
@bzz @Leemoonsoo @felixcheung Please help review |
CI is failed, working on it. |
84ec048
to
1ee3c60
Compare
self._displayhook = lambda *args: None | ||
|
||
def show(self, obj): | ||
from pyspark.sql import DataFrame | ||
if isinstance(obj, DataFrame): | ||
print(gateway.jvm.org.apache.zeppelin.spark.ZeppelinContext.showDF(self.z, obj._jdf)) | ||
elif type(obj).__name__ == "DataFrame": # does not play well with sub-classes | ||
# `isinstance(obj, DataFrame)` would req `import pandas.core.frame.DataFrame` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, why is that the case, if pandas is not imported here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I think I figure out what you mean.
How about adding the check in a method, something like
def isPandas(obj):
try:
from pandas.core.frame import DataFrame
return isinstance(obj, DataFrame)
except ImportError:
return false
6a922e2
to
f7977c0
Compare
Sorry for late update, @felixcheung Please help review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
@@ -44,15 +49,59 @@ def flush(self): | |||
class PyZeppelinContext(dict): | |||
def __init__(self, zc): | |||
self.z = zc | |||
self.max_result = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this from a interpreter property? I think there is a PR on something like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
body_buf.write(str(cell)) | ||
body_buf.write("\n") | ||
body_buf.seek(0); header_buf.seek(0) | ||
#TODO(bzz): fix it, so it shows red notice, as in Spark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are we going to do with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually all the code of method show_dataframe
are copied from python interpreter, I don't have much context about this piece of code. Ideally we should have all the logic in python interpreter, and PySparkInterprete
rshould just reuse or extend python interpreter. But this would be a large follow up ticket.
else: | ||
print(str(obj)) | ||
|
||
def show_dataframe(self, df, show_index=False): | ||
"""Pretty prints DF using Table Display System |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how much of this overlap with z.showData
? should we abstract out a method in zeppelin context
to do pretty print
body_buf.write("\t") | ||
body_buf.write(str(cell)) | ||
body_buf.write("\n") | ||
body_buf.seek(0); header_buf.seek(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: style - new line, don't use ;
What is this PR for?
I copy some code from
PythonInterpreter
toPySparkInterpreter
to enable display pandas DataFrame inPySparkInterpreter
. Ideally IMO all the features in PythonInterpreter should be available inPySparkInterpeter
.PySparkInterpreter
should be an extension of PythonInterpreter. After refactoring of PythonInterpreter is done, we can consider about it.What type of PR is it?
[Improvement]
Todos
What is the Jira issue?
How should this be tested?
Unit test is added and also manually tested.
Screenshots (if appropriate)
Questions: