Add doc for exchanging data frames #1677

m30m · 2016-11-24T16:07:50Z

What is this PR for?

ZeppelinContext can be used to exchange DataFrames but there are some nasty tricks and typecasts.
It's good to provide some examples.

What type of PR is it?

Documentation

Questions:

Does the licenses files need update? no
Is there breaking changes for older versions? no
Does this needs documentation? no

ZeppelinContext can be used to exchange DataFrames but there are some nasty tricks and typecasts. It's good to provide some examples.

Leemoonsoo · 2016-11-24T17:35:17Z

@m30m Awesome!

LGTM and merge to master if there're no more comments.

zjffdu · 2016-11-25T00:18:10Z

Should we do it implicitly for user in ZeppelinContext? Because I feel the syntax is not easy to understand if user don't know the internal implementation of pyspark. And I think we should not expose such internal things to users.

z.put("myPythonDataFrame", postsDf._jdf)

m30m · 2016-11-25T06:09:06Z

It's not possible to put the DataFrame directly because of this error:

  File "/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1124, in __call__
args_command, temp_args = self._build_args(*args)

  File "/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1094, in _build_args
    [get_command_part(arg, self.pool) for arg in new_args])

  File "/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 289, in get_command_part
    command_part = REFERENCE_TYPE + parameter._get_object_id()

  File "/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/dataframe.py", line 841, in __getattr__
    "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))

AttributeError: 'DataFrame' object has no attribute '_get_object_id'

zjffdu · 2016-11-25T06:47:43Z

I mean we can internally do this in PyZeppelinContext as following:

def __setitem__(self, key, item):
    if isinstance(item, DataFrame):
       self.z.put(key, item._jdf)
    else:
       self.z.put(key, item)

m30m · 2016-11-25T06:51:52Z

Yes, that's a good idea. Shall I add a commit to this branch?

zjffdu · 2016-11-25T06:57:39Z

Yes, and you also need to update method __getitem__ so that user don't need to construct DataFrame as following. z.get("myScalaDataFrame") should return DataFrame directly

myScalaDataFrame = DataFrame(z.get("myScalaDataFrame"), sqlContext)

felixcheung · 2016-11-25T19:27:49Z

Let's keep this as documentation only and let's open a JIRA (another PR) for the DataFrame support?

zjffdu · 2016-11-26T00:23:01Z

If we want to support the feature I mentioned I above in another PR, then the document here is useless because we have to update the doc later. So it would be better to do it in this PR IMHO.

felixcheung · 2016-11-26T00:42:03Z

well, it's a lot quicker to get doc-only PR in :)
besides we should have a JIRA for changes like this. It's your call, @m30m

m30m · 2016-11-26T15:03:58Z

I'm not sure whether it's a good idea to hide this complexity in a special way and I should check whether these changes are backward compatible. So I guess a doc-only PR, with a JIRA issue afterwards to handle some spark special types is a better solution.

Leemoonsoo · 2016-11-30T04:58:32Z

Merge to master if there're no further discussions

Add doc for exchanging data frames

a039d5c

ZeppelinContext can be used to exchange DataFrames but there are some nasty tricks and typecasts. It's good to provide some examples.

asfgit closed this in 7d878f7 Dec 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add doc for exchanging data frames #1677

Add doc for exchanging data frames #1677

m30m commented Nov 24, 2016

Leemoonsoo commented Nov 24, 2016

zjffdu commented Nov 25, 2016

m30m commented Nov 25, 2016

zjffdu commented Nov 25, 2016

m30m commented Nov 25, 2016

zjffdu commented Nov 25, 2016 •

edited

felixcheung commented Nov 25, 2016

zjffdu commented Nov 26, 2016

felixcheung commented Nov 26, 2016

m30m commented Nov 26, 2016

Leemoonsoo commented Nov 30, 2016

Add doc for exchanging data frames #1677

Add doc for exchanging data frames #1677

Conversation

m30m commented Nov 24, 2016

What is this PR for?

What type of PR is it?

Questions:

Leemoonsoo commented Nov 24, 2016

zjffdu commented Nov 25, 2016

m30m commented Nov 25, 2016

zjffdu commented Nov 25, 2016

m30m commented Nov 25, 2016

zjffdu commented Nov 25, 2016 • edited

felixcheung commented Nov 25, 2016

zjffdu commented Nov 26, 2016

felixcheung commented Nov 26, 2016

m30m commented Nov 26, 2016

Leemoonsoo commented Nov 30, 2016

zjffdu commented Nov 25, 2016 •

edited