Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20360][PYTHON] reprs for interpreters #17662

Closed
wants to merge 4 commits into from
Closed

Conversation

rgbkrk
Copy link
Contributor

@rgbkrk rgbkrk commented Apr 17, 2017

What changes were proposed in this pull request?

Establishes a very minimal _repr_html_ for PySpark's SparkContext.

How was this patch tested?

nteract:

screen shot 2017-04-17 at 3 41 29 pm

Jupyter:

screen shot 2017-04-17 at 3 53 19 pm

Hydrogen:

screen shot 2017-04-17 at 3 49 55 pm

@holdenk
Copy link
Contributor

holdenk commented Apr 17, 2017

Thanks for working on this @rgbkrk, having better IJupyter/nteract support seems useful.
Jenkins, OK to test.

@rgbkrk
Copy link
Contributor Author

rgbkrk commented Apr 17, 2017

I'm tempted to switch this to <p> tags for the header, based on how the hydrogen styling comes out by default. It then looks like this:

screen shot 2017-04-17 at 2 26 51 pm

Should I be tearing the $ off the front of the version string that's handed back? What's canonical, what's good for the user?

@rgbkrk
Copy link
Contributor Author

rgbkrk commented Apr 17, 2017

Nevermind what I just said, I was context switching between working on javascript and python -- I introduced the $ in the format string. Silly me.

@rgbkrk rgbkrk force-pushed the repr branch 2 times, most recently from 4feb079 to 25f0ce6 Compare April 17, 2017 21:37
@felixcheung
Copy link
Member

cool, should we add something similar to SparkSession?

@rgbkrk
Copy link
Contributor Author

rgbkrk commented Apr 17, 2017

Happy to! What would you like to see?

@rgbkrk
Copy link
Contributor Author

rgbkrk commented Apr 17, 2017

Should it just defer to the session's spark context?

@felixcheung
Copy link
Member

for starter, version and spark UI like this
I'm sure we could add appname, session catalog current database, or conf like spark.sql.catalogImplementation

ok, maybe not too many things :)

</div>
""".format(
spark_version=self.version,
spark_ui_url=self.uiWebUrl,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this change @rgbkrk. I am kind of newbee to python, would like to understand the significance of "," at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're allowed to have trailing commas in Python. It's purely aesthetic -- the primary benefit is for the next coder. Every line has the same format and adding entries to the end is the same as adding them to the middle.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @rgbkrk. That make sense. The reason for asking is method repr which you have added don't have the "," at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the __repr__ declaration now has matching code style.

@rgbkrk
Copy link
Contributor Author

rgbkrk commented Apr 17, 2017

Went ahead and put master and appName, since those seemed sensible for knowing if you were on yarn, local[*], etc.

screen shot 2017-04-17 at 3 41 29 pm

@rgbkrk
Copy link
Contributor Author

rgbkrk commented Apr 17, 2017

After this PR I'll be more than happy to do some representations for RDDs, Spark DataFrames, and the whole lot. Keep encouraging me and you'll have a happy contributor. 😄

@@ -221,6 +221,9 @@ def __init__(self, sparkContext, jsparkSession=None):
or SparkSession._instantiatedSession._sc._jsc is None:
SparkSession._instantiatedSession = self

def _repr_html_(self):
return self.sparkContext._repr_html_()
Copy link
Contributor

@holdenk holdenk Apr 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @felixcheung suggested I think it might make sense to include some extra Spark SQL specific things.

I think the catalog implementation type here (can be very useful for understanding why hive UDFs are not working) (which you can get from the session config with the spark.sql.catalogImplementation key) and the current database (which you can get from the catalog object from the session). The catalog implementation is especially useful since we currently do a "fallback" from hive supported to non-hive supported and the user might not have noticed if they launched in Jupyter where the log messages are a bit more obscure -- something I've been meaning to work on in #17298 but I've gotten a bit distracted).

It might also make sense to return a different URL link (e.g. to the SQL page rather than the default page which takes people to the Jobs section) but this is minor and likely less useful than the other things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've set the catalogImplementation here now. This seems like something we could put in the SparkContext HTML repr as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also make sense to return a different URL link (e.g. to the SQL page rather than the default page which takes people to the Jobs section) but this is minor and likely less useful than the other things.

I don't know enough internals here to tell what URL(s) you'd want here. What properties or calls should I make for the SQL page?

@holdenk
Copy link
Contributor

holdenk commented Apr 17, 2017

Thanks for working on this @rgbkrk , I have a few minor suggestions for the SparkSession representation to be a bit clearer but it looks useful already :)

@holdenk
Copy link
Contributor

holdenk commented Apr 18, 2017

I like it, I'm going to leave this a for a bit and see if anyone has any comments overnight :)

@rgbkrk rgbkrk changed the title [SPARK-20360][PYTHON] Create _repr_html_ for SparkContext [SPARK-20360][PYTHON] reprs for interpreters Apr 18, 2017
@holdenk
Copy link
Contributor

holdenk commented Apr 18, 2017

This looks good to me, since we've just cut the 2.2 branch but not yet built an RC & this PR is already outstanding, I'm going to merge this into master & 2.2 so we can have some faster feedback on this (and maybe pique some other peoples interest).

asfgit pushed a commit that referenced this pull request Apr 18, 2017
## What changes were proposed in this pull request?

Establishes a very minimal `_repr_html_` for PySpark's `SparkContext`.

## How was this patch tested?

nteract:

![screen shot 2017-04-17 at 3 41 29 pm](https://cloud.githubusercontent.com/assets/836375/25107701/d57090ba-2385-11e7-8147-74bc2c50a41b.png)

Jupyter:

![screen shot 2017-04-17 at 3 53 19 pm](https://cloud.githubusercontent.com/assets/836375/25107725/05bf1fe8-2386-11e7-93e1-07a20c917dde.png)

Hydrogen:

![screen shot 2017-04-17 at 3 49 55 pm](https://cloud.githubusercontent.com/assets/836375/25107664/a75e1ddc-2385-11e7-8477-258661833007.png)

Author: Kyle Kelley <rgbkrk@gmail.com>

Closes #17662 from rgbkrk/repr.

(cherry picked from commit f654b39)
Signed-off-by: Holden Karau <holden@us.ibm.com>
@asfgit asfgit closed this in f654b39 Apr 18, 2017
@rgbkrk rgbkrk deleted the repr branch April 18, 2017 19:40
peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
## What changes were proposed in this pull request?

Establishes a very minimal `_repr_html_` for PySpark's `SparkContext`.

## How was this patch tested?

nteract:

![screen shot 2017-04-17 at 3 41 29 pm](https://cloud.githubusercontent.com/assets/836375/25107701/d57090ba-2385-11e7-8147-74bc2c50a41b.png)

Jupyter:

![screen shot 2017-04-17 at 3 53 19 pm](https://cloud.githubusercontent.com/assets/836375/25107725/05bf1fe8-2386-11e7-93e1-07a20c917dde.png)

Hydrogen:

![screen shot 2017-04-17 at 3 49 55 pm](https://cloud.githubusercontent.com/assets/836375/25107664/a75e1ddc-2385-11e7-8477-258661833007.png)

Author: Kyle Kelley <rgbkrk@gmail.com>

Closes apache#17662 from rgbkrk/repr.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants