Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated the Spark SQL Programming guide with Custom object encoding for Dataset and unsupported operation error handling #16997

Closed
wants to merge 5 commits into from

Conversation

HarshSharma8
Copy link

What changes were proposed in this pull request?

Made some updates to SQL programming guide to explain the Encoding operation with kryo.

How was this patch tested?

Just updated the docs.

Please review http://spark.apache.org/contributing.html before opening a pull request.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@@ -297,6 +297,9 @@ reflection and become the names of the columns. Case classes can also be nested
types such as `Seq`s or `Array`s. This RDD can be implicitly converted to a DataFrame and then be
registered as a table. Tables can be used in subsequent SQL statements.

Spark Encoders are used to convert a JVM object to Spark SQL representation. When we want to make a datase, Spark requires an encoder which takes the form Encoder[T] where T is the type we want to be encoded. When we try to create dataset with a custom type of object, then may result into <b>java.lang.UnsupportedOperationException: No Encoder found for Object-Name</b>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's minor, but there are enough problems with the text to call it out. Please match the voice of the other text and avoid 'we'. Typos: "datase", "spark sql" and "kryo" for example. Use back-ticks to consistently format code if you're going to. What is Object-Name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello srowen,
I have updated the content to match the void of the content, you can have another look at it.

@@ -297,6 +297,9 @@ reflection and become the names of the columns. Case classes can also be nested
types such as `Seq`s or `Array`s. This RDD can be implicitly converted to a DataFrame and then be
registered as a table. Tables can be used in subsequent SQL statements.

Spark Encoders are used to convert a JVM object to Spark SQL representation. To create dataset, spark requires an encoder which takes the form of <b>Encoder[T]</b> where <b>T</b> is the type which has to be encoded. Creation of a dataset with a custom type of object, may result into <b>java.lang.UnsupportedOperationException: No Encoder found for Object-Name</b>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is trivial.. but maybe spark -> Spark? I am not an expert in grammar but up to my knowledge, capitalizing a proper noun is correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @HarshSharma8 this still doesn't address the comments. Use back-ticks for code, not bold, too. What is Object-Name?

@HyukjinKwon
Copy link
Member

BTW, could we maybe make the title complete (not opera…)?

@HarshSharma8
Copy link
Author

HarshSharma8 commented Feb 21, 2017 via email

@HarshSharma8
Copy link
Author

HarshSharma8 commented Feb 21, 2017 via email

@srowen
Copy link
Member

srowen commented Feb 21, 2017

You are still bold-facing code elements, and now back-ticked a string, which isn't code. There are still typos like "create dataset" instead of "create a Dataset". Do you mean to write something to indicate a class name will be in the message? then write something like "[class name]". There is no object name here. Please review carefully before you ask for another review.

@HarshSharma8
Copy link
Author

I updated the content with a demo object. I would appreciate if anyone can have a look at this.

@HyukjinKwon
Copy link
Member

Could you fix the PR title too while you are online maybe? It might be nice to have a good title for both a commit log and those who like to track down the history.

@HarshSharma8 HarshSharma8 changed the title Updated the SQL programming guide to explain about the Encoding opera… Updated the Spark SQL Programming guide with Encoder class specifications and possible error handling Feb 21, 2017
@HarshSharma8 HarshSharma8 changed the title Updated the Spark SQL Programming guide with Encoder class specifications and possible error handling Updated the Spark SQL Programming guide with Custom object encoding for Dataset and unsupported operation error handling Feb 21, 2017
@HarshSharma8
Copy link
Author

Hello HyukjinKwon,
I have updated the title, i wish you like it, it shows what is there in the content. And commit has already been made.

@HarshSharma8
Copy link
Author

Did anyone get a chance to verify it or any changes required by me to make ?

@srowen
Copy link
Member

srowen commented Mar 5, 2017

This still has formatting and text problems. I'm sorry I don't think I can go around again for this when it's not an important change, and I'd like to close this.

@HarshSharma8
Copy link
Author

HarshSharma8 commented Mar 6, 2017 via email

srowen added a commit to srowen/spark that referenced this pull request Mar 22, 2017
@srowen srowen mentioned this pull request Mar 22, 2017
@asfgit asfgit closed this in b70c03a Mar 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants