Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20599][SS] ConsoleSink should work with (batch) #18347

Closed
wants to merge 6 commits into from

Conversation

Projects
None yet
4 participants
@lubozhan
Copy link
Contributor

commented Jun 19, 2017

What changes were proposed in this pull request?

Currently, if we read a batch and want to display it on the console sink, it will lead a runtime exception.

Changes:

  • In this PR, we add a match rule to check whether it is a ConsoleSinkProvider, we will display the Dataset
    if using console format.

How was this patch tested?

spark.read.schema().json(path).write.format("console").save

Lubo Zhang added some commits Jun 15, 2017

Lubo Zhang
Lubo Zhang
@@ -465,6 +465,8 @@ case class DataSource(
providingClass.newInstance() match {
case dataSource: CreatableRelationProvider =>
SaveIntoDataSourceCommand(data, dataSource, caseInsensitiveOptions, mode)
case dataSource: ConsoleSinkProvider =>

This comment has been minimized.

Copy link
@jaceklaskowski

jaceklaskowski Jun 19, 2017

Contributor

Underscore dataSource since it's not used.

@@ -465,6 +465,8 @@ case class DataSource(
providingClass.newInstance() match {
case dataSource: CreatableRelationProvider =>
SaveIntoDataSourceCommand(data, dataSource, caseInsensitiveOptions, mode)
case dataSource: ConsoleSinkProvider =>
data.show(data.count().toInt, false)

This comment has been minimized.

Copy link
@jaceklaskowski

jaceklaskowski Jun 19, 2017

Contributor

ConsoleSink has two options that could be used here -- numRows and truncate.

This comment has been minimized.

Copy link
@lubozhan

lubozhan Jun 19, 2017

Author Contributor

Sorry for late reply.
Yes, it is right to use underscore since dataSource is not used. Considering it is no need to create a new ConsoleSink and no access to the private variable, i will use caseInsensitiveOptions instead to extract the numRows and truncate,
Thanks for your comment.

@zsxwing

This comment has been minimized.

Copy link
Member

commented Jun 19, 2017

It's better to make ConsoleSinkProvider implement CreatableRelationProvider instead.

@lubozhan

This comment has been minimized.

Copy link
Contributor Author

commented Jun 21, 2017

Agree with your comment. i will deal with this case in ConsoleSinkProvider.

@@ -51,7 +53,15 @@ class ConsoleSink(options: Map[String, String]) extends Sink with Logging {
}
}

class ConsoleSinkProvider extends StreamSinkProvider with DataSourceRegister {
case class ConsoleRelation(Context: SQLContext, data: DataFrame) extends BaseRelation {

This comment has been minimized.

Copy link
@zsxwing

zsxwing Jun 21, 2017

Member

nit: you can use

case class ConsoleRelation(override val sqlContext: SQLContext, data: DataFrame) extends BaseRelation {
  override def schema: StructType = data.schema
}
// Truncate the displayed data if it is too long, by default it is true
val isTruncated = parameters.get("truncate").map(_.toBoolean).getOrElse(true)

data.sparkSession.createDataFrame(

This comment has been minimized.

Copy link
@zsxwing

zsxwing Jun 21, 2017

Member

You can just call data.showInternal(numRowsToShow, isTruncated). This is a hack in ConsoleSink to avoid using a wrong planner. That's not a problem in the batch DataFrames.

@zsxwing

This comment has been minimized.

Copy link
Member

commented Jun 21, 2017

ok to test.

@zsxwing

This comment has been minimized.

Copy link
Member

commented Jun 21, 2017

Looks pretty good. Left some comments.

@SparkQA

This comment has been minimized.

Copy link

commented Jun 21, 2017

Test build #78400 has finished for PR 18347 at commit 76d7923.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ConsoleRelation(Context: SQLContext, data: DataFrame) extends BaseRelation
@SparkQA

This comment has been minimized.

Copy link

commented Jun 22, 2017

Test build #78424 has started for PR 18347 at commit 5851d22.

@SparkQA

This comment has been minimized.

Copy link

commented Jun 22, 2017

Test build #78451 has finished for PR 18347 at commit 942433e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@zsxwing

This comment has been minimized.

Copy link
Member

commented Jun 22, 2017

LGTM. Merging to master. Thanks!

@zsxwing

This comment has been minimized.

Copy link
Member

commented Jun 22, 2017

@lubozhan do you have an JIRA account?

@asfgit asfgit closed this in e55a105 Jun 22, 2017

@lubozhan

This comment has been minimized.

Copy link
Contributor Author

commented Jun 23, 2017

@zsxwing Yes, my JIRA user name is Lubo Zhang

robert3005 pushed a commit to palantir/spark that referenced this pull request Jun 29, 2017

Lubo Zhang Robert Kruszewski
[SPARK-20599][SS] ConsoleSink should work with (batch)
## What changes were proposed in this pull request?

Currently, if we read a batch and want to display it on the console sink, it will lead a runtime exception.

Changes:

- In this PR, we add a match rule to check whether it is a ConsoleSinkProvider, we will display the Dataset
 if using console format.

## How was this patch tested?

spark.read.schema().json(path).write.format("console").save

Author: Lubo Zhang <lubo.zhang@intel.com>
Author: lubozhan <lubo.zhang@intel.com>

Closes apache#18347 from lubozhan/dev.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.