Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23058][SQL] Show non printable field delim as unicode #20248

Closed
wants to merge 2 commits into from
Closed

[SPARK-23058][SQL] Show non printable field delim as unicode #20248

wants to merge 2 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Jan 12, 2018

What changes were proposed in this pull request?

Create a table with non printable delim like below:

CREATE EXTERNAL TABLE t1(col1 bigint)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'field.delim' = '\177',
  'serialization.format' = '\003'
)
STORED AS
  INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION 'file:/tmp/t1';

When show create table t1. Before this PR::

CREATE EXTERNAL TABLE `t1`(`col1` bigint)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'field.delim' = '',
  'serialization.format' = ''
)
STORED AS
  INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION 'file:/tmp/t1'
TBLPROPERTIES (
  'transient_lastDdlTime' = '1515766958'
)

After this PR:

CREATE EXTERNAL TABLE `t1`(`col1` bigint)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'field.delim' = '\u007F',
  'serialization.format' = '\u0003'
)
STORED AS
  INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION 'file:/tmp/t1'
TBLPROPERTIES (
  'transient_lastDdlTime' = '1516066260'
)

This PR show non printable field delim as unicode when show create table ... and we can recreate table use this message now.

How was this patch tested?

unit tests

@wangyum
Copy link
Member Author

wangyum commented Jan 12, 2018

Non printable characters:
non-printable

@@ -1023,7 +1023,12 @@ case class ShowCreateTableCommand(table: TableIdentifier) extends RunnableComman

val serdeProps = metadata.storage.properties.map {
case (key, value) =>
s"'${escapeSingleQuotedString(key)}' = '${escapeSingleQuotedString(value)}'"
val escapedValue = if (value.length == 1 && (value.head < 32 || value.head > 126)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want Character.isISOControl here. But this is a bit hacky as you're here hard-coding assumptions about the encoding. Why print non-printable chars, and why octal? what goes wrong?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to copy an external table to another environment, but lost the create table statement. So I want to get this create table statement by show create table ..., but it can't show non printable field delim.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so when the same properties are specified, they can be specified as an octal escape sequence. That makes sense to render it back that way. I'd still use isISOControl for better generality.

@SparkQA
Copy link

SparkQA commented Jan 12, 2018

Test build #86040 has finished for PR 20248 at commit d44f242.

  • This patch fails from timeout after a configured wait of `250m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum wangyum changed the title [SPARK-23058][SQL] Fix non printable field delim issue [SPARK-23058][SQL] Show non printable field delim as unicode Jan 13, 2018
@SparkQA
Copy link

SparkQA commented Jan 13, 2018

Test build #86081 has finished for PR 20248 at commit edf5fa6.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Jan 13, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Jan 13, 2018

Test build #86088 has finished for PR 20248 at commit edf5fa6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Jan 13, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Jan 13, 2018

Test build #86095 has finished for PR 20248 at commit edf5fa6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum wangyum closed this Jul 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants