Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29505][SQL] Make DESC EXTENDED <table name> <column name> case insensitive #26927

Conversation

PavithraRamachandran
Copy link
Contributor

What changes were proposed in this pull request?

While querying using desc , if column name is not entered exactly as per the column name given during the table creation, the colstats are wrong. fetching of col stats has been made case insensitive.

Why are the changes needed?

functions like analyze, etc support case insensitive retrieval of column data.

Does this PR introduce any user-facing change?

NO

How was this patch tested?

val colStats = catalogTable.stats.map(_.colStats).getOrElse(Map.empty)
val cs = colStats.get(field.name)
val colStats = catalogTable.stats.map(_.colStats.map{ case (key, value) => key.toLowerCase -> value}).getOrElse(Map.empty)
val cs = colStats.get(field.name.toLowerCase())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link this behaivour to SQLConf.caseSensitiveAnalysis?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu @dongjoon-hyun i have done the changes. could you kindly review?

sql("insert into customer values(2,'Ana','trujilo','Adva de la','Maxico D.F.',05021,'Maxico')")
sql("insert into customer values(3,'Antonio','Antonio Moreno','Mataderos 2312','Maxico D.F.',05023,'Maxico')")
sql("analyze table customer compute statistics for columns cname")
val expectedData= Seq(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add space befor =

@maropu
Copy link
Member

maropu commented Dec 17, 2019

ok to test

@SparkQA
Copy link

SparkQA commented Dec 17, 2019

Test build #115451 has finished for PR 26927 at commit cfb4939.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -720,8 +720,8 @@ case class DescribeColumnCommand(
}

val catalogTable = catalog.getTempViewOrPermanentTableMetadata(table)
val colStats = catalogTable.stats.map(_.colStats).getOrElse(Map.empty)
val cs = colStats.get(field.name)
val colStats = catalogTable.stats.map(_.colStats.map{ case (key, value) => key.toLowerCase -> value}).getOrElse(Map.empty)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @PavithraRamachandran .
Could you run dev/scalastyle and fix the errors?


test("SPARK-29505: desc columnname - case insensitive search") {
withTable("customer") {
sql(s"create table customer(id int, name String, CName String, address String, city String, pin int, country String)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s" -> ".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please rewrite like CREATE TABLE customer(id INT, ...).

test("SPARK-29505: desc columnname - case insensitive search") {
withTable("customer") {
sql(s"create table customer(id int, name String, CName String, address String, city String, pin int, country String)")
sql("insert into customer values(1,'Alfred','Maria','Obere Str 57','Berlin',12209,'Germany')")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INSERT INTO customer VALUES

sql("insert into customer values(1,'Alfred','Maria','Obere Str 57','Berlin',12209,'Germany')")
sql("insert into customer values(2,'Ana','trujilo','Adva de la','Maxico D.F.',05021,'Maxico')")
sql("insert into customer values(3,'Antonio','Antonio Moreno','Mataderos 2312','Maxico D.F.',05023,'Maxico')")
sql("analyze table customer compute statistics for columns cname")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ANALYZE TABLE customer COMPUTE STATISTICS FOR COLUMNS cname

@SparkQA
Copy link

SparkQA commented Dec 18, 2019

Test build #115484 has finished for PR 26927 at commit 453e8df.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 18, 2019

Test build #115487 has finished for PR 26927 at commit 4b007e4.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 18, 2019

Test build #115488 has finished for PR 26927 at commit 9557a10.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 18, 2019

Test build #115492 has finished for PR 26927 at commit af0336f.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 18, 2019

Test build #115499 has finished for PR 26927 at commit 84b64fd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 18, 2019

Test build #115501 has finished for PR 26927 at commit f3792d9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29505][SQL] desc extended <table name> <column name> to be case insensitive [SPARK-29505][SQL] Make DESC EXTENDED <table name> <column name> case insensitive Dec 18, 2019
catalogTable.stats.map(_.colStats.map {
case (key, value) => key.toLowerCase(Locale.ROOT) -> value
}).getOrElse(Map.empty)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we remove this empty line?

} else {
fieldName.toLowerCase(Locale.ROOT)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use the following format? (We use the following style in SessionCatalog.scala and Analyzer.scala.)

protected def formatColumnName(name: String): String = {
  if (conf.caseSensitiveAnalysis) name else name.toLowerCase(Locale.ROOT)
}

sql("INSERT INTO customer VALUES(2,'Ana','trujilo','Adva de la'," +
"'Maxico D.F.',05021,'Maxico')")
sql("INSERT INTO customer VALUES(3,'Antonio','Antonio Moreno','Mataderos 2312'," +
"'Maxico D.F.',05023,'Maxico')")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to populate all data? It seems that we only use CName and the other are ignored completely.

@@ -3336,6 +3336,35 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession {
checkAnswer(df5, Array.empty[Row])
}
}

test("SPARK-29505: desc columnname - case insensitive search") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try to move this test to describe-table-column.sql? You can use SQL there. Please refer SQLQueryTestSuite for how to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the move.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun and @maropu i have addressed the review comments could you kindly verify?

@SparkQA
Copy link

SparkQA commented Dec 19, 2019

Test build #115564 has finished for PR 26927 at commit 27d51c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 19, 2019

Test build #115567 has finished for PR 26927 at commit 7e85a34.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val colStats = catalogTable.stats.map(_.colStats).getOrElse(Map.empty)
val cs = colStats.get(field.name)
val colStats = getColStats(catalogTable)
val cs = colStats.get(getColumnName(field.name))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just simply writing it like this?


    val catalogStats = catalog.getTempViewOrPermanentTableMetadata(table).stats
    val cs = if (conf.caseSensitiveAnalysis) {
      catalogStats.flatMap { cs => cs.colStats.get(field.name) }
    } else {
      catalogStats.flatMap { cs =>
        cs.colStats.map { case (k, v) => (k.toLowerCase(Locale.ROOT), v) }
          .get(field.name.toLowerCase(Locale.ROOT))
      }
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reworked on the change suggested. Could you review?

@SparkQA
Copy link

SparkQA commented Dec 20, 2019

Test build #115597 has finished for PR 26927 at commit afe659c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@PavithraRamachandran
Copy link
Contributor Author

@maropu @dongjoon-hyun could you kindly review the changes? I have reworked on the comments

case (key, value) => key.toLowerCase(Locale.ROOT) -> value
}).getOrElse(Map.empty)
}
val cs = colStats.get(getColumnName(field.name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just do:

val colStatsMap = catalogTable.stats.map(_.colStats).getOrElse(Map.empty)
val colStats = if (conf.caseSensitiveAnalysis) colStatsMap else CaseInsensitiveMap(colStatsMap)

? I think this will fix all problems here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, I totally forgot CaseInsensitiveMap... nice suggestion.

@SparkQA
Copy link

SparkQA commented Dec 24, 2019

Test build #115734 has finished for PR 26927 at commit ac5b097.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…ive manner.Converting the column name which is the key to lower case and searching using lower case whatever may be the input value.
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good if tests pass

@SparkQA
Copy link

SparkQA commented Dec 24, 2019

Test build #115736 has finished for PR 26927 at commit 096d2e3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 24, 2019

Test build #115738 has finished for PR 26927 at commit 5ff17ae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu closed this in 57ca952 Dec 24, 2019
@maropu
Copy link
Member

maropu commented Dec 24, 2019

Thanks! Merged to master.

fqaiser94 pushed a commit to fqaiser94/spark that referenced this pull request Mar 30, 2020
… insensitive

### What changes were proposed in this pull request?
While querying using **desc** , if column name is not entered exactly as per the column name given during the table creation, the colstats are wrong. fetching of col stats has been made case insensitive.

### Why are the changes needed?
functions like **analyze**, etc support case insensitive retrieval of column data.

### Does this PR introduce any user-facing change?
NO

### How was this patch tested?
<!--
Unit test has been rewritten and tested.

Closes apache#26927 from PavithraRamachandran/desc_caseinsensitive.

Authored-by: Pavithra Ramachandran <pavi.rams@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants