New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29505][SQL] Make DESC EXTENDED <table name> <column name> case insensitive #26927
[SPARK-29505][SQL] Make DESC EXTENDED <table name> <column name> case insensitive #26927
Conversation
val colStats = catalogTable.stats.map(_.colStats).getOrElse(Map.empty) | ||
val cs = colStats.get(field.name) | ||
val colStats = catalogTable.stats.map(_.colStats.map{ case (key, value) => key.toLowerCase -> value}).getOrElse(Map.empty) | ||
val cs = colStats.get(field.name.toLowerCase()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you link this behaivour to SQLConf.caseSensitiveAnalysis
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu @dongjoon-hyun i have done the changes. could you kindly review?
sql("insert into customer values(2,'Ana','trujilo','Adva de la','Maxico D.F.',05021,'Maxico')") | ||
sql("insert into customer values(3,'Antonio','Antonio Moreno','Mataderos 2312','Maxico D.F.',05023,'Maxico')") | ||
sql("analyze table customer compute statistics for columns cname") | ||
val expectedData= Seq( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add space befor =
ok to test |
Test build #115451 has finished for PR 26927 at commit
|
@@ -720,8 +720,8 @@ case class DescribeColumnCommand( | |||
} | |||
|
|||
val catalogTable = catalog.getTempViewOrPermanentTableMetadata(table) | |||
val colStats = catalogTable.stats.map(_.colStats).getOrElse(Map.empty) | |||
val cs = colStats.get(field.name) | |||
val colStats = catalogTable.stats.map(_.colStats.map{ case (key, value) => key.toLowerCase -> value}).getOrElse(Map.empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @PavithraRamachandran .
Could you run dev/scalastyle
and fix the errors?
|
||
test("SPARK-29505: desc columnname - case insensitive search") { | ||
withTable("customer") { | ||
sql(s"create table customer(id int, name String, CName String, address String, city String, pin int, country String)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s"
-> "
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please rewrite like CREATE TABLE customer(id INT, ...)
.
test("SPARK-29505: desc columnname - case insensitive search") { | ||
withTable("customer") { | ||
sql(s"create table customer(id int, name String, CName String, address String, city String, pin int, country String)") | ||
sql("insert into customer values(1,'Alfred','Maria','Obere Str 57','Berlin',12209,'Germany')") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
INSERT INTO customer VALUES
sql("insert into customer values(1,'Alfred','Maria','Obere Str 57','Berlin',12209,'Germany')") | ||
sql("insert into customer values(2,'Ana','trujilo','Adva de la','Maxico D.F.',05021,'Maxico')") | ||
sql("insert into customer values(3,'Antonio','Antonio Moreno','Mataderos 2312','Maxico D.F.',05023,'Maxico')") | ||
sql("analyze table customer compute statistics for columns cname") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ANALYZE TABLE customer COMPUTE STATISTICS FOR COLUMNS cname
Test build #115484 has finished for PR 26927 at commit
|
453e8df
to
4b007e4
Compare
Test build #115487 has finished for PR 26927 at commit
|
Test build #115488 has finished for PR 26927 at commit
|
Test build #115492 has finished for PR 26927 at commit
|
af0336f
to
84b64fd
Compare
Test build #115499 has finished for PR 26927 at commit
|
Test build #115501 has finished for PR 26927 at commit
|
catalogTable.stats.map(_.colStats.map { | ||
case (key, value) => key.toLowerCase(Locale.ROOT) -> value | ||
}).getOrElse(Map.empty) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we remove this empty line?
} else { | ||
fieldName.toLowerCase(Locale.ROOT) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we use the following format? (We use the following style in SessionCatalog.scala
and Analyzer.scala
.)
protected def formatColumnName(name: String): String = {
if (conf.caseSensitiveAnalysis) name else name.toLowerCase(Locale.ROOT)
}
sql("INSERT INTO customer VALUES(2,'Ana','trujilo','Adva de la'," + | ||
"'Maxico D.F.',05021,'Maxico')") | ||
sql("INSERT INTO customer VALUES(3,'Antonio','Antonio Moreno','Mataderos 2312'," + | ||
"'Maxico D.F.',05023,'Maxico')") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to populate all data? It seems that we only use CName
and the other are ignored completely.
@@ -3336,6 +3336,35 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession { | |||
checkAnswer(df5, Array.empty[Row]) | |||
} | |||
} | |||
|
|||
test("SPARK-29505: desc columnname - case insensitive search") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you try to move this test to describe-table-column.sql
? You can use SQL there. Please refer SQLQueryTestSuite
for how to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun and @maropu i have addressed the review comments could you kindly verify?
6f7c873
to
27d51c3
Compare
Test build #115564 has finished for PR 26927 at commit
|
Test build #115567 has finished for PR 26927 at commit
|
val colStats = catalogTable.stats.map(_.colStats).getOrElse(Map.empty) | ||
val cs = colStats.get(field.name) | ||
val colStats = getColStats(catalogTable) | ||
val cs = colStats.get(getColumnName(field.name)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just simply writing it like this?
val catalogStats = catalog.getTempViewOrPermanentTableMetadata(table).stats
val cs = if (conf.caseSensitiveAnalysis) {
catalogStats.flatMap { cs => cs.colStats.get(field.name) }
} else {
catalogStats.flatMap { cs =>
cs.colStats.map { case (k, v) => (k.toLowerCase(Locale.ROOT), v) }
.get(field.name.toLowerCase(Locale.ROOT))
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reworked on the change suggested. Could you review?
Test build #115597 has finished for PR 26927 at commit
|
@maropu @dongjoon-hyun could you kindly review the changes? I have reworked on the comments |
case (key, value) => key.toLowerCase(Locale.ROOT) -> value | ||
}).getOrElse(Map.empty) | ||
} | ||
val cs = colStats.get(getColumnName(field.name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just do:
val colStatsMap = catalogTable.stats.map(_.colStats).getOrElse(Map.empty)
val colStats = if (conf.caseSensitiveAnalysis) colStatsMap else CaseInsensitiveMap(colStatsMap)
? I think this will fix all problems here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, I totally forgot CaseInsensitiveMap
... nice suggestion.
Test build #115734 has finished for PR 26927 at commit
|
ac5b097
to
096d2e3
Compare
…ive manner.Converting the column name which is the key to lower case and searching using lower case whatever may be the input value.
096d2e3
to
5ff17ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good if tests pass
Test build #115736 has finished for PR 26927 at commit
|
Test build #115738 has finished for PR 26927 at commit
|
Thanks! Merged to master. |
… insensitive ### What changes were proposed in this pull request? While querying using **desc** , if column name is not entered exactly as per the column name given during the table creation, the colstats are wrong. fetching of col stats has been made case insensitive. ### Why are the changes needed? functions like **analyze**, etc support case insensitive retrieval of column data. ### Does this PR introduce any user-facing change? NO ### How was this patch tested? <!-- Unit test has been rewritten and tested. Closes apache#26927 from PavithraRamachandran/desc_caseinsensitive. Authored-by: Pavithra Ramachandran <pavi.rams@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
What changes were proposed in this pull request?
While querying using desc , if column name is not entered exactly as per the column name given during the table creation, the colstats are wrong. fetching of col stats has been made case insensitive.
Why are the changes needed?
functions like analyze, etc support case insensitive retrieval of column data.
Does this PR introduce any user-facing change?
NO
How was this patch tested?