[SPARK-12982][SQL] Add table name validation in temp table registration #10983

jayadevanmurali · 2016-01-29T12:08:50Z

Add the table name validation at the temp table creation

AmplabJenkins · 2016-01-29T12:12:13Z

Can one of the admins verify this patch?

hvanhovell · 2016-01-29T12:21:27Z

sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala

@@ -747,7 +747,7 @@ class SQLContext private[sql](
   * only during the lifetime of this instance of SQLContext.
   */
  private[sql] def registerDataFrameAsTable(df: DataFrame, tableName: String): Unit = {
-    catalog.registerTable(TableIdentifier(tableName), df.logicalPlan)
+    catalog.registerTable(SqlParser.parseTableIdentifier(tableName), df.logicalPlan)


We have removed SqlParser recently. Could you merge/rebase to the most recent master, and use the sqlParser variable for this?

Ok I can see the variable definition at line 211 of SqlContext.scala
@transient
protected[sql] val sqlParser = new SparkSQLParser(getSQLDialect().parse(_))

But this varable is not used anywhare.All methods use Sqlarser.parseTableIdentifier() for example @experimental
def createExternalTable(
tableName: String,
source: String,
options: Map[String, String]): DataFrame = {
val tableIdent = SqlParser.parseTableIdentifier(tableName)
val cmd =
CreateTableUsing(
tableIdent,
userSpecifiedSchema = None,
source,
temporary = false,
options,
allowExisting = false,
managedIfNoPath = false)
executePlan(cmd).toRdd
table(tableIdent)
}
And more over SparkSQLParser returns Logical plan, I need to get TableIdintefier similor to above method. So how can I use this?

hvanhovell · 2016-01-29T12:23:39Z

@jayadevanmurali I am not sure if the problem described in the JIRA is still an issue in the current master. Could you check this?

jayadevanmurali · 2016-01-30T08:57:19Z

@hvanhovell
I was able to replicate this in spark 2.0.0.

Steps
ayadevan@Satellite-L640:~/spark$ ./bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
____ __
/ / ___ / /
\ / _ / _ `/ __/ '/
// .__/,// //_\ version 2.0.0-SNAPSHOT
/_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
Type in expressions to have them evaluated.
Type :help for more information.
16/01/30 14:19:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/30 14:19:22 WARN Utils: Your hostname, Satellite-L640 resolves to a loopback address: 127.0.1.1; using 100.86.225.72 instead (on interface ppp0)
16/01/30 14:19:22 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Spark context available as sc (master = local[*], app id = local-1454143767817).
SQL context available as sqlContext.

scala> import org.apache.spark.sql.types.{StringType, StructField, StructType}
import org.apache.spark.sql.types.{StringType, StructField, StructType}

scala> import org.apache.spark.sql.{DataFrame, Row, SQLContext}
import org.apache.spark.sql.{DataFrame, Row, SQLContext}

scala> import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.{SparkContext, SparkConf}

scala> val rows = List(Row("foo"), Row("bar"));
rows: List[org.apache.spark.sql.Row] = List([foo], [bar])

scala> val schema = StructType(Seq(StructField("col", StringType)));
schema: org.apache.spark.sql.types.StructType = StructType(StructField(col,StringType,true))

scala> val rdd = sc.parallelize(rows);
rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = ParallelCollectionRDD[0] at parallelize at :32

scala> val df = sqlContext.createDataFrame(rdd, schema)
df: org.apache.spark.sql.DataFrame = [col: string]

scala> df.registerTempTable("t~")

scala> df.sqlContext.dropTempTable("t~")
java.lang.RuntimeException: [1.2] failure: ``.'' expected but`~' found

t~
^
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.catalyst.SqlParser$.parseTableIdentifier(SqlParser.scala:58)
at org.apache.spark.sql.SQLContext.table(SQLContext.scala:836)
at org.apache.spark.sql.SQLContext.dropTempTable(SQLContext.scala:763)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:39)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:44)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:46)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:48)
at $iwC$$iwC$$iwC$$iwC.(:50)
at $iwC$$iwC$$iwC.(:52)
at $iwC$$iwC.(:54)
at $iwC.(:56)
at (:58)
at .(:62)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1045)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1326)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:821)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:852)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:800)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1064)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

scala>

hvanhovell · 2016-01-30T10:23:20Z

You are using an older version of the master branch (last commit 25 days ago). Your version still has the org.apache.spark.sql.catalyst.SqlParser class. That has been removed since commit 7cd7f22.

Please update your master, and try again.

jayadevanmurali · 2016-01-31T04:42:23Z

Thanks @hvanhovell , Got your point. I updated my code and repeat the steps. I was able to replicate this. Please check the steps

jayadevan@Satellite-L640:~/spark$ ./bin/spark-shell
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/01/31 09:27:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/31 09:27:57 WARN Utils: Your hostname, Satellite-L640 resolves to a loopback address: 127.0.1.1, but we couldn't find any external IP address!
16/01/31 09:27:57 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Spark context available as sc (master = local[*], app id = local-1454212680541).
SQL context available as sqlContext.
Welcome to
____ __
/ / ___ / /
\ / _ / _ `/ __/ '/
// .__/,// //_\ version 2.0.0-SNAPSHOT
/_/

Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.apache.spark.sql.types.{StringType, StructField, StructType}
import org.apache.spark.sql.types.{StringType, StructField, StructType}

scala> import org.apache.spark.sql.{DataFrame, Row, SQLContext}
import org.apache.spark.sql.{DataFrame, Row, SQLContext}

scala> import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.{SparkContext, SparkConf}

scala> val rows = List(Row("foo"), Row("bar"));
rows: List[org.apache.spark.sql.Row] = List([foo], [bar])

scala> val schema = StructType(Seq(StructField("col", StringType)));
schema: org.apache.spark.sql.types.StructType = StructType(StructField(col,StringType,true))

scala> val rdd = sc.parallelize(rows);
rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = ParallelCollectionRDD[0] at parallelize at :29

scala> val df = sqlContext.createDataFrame(rdd, schema)
df: org.apache.spark.sql.DataFrame = [col: string]

scala> df.registerTempTable("t~")

scala> df.sqlContext.dropTempTable("t~")
org.apache.spark.sql.AnalysisException: NoViableAltException(327@[209:20: ( DOT id2= identifier )?])
; line 1 pos 1
at org.apache.spark.sql.catalyst.parser.ParseErrorReporter.throwError(ParseDriver.scala:158)
at org.apache.spark.sql.catalyst.parser.ParseErrorReporter.throwError(ParseDriver.scala:147)
at org.apache.spark.sql.catalyst.parser.ParseDriver$.parse(ParseDriver.scala:95)
at org.apache.spark.sql.catalyst.parser.ParseDriver$.parseTableName(ParseDriver.scala:42)
at org.apache.spark.sql.catalyst.CatalystQl.parseTableIdentifier(CatalystQl.scala:81)
at org.apache.spark.sql.SQLContext.table(SQLContext.scala:811)
at org.apache.spark.sql.SQLContext.dropTempTable(SQLContext.scala:738)
... 49 elided

So I will close this pull request and raise a new one.

hvanhovell · 2016-01-31T08:41:50Z

@jayadevanmurali please don't close the PR, your fix is valid. We really shouldn't use table names like t~ unless they are quoted using backticks. Please update your PR and add a test to SQLQuerySuite.

jayadevanmurali · 2016-01-31T09:21:27Z

Thanks @hvanhovell, Working on test case

hvanhovell · 2016-01-31T09:44:46Z

@jayadevanmurali could you add the test to DataFrameSuite instead of SQLQuerySuite? Sorry about the confusion.

jayadevanmurali · 2016-01-31T16:18:02Z

@hvanhovell yes sure, no problem

jayadevanmurali · 2016-02-01T12:10:58Z

@hvanhovell the current pull request is based on an outdated code. So I would like to create a new pull request on the latest code. Hope that is fine

hvanhovell · 2016-02-01T12:15:17Z

Sure it is. Why not just merge the current master?

Revert my changes.

jayadevanmurali · 2016-02-01T13:20:26Z

@hvanhovell, Yea I tried to merge my branch (https://github.com/jayadevanmurali/spark/tree/branch-0.1-SPARK-SPARK-12982) with apache spark master. But I cant see any option to merge by "create pull request". That's why I suggest new PR option.

jayadevanmurali · 2016-02-03T14:41:38Z

@hvanhovell I have created a new PR for this issue with latest code. #11051. Could you please review the same.

hvanhovell · 2016-02-04T14:24:11Z

@jayadevanmurali could you close this one?

jayadevanmurali · 2016-02-04T15:03:08Z

Created new pr with latest code for this issue

Update SQLContext.scala

ef5d7c7

Add the table name validation at the temp table creation

hvanhovell reviewed Jan 29, 2016
View reviewed changes

Update SQLContext.scala

9886429

Revert my changes.

jayadevanmurali closed this Feb 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-12982][SQL] Add table name validation in temp table registration #10983

[SPARK-12982][SQL] Add table name validation in temp table registration #10983

jayadevanmurali commented Jan 29, 2016

AmplabJenkins commented Jan 29, 2016

hvanhovell Jan 29, 2016

jayadevanmurali Jan 30, 2016

hvanhovell commented Jan 29, 2016

jayadevanmurali commented Jan 30, 2016

hvanhovell commented Jan 30, 2016

jayadevanmurali commented Jan 31, 2016

hvanhovell commented Jan 31, 2016

jayadevanmurali commented Jan 31, 2016

hvanhovell commented Jan 31, 2016

jayadevanmurali commented Jan 31, 2016

jayadevanmurali commented Feb 1, 2016

hvanhovell commented Feb 1, 2016

jayadevanmurali commented Feb 1, 2016

jayadevanmurali commented Feb 3, 2016

hvanhovell commented Feb 4, 2016

jayadevanmurali commented Feb 4, 2016

[SPARK-12982][SQL] Add table name validation in temp table registration #10983

[SPARK-12982][SQL] Add table name validation in temp table registration #10983

Conversation

jayadevanmurali commented Jan 29, 2016

AmplabJenkins commented Jan 29, 2016

hvanhovell Jan 29, 2016

Choose a reason for hiding this comment

jayadevanmurali Jan 30, 2016

Choose a reason for hiding this comment

hvanhovell commented Jan 29, 2016

jayadevanmurali commented Jan 30, 2016

hvanhovell commented Jan 30, 2016

jayadevanmurali commented Jan 31, 2016

hvanhovell commented Jan 31, 2016

jayadevanmurali commented Jan 31, 2016

hvanhovell commented Jan 31, 2016

jayadevanmurali commented Jan 31, 2016

jayadevanmurali commented Feb 1, 2016

hvanhovell commented Feb 1, 2016

jayadevanmurali commented Feb 1, 2016

jayadevanmurali commented Feb 3, 2016

hvanhovell commented Feb 4, 2016

jayadevanmurali commented Feb 4, 2016