Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12982][SQL] Add table name validation in temp table registration #10983

Conversation

jayadevanmurali
Copy link
Contributor

Add the table name validation at the temp table creation

Add the table name validation at the temp table creation
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@@ -747,7 +747,7 @@ class SQLContext private[sql](
* only during the lifetime of this instance of SQLContext.
*/
private[sql] def registerDataFrameAsTable(df: DataFrame, tableName: String): Unit = {
catalog.registerTable(TableIdentifier(tableName), df.logicalPlan)
catalog.registerTable(SqlParser.parseTableIdentifier(tableName), df.logicalPlan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have removed SqlParser recently. Could you merge/rebase to the most recent master, and use the sqlParser variable for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I can see the variable definition at line 211 of SqlContext.scala
@transient
protected[sql] val sqlParser = new SparkSQLParser(getSQLDialect().parse(_))

But this varable is not used anywhare.All methods use Sqlarser.parseTableIdentifier() for example @experimental
def createExternalTable(
tableName: String,
source: String,
options: Map[String, String]): DataFrame = {
val tableIdent = SqlParser.parseTableIdentifier(tableName)
val cmd =
CreateTableUsing(
tableIdent,
userSpecifiedSchema = None,
source,
temporary = false,
options,
allowExisting = false,
managedIfNoPath = false)
executePlan(cmd).toRdd
table(tableIdent)
}
And more over SparkSQLParser returns Logical plan, I need to get TableIdintefier similor to above method. So how can I use this?

@hvanhovell
Copy link
Contributor

@jayadevanmurali I am not sure if the problem described in the JIRA is still an issue in the current master. Could you check this?

@jayadevanmurali
Copy link
Contributor Author

@hvanhovell
I was able to replicate this in spark 2.0.0.

Steps
ayadevan@Satellite-L640:~/spark$ ./bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
____ __
/ / ___ / /
\ / _ / _ `/ __/ '/
/
/ .__/,// //_\ version 2.0.0-SNAPSHOT
/_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
Type in expressions to have them evaluated.
Type :help for more information.
16/01/30 14:19:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/30 14:19:22 WARN Utils: Your hostname, Satellite-L640 resolves to a loopback address: 127.0.1.1; using 100.86.225.72 instead (on interface ppp0)
16/01/30 14:19:22 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Spark context available as sc (master = local[*], app id = local-1454143767817).
SQL context available as sqlContext.

scala> import org.apache.spark.sql.types.{StringType, StructField, StructType}
import org.apache.spark.sql.types.{StringType, StructField, StructType}

scala> import org.apache.spark.sql.{DataFrame, Row, SQLContext}
import org.apache.spark.sql.{DataFrame, Row, SQLContext}

scala> import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.{SparkContext, SparkConf}

scala> val rows = List(Row("foo"), Row("bar"));
rows: List[org.apache.spark.sql.Row] = List([foo], [bar])

scala> val schema = StructType(Seq(StructField("col", StringType)));
schema: org.apache.spark.sql.types.StructType = StructType(StructField(col,StringType,true))

scala> val rdd = sc.parallelize(rows);
rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = ParallelCollectionRDD[0] at parallelize at :32

scala> val df = sqlContext.createDataFrame(rdd, schema)
df: org.apache.spark.sql.DataFrame = [col: string]

scala> df.registerTempTable("t~")

scala> df.sqlContext.dropTempTable("t~")
java.lang.RuntimeException: [1.2] failure: ``.'' expected but`~' found

t~
^
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.catalyst.SqlParser$.parseTableIdentifier(SqlParser.scala:58)
at org.apache.spark.sql.SQLContext.table(SQLContext.scala:836)
at org.apache.spark.sql.SQLContext.dropTempTable(SQLContext.scala:763)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:39)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:44)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:46)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:48)
at $iwC$$iwC$$iwC$$iwC.(:50)
at $iwC$$iwC$$iwC.(:52)
at $iwC$$iwC.(:54)
at $iwC.(:56)
at (:58)
at .(:62)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1045)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1326)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:821)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:852)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:800)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1064)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

scala>

@hvanhovell
Copy link
Contributor

You are using an older version of the master branch (last commit 25 days ago). Your version still has the org.apache.spark.sql.catalyst.SqlParser class. That has been removed since commit 7cd7f22.

Please update your master, and try again.

@jayadevanmurali
Copy link
Contributor Author

Thanks @hvanhovell , Got your point. I updated my code and repeat the steps. I was able to replicate this. Please check the steps

jayadevan@Satellite-L640:~/spark$ ./bin/spark-shell
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/01/31 09:27:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/31 09:27:57 WARN Utils: Your hostname, Satellite-L640 resolves to a loopback address: 127.0.1.1, but we couldn't find any external IP address!
16/01/31 09:27:57 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Spark context available as sc (master = local[*], app id = local-1454212680541).
SQL context available as sqlContext.
Welcome to
____ __
/ / ___ / /
\ / _ / _ `/ __/ '/
/
/ .__/,// //_\ version 2.0.0-SNAPSHOT
/_/

Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.apache.spark.sql.types.{StringType, StructField, StructType}
import org.apache.spark.sql.types.{StringType, StructField, StructType}

scala> import org.apache.spark.sql.{DataFrame, Row, SQLContext}
import org.apache.spark.sql.{DataFrame, Row, SQLContext}

scala> import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.{SparkContext, SparkConf}

scala> val rows = List(Row("foo"), Row("bar"));
rows: List[org.apache.spark.sql.Row] = List([foo], [bar])

scala> val schema = StructType(Seq(StructField("col", StringType)));
schema: org.apache.spark.sql.types.StructType = StructType(StructField(col,StringType,true))

scala> val rdd = sc.parallelize(rows);
rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = ParallelCollectionRDD[0] at parallelize at :29

scala> val df = sqlContext.createDataFrame(rdd, schema)
df: org.apache.spark.sql.DataFrame = [col: string]

scala> df.registerTempTable("t~")

scala> df.sqlContext.dropTempTable("t~")
org.apache.spark.sql.AnalysisException: NoViableAltException(327@[209:20: ( DOT id2= identifier )?])
; line 1 pos 1
at org.apache.spark.sql.catalyst.parser.ParseErrorReporter.throwError(ParseDriver.scala:158)
at org.apache.spark.sql.catalyst.parser.ParseErrorReporter.throwError(ParseDriver.scala:147)
at org.apache.spark.sql.catalyst.parser.ParseDriver$.parse(ParseDriver.scala:95)
at org.apache.spark.sql.catalyst.parser.ParseDriver$.parseTableName(ParseDriver.scala:42)
at org.apache.spark.sql.catalyst.CatalystQl.parseTableIdentifier(CatalystQl.scala:81)
at org.apache.spark.sql.SQLContext.table(SQLContext.scala:811)
at org.apache.spark.sql.SQLContext.dropTempTable(SQLContext.scala:738)
... 49 elided

So I will close this pull request and raise a new one.

@hvanhovell
Copy link
Contributor

@jayadevanmurali please don't close the PR, your fix is valid. We really shouldn't use table names like t~ unless they are quoted using backticks. Please update your PR and add a test to SQLQuerySuite.

@jayadevanmurali
Copy link
Contributor Author

Thanks @hvanhovell, Working on test case

@hvanhovell
Copy link
Contributor

@jayadevanmurali could you add the test to DataFrameSuite instead of SQLQuerySuite? Sorry about the confusion.

@jayadevanmurali
Copy link
Contributor Author

@hvanhovell yes sure, no problem

@jayadevanmurali
Copy link
Contributor Author

@hvanhovell the current pull request is based on an outdated code. So I would like to create a new pull request on the latest code. Hope that is fine

@hvanhovell
Copy link
Contributor

Sure it is. Why not just merge the current master?

Revert my changes.
@jayadevanmurali
Copy link
Contributor Author

@hvanhovell, Yea I tried to merge my branch (https://github.com/jayadevanmurali/spark/tree/branch-0.1-SPARK-SPARK-12982) with apache spark master. But I cant see any option to merge by "create pull request". That's why I suggest new PR option.

@jayadevanmurali
Copy link
Contributor Author

@hvanhovell I have created a new PR for this issue with latest code. #11051. Could you please review the same.

@hvanhovell
Copy link
Contributor

@jayadevanmurali could you close this one?

@jayadevanmurali
Copy link
Contributor Author

Created new pr with latest code for this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants