Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25061][SQL] Precedence for ThriftServer hiveconf commandline parameter #27041

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -17,7 +17,7 @@

package org.apache.spark.sql.hive.thriftserver

import java.io.{File, FilenameFilter}
import java.io.{BufferedWriter, File, FilenameFilter, FileWriter}
import java.net.URL
import java.nio.charset.StandardCharsets
import java.sql.{Date, DriverManager, SQLException, Statement}
Expand All @@ -32,16 +32,20 @@ import scala.io.Source
import scala.util.{Random, Try}

import com.google.common.io.Files
import org.apache.hadoop.fs.Path
import org.apache.hadoop.hive.conf.HiveConf.ConfVars
import org.apache.hive.jdbc.HiveDriver
import org.apache.hive.service.auth.PlainSaslHelper
import org.apache.hive.service.cli.{FetchOrientation, FetchType, GetInfoType, RowSet}
import org.apache.hive.service.cli.thrift.ThriftCLIServiceClient
import org.apache.hive.service.server.HiveServer2
import org.apache.thrift.protocol.TBinaryProtocol
import org.apache.thrift.transport.TSocket
import org.scalatest.Assertions._
import org.scalatest.BeforeAndAfterAll

import org.apache.spark.{SparkException, SparkFunSuite}
import org.apache.spark.{SparkConf, SparkException, SparkFunSuite}
import org.apache.spark.deploy.SparkHadoopUtil
import org.apache.spark.internal.Logging
import org.apache.spark.sql.hive.HiveUtils
import org.apache.spark.sql.hive.test.HiveTestJars
Expand Down Expand Up @@ -1213,3 +1217,42 @@ abstract class HiveThriftServer2Test extends SparkFunSuite with BeforeAndAfterAl
}
}
}

object SPARK_25061 {
def main(args: Array[String]): Unit = {

val propKey = "hive.metastore.server.max.threads"
val propXmlValue = "1"
val propCliValue = "2"

// Prepare hive-site.xml
val hiveSiteXmlContent =
s"""
|<configuration>
| <property>
| <name>$propKey</name>
| <value>$propXmlValue</value>
| </property>
|</configuration>
""".stripMargin
val hiveSiteXml = new File(Utils.createTempDir().getCanonicalPath, "hive-site.xml")
val bw = new BufferedWriter(new FileWriter(hiveSiteXml))
bw.write(hiveSiteXmlContent)
bw.close()

// pass --hiveconf to override hive-site.xml property
// Refer org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main
val hiveConfArgs = Array("--hiveconf", s"""$propKey=$propCliValue""")
val optionsProcessor = new HiveServer2.ServerOptionsProcessor("HiveThriftServer2")
optionsProcessor.parse(hiveConfArgs)

// Prepare a hiveclient and assert for the property value
val sparkConf = new SparkConf()
val hadoopConf = SparkHadoopUtil.get.newConfiguration(sparkConf)
hadoopConf.addResource(new Path("file", null, hiveSiteXml.getCanonicalPath))
val hiveConfValue = HiveUtils
.newClientForMetadata(sparkConf, hadoopConf)
.getConf(propKey, "FAIL")
assert(hiveConfValue === propCliValue)
}
}
Expand Up @@ -176,9 +176,21 @@ private[hive] class HiveClientImpl(
// is not set to builtin. When spark.sql.hive.metastore.jars is builtin, the classpath
// has hive-site.xml. So, HiveConf will use that to override its default values.
// 2: we set all spark confs to this hiveConf.
// 3: we set all entries in config to this hiveConf.
// 3: we take the conf passed as --hiveconf which would be set as system properties
// by org.apache.hive.service.server.HiveServer2.ServerOptionsProcessor.parse in
// org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main.
// 4: we set all entries in extraConfig to this hiveConf which have the highest precedence.
// To summarize, the order of precedence will be
// hadoopConf < sparkConf < overrideProps < extraConfig

// not to lose command line overwritten properties
// make a copy overridden props so that it can be reinserted finally
// HiveConf.getConfSystemProperties returns all the system properties
// which were considered on creation of HiveConf constructor
// Refer: org.apache.hadoop.hive.conf.HiveConf#applySystemProperties
val overriddenHiveProps = HiveConf.getConfSystemProperties.asScala
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we totally ignore the --hive-conf previously?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was handled as --hiveconf as part of HiveConf constructor i.e Refer https://github.com/apache/hive/blob/rel/release-2.3.5/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L4079 ,
i.e first it loads hive-site and then it adds --hiveconf properties on top.

But in spark we again add hadoopConf on top of it hence overwriting HiveConf order

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more code comment to convince people that HiveConf.getConfSystemProperties contains only the --hiveconf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan updated. is the comment adequate now?

val confMap = (hadoopConf.iterator().asScala.map(kv => kv.getKey -> kv.getValue) ++
sparkConf.getAll.toMap ++ extraConfig).toMap
sparkConf.getAll.toMap ++ overriddenHiveProps ++ extraConfig).toMap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we should update https://github.com/apache/spark/pull/27041/files#diff-6fd847124f8eae45ba2de1cf7d6296feR170-R179 and also explain why extraConfig is at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yhuai Sure, I have updated the PR with reasonable pointers for the order. Does it suffice it now.?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. As getConfSystemProperties will get all of hive confs that are in the system properties, it is possible that we will pull in a config that is not set by --hiveconf. Seems we are introducing a behavior change? Can you explain the impact of this change and why this change is fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, even without my changes in this PR, the HiveConf always considers the hive confs in system properties which were not set via --hiveconf as part of HiveConf constructor i.e Refer https://github.com/apache/hive/blob/rel/release-2.3.5/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L4079 (same behaviour for hive 1.2.1 as well) hence, this do not change any flow

confMap.foreach { case (k, v) => hiveConf.set(k, v) }
SQLConf.get.redactOptions(confMap).foreach { case (k, v) =>
logDebug(
Expand Down