-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-7961][SQL]Refactor SQLConf to display better error message #6747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
A simple example of the scala> sqlContext.sql("conf").collect().foreach(v => println(v.toSeq.mkString("\t")))
spark.sql.inMemoryColumnarStorage.batchSize 10000 Controls the size of batches for columnar caching. Larger batch sizes can improve memory utilization and compression, but risk OOMs when caching data.
spark.sql.hive.verifyPartitionPath true <TODO>
spark.sql.retainGroupColumns true <TODO>
spark.sql.codegen false When true, code will be dynamically generated at runtime for expression evaluation in a specific query. For some queries with complicated expression this option can lead to significant speed-ups. However, for simple queries this can actually slow down query execution.
spark.sql.planner.sortMergeJoin false <TODO>
spark.sql.shuffle.partitions 200 Configures the number of partitions to use when shuffling data for joins or aggregations.
spark.sql.planner.externalSort false When true, performs sorts spilling to disk as needed otherwise sort each partition in memory.
spark.sql.broadcastTimeout 300 <TODO>
spark.sql.parquet.filterPushdown false Turn on Parquet filter pushdown optimization. This feature is turned off by default because of a known bug in Paruet 1.6.0rc3 (<a href="https://issues.apache.org/jira/browse/PARQUET-136">PARQUET-136</a>). However, if your table doesn't contain any nullable string or binary columns, it's still safe to turn this feature on.
spark.sql.parquet.cacheMetadata true Turns on caching of Parquet schema metadata. Can speed up querying of static data.
spark.sql.unsafe.enabled false <TDDO>
spark.sql.useSerializer2 true <TODO>
... |
1. Add `SQLConfEntry` to store the information about a configuration. For those configurations that cannot be found in `sql-programming-guide.md`, I left the doc as `<TODO>`. 2. Verify the value when setting a configuration if this is in SQLConf. 3. Add a command `conf` to display all public configurations.
|
Test build #34606 has finished for PR 6747 at commit
|
|
Test build #34607 has finished for PR 6747 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe remove the float version, and just use double version?
|
Test build #34678 has finished for PR 6747 at commit
|
|
Test build #34680 has finished for PR 6747 at commit
|
|
Test build #34685 has finished for PR 6747 at commit
|
|
Test build #34690 has finished for PR 6747 at commit
|
|
Addressed all comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add scaladoc for the semantics of these options. In particular, isPublic isn't obvious to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added scaladoc for SQLConfEntry
|
Thanks for working on this much needed cleanup! A few comments (some of which were preexisting):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also have an entry type for configs like these that validates the option is in some list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added enumConf and used it for PARQUET_COMPRESSION
|
Test build #34763 has finished for PR 6747 at commit
|
Added schemas for all set commands and split results into columns. For the configuration that has a default value, if it's not set ,
Sounds good. I will try it. |
One disadvantage is the size of a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a def unsetConf[T](entry: SQLConfEntry[T]) here can be convenient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Seems that we can remove the newline here.
|
One of the things that I envy Hive is Some follow-ups I can think of (but not necessarily to be done in this PR):
|
|
Test build #34767 has finished for PR 6747 at commit
|
|
Test build #34768 has finished for PR 6747 at commit
|
|
Another thought is that, we probably also want to migrate data sources options to this new API. @marmbrus How do you think? |
|
Test build #34775 has finished for PR 6747 at commit
|
|
Test build #34982 has finished for PR 6747 at commit
|
|
Test build #35047 has finished for PR 6747 at commit
|
|
Alright merging this. Thanks. |
1. Add `SQLConfEntry` to store the information about a configuration. For those configurations that cannot be found in `sql-programming-guide.md`, I left the doc as `<TODO>`. 2. Verify the value when setting a configuration if this is in SQLConf. 3. Use `SET -v` to display all public configurations. Author: zsxwing <zsxwing@gmail.com> Closes apache#6747 from zsxwing/sqlconf and squashes the following commits: 7d09bad [zsxwing] Use SQLConfEntry in HiveContext 49f6213 [zsxwing] Add getConf, setConf to SQLContext and HiveContext e014f53 [zsxwing] Merge branch 'master' into sqlconf 93dad8e [zsxwing] Fix the unit tests cf950c1 [zsxwing] Fix the code style and tests 3c5f03e [zsxwing] Add unsetConf(SQLConfEntry) and fix the code style a2f4add [zsxwing] getConf will return the default value if a config is not set 037b1db [zsxwing] Add schema to SetCommand 0520c3c [zsxwing] Merge branch 'master' into sqlconf 7afb0ec [zsxwing] Fix the configurations about HiveThriftServer 7e728e3 [zsxwing] Add doc for SQLConfEntry and fix 'toString' 5e95b10 [zsxwing] Add enumConf c6ba76d [zsxwing] setRawString => setConfString, getRawString => getConfString 4abd807 [zsxwing] Fix the test for 'set -v' 6e47e56 [zsxwing] Fix the compilation error 8973ced [zsxwing] Remove floatConf 1fc3a8b [zsxwing] Remove the 'conf' command and use 'set -v' instead 99c9c16 [zsxwing] Fix tests that use SQLConfEntry as a string 88a03cc [zsxwing] Add new lines between confs and return types ce7c6c8 [zsxwing] Remove seqConf f3c1b33 [zsxwing] Refactor SQLConf to display better error message
SQLConfEntryto store the information about a configuration. For those configurations that cannot be found insql-programming-guide.md, I left the doc as<TODO>.SET -vto display all public configurations.