Skip to content

Implement extensible configuration mechanism #138

@alamb

Description

@alamb

Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-11059

We are getting to the point where there are multiple settings we could add to operators to fine-tune performance. Custom operators provided by crates that extend DataFusion may also need this capability.

I propose that we add support for key-value configuration options so that we don't need to plumb through each new configuration setting that we add.

For example. I am about to start on a "coalesce batches" operator and I would like a setting such as "coalesce.batch.size".

For built-in settings like this we can provide information such as documentation and default values and generate documentation from this.

For example, here is how Spark defines configs:
{code:java}
val PARQUET_VECTORIZED_READER_ENABLED =
buildConf("spark.sql.parquet.enableVectorizedReader")
.doc("Enables vectorized parquet decoding.")
.version("2.0.0")
.booleanConf
.createWithDefault(true) {code}

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions