-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Break up contributing guide into smaller pages (#10533)
* Docs: split contributor guide into multiple pages * Fix links * Update docs/source/contributor-guide/howtos.md Co-authored-by: Jonah Gao <jonahgao@msn.com> --------- Co-authored-by: Jonah Gao <jonahgao@msn.com>
- Loading branch information
Showing
5 changed files
with
325 additions
and
269 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
<!--- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
# Getting Started | ||
|
||
This section describes how you can get started at developing DataFusion. | ||
|
||
## Windows setup | ||
|
||
```shell | ||
wget https://az792536.vo.msecnd.net/vms/VMBuild_20190311/VirtualBox/MSEdge/MSEdge.Win10.VirtualBox.zip | ||
choco install -y git rustup.install visualcpp-build-tools | ||
git-bash.exe | ||
cargo build | ||
``` | ||
|
||
## Protoc Installation | ||
|
||
Compiling DataFusion from sources requires an installed version of the protobuf compiler, `protoc`. | ||
|
||
On most platforms this can be installed from your system's package manager | ||
|
||
``` | ||
# Ubuntu | ||
$ sudo apt install -y protobuf-compiler | ||
# Fedora | ||
$ dnf install -y protobuf-devel | ||
# Arch Linux | ||
$ pacman -S protobuf | ||
# macOS | ||
$ brew install protobuf | ||
``` | ||
|
||
You will want to verify the version installed is `3.12` or greater, which introduced support for explicit [field presence](https://github.com/protocolbuffers/protobuf/blob/v3.12.0/docs/field_presence.md). Older versions may fail to compile. | ||
|
||
```shell | ||
$ protoc --version | ||
libprotoc 3.12.4 | ||
``` | ||
|
||
Alternatively a binary release can be downloaded from the [Release Page](https://github.com/protocolbuffers/protobuf/releases) or [built from source](https://github.com/protocolbuffers/protobuf/blob/main/src/README.md). | ||
|
||
## Bootstrap environment | ||
|
||
DataFusion is written in Rust and it uses a standard rust toolkit: | ||
|
||
- `cargo build` | ||
- `cargo fmt` to format the code | ||
- `cargo test` to test | ||
- etc. | ||
|
||
Note that running `cargo test` requires significant memory resources, due to cargo running many tests in parallel by default. If you run into issues with slow tests or system lock ups, you can significantly reduce the memory required by instead running `cargo test -- --test-threads=1`. For more information see [this issue](https://github.com/apache/datafusion/issues/5347). | ||
|
||
Testing setup: | ||
|
||
- `rustup update stable` DataFusion uses the latest stable release of rust | ||
- `git submodule init` | ||
- `git submodule update` | ||
|
||
Formatting instructions: | ||
|
||
- [ci/scripts/rust_fmt.sh](../../../ci/scripts/rust_fmt.sh) | ||
- [ci/scripts/rust_clippy.sh](../../../ci/scripts/rust_clippy.sh) | ||
- [ci/scripts/rust_toml_fmt.sh](../../../ci/scripts/rust_toml_fmt.sh) | ||
|
||
or run them all at once: | ||
|
||
- [dev/rust_lint.sh](../../../dev/rust_lint.sh) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,129 @@ | ||
<!--- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
# HOWTOs | ||
|
||
## How to add a new scalar function | ||
|
||
Below is a checklist of what you need to do to add a new scalar function to DataFusion: | ||
|
||
- Add the actual implementation of the function to a new module file within: | ||
- [here](https://github.com/apache/datafusion/tree/main/datafusion/functions-array) for array functions | ||
- [here](https://github.com/apache/datafusion/tree/main/datafusion/functions/src/crypto) for crypto functions | ||
- [here](https://github.com/apache/datafusion/tree/main/datafusion/functions/src/datetime) for datetime functions | ||
- [here](https://github.com/apache/datafusion/tree/main/datafusion/functions/src/encoding) for encoding functions | ||
- [here](https://github.com/apache/datafusion/tree/main/datafusion/functions/src/math) for math functions | ||
- [here](https://github.com/apache/datafusion/tree/main/datafusion/functions/src/regex) for regex functions | ||
- [here](https://github.com/apache/datafusion/tree/main/datafusion/functions/src/string) for string functions | ||
- [here](https://github.com/apache/datafusion/tree/main/datafusion/functions/src/unicode) for unicode functions | ||
- create a new module [here](https://github.com/apache/datafusion/tree/main/datafusion/functions/src/) for other functions. | ||
- New function modules - for example a `vector` module, should use a [rust feature](https://doc.rust-lang.org/cargo/reference/features.html) (for example `vector_expressions`) to allow DataFusion | ||
users to enable or disable the new module as desired. | ||
- The implementation of the function is done via implementing `ScalarUDFImpl` trait for the function struct. | ||
- See the [advanced_udf.rs] example for an example implementation | ||
- Add tests for the new function | ||
- To connect the implementation of the function add to the mod.rs file: | ||
- a `mod xyz;` where xyz is the new module file | ||
- a call to `make_udf_function!(..);` | ||
- an item in `export_functions!(..);` | ||
- In [sqllogictest/test_files], add new `sqllogictest` integration tests where the function is called through SQL against well known data and returns the expected result. | ||
- Documentation for `sqllogictest` [here](https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/README.md) | ||
- Add SQL reference documentation [here](https://github.com/apache/datafusion/blob/main/docs/source/user-guide/sql/scalar_functions.md) | ||
|
||
[advanced_udf.rs]: https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/advanced_udaf.rs | ||
[sqllogictest/test_files]: https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest/test_files | ||
|
||
## How to add a new aggregate function | ||
|
||
Below is a checklist of what you need to do to add a new aggregate function to DataFusion: | ||
|
||
- Add the actual implementation of an `Accumulator` and `AggregateExpr`: | ||
- In [datafusion/expr/src](../../../datafusion/expr/src/aggregate_function.rs), add: | ||
- a new variant to `AggregateFunction` | ||
- a new entry to `FromStr` with the name of the function as called by SQL | ||
- a new line in `return_type` with the expected return type of the function, given an incoming type | ||
- a new line in `signature` with the signature of the function (number and types of its arguments) | ||
- a new line in `create_aggregate_expr` mapping the built-in to the implementation | ||
- tests to the function. | ||
- In [sqllogictest/test_files], add new `sqllogictest` integration tests where the function is called through SQL against well known data and returns the expected result. | ||
- Documentation for `sqllogictest` [here](https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/README.md) | ||
- Add SQL reference documentation [here](https://github.com/apache/datafusion/blob/main/docs/source/user-guide/sql/aggregate_functions.md) | ||
|
||
## How to display plans graphically | ||
|
||
The query plans represented by `LogicalPlan` nodes can be graphically | ||
rendered using [Graphviz](https://www.graphviz.org/). | ||
|
||
To do so, save the output of the `display_graphviz` function to a file.: | ||
|
||
```rust | ||
// Create plan somehow... | ||
let mut output = File::create("/tmp/plan.dot")?; | ||
write!(output, "{}", plan.display_graphviz()); | ||
``` | ||
|
||
Then, use the `dot` command line tool to render it into a file that | ||
can be displayed. For example, the following command creates a | ||
`/tmp/plan.pdf` file: | ||
|
||
```bash | ||
dot -Tpdf < /tmp/plan.dot > /tmp/plan.pdf | ||
``` | ||
|
||
## How to format `.md` document | ||
|
||
We are using `prettier` to format `.md` files. | ||
|
||
You can either use `npm i -g prettier` to install it globally or use `npx` to run it as a standalone binary. Using `npx` required a working node environment. Upgrading to the latest prettier is recommended (by adding `--upgrade` to the `npm` command). | ||
|
||
```bash | ||
$ prettier --version | ||
2.3.0 | ||
``` | ||
|
||
After you've confirmed your prettier version, you can format all the `.md` files: | ||
|
||
```bash | ||
prettier -w {datafusion,datafusion-cli,datafusion-examples,dev,docs}/**/*.md | ||
``` | ||
|
||
## How to format `.toml` files | ||
|
||
We use `taplo` to format `.toml` files. | ||
|
||
For Rust developers, you can install it via: | ||
|
||
```sh | ||
cargo install taplo-cli --locked | ||
``` | ||
|
||
> Refer to the [Installation section][doc] on other ways to install it. | ||
> | ||
> [doc]: https://taplo.tamasfe.dev/cli/installation/binary.html | ||
```bash | ||
$ taplo --version | ||
taplo 0.9.0 | ||
``` | ||
|
||
After you've confirmed your `taplo` version, you can format all the `.toml` files: | ||
|
||
```bash | ||
taplo fmt | ||
``` |
Oops, something went wrong.