-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ARROW-2299: [Go] Import Go arrow implementation from influxdata/arrow
**NOTE:** Some code generated files and assembler output from the LLVM compiler do not have the headers as they would be stripped each time code generation is rerun. These files are included so that the Go package is go-gettable without any additional build steps. Author: Stuart Carnie <stuart.carnie@gmail.com> Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #1739 from stuartcarnie/sgc-go-arrow and squashes the following commits: 95b9b42 <Wes McKinney> Add new ci/travis_release_audit.sh script 2320777 <Wes McKinney> Split Apache RAT check into separate script, always run. Update rat_exclude_files.txt f00fb6f <Stuart Carnie> Rename title; add Apache copyright headers to markdown files f31d8ca <Stuart Carnie> Add Apache copyright headers 3e17fe4 <Stuart Carnie> Initial commit, before copyright update
- Loading branch information
1 parent
a50ef9f
commit 60848c0
Showing
141 changed files
with
10,936 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
#!/usr/bin/env bash | ||
|
||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
set -e | ||
|
||
# Check licenses according to Apache policy | ||
git archive HEAD --prefix=apache-arrow/ --output=arrow-src.tar.gz | ||
./dev/release/run-rat.sh arrow-src.tar.gz |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
root = true | ||
|
||
[*.tmpl] | ||
indent_style = tab | ||
indent_size = 4 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
### Go template | ||
# Binaries for programs and plugins | ||
*.exe | ||
*.dll | ||
*.so | ||
*.dylib | ||
*.o | ||
|
||
# Test binary, build with `go test -c` | ||
*.test | ||
|
||
# Output of the go coverage tool, specifically when used with LiteIDE | ||
*.out | ||
|
||
# Project-local glide cache, RE: https://github.com/Masterminds/glide/issues/736 | ||
.glide/ | ||
|
||
bin/ | ||
vendor/ |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
[[constraint]] | ||
name = "github.com/stretchr/testify" | ||
version = "1.2.0" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
GO_BUILD=go build | ||
GO_GEN=go generate | ||
GO_TEST?=go test | ||
GOPATH=$(realpath ../../../../../..) | ||
|
||
GO_SOURCES := $(shell find . -path ./_lib -prune -o -name '*.go' -not -name '*_test.go') | ||
ALL_SOURCES := $(shell find . -path ./_lib -prune -o -name '*.go' -name '*.s' -not -name '*_test.go') | ||
SOURCES_NO_VENDOR := $(shell find . -path ./vendor -prune -o -name "*.go" -not -name '*_test.go' -print) | ||
|
||
.PHONEY: test bench assembly generate | ||
|
||
assembly: | ||
@$(MAKE) -C memory assembly | ||
@$(MAKE) -C math assembly | ||
|
||
generate: bin/tmpl | ||
bin/tmpl -i -data=numeric.tmpldata type_traits_numeric.gen.go.tmpl array/numeric.gen.go.tmpl array/numericbuilder.gen.go.tmpl array/bufferbuilder_numeric.gen.go.tmpl | ||
bin/tmpl -i -data=datatype_numeric.gen.go.tmpldata datatype_numeric.gen.go.tmpl | ||
@$(MAKE) -C math generate | ||
|
||
fmt: $(SOURCES_NO_VENDOR) | ||
goimports -w $^ | ||
|
||
bench: $(GO_SOURCES) | assembly | ||
$(GO_TEST) $(GO_TEST_ARGS) -bench=. -run=- ./... | ||
|
||
bench-noasm: $(GO_SOURCES) | ||
$(GO_TEST) $(GO_TEST_ARGS) -tags='noasm' -bench=. -run=- ./... | ||
|
||
test: $(GO_SOURCES) | assembly | ||
$(GO_TEST) $(GO_TEST_ARGS) ./... | ||
|
||
test-noasm: $(GO_SOURCES) | ||
$(GO_TEST) $(GO_TEST_ARGS) -tags='noasm' ./... | ||
|
||
bin/tmpl: _tools/tmpl/main.go | ||
$(GO_BUILD) -o $@ ./_tools/tmpl | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,174 @@ | ||
<!--- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
Apache Arrow for Go | ||
=================== | ||
|
||
[Apache Arrow][arrow] is a cross-language development platform for in-memory data. It specifies a | ||
standardized language-independent columnar memory format for flat and hierarchical data, | ||
organized for efficient analytic operations on modern hardware. It also provides computational | ||
libraries and zero-copy streaming messaging and inter-process communication. | ||
|
||
|
||
Reference Counting | ||
------------------ | ||
|
||
arrow makes use of reference counting so that it can track when memory buffers are no longer used. This allows | ||
arrow to update resource accounting, pool memory such and track overall memory usage as objects are created | ||
and released. Types expose two methods to deal with this pattern. The `Retain` method will increase the | ||
reference count by 1 and `Release` method will reduce the count by 1. Once the reference count of an object | ||
is zero, any associated object will be freed. `Retain` and `Release` are safe to call from multiple goroutines. | ||
|
||
### When to call `Retain` / `Release`? | ||
|
||
* If you are passed an object and wish to take ownership of it, you must call `Retain`. You must later pair this | ||
with a call to `Release` when you no longer need the object. "Taking ownership" typically means you | ||
wish to access the object outside the scope of the current function call. | ||
|
||
* You own any object you create via functions whose name begins with `New` or `Copy` or when receiving | ||
an object over a channel. Therefore you must call `Release` once you no longer need the object. | ||
|
||
* If you send an object over a channel, you must call `Retain` before sending it as the receiver is | ||
assumed to own the object and will later call `Release` when it no longer needs the object. | ||
|
||
|
||
Performance | ||
----------- | ||
|
||
The arrow package makes extensive use of [c2goasm][] to leverage LLVM's advanced optimizer and generate PLAN9 | ||
assembly functions from C/C++ code. The arrow package can be compiled without these optimizations using the `noasm` | ||
build tag. Alternatively, by configuring an environment variable, it is possible to dynamically configure which | ||
architecture optimizations are used at runtime. | ||
See the `cpu` package [README](internal/cpu/README.md) for a description of this environment variable. | ||
|
||
### Example Usage | ||
|
||
The following benchmarks demonstrate summing an array of 8192 values using various optimizations. | ||
|
||
Disable no architecture optimizations (thus using AVX2): | ||
|
||
```sh | ||
$ INTEL_DISABLE_EXT=NONE go test -bench=8192 -run=. ./math | ||
goos: darwin | ||
goarch: amd64 | ||
pkg: github.com/apache/arrow/go/arrow/math | ||
BenchmarkFloat64Funcs_Sum_8192-8 2000000 687 ns/op 95375.41 MB/s | ||
BenchmarkInt64Funcs_Sum_8192-8 2000000 719 ns/op 91061.06 MB/s | ||
BenchmarkUint64Funcs_Sum_8192-8 2000000 691 ns/op 94797.29 MB/s | ||
PASS | ||
ok github.com/apache/arrow/go/arrow/math 6.444s | ||
``` | ||
|
||
**NOTE:** `NONE` is simply ignored, thus enabling optimizations for AVX2 and SSE4 | ||
|
||
---- | ||
|
||
Disable AVX2 architecture optimizations: | ||
|
||
```sh | ||
$ INTEL_DISABLE_EXT=AVX2 go test -bench=8192 -run=. ./math | ||
goos: darwin | ||
goarch: amd64 | ||
pkg: github.com/apache/arrow/go/arrow/math | ||
BenchmarkFloat64Funcs_Sum_8192-8 1000000 1912 ns/op 34263.63 MB/s | ||
BenchmarkInt64Funcs_Sum_8192-8 1000000 1392 ns/op 47065.57 MB/s | ||
BenchmarkUint64Funcs_Sum_8192-8 1000000 1405 ns/op 46636.41 MB/s | ||
PASS | ||
ok github.com/apache/arrow/go/arrow/math 4.786s | ||
``` | ||
|
||
---- | ||
|
||
Disable ALL architecture optimizations, thus using pure Go implementation: | ||
|
||
```sh | ||
$ INTEL_DISABLE_EXT=ALL go test -bench=8192 -run=. ./math | ||
goos: darwin | ||
goarch: amd64 | ||
pkg: github.com/apache/arrow/go/arrow/math | ||
BenchmarkFloat64Funcs_Sum_8192-8 200000 10285 ns/op 6371.41 MB/s | ||
BenchmarkInt64Funcs_Sum_8192-8 500000 3892 ns/op 16837.37 MB/s | ||
BenchmarkUint64Funcs_Sum_8192-8 500000 3929 ns/op 16680.00 MB/s | ||
PASS | ||
ok github.com/apache/arrow/go/arrow/math 6.179s | ||
``` | ||
|
||
Status | ||
------ | ||
|
||
The first milestone was to implement the necessary Array types in order to use | ||
them internally in the [ifql][] execution engine and storage layers of [InfluxDB][]. | ||
|
||
|
||
### Memory Management | ||
|
||
- [x] Allocations are 64-byte aligned and padded to 8-bytes | ||
|
||
|
||
### Array and builder support | ||
|
||
**Primitive types** | ||
|
||
- [x] Signed and unsigned 8, 16, 32 and 64 bit integers | ||
- [x] 32 and 64 bit floats | ||
- [x] Packed LSB booleans | ||
- [x] Variable-length binary | ||
- [ ] String (valid UTF-8) | ||
- [ ] Half-float (16-bit) | ||
- [ ] Null (no physical storage) | ||
|
||
**Parametric types** | ||
|
||
- [x] Timestamp | ||
- [ ] Interval (year/month or day/time) | ||
- [ ] Date32 (days since UNIX epoch) | ||
- [ ] Date64 (milliseconds since UNIX epoch) | ||
- [ ] Time32 (seconds or milliseconds since midnight) | ||
- [ ] Time64 (microseconds or nanoseconds since midnight) | ||
- [ ] Decimal (128-bit) | ||
- [ ] Fixed-sized binary | ||
- [ ] List | ||
- [ ] Struct | ||
- [ ] Union | ||
- [ ] Dense | ||
- [ ] Sparse | ||
- [ ] Dictionary | ||
- [ ] Dictionary encoding | ||
|
||
### Type metadata | ||
|
||
- [x] Data types (implemented arrays) | ||
- [ ] Field | ||
- [ ] Schema | ||
|
||
|
||
### I/O | ||
|
||
Serialization is planned for a future iteration. | ||
|
||
- [ ] Flat buffers for serializing metadata | ||
- [ ] Record Batch | ||
- [ ] Table | ||
|
||
|
||
|
||
[arrow]: https://arrow.apache.org | ||
[ifql]: https://github.com/influxdata/ifql | ||
[InfluxDB]: https://github.com/influxdata/influxdb | ||
[c2goasm]: https://github.com/minio/c2goasm |
Oops, something went wrong.