Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rust] Add Reset method to buffer and cleanup comments on component. #6369

Closed
wants to merge 26 commits into from

Conversation

richardartoul
Copy link
Contributor

The reset method allows the buffers to be reused so they don't have to be allocated over and over again.

pitrou and others added 26 commits January 31, 2020 07:07
We will focus on OSS-Fuzz, now that we have been accepted there.

Also add an OSS-Fuzz badge in the README.

Closes apache#6328 from pitrou/ARROW-7712-delete-fuzzit-jobs and squashes the following commits:

6c6d2c4 <Antoine Pitrou> Remove remaining fuzzit-related configuration
a6abaa9 <Antoine Pitrou> ARROW-7712:   Delete fuzzit jobs

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
There are some problems with the current union comparison logic. For example:
1. For type check, we should not require fields to be equal. It is possible that two vectors' value ranges are equal but their fields are different.
2. We should not compare the number of sub vectors, as it is possible that two union vectors have different numbers of sub vectors, but have equal values in the range.

Closes apache#5544 from liyafan82/fly_0930_share and squashes the following commits:

d6ef3d2 <liyafan82>  Refine test case
c008289 <liyafan82>  Resolve test failure after rebasing
c515393 <liyafan82>  Rule out the change for union type comparison
bab7402 <liyafan82>  Compare fields for all vectors except union vectors
5b2225e <liyafan82>  Fix the bug with decimal vector
4d8b570 <liyafan82>  Fix problems with current union comparison logic

Authored-by: liyafan82 <fan_li_ya@foxmail.com>
Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
When checking statuses for equality, check to make sure that both have status detail objects before proceeding to compare those objects.

Closes apache#6332 from lidavidm/arrow-7734 and squashes the following commits:

9084bce <David Li> ARROW-7734:  check status details for nullptr in equality

Authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
Make the initial capacity argument optional.

Closes apache#6327 from pitrou/ARROW-6724-bufferoutstream-simpler-ctor and squashes the following commits:

b563f46 <Antoine Pitrou> ARROW-6724:  Allow simpler BufferOutputStream creation

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
…tests

Related to [ARROW-6871](https://issues.apache.org/jira/browse/ARROW-6871).

TransferPair related param checks in different classes have potential problems:

i. splitAndTansfer has no indices check in classes like VarcharVector
ii. splitAndTranser indices check in classes like UnionVector is not correct (Preconditions.checkArgument(startIndex + length <= valueCount)), should check params separately.
iii. should add more UT to cover corner cases.

Closes apache#5645 from tianchen92/ARROW-6871 and squashes the following commits:

f3b897d <tianchen> fix style
0d3c7ea <tianchen> add benchmark
01f9a48 <tianchen> revert changes in copyFrom
a22d58a <tianchen> ARROW-6871:  Enhance TransferPair related parameters check and tests

Authored-by: tianchen <niki.lj@alibaba-inc.com>
Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
According to SQL convertion, sql type DATE should correspond to a format of YYYY-MM-DD, without the components for hour/minute/second/millis

Therefore, JDBC type DATE should correspond to DateDayVector, with a type width of 4, instead of 8.

Closes apache#5944 from liyafan82/fly_1203_date and squashes the following commits:

a6de377 <liyafan82>  Remove division in time conversion
be73192 <liyafan82>  Resolve comments
eea8b79 <liyafan82>  Sql type DATE should correspond to DateDayVector

Authored-by: liyafan82 <fan_li_ya@foxmail.com>
Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
It makes binary verification more robust.

Closes apache#6335 from kou/release-verification-binaries-retry and squashes the following commits:

38944a9 <Sutou Kouhei>  Retry binary download on transient error

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Closes apache#6331 from kszucs/ARROW-7466 and squashes the following commits:

d50b8c8 <Krisztián Szűcs> don't install already installed python3
d956297 <Krisztián Szűcs> re-enable steps
421142d <Krisztián Szűcs> start all arguments at the same column...
2131836 <Krisztián Szűcs> travis multi line string
b50865e <Krisztián Szűcs> use travis_build_dir
6f1beb6 <Krisztián Szűcs> debug paths
7b054ec <Krisztián Szűcs> queue path
6929f3d <Krisztián Szűcs> fix crossbow path
2a2d7c3 <Krisztián Szűcs> deploy using crossbow

Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: François Saint-Jacques <fsaintjacques@gmail.com>
This PR adds DataFusion examples for a Flight client and server where the client can send a SQL query to the server and then receive the results.

I have manually tested with a Java client as well to confirm that it works.

Closes apache#6308 from andygrove/datafusion-flight-example and squashes the following commits:

788feef <Andy Grove> code cleanup
9c47338 <Neville Dipale> Complete flight client's record batch reader
1337b98 <Andy Grove> parse recordbatch
459bef3 <Andy Grove> client parses schema from ipc batches
31c894b <Andy Grove> update release test script
efe05ae <Andy Grove> update release test script
5ecea83 <Andy Grove> formatting
8b419da <Andy Grove> update release test script
03d2c84 <Andy Grove> client streams results
0a39a51 <Andy Grove> client can stream batches
e72c605 <Andy Grove> add starting point for flight-client example
ab28da8 <Andy Grove> get schema from query plan instead of from first batch
0901a3f <Neville Dipale> Merge branch 'datafusion-flight-example' of https://github.com/andygrove/arrow into datafusion-flight-example
ad2e3b0 <Neville Dipale> send schema before batches
996f2a4 <Andy Grove> Use PARQUET_TEST_DATA env var
260f9ca <Neville Dipale> fix license violation
516b66d <Neville Dipale> add helpers to convert record batch to flight data proto message
6beb4ea <Andy Grove> WIP example Flight server for DataFusion

Lead-authored-by: Andy Grove <andygrove73@gmail.com>
Co-authored-by: Neville Dipale <nevilledips@gmail.com>
Signed-off-by: Andy Grove <andygrove73@gmail.com>
…ration test

Additionally test agains dask's latest release not just the master revision.

Closes apache#6326 from kszucs/dask-pandas-pin and squashes the following commits:

b5cb40e <Krisztián Szűcs> pin pandas depending on dask's version
083221c <Krisztián Szűcs> pin pandas in the dask integration test

Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
…el verification

The wheel verification script fails for python 3.5.
At the same time the wheel properly works for python 3.5 docker
images without conda environment.

Conda forge doesn't maintain packages for python 3.5 anymore
and something must have mixed with the numpy versions.

This change fixed the wheel verification locally for me.

Closes apache#6339 from kszucs/wheel-verification and squashes the following commits:

3e96949 <Krisztián Szűcs> remove pytest verbose flags
026e5fb <Krisztián Szűcs> use pip to install dependencies for wheel verification

Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
The binaries are installed using Chocolatey, which takes a bit of time (it's a 2+GB install...), but less so than recompiling Boost from scratch during the CMake build.

[skip appveyor]

Closes apache#6325 from pitrou/ARROW-7726-download-boost-gha and squashes the following commits:

e877622 <Antoine Pitrou> Revert "Try a more flexible way of finding Boost"
eb5db8f <Antoine Pitrou> Try a more flexible way of finding Boost
d570649 <Antoine Pitrou> ARROW-7726:   Use boost binaries on Windows GHA build

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
We need a way to copy vector values in batch. Currently, we have copyFrom and copyFromSafe APIs. However, they are not enough, as copying values individually is not performant.

Closes apache#5916 from liyafan82/fly_1125_veccat and squashes the following commits:

94b407c <liyafan82>  Support dense union vector
ee49dc6 <liyafan82>  Add tests with null values
ad33e23 <liyafan82>  Rewrite tests with vector populator for result verification
c89211a <liyafan82>  Rewrite tests with vector populator and provide static utility
7c13ede <liyafan82>  Support concating vectors values in batch

Authored-by: liyafan82 <fan_li_ya@foxmail.com>
Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
…tartable

Executing the verification script can take quite some time, so creating a new environment in case if anything fails is time consuming.
Let the script reuse the same build directory for source release verification.

Need to export `TMPDIR` environment variable. @kou shall we use an argument instead?

Closes apache#6344 from kszucs/restartable-verification and squashes the following commits:

6d4723d <Krisztián Szűcs> Support for restarting the release verification script

Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
This addition follows the pattern of `test_source_distribution` (as it is in apache#6344). There are also two error message patches to make them consistent with everywhere else that references this env var (though FWIW `testing/data` is still not correct in the verification script, it's `arrow-testing/data` 🤷‍♂).

Closes apache#6345 from nealrichardson/flight-testing-data and squashes the following commits:

a29e88a <Krisztián Szűcs> factor out testing repository cloning
df9ef25 <Neal Richardson> Move addition and fix lint
e165d54 <Neal Richardson> Make sure macOS wheel verification has test data

Lead-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Co-authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
We're discussing whether to make those fields required in the schema definitions (which would make validation automatic by the flatbuffers generated verifier), but in the meantime we can check those fields manually.

This should fix a bunch of issues detected by OSS-Fuzz.

Closes apache#6293 from pitrou/ARROW-7691-check-fb-fields-not-null and squashes the following commits:

02478a6 <Antoine Pitrou> Use a function rather than a macro
e6d3d88 <Antoine Pitrou> ARROW-7691:  Check non-scalar Flatbuffers fields are not null

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
…s to no longer be in miniconda, install miniconda unconditionally

This change was necessary for me to get the script to finish to completion.

Closes apache#6348 from wesm/pip3-to-pip and squashes the following commits:

fcf3ae6 <Wes McKinney> conda environment must be activated for other steps to work if C++ is disabled. Fix selective disabling of integration test components in archery
26da759 <Krisztián Szűcs> always set up miniconda
490ceac <Wes McKinney> pip3 no longer in miniconda

Lead-authored-by: Wes McKinney <wesm+git@apache.org>
Co-authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
…n verifying RC, remove Python 3.5 from wheel verification

This resolves the issues I was having as described in ARROW-6757. This does not fix the Python 3.8 wheel, though

Closes apache#6350 from wesm/windows-rc-verify-fixes and squashes the following commits:

a9d4c66 <Wes McKinney> Fixes for Windows release verification scripts

Authored-by: Wes McKinney <wesm+git@apache.org>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
We're not testing the dataset feature in the verifications scripts yet.

Closes apache#6346 from kszucs/dataset-verification and squashes the following commits:

b8530ea <Krisztián Szűcs> Test dataset during the verification

Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…ong ABI tag

This is fixed in the latest release of `wheel` but we were pinning to an old version

Closes apache#6353 from nealrichardson/fix-win-38-wheels and squashes the following commits:

e3a865b <Neal Richardson> Remove wheel pin in requirements-wheel.txt

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
…arquetWriter

Closes apache#6352 from jorisvandenbossche/ARROW-7762 and squashes the following commits:

9353658 <Joris Van den Bossche> ARROW-7762:  Do not ignore exception for invalid version in ParquetWriter

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
The patch implements an encoder and decoder for Parquet's
BYTE_STREAM_SPLIT encoding. The patch also adds tests for
the new encoding.

Closes apache#6005 from martinradev/byte_stream_split_submit and squashes the following commits:

5a78f8b <Martin Radev> ARROW-5913:  Add BYTE_STREAM_SPLIT encoder and decoder

Authored-by: Martin Radev <martin.b.radev@gmail.com>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
This is to resolve [ARROW-4226](https://issues.apache.org/jira/browse/ARROW-4226).

Closes apache#5716 from rok/ARROW-4226 and squashes the following commits:

9ca93ab <Rok> Implementing review feedback.
1b922f6 <Rok> Implementing review feedback.
11b81bb <Rok> Factoring out index incrementing for dense to COO and CSF indices.
6f4f4a8 <Rok> Implementing feedback review.
28d38cb <Rok> Removing backslashes from comments.
3291abc <Rok> Marking indptrBuffers, indicesBuffers and axisOrder required.
d9ff47e <Rok> Further work and implementing review feedback.
24a831f <Rok> Style.
4f2bf00 <Rok> Work on CSF index tests.
6ceb406 <Rok> Implementing review feedback.
bd0d8c2 <Rok> Dense to sparse CSF conversion now in order of dimension size.
eb51947 <Rok> Switching SparseCSFIndex to '2D' data structure.
a322ff5 <Rok> Adding tests for multiple index value types for SparseCSFIndex.
f44d92c <Rok> Adding SparseCSFIndex::Make.
7d17995 <Rok> Adding Tensor to SparseCSFTensor conversion.
05a47a5 <Rok> Using axis_order in CSF.
6b938f7 <Rok> Documentation.
2d10104 <Rok> WIP

Authored-by: Rok <rok@mihevc.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
* Add an [Obsolete] attribute to `BinaryArray.GetValueOffset`
  * `ListArray.GetValueOffset` already has the [Obsolete] attribute, so it is not changed
* Avoid using `GetValueOffset` in the product source code

As a precaution, I added tests for `ValueOffsets` and left tests for `GetValueOffset`.

Closes apache#6333 from HashidaTKS/ARROW-7514_make_getvalueoffset_obsolete and squashes the following commits:

1dbaf39 <Takashi Hashida> ARROW-7514_make_getvalueoffset_obsolete
92b14c0 <Takashi Hashida> ARROW-7514_make_getvalueoffset_obsolete
07d106c <Takashi Hashida> ARROW-7514_make_getvalueoffset_obsolete

Authored-by: Takashi Hashida <t-hashida@amiya.co.jp>
Signed-off-by: Eric Erhardt <eric.erhardt@microsoft.com>
@github-actions
Copy link

github-actions bot commented Feb 6, 2020

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@@ -60,15 +61,28 @@ func (b *Buffer) Release() {
}
}

// Reset resets the buffer for reuse.
func (b *Buffer) Reset(buf []byte) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method leaks memory when the original buffer comes from b.mem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fsaintjacques Should I call b.mem.Free() on the original buf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if buf != b.buf {
    b.mem.Free(b.buf)
}
b.buf=buf
b.length = len(buf)

The special-case is there because one of the use-cases is "resetting" it with the same buf

@wesm
Copy link
Member

wesm commented Feb 13, 2020

The rebase here seems to have not gone well

@pitrou pitrou changed the title Add Reset method to buffer and cleanup comments on component. [Rust] Add Reset method to buffer and cleanup comments on component. Feb 13, 2020
@richardartoul
Copy link
Contributor Author

Closed in favor of #6430

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet