Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added trim function #7394

Closed
wants to merge 16 commits into from
Closed

Added trim function #7394

wants to merge 16 commits into from

Conversation

sgnkc
Copy link
Contributor

@sgnkc sgnkc commented Jun 10, 2020

No description provided.

@github-actions
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

wesm and others added 15 commits June 11, 2020 11:07
Closes apache#7390 from wesm/ARROW-9085

Authored-by: Wes McKinney <wesm@apache.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
This patch removes "restore_trailing_bits" template parameter of
TransferBitmap class. Trailing bits are now always not clobbered,
which is no harm. It also refines trailing bits processing to keep
the performance influence trivial. Besides, this patch replaces
"invert_bits" boolean parameter with enum to allow explicit naming.

Closes apache#7373 from cyb70289/bitmap-transfer

Authored-by: Yibo Cai <yibo.cai@arm.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
…ion and writing of _metadata

In addition to https://issues.apache.org/jira/browse/ARROW-3154, this also closes https://issues.apache.org/jira/browse/ARROW-3275

And while going through the parquet docs, also clarified the `coerce_timestamps` default.

Closes apache#7348 from jorisvandenbossche/ARROW-3154-parquet-metadata-docs

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
The child array of a ListArray of nulls should have length 0.

Closes apache#7386 from brills/make_null_array_fix

Lead-authored-by: Zhuo Peng <1835738+brills@users.noreply.github.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
With this patch Arrow can pass kerberos_ticket and extra_conf on connecting with HDFS.

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>

Closes apache#7393 from zhouyuan/wip_hdfs_auth

Lead-authored-by: Yuan Zhou <yuan.zhou@intel.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
As the name of the class imply, only directory should be considered for partitions values.

Closes apache#7377 from fsaintjacques/ARROW-8726-incorrect-part

Authored-by: François Saint-Jacques <fsaintjacques@gmail.com>
Signed-off-by: François Saint-Jacques <fsaintjacques@gmail.com>
…lders

This PR improves upon the functionality added in apache#7032 by replacing nested usage of `BooleanArray.Builder` with a dedicated class `ArrowBuffer.BitPackedBuilder` that works at buffer level.

Array builders then use a `ArrowBuffer.BitPackedBuilder` for their validity map, and `BooleanArray.Builder` uses two (one for values, one for validity).

Closes apache#7158 from mr-smidge/arrow-6603-bit-packed-builder-for-nulls

Authored-by: Adam Szmigin <adam.szmigin@jetstoneam.com>
Signed-off-by: Eric Erhardt <eric.erhardt@microsoft.com>
Was curious how easy this would be and what issues I'd run into. So far I've done `sum()` and `mean()`, and I hit some roadblocks in doing min/max.

Issues:

* See workarounds in [sum.Array](https://github.com/apache/arrow/compare/master...nealrichardson:r-sum?expand=1#diff-695287341c6d5011a12d7d9bd3ae07adR350-R357). (1) missing/null values are always dropped, while R provides an option for how to treat missingness. Interestingly, it looks like min and max do have this null option supported (in `MinMaxOptions`) but sum and mean do not (yet).([ARROW-9054](https://issues.apache.org/jira/browse/ARROW-9054))
* (2) There is no sum method for boolean type ([ARROW-9055](https://issues.apache.org/jira/browse/ARROW-9055))
* There is no sum implemented for Scalars, which may be fine (I would expect it to be no-op for numeric types, just thought logically it maybe shouldn't error) ([ARROW-9056](https://issues.apache.org/jira/browse/ARROW-9056))
* For min and max, the path is to call the minmax kernel, which returns a struct containing min and max fields, and then extract the appropriate field from the struct. However, StructScalar doesn't seem to have field accessing methods like StructArray has ([ARROW-8769](https://issues.apache.org/jira/browse/ARROW-8769))
* I also ran into the fact that MakeArrayFromScalar can't handle structs (previously ticketed as [ARROW-6604](https://issues.apache.org/jira/browse/ARROW-6604)), which means I can't convert a ScalarStruct to an R object (data.frame). Not required for these kernel bindings though if ARROW-9070 happens.

Also:

* Summing integers seems to promote to return int64 if given int32 (I didn't try with smaller ints), even when overflow is not a danger (I was adding numbers 1 to 5). It would be nice if it returned the same type it got unless it has to go bigger to avoid overflow.
* Given how flexible it is to use CallFunction on a new function without having to write any additional C++ binding, it was surprising/frustrating that in order to call minmax, I had to first add boilerplate to instantiate default MinMaxOptions https://github.com/apache/arrow/pull/7308/files#diff-ffc5e6f7dfee7f9bed7733b7fb41e632R163-R167. I would expect that kernels would be able to instantiate their own default options if called with no options. ([ARROW-9091](https://issues.apache.org/jira/browse/ARROW-9091))

Closes apache#7308 from nealrichardson/r-sum

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
…C enabled

Closes apache#7165 from kszucs/ARROW-8785

Lead-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
In the past, we run integration tests from main methods, and recently, we have changed this to run them by the failsafe plugin.

This is a good change, but it also leads to significant performance degradation. In the past, it took about 10s to run ITTestLargeVector#testLargeDecimalVector, now it takes more than half an hour.

Our investigation shows that the problem was caused by calling HistoricalLog#recordEvent repeatedly. This method is called only when BaseAllocator#DEBUG is enabled. In a unit/integration test, the flag is enabled by default.

We solve the problem with the following steps:
1. We set system property to disable the BaseAllocator#DEBUG flag.
2. We change the logic so that the system property takes precedence over the AssertionUtil#isAssertionsEnabled method.

This makes the integration tests as fast as before.

Closes apache#7271 from liyafan82/fly_0526_reg

Authored-by: liyafan82 <fan_li_ya@foxmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
…ly exposed

This fixes a few things:
- Sending/receiving text headers in Java
- Iterating over all headers in Java (binary ones used to be filtered out)
- Receiving binary headers in Python

Closes apache#7224 from lidavidm/arrow-8858

Authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
…ion tests

This addresses the integration test error where Java cannot process duplicate field names.

This extends StructVector and VectorSchemaRoot to support duplicate field names.

Closes apache#7289 from rymurr/ARROW-8948

Authored-by: Ryan Murray <rymurr@dremio.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
A Java service, on a failed unary-unary call, will not send separate headers and trailers, but will instead consolidate headers into the trailers. So C++ clients should check both for headers and trailers, or it may miss headers that the Java server intended for clients.

Closes apache#7174 from lidavidm/flight-integration-middleware

Authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Closes apache#7391 from kou/homebrew-enable-gandiva

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
… LLVM 9 until gandiva-decimal-test is fixed

This unblocks LLVM 9 users from running the test suite. Whoever merges this, please leave the JIRA issue open.

Closes apache#7396 from wesm/ARROW-9092-triage

Authored-by: Wes McKinney <wesm@apache.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@wesm
Copy link
Member

wesm commented Jun 12, 2020

superseded by #7402

@wesm wesm closed this Jun 12, 2020
@sgnkc sgnkc deleted the trim_gandiva branch August 6, 2020 06:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet