-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added trim function #7394
Closed
Closed
Added trim function #7394
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? Then could you also rename pull request title in the following format?
See also: |
Closes apache#7390 from wesm/ARROW-9085 Authored-by: Wes McKinney <wesm@apache.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
This patch removes "restore_trailing_bits" template parameter of TransferBitmap class. Trailing bits are now always not clobbered, which is no harm. It also refines trailing bits processing to keep the performance influence trivial. Besides, this patch replaces "invert_bits" boolean parameter with enum to allow explicit naming. Closes apache#7373 from cyb70289/bitmap-transfer Authored-by: Yibo Cai <yibo.cai@arm.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…ion and writing of _metadata In addition to https://issues.apache.org/jira/browse/ARROW-3154, this also closes https://issues.apache.org/jira/browse/ARROW-3275 And while going through the parquet docs, also clarified the `coerce_timestamps` default. Closes apache#7348 from jorisvandenbossche/ARROW-3154-parquet-metadata-docs Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
The child array of a ListArray of nulls should have length 0. Closes apache#7386 from brills/make_null_array_fix Lead-authored-by: Zhuo Peng <1835738+brills@users.noreply.github.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
With this patch Arrow can pass kerberos_ticket and extra_conf on connecting with HDFS. Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> Closes apache#7393 from zhouyuan/wip_hdfs_auth Lead-authored-by: Yuan Zhou <yuan.zhou@intel.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
As the name of the class imply, only directory should be considered for partitions values. Closes apache#7377 from fsaintjacques/ARROW-8726-incorrect-part Authored-by: François Saint-Jacques <fsaintjacques@gmail.com> Signed-off-by: François Saint-Jacques <fsaintjacques@gmail.com>
…lders This PR improves upon the functionality added in apache#7032 by replacing nested usage of `BooleanArray.Builder` with a dedicated class `ArrowBuffer.BitPackedBuilder` that works at buffer level. Array builders then use a `ArrowBuffer.BitPackedBuilder` for their validity map, and `BooleanArray.Builder` uses two (one for values, one for validity). Closes apache#7158 from mr-smidge/arrow-6603-bit-packed-builder-for-nulls Authored-by: Adam Szmigin <adam.szmigin@jetstoneam.com> Signed-off-by: Eric Erhardt <eric.erhardt@microsoft.com>
Was curious how easy this would be and what issues I'd run into. So far I've done `sum()` and `mean()`, and I hit some roadblocks in doing min/max. Issues: * See workarounds in [sum.Array](https://github.com/apache/arrow/compare/master...nealrichardson:r-sum?expand=1#diff-695287341c6d5011a12d7d9bd3ae07adR350-R357). (1) missing/null values are always dropped, while R provides an option for how to treat missingness. Interestingly, it looks like min and max do have this null option supported (in `MinMaxOptions`) but sum and mean do not (yet).([ARROW-9054](https://issues.apache.org/jira/browse/ARROW-9054)) * (2) There is no sum method for boolean type ([ARROW-9055](https://issues.apache.org/jira/browse/ARROW-9055)) * There is no sum implemented for Scalars, which may be fine (I would expect it to be no-op for numeric types, just thought logically it maybe shouldn't error) ([ARROW-9056](https://issues.apache.org/jira/browse/ARROW-9056)) * For min and max, the path is to call the minmax kernel, which returns a struct containing min and max fields, and then extract the appropriate field from the struct. However, StructScalar doesn't seem to have field accessing methods like StructArray has ([ARROW-8769](https://issues.apache.org/jira/browse/ARROW-8769)) * I also ran into the fact that MakeArrayFromScalar can't handle structs (previously ticketed as [ARROW-6604](https://issues.apache.org/jira/browse/ARROW-6604)), which means I can't convert a ScalarStruct to an R object (data.frame). Not required for these kernel bindings though if ARROW-9070 happens. Also: * Summing integers seems to promote to return int64 if given int32 (I didn't try with smaller ints), even when overflow is not a danger (I was adding numbers 1 to 5). It would be nice if it returned the same type it got unless it has to go bigger to avoid overflow. * Given how flexible it is to use CallFunction on a new function without having to write any additional C++ binding, it was surprising/frustrating that in order to call minmax, I had to first add boilerplate to instantiate default MinMaxOptions https://github.com/apache/arrow/pull/7308/files#diff-ffc5e6f7dfee7f9bed7733b7fb41e632R163-R167. I would expect that kernels would be able to instantiate their own default options if called with no options. ([ARROW-9091](https://issues.apache.org/jira/browse/ARROW-9091)) Closes apache#7308 from nealrichardson/r-sum Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
…C enabled Closes apache#7165 from kszucs/ARROW-8785 Lead-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
In the past, we run integration tests from main methods, and recently, we have changed this to run them by the failsafe plugin. This is a good change, but it also leads to significant performance degradation. In the past, it took about 10s to run ITTestLargeVector#testLargeDecimalVector, now it takes more than half an hour. Our investigation shows that the problem was caused by calling HistoricalLog#recordEvent repeatedly. This method is called only when BaseAllocator#DEBUG is enabled. In a unit/integration test, the flag is enabled by default. We solve the problem with the following steps: 1. We set system property to disable the BaseAllocator#DEBUG flag. 2. We change the logic so that the system property takes precedence over the AssertionUtil#isAssertionsEnabled method. This makes the integration tests as fast as before. Closes apache#7271 from liyafan82/fly_0526_reg Authored-by: liyafan82 <fan_li_ya@foxmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
…ly exposed This fixes a few things: - Sending/receiving text headers in Java - Iterating over all headers in Java (binary ones used to be filtered out) - Receiving binary headers in Python Closes apache#7224 from lidavidm/arrow-8858 Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
…ion tests This addresses the integration test error where Java cannot process duplicate field names. This extends StructVector and VectorSchemaRoot to support duplicate field names. Closes apache#7289 from rymurr/ARROW-8948 Authored-by: Ryan Murray <rymurr@dremio.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
A Java service, on a failed unary-unary call, will not send separate headers and trailers, but will instead consolidate headers into the trailers. So C++ clients should check both for headers and trailers, or it may miss headers that the Java server intended for clients. Closes apache#7174 from lidavidm/flight-integration-middleware Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Closes apache#7391 from kou/homebrew-enable-gandiva Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
… LLVM 9 until gandiva-decimal-test is fixed This unblocks LLVM 9 users from running the test suite. Whoever merges this, please leave the JIRA issue open. Closes apache#7396 from wesm/ARROW-9092-triage Authored-by: Wes McKinney <wesm@apache.org> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
superseded by #7402 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.