ARROW-8883: [Rust] [Integration] Enable more tests #8200

nevi-me · 2020-09-16T02:04:36Z

This enables more integration tests, and makes the necessary changes to make them pass.

Status

Primitive Arrays

These are passing, but I have questions on whether the way I'm comparing them is fine.
Specifically, when reading an array that has 3 values, the Buffer is returned aligned to 8 bytes, such that an i8's buffer would have { data: [v1, v2, v3, 0, 0, 0, 0, 0] }.

This causes problems, because when we reconstruct the above from JSON, we would have { data: [v1, v2, v3] }.

I have implemented PartialEq manually for ArrayData at 261c021. I'd appreciate some feedback on whether the approach is fine or not. I deliberately used assert_eq!(a, b, "helpful message") to make it a bit easier to track down the differences between objects being compared, as it's otherwise very difficult and tedious without some log!() implementation.

@jhorstmann 's work on bit slicing has been instrumental in allowing us to compare ArrayData, so I'm leveraging that for buffer comparisons. We however can end up with a situation where slicing buffers with offsets could lead to 2 buffers having different capacities. I don't think this is a bug.

Due to the above, I've also had to compare Buffer by length, and not capacity at 69f4ef3. @sunchao @jorgecarleitao PTAL and advise. I just made the change so tests could pass, but perhaps we could agree on what constitutes equal buffers (whether we should compare capacity).

I'd prefer to make the above change as a separate PR, so I can address offsets and also add tests for coverage.

Complex Types

Lists and structs currently fail because we can't carry custom names and nullability. For example, if a list's child is a list, such child could be named anything (I've seen item or list). We currently have no way of encoding this information; and thus tests fail on schema equality.

I opened https://issues.apache.org/jira/browse/ARROW-10261 to address this. I'll open a PR on the branch that's soon to be renamed from master 🤭, as this will affect Parquet and DataFusion.

Dictionaries

We seem to have covered a bit of dictonary support, but because we can't write dictionaries yet, we can't complete the integration testing.
I've opened https://issues.apache.org/jira/browse/ARROW-10268 for any interesting takers. I think it should be fairly trivial to implement, but I'm already trying to do too much.
@carols10cents this builds on the work that you did for dictionary support from arrow::ipc::conversion, so thanks for that 😃

Metadata

We fail on some metadata tests because arrow::datatypes::Field doesn't support metadata.
This one's a bit difficult to implement, as HashMap<String, String> doesn't support #[derive(Hash)] and other requirements.
I was thinking that maybe we could use Vec<(String, String)> and deal with deduplication of fields, and compatible JSON serde internally.

I opened https://issues.apache.org/jira/browse/ARROW-10259 for the above.

Quicker Testing

One of the reasons why I've taken long to work on this, was because I struggled a bit with running the integration tests (at least I had to go into a few dirs and build stuff). docker-compose --file docker-compose.yml run --rm conda-integration took long, and I was getting sporadic failures in Rust.

I've temporarily created a script called integration-testing.sh, so that whoever pitches in to help out, can use that to quickly build the C++, Java and Rust binaries for testing.

@pitrou @nealrichardson I'll remove the script and the .dockerignore entry before merging this.

Lastly, Antoine, seeing that we can't complete this in time for 2.0.0, I'm fine with leaving the implementation status fields on Rust as blank for a few more month.

Apologies for the "CC the world" update, but I think this is important enough for us to get more eyes on it.

github-actions · 2020-09-16T02:07:16Z

https://issues.apache.org/jira/browse/ARROW-8883

pitrou · 2020-09-25T19:37:44Z

Remember to update the implementation status when done ;-)

jorgecarleitao · 2020-10-10T13:55:01Z

Thanks a lot for driving this and CCing. This is definitely important.

I have myself hit that capacity problem multiple times! One was when I was trying to simplify the equal.rs, for what you wrote on which buffers are different when the capacity is different, the second one was on #8401 , since when we have a buffer that receives data via the c data interface, we do not even know (or care) about its capacity.

AFAI can tell, we have two use-cases of capacity atm:

deallocating the region when it is no longer used
computing the total size in bytes of arrays

Because arrays share buffers, the total size of an array is currently misleading. For example, when the array is computed from is_not_null of another array, both the null buffer (buffer 0) and the value (buffer 1) share the same memory region, and thus IMO the total size computation based on capacity is incorrect. This is also true for complex structs on which buffers are shared within the same array.

Thus, IMO capacity main use-case is for bookkeeping, on how to de-allocate the region. #8401 systematizes that idea, on which BufferData (renamed Bytes there) no longer has capacity, but an enum about how to de-allocate itself.

Regardless, I would say that the vast majority of the use-cases on which we want to compare buffers is when we want to compare its contents, irrespectively of how they should be deallocated / capacity. Therefore, I would be happy to have buffer comparison be based on their actual content (in bytes). We just need to be careful about bitmaps, on which the comparison should be made in bits.

nevi-me · 2020-10-10T14:03:04Z

Thus, IMO capacity main use-case is for bookkeeping, on how to de-allocate the region. #8401 systematizes that idea, on which BufferData (renamed Bytes there) no longer has capacity, but an enum about how to de-allocate itself.

I agree with your thinking here, especially on equality and arrays with offsets. One of the things we don't test (extensively or at all) is whether we're able to write arrays with unaligned offsets to the IPC format. I'll put in time to work on this whenever I get to v5 of the Arrow format.

Regardless, I would say that the vast majority of the use-cases on which we want to compare buffers is when we want to compare its contents, irrespectively of how they should be deallocated / capacity. Therefore, I would be happy to have buffer comparison be based on their actual content (in bytes). We just need to be careful about bitmaps, on which the comparison should be made in bits.

I'd also prefer not to compare using buffer capacity, as I'm mainly interested in the buffer contents, especially when dealing with a frozen buffer where we won't be modifying contents.

On bitmaps, I'll have a look, but I think that the approach of using the bit iterators should cover this.

Apologies again for not having gotten to #8401 yet, I hope you're not in a rush to get it implemented (esp as we're now looking at 3.0 for all new work).

jorgecarleitao

I did a quick run trough and left some comments. I am still missing the large bin file.

So far, all changes make sense to me (apart from the comments).

rust/arrow/src/array/data.rs

rust/arrow/src/ipc/convert.rs

rust/arrow/src/util/integration_util.rs

This is a major refactor of the `equal.rs` module. The rational for this change is many fold: * currently array comparison requires downcasting the array ref to its concrete types. This is painful and not very ergonomics, as the user must "guess" what to downcast for comparison. We can see this in the hacks around `sort`, `take` and `concatenate` kernel's tests, and some of the tests of the builders. * the code in array comparison is difficult to follow given the amount of calls that they perform around offsets. * The implementation currently indirectly uses many of the `unsafe` APIs that we have (via pointer aritmetics), which makes it risky to operate and mutate. * Some code is being repeated. This PR: 1. adds `impl PartialEq for dyn Array`, to allow `Array` comparison based on `Array::data` (main change) 2. Makes array equality to only depend on `ArrayData`, i.e. it no longer depends on concrete array types (such as `PrimitiveArray` and related API) to perform comparisons. 3. Significantly reduces the risk of panics and UB when composite arrays are of different types, by checking the types on `range` comparison 4. Makes array equality be statically dispatched, via `match datatype`. 5. DRY the code around array equality 6. Fixes an error in equality of dictionary with equal values 7. Added tests to equalities that were not tested (fixed binary, some edge cases of dictionaries) 8. splits `equal.rs` in smaller, more manageable files. 9. Removes `ArrayListOps`, since it it no longer needed 10. Moves Json equality to its own module, for clarity. 11. removes the need to have two functions per type to compare arrays. 12. Adds the number of buffers and their respective width to datatypes from the specification. This was backported from #8401 13. adds a benchmark for array equality Note that this does not implement `PartialEq` for `ArrayData`, only `dyn Array`, as different data does not imply a different array (due to nullability). That implementation is being worked on #8200. IMO this PR significantly simplifies the code around array comparison, to the point where many implementations are 5 lines long. This also improves performance by 10-40%. <details> <summary>Benchmark results</summary> ``` Previous HEAD position was 3dd3c69 Added bench for equality. Switched to branch 'equal' Your branch is up to date with 'origin/equal'. Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow) Finished bench [optimized] target(s) in 51.28s Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/equal-176c3cb11360bd12 Gnuplot not found, using plotters backend equal_512 time: [36.861 ns 36.894 ns 36.934 ns] change: [-43.752% -43.400% -43.005%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 7 (7.00%) high mild 5 (5.00%) high severe equal_nulls_512 time: [2.3271 us 2.3299 us 2.3331 us] change: [-10.846% -9.0877% -7.7336%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) high mild 7 (7.00%) high severe equal_string_512 time: [49.219 ns 49.347 ns 49.517 ns] change: [-30.789% -30.538% -30.235%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) high mild 6 (6.00%) high severe equal_string_nulls_512 time: [3.7873 us 3.7939 us 3.8013 us] change: [-8.2944% -7.0636% -5.4266%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) high mild 8 (8.00%) high severe ``` </details> All tests are there, plus new tests for some of the edge cases and untested arrays. This change is backward incompatible `array1.equals(&array2)` no longer works: use `array1 == array2` instead, which is the idiomatic way of comparing structs and trait objects in rust. Closes #8541 from jorgecarleitao/equal Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Signed-off-by: Neville Dipale <nevilledips@gmail.com>

Tested with Rust and Java, but there are Java failures make ARROW_MAGIC and CONT_MARKER consts remove unused import

We weren't able to read dictionaries because of legacy issues

expose Field's dictionary fields via methods some progress on dictionary integration

The intention is not to commit this, but to share with other users who might be interested in helping out with the integration testing. I found the docker-compose process too slow, and it was returning errors on Rust :(

This is to be more in line with integration testing behaviour

@sunchao

After slicing buffers, we can end up with 2 buffers having the same len and data, but not capacity. Such buffers fail the equality test. CC @sunchao jorgecarleitao as you've worked on this. Perhaps we can find a suitable definition of equality, as we're otherwise failing integration tests because of this.

We were only doing basic comparisons of array properties, such as whether data types and lengths are the same. This extends this to test the array data. fix lint

remove my integration testing helper rebase, fix failing test

(cherry picked from commit 03b8aa4)

(cherry picked from commit 448172d)

(cherry picked from commit 2eaae67)

(cherry picked from commit 0122cd5)

I suspect the failures are due to array equality. I'll have a look into them separately

nevi-me · 2020-11-10T06:31:30Z

@jorgecarleitao @carols10cents I'm contemplating closing this, and breaking it up into smaller PRs (if this round of changes don't pass integration). I'll work on it this week in my evenings (GMT+2) so I can get it out of the way, and go back to working on Parquet.

jorgecarleitao · 2020-11-10T06:33:26Z

From what I have read in this PR, it looks good so far. Is there anything blocking it?

nevi-me · 2020-11-10T06:39:14Z

From what I have read in this PR, it looks good so far. Is there anything blocking it?

@jorgecarleitao the actual integration tests not passing, are the blocker 😅

There were 10 failures in the last run, logical equality related. It's actually odd, because on my local testing I get 120+ failures between Rust and the other langs (except C++, that passes).

I'll monitor https://github.com/apache/arrow/pull/8200/checks?check_run_id=1378263093 in the background, if it passes; we can merge this (if you don't have any outstanding queries/concerns).

I used your suggested implementation of logical equality (dde96c6)

nevi-me · 2020-11-10T06:57:35Z

@jorgecarleitao we're down to 3 failures.

thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 0', arrow/src/array/equal/variable_size.rs:31:21

Looks like we're not comparing zero length arrays correctly

nevi-me · 2020-11-10T19:08:53Z

@jorgecarleitao finally passed! There's still some tests that don't pass, but I've updated the status documenation with them (cc @andygrove as you had opened a JIRA for this)

@jhorstmann

This enables more integration tests, and makes the necessary changes to make them pass. ___ # Status ## Primitive Arrays These are passing, but I have questions on whether the way I'm comparing them is fine. Specifically, when reading an array that has 3 values, the `Buffer` is returned aligned to 8 bytes, such that an `i8`'s buffer would have `{ data: [v1, v2, v3, 0, 0, 0, 0, 0] }`. This causes problems, because when we reconstruct the above from JSON, we would have `{ data: [v1, v2, v3] }`. I have implemented `PartialEq` manually for `ArrayData` at 261c021. I'd appreciate some feedback on whether the approach is fine or not. I deliberately used `assert_eq!(a, b, "helpful message")` to make it a bit easier to track down the differences between objects being compared, as it's otherwise very difficult and tedious without some `log!()` implementation. @jhorstmann 's work on bit slicing has been instrumental in allowing us to compare `ArrayData`, so I'm leveraging that for buffer comparisons. We however can end up with a situation where slicing buffers with offsets could lead to 2 buffers having different capacities. _I don't think this is a bug_. Due to the above, I've also had to compare `Buffer` by length, and not capacity at 69f4ef3. @sunchao @jorgecarleitao PTAL and advise. I just made the change so tests could pass, but perhaps we could agree on what constitutes equal buffers (whether we should compare capacity). I'd prefer to make the above change as a separate PR, so I can address offsets and also add tests for coverage. ## Complex Types Lists and structs currently fail because we can't carry custom names and nullability. For example, if a list's child is a list, such child could be named anything (I've seen `item` or `list`). We currently have no way of encoding this information; and thus tests fail on schema equality. I opened https://issues.apache.org/jira/browse/ARROW-10261 to address this. I'll open a PR on the branch that's soon to be renamed from `master` 🤭, as this will affect Parquet and DataFusion. ## Dictionaries We seem to have covered a bit of dictonary support, but because we can't write dictionaries yet, we can't complete the integration testing. I've opened https://issues.apache.org/jira/browse/ARROW-10268 for any interesting takers. I think it should be fairly trivial to implement, but I'm already trying to do too much. @carols10cents this builds on the work that you did for dictionary support from `arrow::ipc::conversion`, so thanks for that 😃 ## Metadata We fail on some metadata tests because `arrow::datatypes::Field` doesn't support metadata. This one's a bit difficult to implement, as `HashMap<String, String>` doesn't support `#[derive(Hash)]` and other requirements. I was thinking that maybe we could use `Vec<(String, String)>` and deal with deduplication of fields, and compatible JSON serde internally. I opened https://issues.apache.org/jira/browse/ARROW-10259 for the above. ## Quicker Testing One of the reasons why I've taken long to work on this, was because I struggled a bit with running the integration tests (at least I had to go into a few dirs and build stuff). `docker-compose --file docker-compose.yml run --rm conda-integration` took long, and I was getting sporadic failures in Rust. I've temporarily created a script called `integration-testing.sh`, so that whoever pitches in to help out, can use that to quickly build the C++, Java and Rust binaries for testing. @pitrou @nealrichardson I'll remove the script and the .dockerignore entry before merging this. Lastly, Antoine, seeing that we can't complete this in time for 2.0.0, I'm fine with leaving the implementation status fields on Rust as blank for a few more month. Apologies for the "CC the world" update, but I think this is important enough for us to get more eyes on it. Closes apache#8200 from nevi-me/rust-integration-more-tests-sep2020 Lead-authored-by: Neville Dipale <nevilledips@gmail.com> Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com> Signed-off-by: Neville Dipale <nevilledips@gmail.com>

@nevi-me

…info The purpose of this PR is refactoring `read_dictionary` to only need one kind of `Schema`, which lets us then remove the `find_dictionary_field` function and the `ipc_schema` field on `StreamReader` by adding a way to look up schema fields that use a particular dictionary by ID. I'm also resubmitting a change to the `dict_id`/`dict_is_ordered` methods on `Field`; I had submitted this to @nevi-me to become part of #8200 but it looks like it got lost in a rebase or something? I think it's more correct to only return values if the fields have a dictionary as their datatype. Closes #8820 from carols10cents/simplify-dictionary-field-lookup Authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com> Signed-off-by: Neville Dipale <nevilledips@gmail.com>

This is a major refactor of the `equal.rs` module. The rational for this change is many fold: * currently array comparison requires downcasting the array ref to its concrete types. This is painful and not very ergonomics, as the user must "guess" what to downcast for comparison. We can see this in the hacks around `sort`, `take` and `concatenate` kernel's tests, and some of the tests of the builders. * the code in array comparison is difficult to follow given the amount of calls that they perform around offsets. * The implementation currently indirectly uses many of the `unsafe` APIs that we have (via pointer aritmetics), which makes it risky to operate and mutate. * Some code is being repeated. This PR: 1. adds `impl PartialEq for dyn Array`, to allow `Array` comparison based on `Array::data` (main change) 2. Makes array equality to only depend on `ArrayData`, i.e. it no longer depends on concrete array types (such as `PrimitiveArray` and related API) to perform comparisons. 3. Significantly reduces the risk of panics and UB when composite arrays are of different types, by checking the types on `range` comparison 4. Makes array equality be statically dispatched, via `match datatype`. 5. DRY the code around array equality 6. Fixes an error in equality of dictionary with equal values 7. Added tests to equalities that were not tested (fixed binary, some edge cases of dictionaries) 8. splits `equal.rs` in smaller, more manageable files. 9. Removes `ArrayListOps`, since it it no longer needed 10. Moves Json equality to its own module, for clarity. 11. removes the need to have two functions per type to compare arrays. 12. Adds the number of buffers and their respective width to datatypes from the specification. This was backported from apache#8401 13. adds a benchmark for array equality Note that this does not implement `PartialEq` for `ArrayData`, only `dyn Array`, as different data does not imply a different array (due to nullability). That implementation is being worked on apache#8200. IMO this PR significantly simplifies the code around array comparison, to the point where many implementations are 5 lines long. This also improves performance by 10-40%. <details> <summary>Benchmark results</summary> ``` Previous HEAD position was 3dd3c69 Added bench for equality. Switched to branch 'equal' Your branch is up to date with 'origin/equal'. Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow) Finished bench [optimized] target(s) in 51.28s Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/equal-176c3cb11360bd12 Gnuplot not found, using plotters backend equal_512 time: [36.861 ns 36.894 ns 36.934 ns] change: [-43.752% -43.400% -43.005%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 7 (7.00%) high mild 5 (5.00%) high severe equal_nulls_512 time: [2.3271 us 2.3299 us 2.3331 us] change: [-10.846% -9.0877% -7.7336%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) high mild 7 (7.00%) high severe equal_string_512 time: [49.219 ns 49.347 ns 49.517 ns] change: [-30.789% -30.538% -30.235%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) high mild 6 (6.00%) high severe equal_string_nulls_512 time: [3.7873 us 3.7939 us 3.8013 us] change: [-8.2944% -7.0636% -5.4266%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) high mild 8 (8.00%) high severe ``` </details> All tests are there, plus new tests for some of the edge cases and untested arrays. This change is backward incompatible `array1.equals(&array2)` no longer works: use `array1 == array2` instead, which is the idiomatic way of comparing structs and trait objects in rust. Closes apache#8541 from jorgecarleitao/equal Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Signed-off-by: Neville Dipale <nevilledips@gmail.com>

@jhorstmann

This enables more integration tests, and makes the necessary changes to make them pass. ___ # Status ## Primitive Arrays These are passing, but I have questions on whether the way I'm comparing them is fine. Specifically, when reading an array that has 3 values, the `Buffer` is returned aligned to 8 bytes, such that an `i8`'s buffer would have `{ data: [v1, v2, v3, 0, 0, 0, 0, 0] }`. This causes problems, because when we reconstruct the above from JSON, we would have `{ data: [v1, v2, v3] }`. I have implemented `PartialEq` manually for `ArrayData` at 261c021. I'd appreciate some feedback on whether the approach is fine or not. I deliberately used `assert_eq!(a, b, "helpful message")` to make it a bit easier to track down the differences between objects being compared, as it's otherwise very difficult and tedious without some `log!()` implementation. @jhorstmann 's work on bit slicing has been instrumental in allowing us to compare `ArrayData`, so I'm leveraging that for buffer comparisons. We however can end up with a situation where slicing buffers with offsets could lead to 2 buffers having different capacities. _I don't think this is a bug_. Due to the above, I've also had to compare `Buffer` by length, and not capacity at 69f4ef3. @sunchao @jorgecarleitao PTAL and advise. I just made the change so tests could pass, but perhaps we could agree on what constitutes equal buffers (whether we should compare capacity). I'd prefer to make the above change as a separate PR, so I can address offsets and also add tests for coverage. ## Complex Types Lists and structs currently fail because we can't carry custom names and nullability. For example, if a list's child is a list, such child could be named anything (I've seen `item` or `list`). We currently have no way of encoding this information; and thus tests fail on schema equality. I opened https://issues.apache.org/jira/browse/ARROW-10261 to address this. I'll open a PR on the branch that's soon to be renamed from `master` 🤭, as this will affect Parquet and DataFusion. ## Dictionaries We seem to have covered a bit of dictonary support, but because we can't write dictionaries yet, we can't complete the integration testing. I've opened https://issues.apache.org/jira/browse/ARROW-10268 for any interesting takers. I think it should be fairly trivial to implement, but I'm already trying to do too much. @carols10cents this builds on the work that you did for dictionary support from `arrow::ipc::conversion`, so thanks for that 😃 ## Metadata We fail on some metadata tests because `arrow::datatypes::Field` doesn't support metadata. This one's a bit difficult to implement, as `HashMap<String, String>` doesn't support `#[derive(Hash)]` and other requirements. I was thinking that maybe we could use `Vec<(String, String)>` and deal with deduplication of fields, and compatible JSON serde internally. I opened https://issues.apache.org/jira/browse/ARROW-10259 for the above. ## Quicker Testing One of the reasons why I've taken long to work on this, was because I struggled a bit with running the integration tests (at least I had to go into a few dirs and build stuff). `docker-compose --file docker-compose.yml run --rm conda-integration` took long, and I was getting sporadic failures in Rust. I've temporarily created a script called `integration-testing.sh`, so that whoever pitches in to help out, can use that to quickly build the C++, Java and Rust binaries for testing. @pitrou @nealrichardson I'll remove the script and the .dockerignore entry before merging this. Lastly, Antoine, seeing that we can't complete this in time for 2.0.0, I'm fine with leaving the implementation status fields on Rust as blank for a few more month. Apologies for the "CC the world" update, but I think this is important enough for us to get more eyes on it. Closes apache#8200 from nevi-me/rust-integration-more-tests-sep2020 Lead-authored-by: Neville Dipale <nevilledips@gmail.com> Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com> Signed-off-by: Neville Dipale <nevilledips@gmail.com>

@nevi-me

…info The purpose of this PR is refactoring `read_dictionary` to only need one kind of `Schema`, which lets us then remove the `find_dictionary_field` function and the `ipc_schema` field on `StreamReader` by adding a way to look up schema fields that use a particular dictionary by ID. I'm also resubmitting a change to the `dict_id`/`dict_is_ordered` methods on `Field`; I had submitted this to @nevi-me to become part of apache#8200 but it looks like it got lost in a rebase or something? I think it's more correct to only return values if the fields have a dictionary as their datatype. Closes apache#8820 from carols10cents/simplify-dictionary-field-lookup Authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com> Signed-off-by: Neville Dipale <nevilledips@gmail.com>

andygrove added the Component: Rust label Sep 16, 2020

nevi-me force-pushed the rust-integration-more-tests-sep2020 branch 2 times, most recently from 63c6e4c to 24d274a Compare October 10, 2020 02:59

nevi-me force-pushed the rust-integration-more-tests-sep2020 branch from 832c187 to d103c3f Compare October 27, 2020 05:10

nevi-me marked this pull request as ready for review October 27, 2020 05:11

nevi-me requested review from andygrove, pitrou and jorgecarleitao October 27, 2020 05:11

jorgecarleitao mentioned this pull request Oct 27, 2020

ARROW-10402: [Rust] Refactor array equality #8541

Closed

jorgecarleitao reviewed Oct 27, 2020

View reviewed changes

rust/arrow/src/array/data.rs Outdated Show resolved Hide resolved

rust/arrow/src/ipc/convert.rs Outdated Show resolved Hide resolved

rust/arrow/src/util/integration_util.rs Outdated Show resolved Hide resolved

nevi-me mentioned this pull request Oct 29, 2020

ARROW-10413: [Rust] [Parquet] Unignore some tests that are passing now #8546

Closed

nevi-me force-pushed the rust-integration-more-tests-sep2020 branch from d103c3f to e3b6e42 Compare October 29, 2020 06:20

carols10cents mentioned this pull request Nov 6, 2020

Rebased/added to Rust integration more tests sep2020 nevi-me/arrow#8

Closed

nevi-me force-pushed the rust-integration-more-tests-sep2020 branch 2 times, most recently from 5ec1b21 to 4b5dc0d Compare November 7, 2020 12:37

nevi-me and others added 10 commits November 10, 2020 08:09

fixes to get more tests passing

c706127

Tested with Rust and Java, but there are Java failures make ARROW_MAGIC and CONT_MARKER consts remove unused import

port Carol's changes to dictionary serialization

c50c0f4

fix IPC file reader's dictionary message offsets

cb331f7

We weren't able to read dictionaries because of legacy issues

add serde_json error

782c925

expose Field's dictionary fields via methods some progress on dictionary integration

Add integration script helper

ffdd8aa

The intention is not to commit this, but to share with other users who might be interested in helping out with the integration testing. I found the docker-compose process too slow, and it was returning errors on Rust :(

don't serialize metadata to JSON if empty

41fabed

This is to be more in line with integration testing behaviour

Add logic to compare array data

fec4a17

We were only doing basic comparisons of array properties, such as whether data types and lengths are the same. This extends this to test the array data. fix lint

disable failing tests, document IPC from tests

f0bbcfb

remove my integration testing helper rebase, fix failing test

Fix Docker linter problem

7f3fc87

(cherry picked from commit 03b8aa4)

carols10cents and others added 6 commits November 10, 2020 08:09

Keep track of Dictionaries in an optional HashMap

904113c

(cherry picked from commit 448172d)

Only return dict_id/dict_is_ordered values for Dictionary types

9fd9cbc

(cherry picked from commit 2eaae67)

Remove dbg and commented-out println

b565295

(cherry picked from commit 0122cd5)

fix issues from rebase

ec2a100

C++ tests that now pass

dc2b01a

I suspect the failures are due to array equality. I'll have a look into them separately

[EXPERIMENT] Test impact of changing logical eq

dde96c6

nevi-me force-pushed the rust-integration-more-tests-sep2020 branch from 2e3763f to dde96c6 Compare November 10, 2020 06:26

nevi-me added 2 commits November 10, 2020 19:38

prevent panic in offset_value_equal

3038bc4

disable failing test, update status doc

e43c394

nevi-me closed this in 18b9281 Nov 10, 2020

carols10cents mentioned this pull request Dec 2, 2020

ARROW-10791: [Rust] StreamReader, read_dictionary duplicating schema info #8820

Closed

nevi-me deleted the rust-integration-more-tests-sep2020 branch January 5, 2021 19:11

asfimport mentioned this pull request Nov 10, 2020

[Rust] [Integration Testing] Enable passing tests and update spec doc #25018

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-8883: [Rust] [Integration] Enable more tests #8200

ARROW-8883: [Rust] [Integration] Enable more tests #8200

nevi-me commented Sep 16, 2020 •

edited

github-actions bot commented Sep 16, 2020

pitrou commented Sep 25, 2020

jorgecarleitao commented Oct 10, 2020

nevi-me commented Oct 10, 2020

jorgecarleitao left a comment

nevi-me commented Nov 10, 2020

jorgecarleitao commented Nov 10, 2020

nevi-me commented Nov 10, 2020

nevi-me commented Nov 10, 2020

nevi-me commented Nov 10, 2020

ARROW-8883: [Rust] [Integration] Enable more tests #8200

ARROW-8883: [Rust] [Integration] Enable more tests #8200

Conversation

nevi-me commented Sep 16, 2020 • edited

Status

Primitive Arrays

Complex Types

Dictionaries

Metadata

Quicker Testing

github-actions bot commented Sep 16, 2020

pitrou commented Sep 25, 2020

jorgecarleitao commented Oct 10, 2020

nevi-me commented Oct 10, 2020

jorgecarleitao left a comment

Choose a reason for hiding this comment

nevi-me commented Nov 10, 2020

jorgecarleitao commented Nov 10, 2020

nevi-me commented Nov 10, 2020

nevi-me commented Nov 10, 2020

nevi-me commented Nov 10, 2020

nevi-me commented Sep 16, 2020 •

edited