Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release][Go] Verification tasks for the Release Candidate fail on Go parquet encryption tests #38345

Closed
raulcd opened this issue Oct 19, 2023 · 5 comments · Fixed by #38362 or #38367
Closed
Assignees
Milestone

Comments

@raulcd
Copy link
Member

raulcd commented Oct 19, 2023

Describe the bug, including details regarding any error messages, version, and platform.

We have created the RC 2 for the Apache Arrow 14.0.0 release and currently we are facing the following error on the release verification for the go sources:

 /tmp/parquet-encryption-test-3157156869
--- FAIL: TestFileEncryptionDecryption (0.40s)
    --- FAIL: TestFileEncryptionDecryption/TestDecryption (0.34s)
        --- FAIL: TestFileEncryptionDecryption/TestDecryption/uniform_encryption.parquet.encrypted (0.07s)
            --- FAIL: TestFileEncryptionDecryption/TestDecryption/uniform_encryption.parquet.encrypted/config_1 (0.00s)
                encryption_read_config_test.go:217: 
                    	Error Trace:	/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:217
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:406
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:430
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:468
                    	            				/tmp/arrow-14.0.0.Rm87k/go/gopath/pkg/mod/github.com/stretchr/testify@v1.8.4/suite/suite.go:112
                    	Error:      	Not equal: 
                    	            	expected: int(0)
                    	            	actual  : int64(50)
                    	Test:       	TestFileEncryptionDecryption/TestDecryption/uniform_encryption.parquet.encrypted/config_1
            --- FAIL: TestFileEncryptionDecryption/TestDecryption/uniform_encryption.parquet.encrypted/config_3 (0.01s)
                encryption_read_config_test.go:217: 
                    	Error Trace:	/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:217
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:406
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:430
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:468
                    	            				/tmp/arrow-14.0.0.Rm87k/go/gopath/pkg/mod/github.com/stretchr/testify@v1.8.4/suite/suite.go:112
                    	Error:      	Not equal: 
                    	            	expected: int(0)
                    	            	actual  : int64(50)
                    	Test:       	TestFileEncryptionDecryption/TestDecryption/uniform_encryption.parquet.encrypted/config_3
        --- FAIL: TestFileEncryptionDecryption/TestDecryption/encrypt_columns_and_footer.parquet.encrypted (0.06s)
            --- FAIL: TestFileEncryptionDecryption/TestDecryption/encrypt_columns_and_footer.parquet.encrypted/config_1 (0.01s)
                encryption_read_config_test.go:217: 
                    	Error Trace:	/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:217
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:406
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:430
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:468
                    	            				/tmp/arrow-14.0.0.Rm87k/go/gopath/pkg/mod/github.com/stretchr/testify@v1.8.4/suite/suite.go:112
                    	Error:      	Not equal: 
                    	            	expected: int(0)
                    	            	actual  : int64(50)
                    	Test:       	TestFileEncryptionDecryption/TestDecryption/encrypt_columns_and_footer.parquet.encrypted/config_1
            --- FAIL: TestFileEncryptionDecryption/TestDecryption/encrypt_columns_and_footer.parquet.encrypted/config_3 (0.01s)
                encryption_read_config_test.go:217: 
                    	Error Trace:	/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:217
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:406
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:430
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:468
                    	            				/tmp/arrow-14.0.0.Rm87k/go/gopath/pkg/mod/github.com/stretchr/testify@v1.8.4/suite/suite.go:112
                    	Error:      	Not equal: 
                    	            	expected: int(0)
                    	            	actual  : int64(50)
                    	Test:       	TestFileEncryptionDecryption/TestDecryption/encrypt_columns_and_footer.parquet.encrypted/config_3
        --- FAIL: TestFileEncryptionDecryption/TestDecryption/encrypt_columns_plaintext_footer.parquet.encrypted (0.08s)
            --- FAIL: TestFileEncryptionDecryption/TestDecryption/encrypt_columns_plaintext_footer.parquet.encrypted/config_1 (0.01s)
                encryption_read_config_test.go:217: 
                    	Error Trace:	/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:217
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:406
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:430
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:468
                    	            				/tmp/arrow-14.0.0.Rm87k/go/gopath/pkg/mod/github.com/stretchr/testify@v1.8.4/suite/suite.go:112
                    	Error:      	Not equal: 
                    	            	expected: int(0)
                    	            	actual  : int64(50)
                    	Test:       	TestFileEncryptionDecryption/TestDecryption/encrypt_columns_plaintext_footer.parquet.encrypted/config_1
            --- FAIL: TestFileEncryptionDecryption/TestDecryption/encrypt_columns_plaintext_footer.parquet.encrypted/config_3 (0.00s)
                encryption_read_config_test.go:217: 
                    	Error Trace:	/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:217
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:406
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:430
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:468
                    	            				/tmp/arrow-14.0.0.Rm87k/go/gopath/pkg/mod/github.com/stretchr/testify@v1.8.4/suite/suite.go:112
                    	Error:      	Not equal: 
                    	            	expected: int(0)
                    	            	actual  : int64(50)
                    	Test:       	TestFileEncryptionDecryption/TestDecryption/encrypt_columns_plaintext_footer.parquet.encrypted/config_3
            --- FAIL: TestFileEncryptionDecryption/TestDecryption/encrypt_columns_plaintext_footer.parquet.encrypted/config_4 (0.00s)
                encryption_read_config_test.go:217: 
                    	Error Trace:	/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:217
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:406
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:430
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:468
                    	            				/tmp/arrow-14.0.0.Rm87k/go/gopath/pkg/mod/github.com/stretchr/testify@v1.8.4/suite/suite.go:112
                    	Error:      	Not equal: 
                    	            	expected: int(0)
                    	            	actual  : int64(50)
                    	Test:       	TestFileEncryptionDecryption/TestDecryption/encrypt_columns_plaintext_footer.parquet.encrypted/config_4
        --- FAIL: TestFileEncryptionDecryption/TestDecryption/encrypt_columns_and_footer_aad.parquet.encrypted (0.06s)
            --- FAIL: TestFileEncryptionDecryption/TestDecryption/encrypt_columns_and_footer_aad.parquet.encrypted/config_1 (0.00s)
                encryption_read_config_test.go:217: 
                    	Error Trace:	/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:217
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:406
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:430
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:468
                    	            				/tmp/arrow-14.0.0.Rm87k/go/gopath/pkg/mod/github.com/stretchr/testify@v1.8.4/suite/suite.go:112
                    	Error:      	Not equal: 
                    	            	expected: int(0)
                    	            	actual  : int64(50)
                    	Test:       	TestFileEncryptionDecryption/TestDecryption/encrypt_columns_and_footer_aad.parquet.encrypted/config_1
            --- FAIL: TestFileEncryptionDecryption/TestDecryption/encrypt_columns_and_footer_aad.parquet.encrypted/config_2 (0.00s)
                encryption_read_config_test.go:217: 
                    	Error Trace:	/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:217
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:406
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:430
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:468
                    	            				/tmp/arrow-14.0.0.Rm87k/go/gopath/pkg/mod/github.com/stretchr/testify@v1.8.4/suite/suite.go:112
                    	Error:      	Not equal: 
                    	            	expected: int(0)
                    	            	actual  : int64(50)
                    	Test:       	TestFileEncryptionDecryption/TestDecryption/encrypt_columns_and_footer_aad.parquet.encrypted/config_2
            --- FAIL: TestFileEncryptionDecryption/TestDecryption/encrypt_columns_and_footer_aad.parquet.encrypted/config_3 (0.00s)
                encryption_read_config_test.go:217: 
                    	Error Trace:	/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:217
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:406
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:430
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:468
                    	            				/tmp/arrow-14.0.0.Rm87k/go/gopath/pkg/mod/github.com/stretchr/testify@v1.8.4/suite/suite.go:112
                    	Error:      	Not equal: 
                    	            	expected: int(0)
                    	            	actual  : int64(50)
                    	Test:       	TestFileEncryptionDecryption/TestDecryption/encrypt_columns_and_footer_aad.parquet.encrypted/config_3
        --- FAIL: TestFileEncryptionDecryption/TestDecryption/encrypt_columns_and_footer_disable_aad_storage.parquet.encrypted (0.02s)
            --- FAIL: TestFileEncryptionDecryption/TestDecryption/encrypt_columns_and_footer_disable_aad_storage.parquet.encrypted/config_2 (0.00s)
                encryption_read_config_test.go:217: 
                    	Error Trace:	/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:217
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:406
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:430
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:468
                    	            				/tmp/arrow-14.0.0.Rm87k/go/gopath/pkg/mod/github.com/stretchr/testify@v1.8.4/suite/suite.go:112
                    	Error:      	Not equal: 
                    	            	expected: int(0)
                    	            	actual  : int64(50)
                    	Test:       	TestFileEncryptionDecryption/TestDecryption/encrypt_columns_and_footer_disable_aad_storage.parquet.encrypted/config_2
        --- FAIL: TestFileEncryptionDecryption/TestDecryption/encrypt_columns_and_footer_ctr.parquet.encrypted (0.05s)
            --- FAIL: TestFileEncryptionDecryption/TestDecryption/encrypt_columns_and_footer_ctr.parquet.encrypted/config_1 (0.00s)
                encryption_read_config_test.go:217: 
                    	Error Trace:	/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:217
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:406
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:430
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:468
                    	            				/tmp/arrow-14.0.0.Rm87k/go/gopath/pkg/mod/github.com/stretchr/testify@v1.8.4/suite/suite.go:112
                    	Error:      	Not equal: 
                    	            	expected: int(0)
                    	            	actual  : int64(50)
                    	Test:       	TestFileEncryptionDecryption/TestDecryption/encrypt_columns_and_footer_ctr.parquet.encrypted/config_1
            --- FAIL: TestFileEncryptionDecryption/TestDecryption/encrypt_columns_and_footer_ctr.parquet.encrypted/config_3 (0.01s)
                encryption_read_config_test.go:217: 
                    	Error Trace:	/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:217
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:406
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:430
                    	            				/tmp/arrow-14.0.0.Rm87k/apache-arrow-14.0.0/go/parquet/encryption_read_config_test.go:468
                    	            				/tmp/arrow-14.0.0.Rm87k/go/gopath/pkg/mod/github.com/stretchr/testify@v1.8.4/suite/suite.go:112
                    	Error:      	Not equal: 
                    	            	expected: int(0)
                    	            	actual  : int64(50)
                    	Test:       	TestFileEncryptionDecryption/TestDecryption/encrypt_columns_and_footer_ctr.parquet.encrypted/config_3
/tmp/parquet-encryption-test-3429813239
FAIL
FAIL	github.com/apache/arrow/go/v14/parquet	0.464s

This is reproducible locally with:

archery docker run  -e VERIFY_VERSION="14.0.0" -e VERIFY_RC="2" -e TEST_DEFAULT=0 -e TEST_GO=1 almalinux-verify-rc

We didn't noticed before because this doesn't fail with non-official releases (local source):

archery docker run  -e VERIFY_VERSION="" -e VERIFY_RC="" -e TEST_DEFAULT=0 -e TEST_GO=1 almalinux-verify-rc

See error here: https://github.com/ursacomputing/crossbow/actions/runs/6572466559/job/17853633830
And workflow file: https://github.com/ursacomputing/crossbow/actions/runs/6572466559/workflow

Successful one on maintenance branch where we test from local: https://github.com/ursacomputing/crossbow/actions/runs/6572711355/job/17854324601
And workflow file: https://github.com/ursacomputing/crossbow/actions/runs/6572711355/workflow

Component(s)

Go, Release

@raulcd raulcd added Type: bug Priority: Blocker Marks a blocker for the release labels Oct 19, 2023
@raulcd raulcd added this to the 14.0.0 milestone Oct 19, 2023
@raulcd
Copy link
Member Author

raulcd commented Oct 19, 2023

@zeroshade @kou I am investigating but I am really confused on why this only fails if we are verifying the Release from an official release but it is passing if we run the verification tasks from a local checkout. As reported on the issue this is reproducible locally if we use:
archery docker run -e VERIFY_VERSION="14.0.0" -e VERIFY_RC="2" -e TEST_DEFAULT=0 -e TEST_GO=1 almalinux-verify-rc but is successful if we use archery docker run -e VERIFY_VERSION="" -e VERIFY_RC="" -e TEST_DEFAULT=0 -e TEST_GO=1 almalinux-verify-rc with the RC branch

@zeroshade
Copy link
Member

I was able to reproduce the failure locally by manually and forcibly updating the parquet-testing submodule. It looks like a recent update there wrote some (or all) of the files so that the boolean columns used RLE encoding instead of plain encoding which the Go impl doesn't seem to support at the moment. Gonna take a quick look because it shouldn't be too difficult to add and test to confirm that it will work for the verification tests.

@kou
Copy link
Member

kou commented Oct 19, 2023

I also think that apache/parquet-testing#39 is related.

@kou
Copy link
Member

kou commented Oct 19, 2023

How about this?

diff --git a/dev/release/verify-release-candidate.sh b/dev/release/verify-release-candidate.sh
index 0c6ac075b..287c557fb 100755
--- a/dev/release/verify-release-candidate.sh
+++ b/dev/release/verify-release-candidate.sh
@@ -959,12 +959,26 @@ ensure_source_directory() {
     fi
   fi
 
-  # Ensure that the testing repositories are cloned
-  if [ ! -d "${ARROW_SOURCE_DIR}/testing/data" ]; then
-    git clone https://github.com/apache/arrow-testing.git ${ARROW_SOURCE_DIR}/testing
+  # Ensure that the testing repositories are prepared
+  if [ ! -d ${ARROW_SOURCE_DIR}/testing/data ]; then
+    if [ -d ${SOURCE_DIR}/../../testing/data ]; then
+      cp -a ${SOURCE_DIR}/../../testing/ ${ARROW_SOURCE_DIR}/
+    else
+      git clone \
+        https://github.com/apache/arrow-testing.git \
+        ${ARROW_SOURCE_DIR}/testing
+    fi
   fi
-  if [ ! -d "${ARROW_SOURCE_DIR}/cpp/submodules/parquet-testing/data" ]; then
-    git clone https://github.com/apache/parquet-testing.git ${ARROW_SOURCE_DIR}/cpp/submodules/parquet-testing
+  if [ ! -d ${ARROW_SOURCE_DIR}/cpp/submodules/parquet-testing/data ]; then
+    if [ -d ${SOURCE_DIR}/../../cpp/submodules/parquet-testing/data ]; then
+      cp -a \
+         ${SOURCE_DIR}/../../cpp/submodules/parquet-testing/ \
+         ${ARROW_SOURCE_DIR}/cpp/submodules/
+    else
+      git clone \
+        https://github.com/apache/parquet-testing.git \
+        ${ARROW_SOURCE_DIR}/cpp/submodules/parquet-testing
+    fi
   fi
 
   export ARROW_TEST_DATA=$ARROW_SOURCE_DIR/testing/data

(I think that this is not a blocker.)

kou added a commit to kou/arrow that referenced this issue Oct 19, 2023
…sible

We have external test data repositories, apache/arrow-testing and
apache/parquet-testing. We use them as submodule. apache/arrow may not
use the latest test data repositories. But our verification script
always use the latest test data repositories. It may cause test
failures.
@zeroshade
Copy link
Member

That's definitely a viable solution, i also just put together a PR to implement the Boolean RLE encoding...

raulcd pushed a commit that referenced this issue Oct 20, 2023
…38362)

### Rationale for this change

We have external test data repositories, apache/arrow-testing and apache/parquet-testing. We use them as submodule. apache/arrow may not use the latest test data repositories. But our verification script always use the latest test data repositories. It may cause test failures.

### What changes are included in this PR?

Use local test data if they exist.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* Closes: #38345

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
@raulcd raulcd modified the milestones: 14.0.0, 15.0.0 Oct 20, 2023
JerAguilon pushed a commit to JerAguilon/arrow that referenced this issue Oct 23, 2023
…sible (apache#38362)

### Rationale for this change

We have external test data repositories, apache/arrow-testing and apache/parquet-testing. We use them as submodule. apache/arrow may not use the latest test data repositories. But our verification script always use the latest test data repositories. It may cause test failures.

### What changes are included in this PR?

Use local test data if they exist.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* Closes: apache#38345

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this issue Oct 25, 2023
…sible (apache#38362)

### Rationale for this change

We have external test data repositories, apache/arrow-testing and apache/parquet-testing. We use them as submodule. apache/arrow may not use the latest test data repositories. But our verification script always use the latest test data repositories. It may cause test failures.

### What changes are included in this PR?

Use local test data if they exist.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* Closes: apache#38345

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
zeroshade added a commit that referenced this issue Oct 30, 2023
### Rationale for this change
Looks like the parquet-testing repo files have been updated and now include boolean columns which use the RLE encoding type. This causes the Go parquet lib to fail verification tests when it pulls the most recent commits for the parquet-testing repository. So a solution for this is to actually implement the RleBoolean encoder and decoder.

### What changes are included in this PR?
Adding `RleBooleanEncoder` and `RleBooleanDecoder` and updating the `parquet-testing` repo.

### Are these changes tested?
Unit tests are added, and this is also tested via the `parquet-testing` golden files.

* Closes: #38345
* Closes: #38462

Lead-authored-by: Matt Topol <zotthewizard@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…sible (apache#38362)

### Rationale for this change

We have external test data repositories, apache/arrow-testing and apache/parquet-testing. We use them as submodule. apache/arrow may not use the latest test data repositories. But our verification script always use the latest test data repositories. It may cause test failures.

### What changes are included in this PR?

Use local test data if they exist.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* Closes: apache#38345

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…pache#38367)

### Rationale for this change
Looks like the parquet-testing repo files have been updated and now include boolean columns which use the RLE encoding type. This causes the Go parquet lib to fail verification tests when it pulls the most recent commits for the parquet-testing repository. So a solution for this is to actually implement the RleBoolean encoder and decoder.

### What changes are included in this PR?
Adding `RleBooleanEncoder` and `RleBooleanDecoder` and updating the `parquet-testing` repo.

### Are these changes tested?
Unit tests are added, and this is also tested via the `parquet-testing` golden files.

* Closes: apache#38345
* Closes: apache#38462

Lead-authored-by: Matt Topol <zotthewizard@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
@raulcd raulcd modified the milestones: 15.0.0, 14.0.2 Dec 5, 2023
raulcd pushed a commit that referenced this issue Dec 6, 2023
…38362)

### Rationale for this change

We have external test data repositories, apache/arrow-testing and apache/parquet-testing. We use them as submodule. apache/arrow may not use the latest test data repositories. But our verification script always use the latest test data repositories. It may cause test failures.

### What changes are included in this PR?

Use local test data if they exist.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* Closes: #38345

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…sible (apache#38362)

### Rationale for this change

We have external test data repositories, apache/arrow-testing and apache/parquet-testing. We use them as submodule. apache/arrow may not use the latest test data repositories. But our verification script always use the latest test data repositories. It may cause test failures.

### What changes are included in this PR?

Use local test data if they exist.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* Closes: apache#38345

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…pache#38367)

### Rationale for this change
Looks like the parquet-testing repo files have been updated and now include boolean columns which use the RLE encoding type. This causes the Go parquet lib to fail verification tests when it pulls the most recent commits for the parquet-testing repository. So a solution for this is to actually implement the RleBoolean encoder and decoder.

### What changes are included in this PR?
Adding `RleBooleanEncoder` and `RleBooleanDecoder` and updating the `parquet-testing` repo.

### Are these changes tested?
Unit tests are added, and this is also tested via the `parquet-testing` golden files.

* Closes: apache#38345
* Closes: apache#38462

Lead-authored-by: Matt Topol <zotthewizard@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment