Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI][Go] Travis build fail on go test -race for github.com/apache/arrow/go/v9/parquet/pqarrow #32016

Closed
asfimport opened this issue May 26, 2022 · 12 comments

Comments

@asfimport
Copy link
Collaborator

Go builds for travis on ARM seem to be failing consistently with:

+ for d in $(go list ./... | grep -v vendor)
+ go test -race -tags assert github.com/apache/arrow/go/v9/parquet/pqarrow
signal: killed
FAIL    github.com/apache/arrow/go/v9/parquet/pqarrow    60.206s
FAIL
ERROR: 1
Error: `docker-compose --file /home/travis/build/apache/arrow/docker-compose.yml run --rm --volume /home/travis/build/apache/arrow/build:/build debian-go` exited with a non-zero exit code 1, see the process log above.

See example of build failures:
https://app.travis-ci.com/github/apache/arrow/jobs/571479526

Reporter: Raúl Cumplido / @raulcd
Assignee: Matthew Topol / @zeroshade

PRs and other links:

Note: This issue was originally created as ARROW-16669. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Raúl Cumplido / @raulcd:
cc @zeroshade  

@asfimport
Copy link
Collaborator Author

Yibo Cai / @cyb70289:
cc [~guyuqi]  

@asfimport
Copy link
Collaborator Author

Matthew Topol / @zeroshade:
I'm guessing it's timing out? This is really odd. I don't have an ARM64 machine myself to test with, does anyone have access to one or know how I could run the Travis CI in a mode that i could ssh too for testing?

@asfimport
Copy link
Collaborator Author

Yibo Cai / @cyb70289:
On my local ARM machine (Neoverse N1 CPU, same as Travis CI instance) this test costs 53s. Default timeout is 60s, probably too low if the CI host is busy.

@asfimport
Copy link
Collaborator Author

Yibo Cai / @cyb70289:
But I see another test runs 96s, doesn't get killed ...
https://app.travis-ci.com/github/apache/arrow/jobs/571479526#L772

@asfimport
Copy link
Collaborator Author

Yuqi Gu / @guyuqi:
I reproduced it in my local Arm64 environments:

linux@entos-altra-01:~/yuqi/arrow/go/parquet$  go test -race -tags assert github.com/apache/arrow/go/v9/parquet/pqarrow
--- FAIL: TestArrowReaderAdHocReadDecimals (0.00s)
panic: please point PARQUET_TEST_DATA env var to the test data directory [recovered]
        panic: please point PARQUET_TEST_DATA env var to the test data directory

It seems it need to point PARQUET_TEST_DATA env var to the test data directory.

@asfimport
Copy link
Collaborator Author

Yibo Cai / @cyb70289:
For local test, we need to clone below repo and set PARQUET_TEST_DATA to the data subdirectory:
https://github.com/apache/parquet-testing/

@asfimport
Copy link
Collaborator Author

Yuqi Gu / @guyuqi:
Thanks, @cyb70289;

The test was passed after PARQUET_TEST_DATA was properly set :

linux@entos-altra-01:~/yuqi/arrow/go/parquet$ go test -race -tags assert github.com/apache/arrow/go/v9/parquet/pqarrow
ok      github.com/apache/arrow/go/v9/parquet/pqarrow   64.881s

In other CI routines, it seems the tests are also passed:
https://app.travis-ci.com/github/apache/arrow/jobs/575910309#L814
https://app.travis-ci.com/github/apache/arrow/jobs/575852913#L802
.....

From: https://app.travis-ci.com/github/apache/arrow/builds

@asfimport
Copy link
Collaborator Author

Yuqi Gu / @guyuqi:
For this build failures:
Was there a possible situation where the CI routine host was at high workload and system was unstable which caused this failure?

From the build history: https://app.travis-ci.com/github/apache/arrow/builds, almost all of Arm64 Go CI routines were passed.

IMHO, what about to close this ticket? Meanwhile we can keep eyes on Arm64 Go CI and could re-open this ticket if the the same failure occurred again.

WDYT? @zeroshade @cyb70289

@asfimport
Copy link
Collaborator Author

Yibo Cai / @cyb70289:
The sporadic "signal: killed" failure is probably due to out of memory.

+ go test -race -tags assert github.com/apache/arrow/go/v9/parquet/pqarrow
signal: killed

On a local host, I see physical memory usage (RSS) of this test increases gradually to about 7G, but travis arm64-graviton2 instance has only 7.5G memory [1].

[1] https://docs.travis-ci.com/user/reference/overview/#virtualisation-environment-vs-operating-system

@asfimport
Copy link
Collaborator Author

Matthew Topol / @zeroshade:
@cyb70289 That would definitely do it. I don't remember what it's doing that makes it use 7G in those tests though. Might be worth taking a look to see if we can reduce that memory footprint for the tests.

@asfimport
Copy link
Collaborator Author

Yibo Cai / @cyb70289:
Issue resolved by pull request 13628
#13628

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants