Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test suite errors on arm64 architecture #2271

Open
tillea opened this issue Oct 19, 2023 · 27 comments
Open

Test suite errors on arm64 architecture #2271

tillea opened this issue Oct 19, 2023 · 27 comments

Comments

@tillea
Copy link

tillea commented Oct 19, 2023

Hi,

the Debian packaged version of canu seems to work nicely on several 64bit architectures. Unfortunately it fails the CI test we wrote for canu which calls the command

canu -p ecoli -d ecoli-pacbio genomeSize=4.8m maxThreads=4 -pacbio /tmp/autopkgtest-lxc.6mhavcm6/downtmp/autopkgtest_tmp/pacbio.fastq

failing on arm64 architecture. The Debian infrastructure provides full logs which include the installation of all preconditions for the software that is used. So please inspect the full log of the arm64 test (and scroll down to the end) to see the whole test result. If you want to compare the issue with other architectures you can check our tracker which provides links in green color named "PASS".
As you can read on this page the canu version is 2.2.

Kind regards, Andreas,

@skoren
Copy link
Member

skoren commented Oct 20, 2023

Canu doesn't support arm architectures, I'm surprised it compiled there at all. You can see that there have been some recent changes (#2260) to support arm but I wouldn't expect v2.2 to work on it.

@mr-c
Copy link
Contributor

mr-c commented Oct 20, 2023

@skoren We've been compiling for arm64 version 2.0 ; only recently we began running some extra testing and ran into the failure linked about.

Interestingly enough, the assembly succeeds on ppc64el, riscv64, and s390x! https://ci.debian.net/packages/c/canu/

@skoren
Copy link
Member

skoren commented Oct 20, 2023

A more detailed error should be in correction/0-mercounts/meryl-count.000001.out. Are you able to capture the nested out files from your logs on failure?

@mr-c
Copy link
Contributor

mr-c commented Oct 20, 2023

@skoren I've queued up a new build that grabs all *.out files and I'm trying another build on a Debian arm64 porterbox

@mr-c
Copy link
Contributor

mr-c commented Oct 21, 2023

On a dedicated machine I wasn't able to reproduce the error; but on a new CI run I got the following:
artifacts(2).tar.gz

https://ci.debian.net/data/autopkgtest/testing/arm64/c/canu/39150058/log.gz

@skoren
Copy link
Member

skoren commented Oct 21, 2023

Nothing very enlightening in the logs:

FINAL CONFIGURATION
-------------------

Estimated to require 311 MB memory out of 2048 MB allowed.
Estimated to require 2 batches.

Configured complex mode for 0.304 GB memory per batch, and up to 2 batches.

Start counting with THREADED method.
Used 0.268 GB / 1.969 GB to store      2093487 kmers; need 0.002 GB to sort        36114 kmers

Failed with 'Segmentation fault'; backtrace (libbacktrace):
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
Segmentation fault

but if it's working on a dedicated machine, is it possible that the CI is restricting memory so when it tries to allocate some it fails? Not sure if @brianwalenz has any other suggestions.

@brianwalenz
Copy link
Member

Sadly, no suggestions.

Both HEAD and v2.2 run fine with the parameters expected to be used in the CI (meryl k=16 threads=4 memory=2 count segment=1/01 ../../ecoli.seqStore output out.meryl) on both amd64 and arm64 (via M2pro), amd64 ran cleanly in valgrind.

@tillea
Copy link
Author

tillea commented Nov 2, 2023

memo

I've asked the admins of the CI infrastructure and the answer is: The memory for the arm64 workers isn't huge (8GB), but all i386 and some of the amd64 workers have the same.
Kind regards, Andreas.

@paulgevers
Copy link

Hmm, I wonder about that 10 GB memory needed in the text below. That's more than available

root@elbrus:/tmp/autopkgtest-lxc.3wrkiidv/downtmp/build.ej9/src# /usr/lib/canu/bin/meryl k=16 threads=4 memory=2 count segment=1/01 ../../autopkgtest_tmp/ecoli-pacbio/ecoli.seqStore output out.meryl

Found 1 command tree.

Counting 110 (estimated) million canonical 16-mers from 1 input file:
    canu-seqStore: ../../autopkgtest_tmp/ecoli-pacbio/ecoli.seqStore


SIMPLE MODE
-----------

  16-mers
    -> 4294967296 entries for counts up to 65535.
    -> 64 Gbits memory used

  115899341 input bases
    -> expected max count of 463597, needing 4 extra bits.
    -> 16 Gbits memory used

  10 GB memory needed


COMPLEX MODE
------------

prefix     # of   struct   kmers/    segs/      min     data    total
  bits   prefix   memory   prefix   prefix   memory   memory   memory
------  -------  -------  -------  -------  -------  -------  -------
     1     2  P   434 kB    27 MM    26 kS  8192  B   214 MB   214 MB
     2     4  P   427 kB    13 MM    12 kS    16 kB   207 MB   207 MB
     3     8  P   426 kB  7073 kM  6417  S    32 kB   200 MB   200 MB
     4    16  P   437 kB  3536 kM  3096  S    64 kB   193 MB   193 MB
     5    32  P   474 kB  1768 kM  1493  S   128 kB   186 MB   187 MB
     6    64  P   561 kB   884 kM   719  S   256 kB   179 MB   180 MB
     7   128  P   750 kB   442 kM   346  S   512 kB   173 MB   173 MB
     8   256  P  1140 kB   221 kM   166  S  1024 kB   166 MB   167 MB
     9   512  P  1936 kB   110 kM    80  S  2048 kB   160 MB   161 MB
    10  1024  P  3544 kB    55 kM    39  S  4096 kB   156 MB   159 MB  Best Value!
    11  2048  P  6768 kB    27 kM    19  S  8192 kB   152 MB   158 MB
    12  4096  P    12 MB    13 kM     9  S    16 MB   144 MB   156 MB
    13  8192  P    25 MB  7074  M     5  S    32 MB   160 MB   185 MB
    14    16 kP    50 MB  3537  M     2  S    64 MB   128 MB   178 MB
    15    32 kP   101 MB  1769  M     1  S   128 MB   128 MB   229 MB
    16    64 kP   202 MB   885  M     1  S   256 MB   256 MB   458 MB
    17   128 kP   405 MB   443  M     1  S   512 MB   512 MB   917 MB
    18   256 kP   810 MB   222  M     1  S  1024 MB  1024 MB  1834 MB
    19   512 kP  1620 MB   111  M     1  S  2048 MB  2048 MB  3668 MB
    20  1024 kP  3240 MB    56  M     1  S  4096 MB  4096 MB  7336 MB


FINAL CONFIGURATION
-------------------

Estimated to require 311 MB memory out of 2048 MB allowed.
Estimated to require 2 batches.

Configured complex mode for 0.304 GB memory per batch, and up to 2 batches.

Start counting with THREADED method.
Used 0.269 GB / 1.969 GB to store      2093487 kmers; need 0.002 GB to sort        39856 kmers

Failed with 'Segmentation fault'; backtrace (libbacktrace):
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
Segmentation fault

If you tell me what command to run, I guess I can run it in the container.

@paulgevers
Copy link

(gdb) bt
#0  merylCountArray::add(unsigned __int128) (this=0xffffe76e2d98, suffix=<optimized out>) at meryl/src/meryl/merylCountArray.C:579
#1  0x0000aaaaaaaadbd8 in insertKmers (G=0xaaaaaab20690, T=<optimized out>, S=0xffffe0003ba0) at meryl/src/meryl/merylOp-countThreads.C:274
#2  0x0000aaaaaaac3c3c in sweatShop::worker (this=0xaaaaaab28430, workerData=0xaaaaaab28520) at utility/src/utility/sweatShop.C:305
#3  0x0000fffff7b91318 in start_thread (arg=0x0) at ./nptl/pthread_create.c:444
#4  0x0000fffff7bfb01c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79

@mr-c
Copy link
Contributor

mr-c commented Nov 4, 2023

@paulgevers
Copy link

@mr-c as the maintainer of autopkgtest you can be assured I already read that file. Unfortunately that doesn't tell me what's of interest to run manually now.

@brianwalenz
Copy link
Member

Oh, awesome, thanks for the stack trace! Do you happen to know the memory page size being used here?

Is there is a downoadable VM I can use to reproduce this? Possibly just one with an installed OS would be sufficient...and (brief) instructions on how to run would be invaluable as I'm not terribly VM literate.

About the "10 GB needed": It is reporting estimated resources required for two algorithms, the 'simple' and the 'complex'. The simple claims to need 10 GB, and so isn't used, while the complex is estimating 0.3 GB.

@paulgevers
Copy link

Oh, awesome, thanks for the stack trace! Do you happen to know the memory page size being used here?

No, but if you teach me how to look it up, I can do that.

Is there is a downoadable VM I can use to reproduce this? Possibly just one with an installed OS would be sufficient...and (brief) instructions on how to run would be invaluable as I'm not terribly VM literate.

Neither am I. The Debian CI infrastructure doesn't work with VMs but with lxc containers that are generated with autopkgtest.

@brianwalenz
Copy link
Member

pagesize or getconf PAGESIZE or getconf PAGE_SIZE.

I'll try to reproduce the failure here this week, failing that, I'll add a bunch of debugging to a branch and let you run another test.

@paulgevers
Copy link

root@ci-worker-arm64-02:~# getconf PAGESIZE
4096

@brianwalenz
Copy link
Member

Shucks, that theory went out the window.

@tillea
Copy link
Author

tillea commented Nov 28, 2023

Do we have some alternative idea to track down the problem?

@brianwalenz
Copy link
Member

Thanks for the ping. After a ferocious battle with QEMU I have reproduced the crash, literally just now.

Annoyingly, it does NOT crash when I build Canu with debugging support, nor it fail when run under valgrind.

@tillea
Copy link
Author

tillea commented Nov 29, 2023 via email

@brianwalenz
Copy link
Member

Hopefully fixed. 'twas a good bug.

I had used too weak of a memory ordering requirement (https://en.cppreference.com/w/cpp/atomic/memory_order) that allowed aarch64 to reorder instructions such that a shared memory allocation was able to escape a critical section. The key quote from the linked page is

On strongly-ordered systems — x86, SPARC TSO, IBM mainframe, etc. — release-acquire ordering is automatic for the majority of operations. [...] On weakly-ordered systems (ARM, Itanium, PowerPC), special CPU load or memory fence instructions are used.

What I thought was implementing a critical section worked on amd64 more-or-less by default; on ARM the default wasn't strong enough.

@tillea
Copy link
Author

tillea commented Nov 30, 2023 via email

@tillea
Copy link
Author

tillea commented Jan 5, 2024

Ping about a new release or a commit we can cherry-pick for the Debian package.

@brianwalenz
Copy link
Member

Hi Andreas- I'm (finally) getting around to making a release. There have been a few build changes (the handling of externally defied CXXFLAGS), and packaging changes (installing perl modules into lib/perl5/site_perl/canu instead of lib/site_perl/canu). Is there anything on your side you'd like to see? I didn't find any Debian-specific patches in https://salsa.debian.org/med-team/canu but it is entirely likely I looked in the wrong place.

Apologies for the double-work of cherry-picking and then updating for a new release. The release kept getting preempted by other projects.

@tillea
Copy link
Author

tillea commented Feb 9, 2024 via email

@mr-c
Copy link
Contributor

mr-c commented Feb 9, 2024

I didn't find any Debian-specific patches in https://salsa.debian.org/med-team/canu but it is entirely likely I looked in the wrong place.

Check out https://salsa.debian.org/med-team/canu/-/tree/master/debian/patches?ref_type=heads but I think all are merged upstream already.

Did you update your copy of parasail to grab jeffdaily/parasail#102 ?

@mr-c
Copy link
Contributor

mr-c commented Feb 29, 2024

Did you update your copy of parasail to grab jeffdaily/parasail#102 ?

Answering my own question: Yes, as of a55ecfa

@brianwalenz Looks like all is ready for a new release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants