Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Travis CI Android emulator-based tests for Android are broken #603

Closed
briansmith opened this issue Dec 24, 2017 · 22 comments · Fixed by #777
Closed

Travis CI Android emulator-based tests for Android are broken #603

briansmith opened this issue Dec 24, 2017 · 22 comments · Fixed by #777
Assignees

Comments

@briansmith
Copy link
Owner

It seems like the Android SDK changed in some non-backward-compatible way. If somebody has a fix, I'd take a PR for it. Otherwise I think I might try rewriting all the non-Windows CI stuff using Docker and then using the Docker-ified stuff, ETA in Feburary.

@briansmith
Copy link
Owner Author

See rust-lang/rust#45580 for some inspiration.

@andrewtj
Copy link
Contributor

I think a change in Rust might have inadvertently broken the test suite. This glob in travis.sh matches two files so adb push attempts to place them both in the directory /data/ring-test instead of just copying the test executable to that path. Patching that gets the run a little further to an Illegal instruction failure.

@andrewtj
Copy link
Contributor

And here's a tombstone.

@briansmith
Copy link
Owner Author

@andrewtj That's great news! It seems like there were several problems.

The problem I was experiencing previously was "adb: command not found" and "emulator: command not found" errors on Travis CI. Now it seems those errors aren't occurring! Maybe the problem was just that the Travis CI cache was in a corrupted state or something.

Because of those "command not found" errors, I wasn't able to notice the problem that your patch fixes, which apparently is due to a change in rustc. Thanks for providing an example fix.

I went back to tag 0.12.1 and applied that fix to the 0.12.1 tag and kicked off a Travis CI build. The illegal instruction error still occurs. However, back when I released 0.12.1, the build passed just fine; see https://travis-ci.org/briansmith/ring/builds/271520581. That indicates to me that there is probably a change in the emulator, or a change in rustc or some other part of the toolchain that makes it incompatible with the emulator.

@andrewtj
Copy link
Contributor

Okay I've run a few more builds and it looks like 1.21.0 works but 1.22.1 does not. So I guess something must have changed in Rust's jemalloc fork in 1.22?

@briansmith
Copy link
Owner Author

The Rust 1.22.0/1.22.1 release notes are at https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1220-2017-11-22

The most clearly relevant changes listed in the release notes are:

In Rust 1.21.0, jemalloc was upgraded to version 4.5.0, according to those release notes. However, you said 1.21.0 works fine.

See also:

@andrewtj
Copy link
Contributor

I've narrowed it down to something between nightly-2017-10-29 and nightly-2017-10-30:

#!/bin/sh
set -x
for REL in nightly-2017-10-29 nightly-2017-10-30; do
	rustup override set ${REL} 
	cargo build --target=armv7-linux-androideabi
	adb push target/armv7-linux-androideabi/debug/hello_world /data
	adb shell '/data/hello_world; echo $?'
done
$ ./test_run.sh 
+ rustup override set nightly-2017-10-29
info: using existing install for 'nightly-2017-10-29-x86_64-unknown-linux-gnu'
info: override toolchain for '/home/atj/hello_world' set to 'nightly-2017-10-29-x86_64-unknown-linux-gnu'

  nightly-2017-10-29-x86_64-unknown-linux-gnu unchanged - rustc 1.23.0-nightly (269cf5026 2017-10-28)

+ cargo build --target=armv7-linux-androideabi
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
+ adb push target/armv7-linux-androideabi/debug/hello_world /data
target/armv7-linux-androideabi/debug/hello_world: 1 file pushed. 10.6 MB/s (3949796 bytes in 0.356s)
+ adb shell /data/hello_world; echo $?
Hello, world!
0
+ rustup override set nightly-2017-10-30
info: using existing install for 'nightly-2017-10-30-x86_64-unknown-linux-gnu'
info: override toolchain for '/home/atj/hello_world' set to 'nightly-2017-10-30-x86_64-unknown-linux-gnu'

  nightly-2017-10-30-x86_64-unknown-linux-gnu unchanged - rustc 1.23.0-nightly (90ef3372e 2017-10-29)

+ cargo build --target=armv7-linux-androideabi
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
+ adb push target/armv7-linux-androideabi/debug/hello_world /data
target/armv7-linux-androideabi/debug/hello_world: 1 file pushed. 9.4 MB/s (3271284 bytes in 0.333s)
+ adb shell /data/hello_world; echo $?
Illegal instruction 
132

It looks like rust-lang/rust#45656 landed in that window. I'm not familiar with the Android toolchain so I'm going to leave it here for the minute.

@briansmith
Copy link
Owner Author

I'm guessing that the version of the Android SDK that were'e using on Travis CI, in particular its emulator, is simply incompatible with the one that rustc uses. The Android SDK does warn us that it is out of date during installation. I think we need to switch to using a sdkmanager-based installation. I'll take a look at doing that.

@briansmith briansmith self-assigned this Jan 23, 2018
@andrewtj
Copy link
Contributor

That sounds reasonable. It's noted in rust-lang/rust#45580 that the minimum supported version is 14 and the current script sets up a version 18 emulator, but emulator and device are probably different ball games.

briansmith added a commit that referenced this issue Apr 5, 2018
The Travis CI testing using the Android emulator was broken a while
back. See #603. Until that
is fixed, just don't run the tests on Android. This is unfortunate,
however it is necessary to make progress on improvements for higher-
priority platforms.
@briansmith briansmith changed the title Travis CI Android build/test is broken Travis CI Android emulator-based tests for Android are broken Apr 5, 2018
@briansmith
Copy link
Owner Author

Commit 9e2bcb6 disables the Android emulator tests on Travis CI pending a solution to this.

@briansmith
Copy link
Owner Author

I don't anticipate working on this any time soon since Android is the least interesting target to me right now and simultaneously one of the most difficult targets to support. If anybody wants to volunteer to resolve this issue, I would greatly appreciate it and I'd be happy to review a PR that adds the tests back. Alternatively, if somebody has a business interest in keeping Android as a fully supported platform for ring, ping me at brian@briansmith.org.

@pietro
Copy link
Contributor

pietro commented Jun 1, 2018

I tried using the latest SDK on both travis and locally and I get the same errors with both clang and gcc. If someone that knows more about debugging native libraries on android wants to help please ping me.

@pietro
Copy link
Contributor

pietro commented Jun 7, 2018

This is all I got:

(gdb) c
Continuing.
warning: Could not load shared library symbols for libc.so.
Do you need "set solib-search-path" or "set sysroot"?
warning: Could not load shared library symbols for libm.so.
Do you need "set solib-search-path" or "set sysroot"?

Program received signal SIGILL, Illegal instruction.
0xb6efead0 in arena_chunk_alloc ()
(gdb) bt
#0  0xb6efead0 in arena_chunk_alloc ()
#1  0xb6ef7ba0 in arena_run_alloc_small (tsdn=0x0, size=4096, binind=<optimized out>, arena=<optimized out>) at /checkout/src/liballoc_jemalloc/../jemalloc/src/arena.c:1220
#2  arena_bin_nonfull_run_get (arena=0xb6a00100, bin=<optimized out>, tsdn=<optimized out>) at /checkout/src/liballoc_jemalloc/../jemalloc/src/arena.c:2349
#3  arena_bin_malloc_hard (tsdn=0x0, arena=0xb6a00100, bin=0xb6a00a50) at /checkout/src/liballoc_jemalloc/../jemalloc/src/arena.c:2390
#4  0xb6ef8fac in arena_malloc_small (tsdn=0x0, binind=11, arena=<optimized out>, zero=<optimized out>) at /checkout/src/liballoc_jemalloc/../jemalloc/src/arena.c:2606
#5  je_arena_malloc_hard (tsdn=0x0, arena=<optimized out>, size=<optimized out>, ind=11, zero=<error reading variable: access outside bounds of object referenced via synthetic pointer>)
    at /checkout/src/liballoc_jemalloc/../jemalloc/src/arena.c:2720
#6  0xb6ee9ffc in je_arena_malloc (tsdn=<optimized out>, arena=0xb6a00100, size=128, slow_path=<error reading variable: access outside bounds of object referenced via synthetic pointer>, ind=<optimized out>, 
    zero=<optimized out>, tcache=<optimized out>) at /checkout/src/liballoc_jemalloc/../jemalloc/include/jemalloc/internal/arena.h:1357
#7  je_iallocztm (tsdn=<optimized out>, ind=11, arena=0xb6a00100, slow_path=<error reading variable: access outside bounds of object referenced via synthetic pointer>, size=<optimized out>, 
    zero=<optimized out>, tcache=<optimized out>, is_metadata=<optimized out>) at include/jemalloc/internal/jemalloc_internal.h:1067
#8  a0ialloc (size=128, zero=<error reading variable: access outside bounds of object referenced via synthetic pointer>, is_metadata=<optimized out>)
    at /checkout/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:336
#9  0xb6f17b04 in je_malloc_tsd_boot1 ()
#10 0xb6ef2718 in malloc_init_hard () at /checkout/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:1518
#11 malloc_init () at /checkout/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:317
#12 jemalloc_constructor () at /checkout/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:2807
#13 0xb6e21906 in ?? () from target:/system/bin/linker
#14 0xb6e219de in ?? () from target:/system/bin/linker
#15 0xb6e21b30 in ?? () from target:/system/bin/linker
#16 0xb6e22054 in ?? () from target:/system/bin/linker
#17 0xb6e1fa3c in _start () from target:/system/bin/linker
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) c
Continuing.

Program received signal SIGILL, Illegal instruction.
0xb6efead0 in arena_chunk_alloc ()

@pietro
Copy link
Contributor

pietro commented Jan 23, 2019

Now that stable doesn't ship jemalloc I decided to give this another try. Tests fail on different places depending if I'm doing a release build or not.

For debug builds I get a panic on the AEAD tests, on a very innocent looking line on gcm.c:

0xb6fb81a4 in gcm_init_4bit (Htable=0xb6edd6b0, H=0xb6edd650) at crypto/fipsmodule/modes/gcm.c:91
91	  Htable[1] = V;

The disassembled instruction => 0xb6f5a1a0 <+440>: vld1.64 {d16-d17}, [r3], r1 is a NEON instruction, but the emulator doesn't support it:

$ adb shell 'cat /proc/cpuinfo'
processor	: 0
model name	: ARMv7 Processor rev 1 (v7l)
BogoMIPS	: 125.00
Features	: swp half thumb fastmult vfp edsp vfpv3 tls vfpv4 idiva idivt vfpd32 evtstrm 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x2
CPU part	: 0xc0f
CPU revision	: 1

Hardware	: Dummy Virtual Machine
Revision	: 0000
Serial		: 0000000000000000

I got this grepping for neon in the macros defined by the compiler:

$ arm-linux-androideabi-clang -x c  -dM -E -I include crypto/fipsmodule/modes/gcm.c | grep -ie neon
#define ARMV7_NEON (1 << 0)
#define __ARM_NEON 1
#define __ARM_NEON_FP 0x4
#define __ARM_NEON__ 1

@pietro
Copy link
Contributor

pietro commented Jan 24, 2019

Using a system image with an API level > 23 works.

@briansmith if that's ok with I'll do PR re-adding the android tests using an API 24 sys image and ndk.

@briansmith
Copy link
Owner Author

@briansmith if that's ok with I'll do PR re-adding the android tests using an API 24 sys image and ndk.

Yes, please use Android 8.0 (API level 26). According to https://developer.android.com/distribute/best-practices/develop/target-sdk that's the minimum for the Google Play store as of now.

@briansmith
Copy link
Owner Author

Also, what do you think about switching to cross instead of maintaining all this infrastructure ourselves?

@pietro
Copy link
Contributor

pietro commented Jan 25, 2019

Cool, I'll do a PR for armv7 and once that's merged I'll do one for aarch64. I'll try to take a look at using cross in travis. I haven't had the need to use it because for all the little rust cross compiling I do, it always works by setting CC and passing the right target to cargo.

@briansmith
Copy link
Owner Author

After researching more, I think if we can get things working the current way, that's better than using cross, because cross's emulation of Android is quite limited. I think it makes sense to use cross for non-Android/non-iOS Linux cross-build/testing stuff though.

@pietro
Copy link
Contributor

pietro commented Jan 28, 2019

Darn, the latest available arm/aarch64 android system images are for API 22. I set up the NDK to use API 26 and tested with the 22 images and it worked.

@briansmith how do you want to proceed?

@briansmith
Copy link
Owner Author

Darn, the latest available arm/aarch64 android system images are for API 22. I set up the NDK to use API 26 and tested with the 22 images and it worked.

Let's just add a comment that we're using API 22 images because there are no API 26 images available.

@briansmith
Copy link
Owner Author

briansmith commented Jan 29, 2020

Re-opening this as it is happening again: "Illegal instruction" is reported in the ARM emulator. Perhaps it is an alignment issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants