Skip to content
This repository has been archived by the owner on Jul 15, 2019. It is now read-only.

Fix 3.18.1 build #2

Merged
merged 4 commits into from
Dec 11, 2018
Merged

Fix 3.18.1 build #2

merged 4 commits into from
Dec 11, 2018

Conversation

cpaelzer
Copy link
Collaborator

A few issues with the Tests were mentioned, this should fix them.
Also all install files follow the LIBGPSSONAME rename, lets do the same for libgps itself as well.

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
…e system to be present)

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
@cpaelzer
Copy link
Collaborator Author

Mention of issue was here in #1

@cpaelzer
Copy link
Collaborator Author

cpaelzer commented Nov 16, 2018

I never trusted the travis builds too much, so I did a bunch of sbuilds locally.
What I see is that something seems to have changed:

ubuntu-cosmic working
ubuntu-disco failing
debian-unstable failing
debian-testing works

Now cosmic was forked just a few weeks ago, so something in the overall releases has changed.
What I see is:

dh_install
cp: cannot open 'debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so.24' for reading: Too many levels of symbolic links
dh_install: cp --reflink=auto -a debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so.24 debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so.24.0.0 debian/libgps24//usr/lib/x86_64-linux-gnu/ returned exit code 1

But when I enter the chroot, not only do links look sane:

# ls -laF debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so*  
lrwxrwxrwx 1 paelzer paelzer     16 Nov 16 09:48 debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so -> libgps.so.24.0.0*
lrwxrwxrwx 1 paelzer paelzer     16 Nov 16 09:48 debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so.24 -> libgps.so.24.0.0*
-rwxrwxr-x 1 paelzer paelzer 630408 Nov 16 09:48 debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so.24.0.0*

Also in addition I can enter the very same cp command and it works.
cp --reflink=auto -a debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so.24 debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so.24.0.0 /tmp/

I get the feeling this is limited to chroots, I'll be building in VMs later to prove that theory

@cpaelzer
Copy link
Collaborator Author

I have built it on all arches here. You can find buildlogs there.

There it works on all, but ppc64el - but I actually believe this is a race of something on the FS not being ready.

I wonder if cp always used --reflink=auto or if that might be related.

@cpaelzer
Copy link
Collaborator Author

After re-running enough builds I'd want to summarize:

  1. only occurs on recent releases (disco/unstable), but not on older (cosmic/testing)
  2. is racy, sometimes it hits libgps.so.24.0.0 (libgps) sometimes libgps.so(-dev package) sometimes none

I seem to have reduced (but not understood) the race window accessing the files before dh_install.

@cpaelzer
Copy link
Collaborator Author

I'm still puzzled by the symlink issue on dh_install
I added a snippet like:

# install everything                                                         
sync                                                                         
for f in libgps.so libgps.so.24 libgps.so.24.0.0; do \                       
    namei debian/tmp/usr/lib/x86_64-linux-gnu/$$f; \                         
    ls -laF debian/tmp/usr/lib/x86_64-linux-gnu/$$f; \                       
done                                                                         
dh_install 

But is shows me this:

# install everything
sync
for f in libgps.so libgps.so.24 libgps.so.24.0.0; do \
    namei debian/tmp/usr/lib/x86_64-linux-gnu/$f; \
    ls -laF debian/tmp/usr/lib/x86_64-linux-gnu/$f; \
done
f: debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so
 d debian
 d tmp
 d usr
 d lib
 d x86_64-linux-gnu
 l libgps.so -> libgps.so.24.0.0
   - libgps.so.24.0.0
lrwxrwxrwx 1 root root 16 Nov 16 14:45 debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so -> libgps.so.24.0.0*
f: debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so.24
 d debian
 d tmp
 d usr
 d lib
 d x86_64-linux-gnu
 - libgps.so.24
-rwxrwxr-x 1 root root 16 Nov 16 14:45 debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so.24*
f: debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so.24.0.0
 d debian
 d tmp
 d usr
 d lib
 d x86_64-linux-gnu
 - libgps.so.24.0.0
-rwxrwxr-x 1 root root 630408 Nov 16 14:45 debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so.24.0.0*
dh_install
cp: cannot open 'debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so.24' for reading: Too many levels of symbolic links
dh_install: cp --reflink=auto -a debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so.24 debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so.24.0.0 debian/libgps24//usr/lib/x86_64-linux-gnu/ returned exit code 1

Isn't that proving that it is not a symlink loop, just to then fail with just that?

@cpaelzer
Copy link
Collaborator Author

Interesting - today it works in Ubuntu Disco, but not yet in Debian unstable.
I get the feeling we are facing something totally unrelated to our changes.

At least I can reproduce at the time in d/rules.
I don't need to call dh_install, if I call cp with the same options the bug triggers.

And there I realized the file breaking with "Too many levels of symbolic links" is not even the one being a symlink.
libgps.so would be a symlink, but it is failing on libgps.so.24

f: debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so.24
 d debian
 d tmp
 d usr
 d lib
 d x86_64-linux-gnu
 - libgps.so.24
-rwxrwxr-x 1 root root 16 Nov 19 12:31 debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so.24*
cp: cannot open 'debian/tmp/usr/lib/x86_64-linux-gnu/libgps.so.24' for reading: Too many levels of symbolic links

I more and more think cp misdetects some other error as -ELOOP and the error message is misleading.
Lets see if it reproduces still with strace enabled ...

@cpaelzer
Copy link
Collaborator Author

Hmm, now after about a day of experiments (when I added too much debug it went away) it seems no more reproducible anymore on my local system. Maybe whatever fix was pushed to Ubuntu-disco did notwalso appear in debain-unstable?

It now worked 4/4 times for me while formerly I had a 100% bad case rate.
Would you mind trying it again for you in your chroot, maybe after merging the few cleanups I have provided here?

Next I rebuilt it in a place now where i can run multi-arch autopkgtests on it later on to be sure how that behaves now .... But there one architecture (ppc64el) still hits the same bug.

I'll continue to respin different ideas whenever I have time as this is very odd ...

@cpaelzer
Copy link
Collaborator Author

The ppc64el build seems to be the one exposed to the bug the most. Even with a lot of extra actions in place it still hits the issue.

I was even desperate enough to try a silly sleep/sync in the build, that still fails (good for repro, bad for me)

Now maybe at least I could get strace of the cp from ppc64el then.
So lets try strace again ...

@bzed
Copy link
Owner

bzed commented Nov 20, 2018

On travis it builds fine, only the autopkgtests fail.

@cpaelzer
Copy link
Collaborator Author

Yeah, it is a race for sure - any build environment might be fine or not. I didn't find any differentiator yet.

But I'd want to autopkgtest it on all architectures to be sure, so I'd want to build it.
Also analyzing this bug is worth on its own - it is just too elusive to be a quick fix.

But I'm lost, as I'm down to strace telling me:
openat(AT_FDCWD, "debian/tmp/usr/lib/powerpc64le-linux-gnu/libgps.so.24", O_RDONLY|O_NOFOLLOW) = -1 ELOOP (Too many levels of symbolic links)
But at the same time I know from namei that there is no loop.

@bzed - does it even fails on travis autopkgtest with the fixes I added here?
It should only run in at least containers (which implies a started system) - so I had hoped it would work now.

I might just run the test in a local VM to be sure (giving up to wait for the ppc64el build).
Yeah you are right - maybe I should timebox it and focus on the test for now ...

@cpaelzer
Copy link
Collaborator Author

With the branch here applied I ran the autopkgtest and it LGTM now.

$ sudo autopkgtest -o verify-gpsd-test --no-built-binaries --apt-upgrade --shell --setup-commands="add-apt-repository ppa:ci-train-ppa-service/3522; apt update; apt -y upgrade" gpsd_3.18.1-2~ppa16.dsc -- qemu --qemu-options='-cpu host' --ram-size=4096 --cpus 8 ~/work/autopkgtest-disco-amd64.img

I posted a full log of this here

If anything I'd be somewhat afraid of this odd build issue, but you are right if it builds fine for you please feel free to go on. I have not much more ideas how to debug it - it is no symlink loop but fails as one :-/

What are the next steps you'd suggest - try uploading and see what happens? maybe some testing of 3.18 on real devices first?

@cpaelzer
Copy link
Collaborator Author

As a reference look at this build log

Even in an iteration (for the race to close) it repetitively shows:

  • namei confirms full path has no link issues
  • cp fails with openat returning -ELOOP on exactly that path

@bzed
Copy link
Owner

bzed commented Nov 20, 2018

My wild guess: fakeroot is broken...

@cpaelzer
Copy link
Collaborator Author

Also ran a dmesg in the build env after the error now, but no new insight due to that :-(

The tests are good as shown a few comments before.
Therefore would you consider merging those few fixes here then?

For the potential build issue I'm out of ideas for now.

@cpaelzer
Copy link
Collaborator Author

good case (s390x):

     0.000079 lstat("debian/tmp/usr/lib/s390x-linux-gnu/libgps.so.24", {st_mode=S_IFLNK|0777, st_size=16, ...}) = 0
     0.000079 getpid()                  = 17929
     0.000068 semop(65537, [{0, -1, SEM_UNDO}], 1) = 0
     0.000101 msgsnd(65536, {1, "\0\0\0\3\0\0F\t\0\0\0\2\0\0\7\321\0\0\t\305\0\0\0\0\0\v\347h\0\0\0\0"...}, 1088, 0) = 0
     0.000094 msgrcv(98305, {1, "\0\0\0\3\0\0F\t\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\v\347h\0\0\0\0"...}, 1088, 0, 0) = 1088
     0.000081 semop(65537, [{0, 1, SEM_UNDO}], 1) = 0
     0.000086 lstat("/tmp/libgps.so.24", 0x3ffcdc7e848) = -1 ENOENT (No such file or directory)
     0.000083 readlink("debian/tmp/usr/lib/s390x-linux-gnu/libgps.so.24", "libgps.so.24.0.0", 17) = 16
     0.000077 symlinkat("libgps.so.24.0.0", AT_FDCWD, "/tmp/libgps.so.24") = 0

Bad case:

     0.000245 lstat("debian/tmp/usr/lib/powerpc64le-linux-gnu/libgps.so.24", {st_mode=S_IFLNK|0777, st_size=16, ...}) = 0
     0.000282 getpid()                  = 15048
     0.000192 semop(65537, [{0, -1, SEM_UNDO}], 1) = 0
     0.000291 msgsnd(65536, {1, "\3\0\0\0\310:\0\0\2\0\0\0\321\7\0\0\305\t\0\0\370\341\v\0\0\0\0\0\21\375\0\0"...}, 1088, 0) = 0
     0.000160 msgrcv(98305, {1, "\3\0\0\0\310:\0\0\2\0\0\0\0\0\0\0\0\0\0\0\370\341\v\0\0\0\0\0\21\375\0\0"...}, 1088, 0, 0) = 1088
     0.000302 semop(65537, [{0, 1, SEM_UNDO}], 1) = 0
     0.000270 stat("/tmp/libgps.so.24", 0x3ffffd493fd0) = -1 ENOENT (No such file or directory)
     0.000274 openat(AT_FDCWD, "debian/tmp/usr/lib/powerpc64le-linux-gnu/libgps.so.24", O_RDONLY|O_NOFOLLOW) = -1 ELOOP (Too many levels of symbolic links)

Both lstat are showing it is a link -> S_FLINK
The sempop/msgsend/msgrecv/semop is the real stat call through libfakeroot.

You see that in the good case detects it is a symlink, and then uses readlink+symlinkat to "copy".
But the bad case assumes it is a file, uses openat and fails

It seems to me that it is a symlink on disk, but within fakeroot stat it is not.
Due to that cod takes the wrong path which breaks eventually.
Even more feeling like broken fakeroot to me, but I'm not sure what to do about it.

There is a new strace released just recently which is complete in Debian and in proposed in Ubuntu.
I already used that new one, so for a try I disabled it and build against the former 1.22-2 version of fakeroot - that makes no difference.

BTW - Thanks @apw for helping me to read most of that strace output!

Sadly I still don't know what to do about it now :-/

Build logs with more debug:
bad
good

@cpaelzer
Copy link
Collaborator Author

I think we are good, the build issue that is left is odd and makes no sens, if anything it seems like a fakeroot fix on needed the Ubuntu builders.

@bzed - so what are the next steps now?
I'd assume:

  1. you merging my fixups for the tests in this MP
  2. you uploading it to Debian-experimental to check if the build works fine there
  3. if ok upload to -unstable, otherwise lets continue debugging

Or is there anything in the MP here that I have to fix for you?

@cpaelzer
Copy link
Collaborator Author

Actually I think I came up wit ha quirk for the build issue.
Not "beautiful" but doing the job for now.
We can at every later upload give it a try if it works without the quirk.

I'll be pushing the fix to this MP after another round of test builds (cleaning up all my debug changes).

P.S. I also have a new theory what causes all of this - the multi-scons build (per python version) will end up "make installing" files multiple times - that is uncommon and might be related.

…ME with a quirk to avoid issues by broken fakeroot

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
@cpaelzer
Copy link
Collaborator Author

Ok, the quirk is working fine on all architectures for me - give it a check and let me know what you think.
I pushed it to the branch related to this MP, so it should show up here any minute.

@cpaelzer
Copy link
Collaborator Author

The test fail is some wget not related to this PR.
The build fail didn't happen when I submitted it - I rebuilt in sbuild today and it built fine

Therefore I wanted to ping on merging and uploading that if it is ok for you?

@bzed bzed merged commit d8942ba into master Dec 11, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants