Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[41] ARM kernel support #86

Merged
merged 8 commits into from
Apr 11, 2014
Merged

[41] ARM kernel support #86

merged 8 commits into from
Apr 11, 2014

Conversation

dkogan
Copy link
Contributor

@dkogan dkogan commented Apr 8, 2014

These patches allow sysdig to work on ARM.

… to do

The kernel knows its configuration, and will pick the correct unistd.h
This is a valid assumption for x86, but not for many other arches (per-EABI ARM,
MIPS, etc).
Previously we were looking at __x86_64__ to see if we had that system call. Now
I directly look at __NR_socketcall, which is more portable
This is described in syscall(2). Some syscalls take 64-bit arguments. On
arches that have 64-bit registers, these arguments are shipped in a register.
On 32-bit arches, however, these are split between two consecutive registers,
with some alignment requirements. Some require an odd/even pair while some
others require even/odd. For now I assume they all do what x86_32 does, and
we can handle the rest when we port those.
…scalls

There were some __x86_64__ checks whose purpose wasn't entirely clear. Mostly I
THINK they were meant to separate the syscalls that existed on each
architecture, but this wasn't done perfectly: there were syscalls that existed,
but were masked out by the #ifdef. This patch removes the check for x86_64, and
checks each syscall for existence individually.

It may be a good idea to do this across the board for ALL the syscalls. The C
preprocessor can't quite do that I don't think, but we can generate the header
we want.
The kernel driver was making sure that mmap( "/dev/sysdig0" ) was getting
userspace memory with the (vma->vm_flags & VM_EXEC) flag clear. For whatever
reason this is not true on ARM: VM_EXEC is set there. I can't tell why yet, but
things work fine after removing this check, and it seems benign enough.
The sysdig kernel driver is mostly useless without syscall tracepoints. On ARM
these were added relatively recently: in Linux 3.7. This patch explicitly checks
this support exists, and refuses to build if it isn't
@dkogan dkogan mentioned this pull request Apr 8, 2014
@dkogan
Copy link
Contributor Author

dkogan commented Apr 8, 2014

Bug: #41

@gianlucaborello
Copy link
Contributor

Wow, awesome! We'll review the code asap.

Meanwhile, can someone try this on a real Raspberry Pi? If it works (just a basic live capture smoke test) then we'll buy one in Draios and hook it up to Jenkins so we can have apt-get builds and run against our thorough test suite.

Thanks!

@kristopolous kristopolous changed the title ARM kernel support [41] ARM kernel support Apr 9, 2014
@ret2libc
Copy link
Contributor

ret2libc commented Apr 9, 2014

I have tried this on a real raspberry and it works. It compiles and i can do, without problems, a basic live capture

gianlucaborello added a commit that referenced this pull request Apr 11, 2014
[41] ARM kernel support
@gianlucaborello gianlucaborello merged commit c41f874 into draios:master Apr 11, 2014
@gianlucaborello
Copy link
Contributor

It looks good, thanks a lot for this!

I'll run this on our jenkins server overnight that tests a bunch of different kernels and see if something broke, but I doubt.

Hopefully we'll find the time to setup a raspberry where we can run the full build & test cycle on it at every commit.

@gianlucaborello
Copy link
Contributor

@dkogan found this, please let me know if you don't agree with the fix: ca9691b

@dkogan
Copy link
Contributor Author

dkogan commented Apr 12, 2014

Ah yes, that makes sense. I did make sure that x86_32 builds (which it does for me, actually), but haven't tried to run it.

@dkogan
Copy link
Contributor Author

dkogan commented Apr 13, 2014

@gianlucaborello actually this issue you've found would affect x86_32 AND arm. I don't see build issues for either (building the v3.14 tag), and the arm code does generally work. But your patch is 100% right, so I'm wondering if we haven't tested the syscalls that specifically would use the split 64-bit parameters. I did test it, but this bug indicates that I clearly didn't do it right. I'll try again in a bit. In the meantime, any particular methods yall used to test that?

@gianlucaborello
Copy link
Contributor

@dkogan Yeah, the driver compiles (wrong word in my previous comment), but, as you guessed, some system calls won't work.
In our test, this code is called from userspace:

pwrite(fd, "buffer", sizeof("buffer") - 1, 4);
pwrite(fd, "buffer", sizeof("buffer") - 1, 987654321987654);

And then, the test framework waits until the two events come back, and checks wether the arguments captured by the driver are the same as the ones passed.
Under x86, the second off_t doesn't match.

I totally understand that this kind of things are a pain to find, that's why we have this nice test framework that tests every system call on a bunch of different kernels, both x86 and x86_64. Unfortunately we can't release it yet (it will definitely happen asap), but in the meantime I'll promptly report if something breaks (all these tests are run nightly against the master branch).

@dkogan
Copy link
Contributor Author

dkogan commented Apr 13, 2014

Gianluca Borello notifications@github.com writes:

@dkogan Yeah, the driver compiles (wrong word in my previous comment), but, as you guessed, some system calls won't work.
In our test, this code is called from userspace:

pwrite(fd, "buffer", sizeof("buffer") - 1, 4);
pwrite(fd, "buffer", sizeof("buffer") - 1, 987654321987654);

I checked those, but clearly didn't do it right. Sorry about that.

I totally understand that this kind of things are a pain to find, that's why we have this nice test framework that tests every system call on a bunch of different kernels, both x86 and x86_64. Unfortunately we can't release it yet (it will definitely happen asap), but in the meantime I'll promptly report if something breaks (all these tests are run nightly against the master branch).

OK. Since you have a tests suite, I'll wait for you to report any
failures before doing any work on this. Thanks much.

@gianlucaborello
Copy link
Contributor

@dkogan I managed to run a Raspbian under QEMU (which was a big pain in the ass) and started running our regression tests on arm as well. There are a bunch of issues but I think the majority are in the tests themselves and not a porting issues, will go through them slowly every time I have a few spare minutes.

To start, I believe the order of registers in merge_64 must be swapped for arm otherwise off_t won't work in a case like:

pwrite(fd, "test", sizeof("test") - 1, 4);

Note that the code as it is now works fine on x86.

I was very short of time so I didn't dig into this too much, but since your knowledge seems far ahead of mine, let me know what you think.

Thanks!

@dkogan
Copy link
Contributor Author

dkogan commented Apr 15, 2014

Gianluca Borello notifications@github.com writes:

@dkogan I managed to run a Raspbian under QEMU (which was a big
pain in the ass) and started running our regression tests on arm as
well. There are a bunch of issues but I think the majority are in the
tests themselves and not a porting issues, will go through them slowly
every time I have a few spare minutes.

I wrote up basic qemu instructions some days ago:

http://notes.secretsauce.net/notes/2014/04/07_running-qemu-with-a-custom-kernel-on-arm.html

Probably should have shared those earlier; sorry.

To start, I believe the order of registers in merge_64 must be
swapped for arm otherwise off_t won't work in a case like:

pwrite(fd, "test", sizeof("test") - 1, 4);

Note that the code as it is now works fine on x86.

Right. I was going to test it to tell me if the order is backwards or
not. I thought I did, but I guess I didn't.

I was very short of time so I didn't dig into this too much, but since
your knowledge seems far ahead of mine, let me know what you think.

Yep. Give me a day or two.

dima

@gianlucaborello
Copy link
Contributor

On Mon, Apr 14, 2014 at 7:00 PM, Dima Kogan notifications@github.comwrote:

I wrote up basic qemu instructions some days ago:

http://notes.secretsauce.net/notes/2014/04/07_running-qemu-with-a-custom-kernel-on-arm.html

That's a good article: I pretty much did your same steps, but used the
"official" Raspbian image and the kernel from
https://github.com/raspberrypi/linux

@dkogan
Copy link
Contributor Author

dkogan commented Apr 15, 2014

Gianluca Borello notifications@github.com writes:

I was very short of time so I didn't dig into this too much, but since
your knowledge seems far ahead of mine, let me know what you think.

OK. I looked at it, and there's a patch in the same branch as before.
ARM EABI has alignment requirements, so we needed a register of padding.
This is mentioned in the syscall(2) manpage. This manpage lists the
syscalls that need the 64-bit split logic. Sysdig only implements pread
and pwrite of those and it'll probably be good to implement the others
at some point.

My test file:


#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int main(void)
{
    int fd = open("/dev/zero", O_RDONLY);
    off_t offset = 0x1122334455667788LL;
    char buf[4];
    pread(fd, buf, sizeof(buf), offset);
    ftruncate(fd, offset);
    return 0;
}

This needs to be compiled with -D_FILE_OFFSET_BITS=64 to have the 64-bit
offsets. With the patch, I see the full 64-bit offset in sysdig. Sysdig
sees the ftruncate() syscall, but does not interpret it in any way,
since there's no support for it. This was just my check

@gianlucaborello
Copy link
Contributor

Wow, thanks a lot for finding this, just merged the patch, I confirm it worked.
Next one failing is pwritev :)

pwritev(fd, wv, wv_count, 132456789012345LL);

offset returned from libscap seems to be 132452496472183.
I compiled with -D_FILE_OFFSET_BITS=64.

@dkogan
Copy link
Contributor Author

dkogan commented Apr 15, 2014

Gianluca Borello notifications@github.com writes:

Wow, thanks a lot for finding this, just merged the patch, I confirm it worked.
Next one failing is pwritev :)

pwritev(fd, wv, wv_count, 132456789012345LL);

offset returned from libscap seems to be 132452496472183.
I compiled with -D_FILE_OFFSET_BITS=64.

Thanks much for testing. I did what I should've done before, and wrote a
more full test program. Here it is:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/uio.h>

int main(void)
{
    const off_t offset = 1234567890123456789LL;
    char buf[4];

    int fd_zero = open("/dev/zero", O_RDONLY);
    pread (fd_zero, buf, sizeof(buf), offset);
    preadv(fd_zero,
           &(struct iovec){ .iov_base = buf,
                   .iov_len = sizeof(buf)},
           1, offset );

    int fd_null = open("/dev/null", O_WRONLY);
    pwrite(fd_null, buf, sizeof(buf), offset);
    pwritev(fd_null,
            &(struct iovec){.iov_base = buf, .iov_len = sizeof(buf)},
            1, offset );

    return 0;
}

This tests pread, pwrite, preadv and pwritev. I also changed the patch
in that same tree to work correctly with this test program. Oddly, to
make it work I had to remove the register padding from the preadv and
pwritev parsers. So the ABI doesn't appear to be work the way I thought
it worked. This is alarming, and I'll look into it, but whatever the
kernel is actually doing is "right" by definition, so as far as sysdig
is concerned, the patch in my tree should be good.

dima

@dkogan
Copy link
Contributor Author

dkogan commented Apr 15, 2014

Oh, and strace apparently has this bug too. If I strace that test program, I see this:

pread(3, "\0\0\0\0", 4, 1234567890123456789) = 4
preadv(3, [{"\0\0\0\0", 4}], 1, 4582412532) = 4
open("/dev/null", O_WRONLY|O_LARGEFILE) = 4
pwrite(4, "\0\0\0\0", 4, 1234567890123456789) = 4
pwritev(4, [{"\0\0\0\0", 4}], 1, 4582412532) = 4

The 45824... values are what you get if you DO align the arguments. Something is odd.

@gianlucaborello
Copy link
Contributor

Awesome, that worked, just merged.

Now I see a bunch of errors with some esoteric clone() flags combination, and also some other stuff with sockets and rlimits, but will deal with those later, stay tuned for more crap, and thanks a lot for contributing!

@dkogan
Copy link
Contributor Author

dkogan commented Apr 16, 2014

Gianluca Borello notifications@github.com writes:

Wow, thanks a lot for finding this, just merged the patch, I confirm it worked.
Next one failing is pwritev :)

pwritev(fd, wv, wv_count, 132456789012345LL);

offset returned from libscap seems to be 132452496472183.
I compiled with -D_FILE_OFFSET_BITS=64.

I figured out why the alignment wasn't required for preadv/pwritev.
There's a comment-full patch in my tree that'd be good to merge. A
writeup:

http://notes.secretsauce.net/notes/2014/04/16_argument-alignment-in-linux-system-calls.html

@gianlucaborello
Copy link
Contributor

On Wed, Apr 16, 2014 at 2:38 AM, Dima Kogan notifications@github.comwrote:

I figured out why the alignment wasn't required for preadv/pwritev.
There's a comment-full patch in my tree that'd be good to merge. A
writeup:

http://notes.secretsauce.net/notes/2014/04/16_argument-alignment-in-linux-system-calls.html

That was a very interesting reading, thanks a lot for the troubleshooting,
just merged your comments in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants