Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getdents64() EFAULT - possibly struct linux_dirent alignment related #1769

Closed
therealkenc opened this issue Mar 10, 2017 · 12 comments
Closed

Comments

@therealkenc
Copy link
Collaborator

therealkenc commented Mar 10, 2017

Posting this only because it has been driving me batty for a while now. I think might be a bug in the code not WSL, but I can't identify the issue or I would finger Google. It causes a DCHECK in this line of code. The test case below works on Real Linux, fails on WSL. If you comment out the const int foo_ line in the DirReaderLinux struct it passes on WSL. Alignment maybe??? Nah... Stumped. 🤷

g++ test-getdents64.cc -o test-getdents64 && ./test-getdents64

/* test-getdents64.cc */
#include <sys/syscall.h>
#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <iostream>

using namespace std;

struct DirReaderLinux {
  DirReaderLinux() {
    cout << "DirReaderLinux constructor\n";
    memset(buf_, 0, sizeof(buf_));
  }

  const int foo_ = 0; // doesn't do anything but comment out and no fail
  unsigned char buf_[512]; // __attribute__ ((aligned (8)));
};

int main(int argc, const char* argv[])
{
  DirReaderLinux reader;
  int fd = open(".", O_RDONLY | O_DIRECTORY);
  const int ret = syscall(__NR_getdents64, fd, reader.buf_, sizeof(reader.buf_));
  char errbuf[128];
  cout << "ret: " << ret << " " << strerror_r(errno, errbuf, 128) << "\n";
  return 0;
}

strace like the title says.

write(1, "DirReaderLinux constructor\n", 27) = 27
open(".", O_RDONLY|O_DIRECTORY)         = 3
getdents64(3, 0x7ffffcf0d4a4, 512)      = -1 EFAULT (Bad address)
write(1, "ret: -1 Bad address\n", 20)   = 20
exit_group(0)                           = ?
+++ exited with 0 +++
@aseering
Copy link
Contributor

Just to verify -- this fails when the foo_ definition is there? Or when it's commented out?

Unless I'm mistaken, isn't the above getdents64() call asking the kernel to write a uint64_t at an offset of 4 mod 8 bytes? My x86 assembly is quite rusty, but as I recall, there are instructions that will let you do that and otherwise-equivalent (but maybe sometimes faster on some specific chips?) instructions that won't.

@aseering
Copy link
Contributor

Also, to test the alignment theory -- is it possible to hit this error if you declare your array as an array of linux_dirents, rather than an array of char? I believe the C standard places stronger alignment requirements on the former.

@therealkenc
Copy link
Collaborator Author

therealkenc commented Mar 10, 2017

If you comment it out it passes. I.e. it fails as given above. In the real code that 'foo_' is a file descriptor.

But yeah it looks like it is an alignment thing. I just updated the code with a (commented out) __attribute__ ((aligned (8))) which, if you use, will also cause it to pass on WSL. I thought x86 (and amd64) were more forgiving about such matters (at a performance cost), with the exception of some atomicity guarantees; but you can call me "rusty" too. Anyway it isn't an ABI requirement the pointer be 8 byte aligned, so this is a really really obscure WSL 'bug'. Canonical examples of getdents64() use a char[] buffer because you're iterating through a variable length structure (depending on the d_name length). Appreciate you giving it the glance.

@therealkenc therealkenc changed the title unexplained getdents64() EFAULT getdents64() EFAULT - possibly struct linux_dirent alignment related Mar 11, 2017
@benhillis benhillis added the bug label Mar 15, 2017
@benhillis
Copy link
Member

benhillis commented Mar 15, 2017

Thanks for reporting this issue @therealkenc! You're absolutely right about alignment, looks like we are enforcing too strict of an alignment requirement on the user buffer in the getdents syscalls. I suspect there's a handful of other places this is being done as well.

Was there a scenario this was causing an issue for or just something you noticed when playing around? Trying to gauge the priority of this fix. Thanks again!

@therealkenc
Copy link
Collaborator Author

The scenario is linked in the OP; it's in the chromium source. I've known about it since July (it is item one in "the list"), but didn't post because (a) "chrome" nuff said, (b) there is a trivial source work-around, and mostly (c) I hadn't done a test case to see what was going on. It turns out it isn't even a blocker for running Google branded Chrome releases, because it is just a debug check, and it turns out listing the directory in question (which is /proc/self/fd) isn't critical to operation in the first place. To that extent it isn't any kind of priority. I also kind of wondered whether something like this alignment problem could trip in other places, so thanks for sharing the thought on that.

@benhillis
Copy link
Member

@therealkenc - sorry missed the link in the original post. Good to hear this is only required for debug builds. I've filed a task to look at all of our alignment related checks when reading or writing user buffers.

To set expectations we're getting very close to Creator's Update release and this probably doesn't meet the bar, but we'll get this fixed for future builds.

@therealkenc
Copy link
Collaborator Author

therealkenc commented Mar 15, 2017

Thanks. Totally. Hence the pre-preemptive caveat "posting this only because..." in my OP. I've read the release strategy between the lines in other posts. #1692 is probably a one liner case statement fallthrough and still isn't worth doing for Creators. Being "that guy" who causes the BSOD after millions of people upgrade to Creators probably isn't great for career advancement. Plenty of time to break everything with a re-written memory manager and a win32-side CPL 0 filesystem driver starting in May. 🐱‍👓

@benhillis
Copy link
Member

A fix for this is now in code review.

@benhillis
Copy link
Member

I've checked in a fix for this. The fix generically fixes the lxcore driver from enforcing alignment requirements on user buffers that are not enforced in the native Linux kernel. Look for this fix in an Insider build once vNext begins flighting.

@therealkenc
Copy link
Collaborator Author

therealkenc commented Apr 4, 2017

I am thinking this was because of Astoria's roots, which basically means running on ARM, and following the Linux ARM EABI, eh?

So... Make sure the #ifdefs are still there for when WSL is ported to Win10 phones. 🤣

@jackchammons jackchammons removed the bug label Apr 11, 2017
@therealkenc
Copy link
Collaborator Author

therealkenc commented Apr 29, 2017

Relax alignment requirements in lxcore [GH 1794]

Confirmed fixed in 16184. Appreciated. For what it is worth, this brings chromium down to only two remaining blockers (notwithstanding udev, which is IMO Google's bug).

@benhillis
Copy link
Member

Glad to hear it! Maybe this will fix a couple other obscure bugs too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants