Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange error when running specific Node.js application #940

Open
miha-plesko opened this issue Jan 24, 2018 · 12 comments
Open

Strange error when running specific Node.js application #940

miha-plesko opened this issue Jan 24, 2018 · 12 comments

Comments

@miha-plesko
Copy link

I'm running a Node.js 6.10.2 application and unikernel crashes like this:

OSv v0.24-472-gf240a59
eth0: 192.168.122.15
Aborted

[backtrace]
0x000000000045237a <osv::generate_signal(siginfo&, exception_frame*)+58>
0x00000000004523ea <osv::handle_mmap_fault(unsigned long, int, exception_frame*)+26>
0x000000000033073a <mmu::vm_fault(unsigned long, exception_frame*)+330>
0x0000000000395bd7 <page_fault+119>
0x0000000000394a26 <???+3754534>
0x0000100001d5955f <icu_58::UnicodeSet::add(int, int)+127>
0x0000100001364bed <icu_58::CollationDataReader::read(icu_58::CollationTailoring const*, unsigned char const*, int, icu_58::CollationTailoring&, UErrorCode&)+2093>
0x00001000012977a9 <icu_58::CollationRoot::load(UErrorCode&)+233>
0x0000100001297897 <icu_58::CollationRoot::getRootCacheEntry(UErrorCode&)+87>
0x000010000136378e <icu_58::CollationLoader::loadTailoring(icu_58::Locale const&, UErrorCode&)+46>
0x000010000133d9b4 <icu_58::Collator::makeInstance(icu_58::Locale const&, UErrorCode&)+20>
0x000010000133da46 <icu_58::Collator::createInstance(icu_58::Locale const&, UErrorCode&)+38>
0x0000100001956c55 <???+26569813>
0x000010000195916d <v8::internal::Collator::InitializeCollator(v8::internal::Isolate*, v8::internal::Handle<v8::internal::String>, v8::internal::Handle<v8::internal::JSObject>, v8::internal::Handle<v8::internal::JSObject>)+349>
0x0000100001af5753 <v8::internal::Runtime_CreateCollator(int, v8::internal::Object**, v8::internal::Isolate*)+1267>
0x0000342b2fa092a6 <???+799052454>
0x0000342b2fcdeeeb <???+802025195>
0x0000342b2fa400e2 <???+799277282>
0x0000342b2fa09894 <???+799053972>
0x0000342b2fa34455 <???+799229013>
0x0000342b2fd3f3af <???+802419631>
0x0000342b2fdc05d9 <???+802948569>
0x0000342b2fb491f3 <???+800362995>
0x0000342b2fcfc631 <???+802145841>
0x0000342b2fb3e451 <???+800318545>
0x0000342b2fb3df7c <???+800317308>
0x0000342b2fb34841 <???+800278593>
0x0000342b2fb8236b <???+800596843>
0x0000342b2fb49ea5 <???+800366245>
0x0000342b2fd25f16 <???+802316054>
0x0000342b2fb491f3 <???+800362995>
0x0000342b2fcfc631 <???+802145841>
0x0000342b2fb3e451 <???+800318545>
0x0000342b2fb3df7c <???+800317308>
0x0000342b2fb34841 <???+800278593>
0x0000342b2fb8236b <???+800596843>
0x0000342b2fb49ea5 <???+800366245>
0x0000342b2fd2506c <???+802312300>
0x0000342b2fb491f3 <???+800362995>
0x0000342b2fcfc631 <???+802145841>
0x0000342b2fb3e451 <???+800318545>
0x0000342b2fb3df7c <???+800317308>
0x0000342b2fb34841 <???+800278593>
0x0000342b2fb8236b <???+800596843>
0x0000342b2fb49ea5 <???+800366245>
0x0000342b2fb4d4e8 <???+800380136>
0x0000342b2fb491f3 <???+800362995>
0x0000342b2fb3f88a <???+800323722>
0x0000342b2fb3e451 <???+800318545>
0x0000342b2fb3df7c <???+800317308>
0x0000342b2fb34841 <???+800278593>
0x0000342b2fb4a172 <???+800366962>
0x0000342b2fb49ea5 <???+800366245>
0x0000342b2fb495e3 <???+800364003>
0x0000342b2fb491f3 <???+800362995>
0x0000342b2fb3f88a <???+800323722>
0x0000342b2fb3e451 <???+800318545>
0x0000342b2fb3df7c <???+800317308>
0x0000342b2fb34841 <???+800278593>
0x0000342b2fb34309 <???+800277257>
0x0000342b2fb3418f <???+800276879>
0x0000342b2fa44955 <???+799295829>
0x0000342b2fa42738 <???+799287096>
0x0000342b2fa3b7a2 <???+799258530>
0x0000342b2fa2508e <???+799166606>
0x00001000018bafc6 <v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*)+198>
0x0000100001609af3 <v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*)+307>
0x0000100001613980 <v8::Function::Call(v8::Local<v8::Value>, int, v8::Local<v8::Value>*)+64>
0x00001000011fc74d <node::LoadEnvironment(node::Environment*)+621>
0x00001000011ff10f <node::Start(int, char**)+1247>
0x000000000041b171 <osv::application::run_main(std::string, int, char**)+385>
0x000000000041d988 <osv::application::run_main()+568>
0x000000000020c5a6 <osv::application::main()+86>
0x000000000041dd18 <???+4316440>
0x000000000044d6a5 <???+4511397>
0x00000000003f5477 <thread_main_c+39>
0x00000000003959a5 <???+3758501>
0x00a7000ba2024dff <???+-1576907265>
0x00000000003f4b6f <???+4148079>
0xfb89485354415540 <???+1413567808>

I have no idea what the stacktrace is telling me, @nyh would you be willing to enlighten me? 😄 It feels like there should be something small, but I can't tell what...

@nyh
Copy link
Contributor

nyh commented Jan 25, 2018

Hi Miha, you can get a backtrace with less junk - and more information (function arguments, and so on) using gdb - see explanation in https://github.com/cloudius-systems/osv/wiki/Debugging-OSv#debugging-osv-with-gdb.

It seems the ICU library code (ICU is the library which deals with unicode characters, and the likes) is crashing, but since I'm not at all familiar with it, I really have no idea if it's an ICU bug or some sort of OSv bug that for some reason manfests itself like this.

One thing this could be is a bug in OSv's locale code, which I'm guessing that icu_58::Collator::createInstance is using somehow. I thought I fixed all these bugs long ago (see for example #715, #314) but it's possible some remain (?).

@nyh
Copy link
Contributor

nyh commented Jan 25, 2018

@miha-plesko how can I reproduce this problem? I have zero knowledge on how to run nodejs code.

@wkozaczuk
Copy link
Collaborator

wkozaczuk commented Jan 25, 2018 via email

@miha-plesko
Copy link
Author

Thanks for quick response.

We're trying to run application with Capstan using node 6.10.2 package. However since I realize that it's difficult to debug via Capstan, I tried to prepare an OSv app example for you guys. I'm having troubles uploading symlink files from ROOTFS directory to the unikernel (I thought this issue was resolved long ago...), the scripts/build script fails for some reason like this:

...
Adding /wio/node_modules/fuse-box/node_modules/is-fullwidth-code-point/readme.md...
Adding /wio/node_modules/fuse-box/node_modules/is-fullwidth-code-point/package.json...

ERROR: file upload failed:
[Errno 2] No such file or directory: '/home/miha/git-repos/osv/apps/node-wio-example/ROOTFS/node_modules/react-native/ReactAndroid/src/main/jni/prebuilt/lib/armeabi-v7a/libreactnativejni-stripped.so'

on my machine, but I assume you know how to overcome this so I'm pasting the app anyway.

All you need to do is to include this directory http://x.k00.fr/cyuqg | password: 451848 in your osv/apps and then execute:

$ scripts/build -j6 image=node-wio-example

Finnally, run the unikernel and OSv should crash like described in this issue.

BTW, the original issue that was reported by Capstan user can be seen here: mikelangelo-project/capstan-packages#16

@nyh
Copy link
Contributor

nyh commented Jan 28, 2018

@miha-plesko I tried as you suggested above, and strangely, on my machine (Fedora 27 with gcc 7.2.1) I couldn't even get the "node" package to compile so I didn't get to see the symlink problem or the original bug:

../deps/v8/src/objects-body-descriptors.h:102:20: note:   candidate expects 2 arguments, 1 provided
     IterateBody(obj);
                    ^
make[3]: *** [deps/v8/tools/gyp/v8_base.target.mk:504: /home/nyh/osv.tmp2/osv/apps/node/node-6.10.3/out/Release/obj.target/v8_base/deps/v8/src/heap/scavenger.o] Error 1

Could it be that this older node.js cannot build with newer compilers?

@miha-plesko
Copy link
Author

Interesting. I can confirm that I have following gcc installed:

$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609

so the problem may be in your newer version of gcc. I assume it wouldn't help you anything if I share my compiled stuff with you, right?

@nyh
Copy link
Contributor

nyh commented Jan 28, 2018

The compilation error with gcc 7 is indeed a known bug in node.js - see nodejs/node#13574 and the patch
nodejs/node@11c7e01
I'll check if this patch works, so I can get node.js to compile and see the other bugs :-)

@nyh
Copy link
Contributor

nyh commented Jan 28, 2018

@miha-plesko after fixing node's compilation, I can now scripts/build image=node-wio-example successfully (no error messages for symbolic links, I don't know what you saw...).
It didn't work:

 scripts/run.py
OSv v0.24-486-gedaa9dc9
eth0: 192.168.122.15
module.js:471
    throw err;
    ^

Error: Cannot find module '/fuse.web.js'
    at Function.Module._resolveFilename (module.js:469:15)
    at Function.Module._load (module.js:417:25)
    at Module.runMain (module.js:604:10)
    at run (bootstrap_node.js:390:7)
    at startup (bootstrap_node.js:150:9)
    at bootstrap_node.js:505:3

But this is just because the command line in module.py was wrong - I had to change it to "/libnode.so /wio/fuse.web.js".

Then it "works" and crashes exactly like you reported originally. Good (or rather, bad ;-)).

@nyh
Copy link
Contributor

nyh commented Jan 28, 2018

The gdb stack trace is not very different:

#7  0x0000000000329f5b in mmu::vm_fault (addr=addr@entry=35184374185984, 
    ef=ef@entry=0xffff8000049a1068) at core/mmu.cc:1330
#8  0x0000000000387ff9 in page_fault (ef=0xffff8000049a1068)
    at arch/x64/mmu.cc:38
#9  <signal handler called>
#10 0x0000100001c86ff3 in icu_58::UnicodeSet::add(int const*, int, signed char) ()
#11 0x0000100001c871e6 in icu_58::UnicodeSet::add(int, int) ()
#12 0x000010000126538b in icu_58::CollationDataReader::read(icu_58::CollationTailoring const*, unsigned char const*, int, icu_58::CollationTailoring&, UErrorCode&) ()
#13 0x000010000130c5ba in icu_58::CollationRoot::load(UErrorCode&) ()
#14 0x000010000130c6a8 in icu_58::CollationRoot::getRootCacheEntry(UErrorCode&) ()
#15 0x000010000125faf0 in icu_58::CollationLoader::loadTailoring(icu_58::Locale const&, UErrorCode&) ()
#16 0x00001000012bfd05 in icu_58::Collator::makeInstance(icu_58::Locale const&, UErrorCode&) ()
#17 0x00001000012bfd97 in icu_58::Collator::createInstance(icu_58::Locale const&, UErrorCode&) ()
#18 0x00001000018fbd36 in v8::internal::(anonymous namespace)::CreateICUCollator(v8::internal::Isolate*, icu_58::Locale const&, v8::internal::Handle<v8::internal::JSObject>) ()
#19 0x00001000018fdeb0 in v8::internal::Collator::InitializeCollator(v8::internal::Isolate*, v8::internal::Handle<v8::internal::String>, v8::internal::Handle<v8::internal::JSObject>, v8::internal::Handle<v8::internal::JSObject>) ()
#20 0x0000100001a97279 in v8::internal::Runtime_CreateCollator(int, v8::internal::Object**, v8::internal::Isolate*) ()
...
#547 0x000024251a604381 in ?? ()

But one interesting thing is that the fault is at address 35184374185984, i.e., 0x200000200000. This is an mmapped address, and gdb's "osv mmap" shows what it is:

...
0x0000200000100000 0x0000200000101000 [4.0 kB]         flags=p        perm=none
0x0000200000101000 0x0000200000200000 [1020.0 kB]      flags=p        perm=rw  
0x0000200000200000 0x0000200000201000 [4.0 kB]         flags=p        perm=none
0x0000200000201000 0x0000200000204000 [12.0 kB]        flags=p        perm=rw  
...

As you can see 0x0000200000200000 is at a single page deliberately mapped without permissions, to cause this fault, so this is why we got it when this address was touched. It looks like a stack guard. I am guessing the 12 KB stack which follows it is what overflowed - this actually means a 16KB stack was requested, and a 4K guard page was taken off from it (this subtractive behavior is what both Linux and OSv do, despite Posix specifying the guard page should be added to the stack size).
I am guessing that this 12KB stack is simply not enough for OSv, especially when we consider that in OSv, when you call any "system call" the stack may go even deeper (the system call is just a normal function call) - although in this stack trace I don't see any OSv function call - it seems like a really deep call trace, over 500 calls long.

If my analysis is correct, the question becomes why does node.js use a 16 KB stack for this v8 initialization (?) code. Is this even possible? And I also don't know why this problem happens in OSv but not Linux, or why it didn't happen in other node.js applications.

@nyh
Copy link
Contributor

nyh commented Jan 28, 2018

I think I know where the problem is and how to fix this. In src/node.cc there is:

  // Don't shrink the thread's stack on FreeBSD.  Said platform decided to
  // follow the pthreads specification to the letter rather than in spirit:
  // https://lists.freebsd.org/pipermail/freebsd-current/2014-March/048885.html
#ifndef __FreeBSD__
  CHECK_EQ(0, pthread_attr_setstacksize(&attr, PTHREAD_STACK_MIN));
#endif  // __FreeBSD__

This PTHREAD_STACK_MIN is 16 KB, and apparently is not enough. When I changed this to PTHREAD_STACK_MIN*17, the crash appears to be gone!

I now get a different error, probably just a configuration error in the files you gave me (?):

Environment is:  devserver
fs.js:641
  return binding.open(pathModule._makeLong(path), stringToFlags(flags), mode);
                 ^
Error: ENOENT: no such file or directory, open 'package.json'
    at Error (native)
    at Object.fs.openSync (fs.js:641:18)
    at Object.fs.readFileSync (fs.js:509:33)
    at BuildBundle (/wio/src/lib/fuse/fuse.web.js:48:35)
    at Object.<anonymous> (/wio/fuse.web.js:10:3)
    at Module._compile (module.js:570:32)
    at Object.Module._extensions..js (module.js:579:10)
    at Module.load (module.js:487:32)
    at tryModuleLoad (module.js:446:12)
    at Function.Module._load (module.js:438:3)

Anyway, I'm not sure how we'd want to fix this. We can easily add yet another patch to node.js to get it to work on OSv, but this is not pretty (it will not help other people who are using different applications).

I think a better solution is to modify pthread_attr_setstacksize to change a request to set a stack size lower than some OSv-defined minimum, e.g., 64 KB, to this minimum.
The pthread_attr_setstacksize() manual page says that it "determines the minimum size (in bytes) that will be allocated for threads". It sounds like it's fine to allocate MORE than this specified size, and this is what we'll do.

Opinions?

@nyh
Copy link
Contributor

nyh commented Jan 28, 2018

This problem may be more complex than I though... I changed in libc/pthread.cc as follows:

int pthread_attr_setstacksize(pthread_attr_t *attr, size_t stacksize)
{
    // Linux considers the minimum reasonable stack, PTHREAD_STACK_MIN,
    // to be 16K. However, in OSv such a tiny stack would probably not
    // be enough for proper function, considering that OSv's functions
    // use the application's stack as well, unlike Linux's system calls.
    stacksize = std::max(stacksize, (size_t)256*1024);
    from_libc(attr)->stack_size = stacksize;
    return 0;
}

This 256K should be more than enough (much bigger than I would have liked), and yet the code sometimes crashes (as before) and sometimes work, about 50%/50%. I checked with "osv mmap" that indeed the mapping after the crashing address grew to 256K-4K. So we may have another problem, not (or not just) a real stack overflow.

@miha-plesko do you have any idea what the problematic RegisterDebugSignalHandler() code does? I wonder if the stuff it does somehow disagrees with OSv.

@miha-plesko
Copy link
Author

Hi @nyh I've played with this a little further and it seems like app uses some kind of framework for bundling (uglyfying the code to run in production if I put it very simple) called fuse-box that is causing problems. The 50/50 behavior you've observed probably comes from multithreading nature of this framework.

The 50% that crashes with another error (saying it cannot find some files to copy) can be resolved by setting NODE_PATH=/node_modules environment variable in bootcmd. Doing this even those 50% cases will hit to original error, or at least similar one:

Size: 4 MB in 15m 11s 75ms
Aborted

[backtrace]
0x000000000045237a <osv::generate_signal(siginfo&, exception_frame*)+58>
0x00000000004523ea <osv::handle_mmap_fault(unsigned long, int, exception_frame*)+26>
0x000000000033073a <mmu::vm_fault(unsigned long, exception_frame*)+330>
0x0000000000395bd7 <page_fault+119>
0x0000000000394a26 <???+3754534>
0x0000100001d5955f <icu_58::UnicodeSet::add(int, int)+127>
0x0000100001364bed <icu_58::CollationDataReader::read(icu_58::CollationTailoring const*, unsigned char const*, int, icu_58::CollationTailoring&, UErrorCode&)+2093>
0x00001000012977a9 <icu_58::CollationRoot::load(UErrorCode&)+233>
0x0000100001297897 <icu_58::CollationRoot::getRootCacheEntry(UErrorCode&)+87>
0x000010000136378e <icu_58::CollationLoader::loadTailoring(icu_58::Locale const&, UErrorCode&)+46>
0x000010000133d9b4 <icu_58::Collator::makeInstance(icu_58::Locale const&, UErrorCode&)+20>
0x000010000133da46 <icu_58::Collator::createInstance(icu_58::Locale const&, UErrorCode&)+38>
0x0000100001956c55 <???+26569813>
0x000010000195916d <v8::internal::Collator::InitializeCollator(v8::internal::Isolate*, v8::internal::Handle<v8::internal::String>, v8::internal::Handle<v8::internal::JSObject>, v8::internal::Handle<v8::internal::JSObject>)+349>
0x0000100001af5753 <v8::internal::Runtime_CreateCollator(int, v8::internal::Object**, v8::internal::Isolate*)+1267>
0x000005f7f76092a6 <???+-144665946>
0x000005f7f7c0c18b <???+-138362485>
0x000005f7f7c0b7c2 <???+-138364990>
0x000005f7f7609894 <???+-144664428>
0x000005f7f7634455 <???+-144489387>
0x000005f7f7b98a6f <???+-138835345>
0x000005f7f7c06699 <???+-138385767>
0x000005f7f7846873 <???+-142317453>
0x000005f7f7909b91 <???+-141517935>
0x000005f7f773ef31 <???+-143397071>
0x000005f7f773ea5c <???+-143398308>
0x000005f7f7aea6c6 <???+-139548986>
0x000005f7f779d18b <???+-143011445>
0x000005f7f7767245 <???+-143232443>
0x000005f7f7885934 <???+-142059212>
0x000005f7f79a25cd <???+-140892723>
0x000005f7f79965fb <???+-140941829>
0x000005f7f7609894 <???+-144664428>
0x000005f7f78e1549 <???+-141683383>
0x000005f7f798ea4f <???+-140973489>
0x000005f7f7967654 <???+-141134252>
0x000005f7f7967abc <???+-141133124>
0x000005f7f7944fd8 <???+-141275176>
0x000005f7f763458c <???+-144489076>
0x000005f7f7a9df39 <???+-139862215>
0x000005f7f78e10f3 <???+-141684493>
0x000005f7f7609894 <???+-144664428>
0x000005f7f7948aa5 <???+-141260123>
0x000005f7f7948716 <???+-141261034>
0x000005f7f763b7a2 <???+-144459870>
0x000005f7f762508e <???+-144551794>

Here are some links to Node.js source code:
Runtime_CreateCollator()
Collator::InitializeCollator()
but since I'm not familiar at all with locale handling and quite rusty with C++, looking at source code doesn't give me much ideas about what could be wrong... @nyh do you see anything "strange" or "OSv-unfriendly" there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants