Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nodejs: web server sample stuck in a epoll_wait loop #512

Closed
rads-1996 opened this issue Jun 28, 2021 · 16 comments
Closed

Nodejs: web server sample stuck in a epoll_wait loop #512

rads-1996 opened this issue Jun 28, 2021 · 16 comments
Labels
area/kernel Area: Kernel enhancement New feature or request severity/moderate Severity: Moderate status/triaged Status: Triaged

Comments

@rads-1996
Copy link
Contributor

Hello,

I am trying to build a confidential app using Nodejs and Mystikos. I am not able to run even a simple Nodejs app.
This is my Nodejs file:

const http = require('http');

const hostname = '127.0.0.1';
const port = 3000;

const server = http.createServer((req, res) => {
res.statusCode = 200;
res.setHeader('Content-Type', 'text/plain');
res.end('Hello World');
});

server.listen(port, hostname, () => {
console.log(`Server running at http://${hostname}:${port}/`);
});

and this is my Dockerfile:

FROM node:11-alpine

WORKDIR /app
COPY app.js /app/app.js

These are the instructions I am using to run it with Mystikos:

myst-appbuilder Dockerfile
myst mkcpio appdir rootfs
myst exec-linux rootfs /usr/local/bin/node /app/app.js

I see this error message:

Segmentation fault (core dumped)

Would appreciate any pointers on how to debug this further.

@paulcallen paulcallen added severity/moderate Severity: Moderate status/triaged Status: Triaged area/kernel Area: Kernel labels Jun 28, 2021
@paulcallen
Copy link
Member

try running with --strace and --etrace to see if this is due to some unsupported syscall.
We have not looked at Java or Javascript yet. Current focus has been on c/c++, .net and python. We will investigate more going forwards.

@bodzhang bodzhang added the enhancement New feature or request label Jun 29, 2021
@rads-1996
Copy link
Contributor Author

Thanks for getting back. I tried running with strace and got this error:

=== SYS_prlimit64(pid=0, resource=3, new_rlim=(nil), old_rlim=0x7f57376d6940): tid=101   
SYS_prlimit64(): return=-EINVAL(-22): tid=101
=== SYS_rt_sigprocmask(how=0 set=0x7f57377bb318 oldset=0x7f57376d6850): tid=101   
SYS_rt_sigprocmask(): return=0(0): tid=101
=== SYS_tkill(tid=101 sig=6): tid=101   
SYS_tkill(): return=0(0): tid=101
=== SYS_rt_sigprocmask(how=2 set=0x7f57376d6850 oldset=(nil)): tid=101   
SYS_rt_sigprocmask(): return=0(0): tid=101
Makefile:23: recipe for target 'run' failed
make: *** [run] Segmentation fault (core dumped)

Fetching prlimit64 for RLIMIT_STACK is throwing an error. Any ideas about what to do next?

@paulcallen paulcallen changed the title Nodejs segmentation fault Nodejs: missing support in SYS_prlimit64 causes Nodejs to abort process Jun 30, 2021
@bodzhang
Copy link
Collaborator

The crash looks like a bug in the abort signal (sig=6) handling. @mikbras is looking into it.

The specific parameter NodeJs runtime queried using SYS_prlimit64 is not supported by the Mystikos implementation yet. With the latest Node package (changing the Docker file to use "FROM node:alpine"), the SYS_prlimit64 query is not treated by NodeJs runtime as a fatal error, and I can see the STRACE progress further, but failed differently.

@rads-1996
Copy link
Contributor Author

Thanks for looking into this. I have raised a PR to fix SYS_prlimit64 for RLIMIT_STACK. After making this change I have run into another issue. Here is the STRACE around the failure -

=== SYS_mmap(addr=2795d8059000 length=134217728(8000000) prot=0 flags=16418 fd=-1 offset=0): tid=101
=== SYS_epoll_pwait(edpf=9 events=0x7f244664f900 maxevents=1024 timeout=-1 sigmask=0): tid=102
    SYS_mmap(): return=-EINVAL(-22): tid=101
    SYS_epoll_pwait(): return=1(1): tid=102
=== SYS_mmap(addr=2795d8059000 length=134217728(8000000) prot=0 flags=16418 fd=-1 offset=0): tid=101
=== SYS_clock_gettime(clk_id=6 tp=0x7f244664f870): tid=102
    SYS_mmap(): return=-EINVAL(-22): tid=101
    SYS_clock_gettime(): return=0(0): tid=102
=== SYS_epoll_pwait(edpf=9 events=0x7f244664f900 maxevents=1024 timeout=-1 sigmask=0): tid=102
=== SYS_writev(fd=2 iov=0x7f2448a896d0 iovcnt=2): tid=101

#
# Fatal process OOM in CodeRange setup: allocate virtual memory
#

At this point mmap with fixed address is failing and the application hangs.

@bodzhang
Copy link
Collaborator

bodzhang commented Jul 1, 2021

The bug in Abort Signal handling flow was fixed by PR #530

@bodzhang
Copy link
Collaborator

bodzhang commented Jul 2, 2021

The SYS_mmap() error is due to current Mystikos mmap() implementation limitation. It does not support requesting specific address. With the modified mmap() implementation and several other changes, in my fork, the sample test can move forward further, but eventually failed with NodeJS traceback:

tty.js:87
   throw new ERR_TTY_INIT_FAILED(ctx);
   ^

SystemError [ERR_TTY_INIT_FAILED]: TTY initialization failed: uv_tty_init returned ENOTSUP (operation not supported on socket)
   at new WriteStream (tty.js:87:11)
   at createWritableStdioStream (internal/process/stdio.js:164:16)
   at process.getStdout [as stdout] (internal/process/stdio.js:33:14)
   at Console.get (internal/console/constructor.js:153:38)
   at Console.(anonymous function) (internal/console/constructor.js:276:46)
   at Console.log (internal/console/constructor.js:287:61)
   at Object.<anonymous> (/app/app.js:1:9)
   at Module._compile (internal/modules/cjs/loader.js:816:30)
   at Object.Module._extensions..js (internal/modules/cjs/loader.js:827:10)
   at Module.load (internal/modules/cjs/loader.js:685:32)

@rads-1996
Copy link
Contributor Author

I tried running Nodejs 11 with your modified mmap implementation and it still shows a failure (segmentation fault).

I then tried running it with Nodejs 9 instead. In this case the program is hanging. This is happening because poll() is failing for /dev/random. I made the following fix and the server is now able to start and process requests:

diff --git a/kernel/ramfs.c b/kernel/ramfs.c
index 83d237c5..6d2fc5f9 100644
--- a/kernel/ramfs.c
+++ b/kernel/ramfs.c
@@ -2292,7 +2292,9 @@ static int _fs_get_events(myst_fs_t* fs, myst_file_t* file)
     if (!_ramfs_valid(ramfs) || !_file_valid(file))
         ERAISE(-EINVAL);
 
-    ret = -ENOTSUP;
+    /* Regular files always poll TRUE for reads and writes */
+    ret |= POLLIN;
+    ret |= POLLOUT;
 
 done:
     return ret;

If this looks fine I can add it to the existing RLIMIT PR #526.

@bodzhang
Copy link
Collaborator

bodzhang commented Jul 7, 2021

@rads-1996 , I think it's better to have a separate PR for /dev/random poll support. BTW, the segment fault you encountered is due to the unsupported statx syscall triggering myst_panic(). After I changed the implementation to return ENOSYS, I got the NodeJS traceback with NodeJS 11. I missed the push to my fork. The ENOSYS change on statx syscall is now in the fork. Sorry for that.

@rads-1996
Copy link
Contributor Author

No worries, I will try the fix wih Nodejs 11 again. I will submit a separate PR for /dev/random poll support as you suggested.
With your mmap fix and poll support, Nodejs 9 works. Should we add a regression test for the same?

@bodzhang
Copy link
Collaborator

bodzhang commented Jul 9, 2021

mmap fix to support mapping request with a preferred address is merged - PR #545

@bodzhang
Copy link
Collaborator

@rads-1996 , can you review the suggested changes in PR #526?

@rads-1996
Copy link
Contributor Author

@bodzhang I have updated the PR based on your suggestion. Kindly review.

@bodzhang
Copy link
Collaborator

PR #526 merged

@rads-1996
Copy link
Contributor Author

Thanks! I have opened PR #586 for the /dev/random poll issue.
There seems to have been a regression. After this fix the server starts up but is not able to accept requests. Looks like it is stuck in a epoll_wait loop.

@bodzhang bodzhang changed the title Nodejs: missing support in SYS_prlimit64 causes Nodejs to abort process Nodejs: web server sample stuck in a epoll_wait loop Jul 15, 2021
@bodzhang
Copy link
Collaborator

@rads-1996 , The epoll_wait loop issue was addressed by PR #615, and your PR #586 was also merged. Can you add a nodejs_webserver regression test under /solutions? You might want to take a look at /samples/pytorch/Makefile for a reliable mechanism to wait for the webserver to initialize.

@rads-1996
Copy link
Contributor Author

Thank you for the issue resolution. I will submit a PR along with the test case.
Confirming that nodejs9 is working now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kernel Area: Kernel enhancement New feature or request severity/moderate Severity: Moderate status/triaged Status: Triaged
Projects
None yet
Development

No branches or pull requests

3 participants