Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wedges system #9

Closed
davidw opened this issue May 14, 2013 · 14 comments
Closed

wedges system #9

davidw opened this issue May 14, 2013 · 14 comments

Comments

@davidw
Copy link

davidw commented May 14, 2013

I'm pretty sure I must be doing something wrong, but this code makes my system hang when I try and run it:

syslog:start(). 
{ok,<0.195.0>}
(xyz@10.0.0.44)2> 
(xyz@10.0.0.44)2> {ok,Log} = syslog:open("erlsyslog", [cons, perror, pid], user).  

At this point it just hangs. I tracked things down in the C code to where it does this:

        nd->port = driver_create_port(port, pid, DRV_NAME, data);

But I have no idea what it locks up there. I suspect I'm doing something wrong, but it'd be nice if I got a gentle message rather than a hung system.

@jlouis
Copy link

jlouis commented May 14, 2013

For the record, I am seeing the same behaviour, R16B OSX Mountain Lion.

@lemenkov
Copy link

Me too. R16B, Linux x86_64.

@jlouis
Copy link

jlouis commented May 14, 2013

I am willing to bet that the problem has to do with locking and how R16B handles ports differently (asynchronously) compared to R15B. I'll probably take a deeper look into the code tomorrow.

@jlouis
Copy link

jlouis commented May 14, 2013

Confirmed that it works in OTP_R15B02-17-g1a4e8f6 so it is something new which has made the problem appear. There are many port changes in R16B, so perhaps thats it.

@jlouis
Copy link

jlouis commented May 14, 2013

A debug version of the Erlang VM seems to have caught something:

3> {ok,Log} = syslog:open("erlsyslog", [cons, perror, pid], user).
Locking 'driver_lock:syslog_drv[mutex]' lock which is already locked by thread!
Currently these locks are locked by the scheduler 2 thread:
  'driver_lock:syslog_drv[mutex]'

Program received signal SIGABRT, Aborted.
[Switching to process 30647 thread 0x2103]
0x00007fff9226c212 in __pthread_kill ()

@Vagabond
Copy link
Owner

Paging doctor @vinoski.

@vinoski
Copy link
Contributor

vinoski commented May 14, 2013

Looking into it.

@jlouis
Copy link

jlouis commented May 15, 2013

Ok, a backtrace shows the problem occurs in io.c:827 where we call create_port. Now this call will try to take the driver_lock in io.c:344, but this is trouble! The erl_driver already runs in the DRIVER LOCK mode (opposed to PORT LOCK mode). This means that then our syslogdrv_control executes, we are holding that driver lock already. This leads to a double-lock which hangs the VM.

The inet_drv uses the same style where a port driver is created from its control callback, but that driver uses PORT LEVEL LOCKING and thus it seems to avoid it since the port has the lock instead and that is not a problem to grab. The question is if it is safe to enable PORT LOCKING for this driver as well.

So the question is really if it is safe to have multiple callers executing inside the syslog_drv.c code.

lemenkov added a commit to lemenkov/erlsyslog that referenced this issue May 16, 2013
See this thread for further details (as well as nice explanation from @jlouis):

Vagabond/erlang-syslog#9

Signed-off-by: Peter Lemenkov <lemenkov@gmail.com>
@jlouis
Copy link

jlouis commented May 16, 2013

I have been looking tinto this some more. There are concerns with the fix @lemenkov proposed I think. The concern is that openlog() and closelog() can be called with a race I think. They are not re-entrant, and I am not sure we have access to the reentrant versions, openlog_r() and closelog_r(). So I think we may open up a potential race between multiple schedulers trying to execute in the code section.

One possible solution is to protect the necessary syslog calls by a mutex, so only one scheduler is able to operate on the call itself at a time. This would mend the race window, but the change is a bit more involved then to fix.

@vinoski
Copy link
Contributor

vinoski commented May 16, 2013

I have a completely different fix that refactors the code to move the port creation out of the C code up into Erlang instead. I think it simplifies everything.

vinoski added a commit to vinoski/erlang-syslog that referenced this issue May 16, 2013
With R16B changes related to locking within the area of port drivers seemed
to result in the syslog driver hanging on the driver_create_port call. To
avoid this, move port creation and closing from C into Erlang. Rework the
driver control interface to simplify the open call to just the setting of
the logopt and facility on a new port. Remove the opening and closing of a
port in the gen_server init and terminate as that port is no longer needed
to communicate with the driver. Move the open call out of the gen_server
and do it in the caller's process instead.
@jlouis
Copy link

jlouis commented May 16, 2013

I like your fix better than mine. It is a move in a direction I like more than introducing more mutexes.

@davidw
Copy link
Author

davidw commented May 21, 2013

Just thought I'd mention here as well that @vinoski 's code appears to work for me, too, although now I'm a bit gunshy of using a compiled in port...

@jlouis
Copy link

jlouis commented May 21, 2013

I am running it in production as well and it looks like it works fine.

Vagabond added a commit that referenced this issue Jun 17, 2013
move port creation to Erlang to fix issue #9
@davidw
Copy link
Author

davidw commented Dec 17, 2014

Thanks to @vinoski this has been fixed, it seems.

@davidw davidw closed this as completed Dec 17, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants