Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout doesn't work with pathological child process--possible deadlock? #27

Closed
acruise opened this issue Oct 20, 2019 · 3 comments
Closed

Comments

@acruise
Copy link

acruise commented Oct 20, 2019

Hi there, I'm using os-lib_2.13:0.3.0 in an automated student assignment evaluation system... We need to compile their C code and run the binaries to check the output, and sometimes they're... bad.

I have a particular student program that hangs early in its execution--I can fix the program, but that's not the point--students' programs will be doing all kinds of damage, and I need os.proc to kill them if they hang. :)

The last two lines of the strace of this student program are:

fstat(1, {st_dev=makedev(0, 0x18), st_ino=3, st_mode=S_IFCHR|0600, st_nlink=1, st_uid=1000, st_gid=5, st_blksize=1024, st_blocks=0, st_rdev=makedev(0x88, 0), st_atime=1571542425 /* 2019-10-19T20:33:45.542742043-0700 */, st_atime_nsec=542742043, st_mtime=1571542425 /* 2019-10-19T20:33:45.542742043-0700 */, st_mtime_nsec=542742043, st_ctime=1571530834 /* 2019-10-19T17:20:34.582741621-0700 */, st_ctime_nsec=582741621}) = 0
futex(0xa01490, FUTEX_WAIT_PRIVATE, 2, NULL

When I run this program as a subprocess in os.proc, there are three threads that are locked forever, here's the top of their stack traces:

main thread:

"main" #1 prio=5 os_prio=0 tid=0x00007ff7dc04e800 nid=0x678c in Object.wait() [0x00007ff7e2876000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x000000071f35bc08> (a java.lang.UNIXProcess)
	at java.lang.Object.wait(Object.java:502)
	at java.lang.UNIXProcess.waitFor(UNIXProcess.java:395)
	- locked <0x000000071f35bc08> (a java.lang.UNIXProcess)
	at os.SubProcess.waitFor(SubProcess.scala:56)
	at os.proc.run$1(ProcessOps.scala:182)
	at os.proc.stream(ProcessOps.scala:186)
	at os.proc.call(ProcessOps.scala:70)

one of the I/O threads:

"Thread-12" #24 prio=5 os_prio=0 tid=0x00007ff7dd417000 nid=0x67c8 runnable [0x00007ff7bdffb000]
   java.lang.Thread.State: RUNNABLE
	at java.io.FileInputStream.readBytes(Native Method)
	at java.io.FileInputStream.read(FileInputStream.java:255)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
	- locked <0x000000071f35df28> (a java.lang.UNIXProcess$ProcessPipeInputStream)
	at os.SubProcess$OutputStream.read(SubProcess.scala:177)
	- locked <0x000000071f362260> (a os.SubProcess$OutputStream)
	at os.SubProcess$OutputStream.read(SubProcess.scala:165)
	- locked <0x000000071f362260> (a os.SubProcess$OutputStream)
	at os.Internals$.transfer0(Internals.scala:17)
	at os.proc$$anon$2.run(ProcessOps.scala:132)
	at java.lang.Thread.run(Thread.java:748)

the other I/O thread:

"Thread-11" #23 prio=5 os_prio=0 tid=0x00007ff7dd415000 nid=0x67c7 runnable [0x00007ff7bdefa000]
   java.lang.Thread.State: RUNNABLE
	at java.io.FileInputStream.readBytes(Native Method)
	at java.io.FileInputStream.read(FileInputStream.java:255)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
	- locked <0x000000071f35be68> (a java.lang.UNIXProcess$ProcessPipeInputStream)
	at os.SubProcess$OutputStream.read(SubProcess.scala:177)
	- locked <0x000000071f3600e0> (a os.SubProcess$OutputStream)
	at os.SubProcess$OutputStream.read(SubProcess.scala:165)
	- locked <0x000000071f3600e0> (a os.SubProcess$OutputStream)
	at os.Internals$.transfer0(Internals.scala:17)
	at os.proc$$anon$1.run(ProcessOps.scala:122)
	at java.lang.Thread.run(Thread.java:748)

this is the bit of my code that's running the subprocess:

  private def run_!(command: List[String], cwd: Option[File]): (Int, ByteString, ByteString) = {
    val proc = os.proc(command)

    val result = proc.call(
      cwd = cwd.map(os.Path(_)).orNull,
      timeout=5000L,
      check=false
    )

    (result.exitCode, ByteString(result.out.bytes), ByteString(result.err.bytes))
  }
@acruise
Copy link
Author

acruise commented Oct 20, 2019

Much as it pains me to have to say this: commons-exec works fine here; it kills even the most stubborn subprocesses after the timeout elapses.

  private def run_!(command: List[String], cwd: Option[File]): (Int, ByteString, ByteString) = {
    val out = new ByteArrayOutputStream()
    val err = new ByteArrayOutputStream()

    val exec = new DefaultExecutor()
    cwd.foreach(exec.setWorkingDirectory)
    exec.setWatchdog(new ExecuteWatchdog(5000L))
    exec.setStreamHandler(new PumpStreamHandler(out, err))
    exec.setExitValues(null)

    val cmd = new CommandLine(command.head)
    cmd.addArguments(command.tail.toArray)
    val exit = exec.execute(cmd)
    val stdout = out.toByteArray
    val stderr = err.toByteArray

    (exit, ByteString(stdout), ByteString(stderr))
  }

@lihaoyi-databricks
Copy link
Collaborator

Wouldn't be surprised if there was some misbehavior in there. As a workaround, you could probably use os.proc.spawn, Thread.sleep, and .destroy

@lihaoyi-databricks
Copy link
Collaborator

should be fixed in master I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants