Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClojureScript tests regularly hang/time out #578

Closed
5 tasks
plexus opened this issue Dec 27, 2018 · 11 comments
Closed
5 tasks

ClojureScript tests regularly hang/time out #578

plexus opened this issue Dec 27, 2018 · 11 comments

Comments

@plexus
Copy link
Contributor

plexus commented Dec 27, 2018

This is a spin-off of #569. I've managed to find and fix all "real" test failures, that leaves us with one more issue, being that regularly (one time out of two) a ClojureScript build will stop in the middle of a test and just hang there until after ten minutes the build times out.

Examples: (I'll add more links as I come across them)

This has been observed on different Java version (8, 11) and ClojureScript versions (1.9, 1.10), so it seems it's a general problem, some kind of race condition. My hunch is that a message is being sent to the ClojureScript environment before it's properly able to receive it, and so a reply is never received.

This is a pain, especially since I'm not sure if this will be reproducible locally (or consistently). Still here are some ideas that could help to pinpoint the issue.

  • Add a flag to generate output similar to *nrepl-messages*, so you can see which messages go back and forth during testing
  • Find all places where exceptions are being swallowed, and make that at least during testing they get printed out (I came across one or two of these already)
  • start a background shell in the Makefile that waits five minutes and then calls jstack, so we can see where it's blocking
  • print test var names as they run, not just namespaces, so we can see which tests in particular suffer from this
  • Review the code path of these tests, where are we waiting for an async response? What can cause it not to arrive?
@plexus
Copy link
Contributor Author

plexus commented Dec 27, 2018

Relatedly: is there an opportunity to simplify the ClojureScript test setup as @arichiardi hinted at in #555?

@bbatsov
Copy link
Member

bbatsov commented Dec 27, 2018

Relatedly: is there an opportunity to simplify the ClojureScript test setup as @arichiardi hinted at in #555?

Isn't he asking for ClojureScript support in cider-test?

@bbatsov
Copy link
Member

bbatsov commented Dec 27, 2018

This has been observed on different Java version (8, 11) and ClojureScript versions (1.9, 1.10), so it seems it's a general problem, some kind of race condition. My hunch is that a message is being sent to the ClojureScript environment before it's properly able to receive it, and so a reply is never received.

It's interesting to see if those race conditions will still happen with nREPL 0.6-snapshot, as @cgrand simplified recently a lot the code evaluation logic (see nrepl/nrepl#98).

Add a flag to generate output similar to nrepl-messages, so you can see which messages go back and forth during testing

We're working on something like this here nrepl/nrepl#87

Find all places where exceptions are being swallowed, and make that at least during testing they get printed out (I came across one or two of these already)

That's a good idea and probably shouldn't be hard to do.

start a background shell in the Makefile that waits five minutes and then calls jstack, so we can see where it's blocking

Same here.

print test var names as they run, not just namespaces, so we can see which tests in particular suffer from this

Is there an easy way to do so?

This was referenced Dec 28, 2018
@plexus
Copy link
Contributor Author

plexus commented Dec 30, 2018

As a first step I added the jstack timeout thing in #583

@bbatsov
Copy link
Member

bbatsov commented Dec 30, 2018

@plexus Btw, when are we doing the switchover to Circle CI? I do hope that it will at least reduce the timeout issues.

@plexus
Copy link
Contributor Author

plexus commented Dec 30, 2018

Btw, when are we doing the switchover to Circle CI?

Nothing stopping us, I got the configuration sitting here, I can make a PR. Probably not today though.

@bbatsov
Copy link
Member

bbatsov commented Dec 30, 2018

👍

@plexus
Copy link
Contributor Author

plexus commented Dec 31, 2018

Here is a build log that includes stacktraces after the tests hang, this could be a good starting point for diagnosing the issue.

https://travis-ci.org/clojure-emacs/cider-nrepl/jobs/473545815

@bbatsov
Copy link
Member

bbatsov commented Dec 31, 2018

Look pretty good! Excellent work!

@plexus
Copy link
Contributor Author

plexus commented Jan 1, 2019

Looking at the stack traces this is where it blocks, it's in cljs.node.repl, trying to initialize the connection to Node.

https://github.com/clojure/clojurescript/blob/r1.8.51/src/main/clojure/cljs/repl/node.clj#L128

I'm guessing the connection to Node continuously fails, the exception gets swallowed and the loop gets executed again, over and over.

The only logic in there that is likely to fail is creating the Socket. (code), which can throw an IOException if the connection fails... We should try to get some insight into what's happening on the node side. Does the node process start? Does it crash? Does it fail to start the REPL server?

@bbatsov
Copy link
Member

bbatsov commented Jan 1, 2019

Interesting insight! I can't imagine why something so basic would be failing and I wonder whether it's somehow related to the container's environment, given the randomness of the failures.

I also wonder if one way to fix this would be to replace the Node repl with a Nashorn REPL. Thinking a bit about the current usage of Node in the tests I recalled that back in the day we couldn't use Rhino, as its implementation was weird in many ways and we couldn't use Nashorn, because we still supported Java 6 and 7. On the other hand Oracle surprisingly announced some plans to pull the plug on Nashorn, so I'm not sure how good of an idea would be to rely on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants