Skip to content
This repository has been archived by the owner on Mar 4, 2021. It is now read-only.

BasicChaosMonkey.doMonkeyBusiness() method exit without finishing its job #274

Open
yufengJ opened this issue Sep 9, 2016 · 3 comments
Open

Comments

@yufengJ
Copy link

yufengJ commented Sep 9, 2016

Hi all,

I've observed that during BasicChaosMonkey.doMonkeyBusiness(), the method suddenly returned without finishing rest of it's happy-path. There's no exception nor error messages.

The jettyRun output is as follow:

2016-09-08 16:31:16.328 - INFO  BasicChaosInstanceSelector - [BasicChaosInstanceSelector.java:65] Randomly selecting 1 from 3 instances, excluding null
2016-09-08 16:31:16.563 - INFO  Monkey - [Monkey.java:138] Reporting what I did...

I've set up the debugger to trace this. The code end up into org.jclouds.ContextBuilde.
The stack dump is:

"pool-1-thread-1@9515" prio=5 tid=0x1d nid=NA runnable
  java.lang.Thread.State: RUNNABLE
    at org.jclouds.ContextBuilder.buildView(ContextBuilder.java:588)
    at com.netflix.simianarmy.client.aws.AWSClient.getJcloudsComputeService(AWSClient.java:818)
    - locked <0x2989> (a com.netflix.simianarmy.client.aws.AWSClient)
    at com.netflix.simianarmy.client.aws.AWSClient.connectSsh(AWSClient.java:834)
    at com.netflix.simianarmy.chaos.ChaosInstance.connectSsh(ChaosInstance.java:123)
    at com.netflix.simianarmy.chaos.ChaosInstance.canConnectSsh(ChaosInstance.java:101)
    at com.netflix.simianarmy.chaos.ScriptChaosType.canApply(ScriptChaosType.java:60)
    at com.netflix.simianarmy.basic.chaos.BasicChaosMonkey.pickChaosType(BasicChaosMonkey.java:141)
    at com.netflix.simianarmy.basic.chaos.BasicChaosMonkey.doMonkeyBusiness(BasicChaosMonkey.java:121)
    at com.netflix.simianarmy.Monkey.run(Monkey.java:134)
    at com.netflix.simianarmy.Monkey$1.run(Monkey.java:155)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

I've observed the issue on master branch and tag v2.5.1.
Tag v2.5.0 is fine though and I was using it well. So i am suspecting it's because some dependency changes in between that is causing this. However a diff between build.gradle of different tags shows me that jcloud is not upgraded during these two tags. So i am confused as where to go next.

$ diff master_branch/build.gradle tag_v2.5.0/build.gradle
1,6d0
< buildscript {
<     repositories {
<         jcenter()
<     }
< }
<
8c2
<     id 'nebula.netflixoss' version '3.2.3'

---
>     id 'nebula.netflixoss' version '2.2.9'
18c12
< repositories {

---
> repositories {
26,28d19
< sourceCompatibility = 1.7
< targetCompatibility = 1.7
<
36c27,28
<     compile 'com.sun.jersey:jersey-servlet:1.19'

---
>     compile 'com.sun.jersey:jersey-core:1.11'
>     compile 'com.sun.jersey:jersey-servlet:1.11'
40c32,34
<     compile 'com.netflix.eureka:eureka-client:1.4.1'

---
>     compile('com.netflix.eureka:eureka-client:1.1.22') {
>         exclude group: 'com.sun.jersey', module: 'jersey-bundle'
>     }
49a44
>     compile 'ch.qos.logback:logback-classic:1.0.13'
51,52d45
<     compile 'org.springframework:spring-jdbc:4.2.5.RELEASE'
<     compile 'com.zaxxer:HikariCP:2.4.7'

I might dig deeper into this. Has anyone got this issue before?

@ebukoski
Copy link
Contributor

ebukoski commented Sep 9, 2016

I ran into something very similar with Janitor: it died in an AWS API call
with no log message. To trace it, I created a one-off JSP that invoked the
same API call so I could get a full stack trace. In my case it was a
version mismatch between the AWS client library in open source SimianArmy
and a different AWS client jar that was being pulled in by our non-open
source version.

I upgraded the AWS client to 1.11.9 and it resolved the issue for me. I
have an open PR to introduce this to the main code line.

On Thu, Sep 8, 2016 at 6:03 PM, Yufeng notifications@github.com wrote:

Hi all,

I've observed that during BasicChaosMonkey.doMonkeyBusiness(), the method
suddenly returned without finishing rest of it's happy-path. There's no
exception nor error messages.

The jettyRun output is as follow:

2016-09-08 16:31:16.328 - INFO BasicChaosInstanceSelector - [BasicChaosInstanceSelector.java:65] Randomly selecting 1 from 3 instances, excluding null
2016-09-08 16:31:16.563 - INFO Monkey - [Monkey.java:138] Reporting what I did...

I've set up the debugger to trace this. The code end up into
org.jclouds.ContextBuilde.
The stack dump is:

"pool-1-thread-1@9515" prio=5 tid=0x1d nid=NA runnable
java.lang.Thread.State: RUNNABLE
at org.jclouds.ContextBuilder.buildView(ContextBuilder.java:588)
at com.netflix.simianarmy.client.aws.AWSClient.getJcloudsComputeService(AWSClient.java:818)
- locked <0x2989> (a com.netflix.simianarmy.client.aws.AWSClient)
at com.netflix.simianarmy.client.aws.AWSClient.connectSsh(AWSClient.java:834)
at com.netflix.simianarmy.chaos.ChaosInstance.connectSsh(ChaosInstance.java:123)
at com.netflix.simianarmy.chaos.ChaosInstance.canConnectSsh(ChaosInstance.java:101)
at com.netflix.simianarmy.chaos.ScriptChaosType.canApply(ScriptChaosType.java:60)
at com.netflix.simianarmy.basic.chaos.BasicChaosMonkey.pickChaosType(BasicChaosMonkey.java:141)
at com.netflix.simianarmy.basic.chaos.BasicChaosMonkey.doMonkeyBusiness(BasicChaosMonkey.java:121)
at com.netflix.simianarmy.Monkey.run(Monkey.java:134)
at com.netflix.simianarmy.Monkey$1.run(Monkey.java:155)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I've observed the issue on master branch and tag v2.5.1.
Tag v2.5.0 is fine though and I was using it well. So i am suspecting it's
because some dependency changes in between that is causing this. However a
diff between build.gradle of different tags shows me that jcloud is not
upgraded during these two tags. So i am confused as where to go next.

$ diff master_branch/build.gradle tag_v2.5.0/build.gradle
1,6d0
< buildscript {
< repositories {
< jcenter()
< }
< }
<
8c2

< id 'nebula.netflixoss' version '3.2.3'

id 'nebula.netflixoss' version '2.2.9'

18c12

< repositories {

repositories {
26,28d19
< sourceCompatibility = 1.7
< targetCompatibility = 1.7
<
36c27,28

< compile 'com.sun.jersey:jersey-servlet:1.19'

compile 'com.sun.jersey:jersey-core:1.11'
compile 'com.sun.jersey:jersey-servlet:1.11'

40c32,34

< compile 'com.netflix.eureka:eureka-client:1.4.1'

compile('com.netflix.eureka:eureka-client:1.1.22') {
    exclude group: 'com.sun.jersey', module: 'jersey-bundle'
}

49a44
compile 'ch.qos.logback:logback-classic:1.0.13'
51,52d45
< compile 'org.springframework:spring-jdbc:4.2.5.RELEASE'
< compile 'com.zaxxer:HikariCP:2.4.7'

I might dig deeper into this. Has anyone got this issue before?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#274, or mute the thread
https://github.com/notifications/unsubscribe-auth/AKXxgfImWXScybt-Yx2W8lvb6gm0po5Pks5qoLBbgaJpZM4J4mfs
.

@yufengJ
Copy link
Author

yufengJ commented Sep 11, 2016

Thanks for suggestions!
It turned out it's the same issue as #259.

Problem was fixed by fixing the dependency

compile ('com.netflix.eureka:eureka-client:1.4.1') {
        exclude group: 'com.google.inject'
}

@pwhitham
Copy link

Nice! I ran into this just recently and the dependency exclusion also solved the issue for me

Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants