Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception in the Supervisor #127

Closed
acaproni opened this issue Sep 27, 2018 · 5 comments
Closed

Exception in the Supervisor #127

acaproni opened this issue Sep 27, 2018 · 5 comments
Labels

Comments

@acaproni
Copy link
Member

The Weather Supervisor started at 2018-09-27T21:13:53.700 and threw the following exception:

2018-09-27T21:26:07.810 | ERROR [KafkaSubscriber.scala:64] [org.eso.ias.kafkautils.SimpleStringConsumer-Thread] Subscriber of [Supervisor-Weather] got an error processing event [IASValue: id=[Array-AntennasToPads], runningID=(array:MONITORED_SOFTWARE_SYSTEM)@(ANTPAD:PLUGIN)@(ConverterID:CONVERTER)@(Array-AntennasToPads:IASIO), read from Monitored System at 2018-09-27T21:26:07.598, produced by plugin at 2018-09-27T21:26:07.598, sent by plugin to converter at 2018-09-27T21:26:07.598, received by converter at 2018-09-27T21:26:07.601, processed by converter at 2018-09-27T21:26:07.601, sent to BSDB at 2018-09-27T21:26:07.601, mode=OPERATIONAL, type=STRING, value=DV01:A045,DV02:A104,DV03:A049,DV04:A066,DV05:A007,DV06:A027,DV07:A001,DV08:A042,DV09:A074,DV10:A062,DV11:A044,DV12:A092,DV13:A111,DV14:A087,DV15:A047,DV16:A069,DV17:A060,DV18:A075,DV19:A093,DV20:A072,DV21:A011,DV22:A083,DV23:A022,DV24:A088,DV25:A086,DA41:A023,DA42:A008,DA43:A058,DA44:A016,DA45:A134,DA46:A096,DA47:A035,DA48:A070,DA49:A024,DA50:A105,DA51:A101,DA52:A082,DA53:A033,DA54:A073,DA55:A108,DA56:A091,DA57:A089,DA58:A090,DA59:A076,DA60:A043,DA61:A094,DA62:A135,DA63:A085,DA64:A015,DA65:A068,CM01:N602,CM02:J502,CM03:J503,CM04:N605,CM05:J506,CM06:N606,CM07:N601,CM08:J505,CM09:N603,CM10:J501,CM11:N604,CM12:J504,PM01:T703,PM02:T701,PM03:T702,PM04:T704]
java.lang.IllegalArgumentException: requirement failed: Cannot calc the validity if there is no output
        at scala.Predef$.require(Predef.scala:277)
        at org.eso.ias.dasu.DasuImpl.calcOutputValidity(DasuImpl.scala:368)
        at org.eso.ias.dasu.DasuImpl.$anonfun$updateAndPublishOutput$1(DasuImpl.scala:430)
        at org.eso.ias.dasu.DasuImpl.$anonfun$updateAndPublishOutput$1$adapted(DasuImpl.scala:422)
        at scala.Option.foreach(Option.scala:257)
        at org.eso.ias.dasu.DasuImpl.org$eso$ias$dasu$DasuImpl$$updateAndPublishOutput(DasuImpl.scala:422)
        at org.eso.ias.dasu.DasuImpl.inputsReceived(DasuImpl.scala:331)
        at org.eso.ias.supervisor.Supervisor.$anonfun$inputsReceived$2(Supervisor.scala:271)
        at org.eso.ias.supervisor.Supervisor.$anonfun$inputsReceived$2$adapted(Supervisor.scala:266)
        at scala.collection.Iterator.foreach(Iterator.scala:944)
        at scala.collection.Iterator.foreach$(Iterator.scala:944)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1432)
        at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:210)
        at org.eso.ias.supervisor.Supervisor.inputsReceived(Supervisor.scala:266)
        at org.eso.ias.dasu.subscriber.KafkaSubscriber.$anonfun$iasioReceived$1(KafkaSubscriber.scala:61)
        at org.eso.ias.dasu.subscriber.KafkaSubscriber.$anonfun$iasioReceived$1$adapted(KafkaSubscriber.scala:61)
        at scala.Option.foreach(Option.scala:257)
        at org.eso.ias.dasu.subscriber.KafkaSubscriber.iasioReceived(KafkaSubscriber.scala:61)
        at org.eso.ias.kafkautils.KafkaIasiosConsumer.stringEventReceived(KafkaIasiosConsumer.java:152)
        at org.eso.ias.kafkautils.SimpleStringConsumer.notifyListener(SimpleStringConsumer.java:430)
        at org.eso.ias.kafkautils.SimpleStringConsumer.run(SimpleStringConsumer.java:412)
        at java.lang.Thread.run(Thread.java:748)

The Supervisor continued to work for about 5 minutes then stopped processing events. The HB was still published

@acaproni acaproni added the Bug label Sep 27, 2018
@acaproni
Copy link
Member Author

acaproni commented Sep 27, 2018

The error seems to be reproducible.

Apparently the DASU expects 3 inputs that arrive in different times. When all of them arrive the DASU is supposed to run the TF and produce the output but this is not what happens. The TF is effectively executed 4 seconds later.

On more issue is that the SimpleStringConsumer is coupled with the DASU so that an error at DASU level will trigger a failure in the thread that poll kafka topic for new inputs and the entire Supervisor stops working.

@acaproni
Copy link
Member Author

acaproni commented Oct 1, 2018

Supervisor-Antennas reports a huge (810) number of threads:

2018-10-01T19:53:46.178 | INFO  [SupervisorStatistics.scala:86] [pool-397-thread-1] Stats: 24060 used heap 73887Kb; alive thrreads 810; IASIOs processed so far 0 (2406/min); input in the last interval 24047

@acaproni
Copy link
Member Author

acaproni commented Oct 1, 2018

There is also an exception logged in another Supervisor (Supervisor-Weather):

2018-10-01T18:55:28.404 | ERROR [SimpleStringConsumer.java:415] [org.eso.ias.kafkautils.SimpleStringConsumer-Thread] Consumer [Supervisor-WeatherConsumer] got an exception got processing events: records lost!
java.lang.AssertionError: assertion failed: DASU [Dasu-WS-W-WindSpeed] Cannot calc the validity if there is no output
        at scala.Predef$.assert(Predef.scala:219)
        at org.eso.ias.dasu.DasuImpl.calcOutputValidity(DasuImpl.scala:368)
        at org.eso.ias.dasu.DasuImpl.$anonfun$updateAndPublishOutput$1(DasuImpl.scala:430)
        at org.eso.ias.dasu.DasuImpl.$anonfun$updateAndPublishOutput$1$adapted(DasuImpl.scala:422)
        at scala.Option.foreach(Option.scala:257)
        at org.eso.ias.dasu.DasuImpl.org$eso$ias$dasu$DasuImpl$$updateAndPublishOutput(DasuImpl.scala:422)
        at org.eso.ias.dasu.DasuImpl.inputsReceived(DasuImpl.scala:331)
        at org.eso.ias.supervisor.Supervisor.$anonfun$inputsReceived$2(Supervisor.scala:271)
        at org.eso.ias.supervisor.Supervisor.$anonfun$inputsReceived$2$adapted(Supervisor.scala:266)
        at scala.collection.Iterator.foreach(Iterator.scala:944)
        at scala.collection.Iterator.foreach$(Iterator.scala:944)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1432)
        at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:210)
        at org.eso.ias.supervisor.Supervisor.inputsReceived(Supervisor.scala:266)
        at org.eso.ias.dasu.subscriber.KafkaSubscriber.$anonfun$iasioReceived$1(KafkaSubscriber.scala:61)
        at org.eso.ias.dasu.subscriber.KafkaSubscriber.$anonfun$iasioReceived$1$adapted(KafkaSubscriber.scala:61)
        at scala.Option.foreach(Option.scala:257)
        at org.eso.ias.dasu.subscriber.KafkaSubscriber.iasioReceived(KafkaSubscriber.scala:61)
        at org.eso.ias.kafkautils.KafkaIasiosConsumer.stringEventReceived(KafkaIasiosConsumer.java:152)
        at org.eso.ias.kafkautils.SimpleStringConsumer.notifyListener(SimpleStringConsumer.java:430)
        at org.eso.ias.kafkautils.SimpleStringConsumer.run(SimpleStringConsumer.java:412)
        at java.lang.Thread.run(Thread.java:748)

@acaproni
Copy link
Member Author

acaproni commented Oct 1, 2018

The load of the server is also quite high, around 9. 2 processes have a very high CPU usage, the Supervisor-Antennas and the web server sender:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
22776 root      20   0 3545728 200420   2700 S  87.0  5.2  88:42.82 java
23395 root      20   0 2759732 194440   4324 S  43.9  5.0  45:43.78 java

@acaproni
Copy link
Member Author

acaproni commented Oct 3, 2018

The number of threads in the antennas supervisor ~800 is not that surprising. There are 396 DASUs running in that container and each DASU has

  • one optional thread for throttling
  • one thread to periodically publish the output

The number is consistent with the number of threads (~72) of the wetaher supervisor where 31 DASUs run.

We must also consider threads to poll data from Kafka and the heartbeat

@acaproni acaproni closed this as completed Oct 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant