Skip to content

Singularity aborts when sending overdue task emails if smtp not configured #1001

@whitedr

Description

@whitedr

I encountered a situation where I has some scheduled requests that got stuck in a PENDING state. I'm not entirely sure how they got into that state but I think it was related to pausing/unpausing them repeatedly. Once these pending requests existed, I began seeing singularity abort periodically. I tracked it down to this in the log:

2016-04-19T11:45:52.54317 ERROR [2016-04-19 11:45:52,537] com.hubspot.singularity.scheduler.SingularityLeaderOnlyPoller: Caught an exception while running SingularityScheduledJobPoller
2016-04-19T11:45:52.54320 ! java.lang.IllegalStateException: Optional.get() cannot be called on an absent value
2016-04-19T11:45:52.54321 ! at com.google.common.base.Absent.get(Absent.java:47) ~[SingularityService-shaded.jar:0.5.0]
2016-04-19T11:45:52.54321 ! at com.hubspot.singularity.smtp.SingularityMailer.getDestination(SingularityMailer.java:317) ~[SingularityService-shaded.jar:0.5.0]
2016-04-19T11:45:52.54322 ! at com.hubspot.singularity.smtp.SingularityMailer.prepareTaskMail(SingularityMailer.java:275) ~[SingularityService-shaded.jar:0.5.0]
2016-04-19T11:45:52.54324 ! at com.hubspot.singularity.smtp.SingularityMailer.sendTaskOverdueMail(SingularityMailer.java:249) ~[SingularityService-shaded.jar:0.5.0]
2016-04-19T11:45:52.54325 ! at com.hubspot.singularity.scheduler.SingularityScheduledJobPoller.runActionOnPoll(SingularityScheduledJobPoller.java:94) ~[SingularityService-shaded.jar:0.5.0]
2016-04-19T11:45:52.54325 ! at com.hubspot.singularity.scheduler.SingularityLeaderOnlyPoller.runActionIfLeaderAndMesosIsRunning(SingularityLeaderOnlyPoller.java:108) [SingularityService-shaded.jar:0.5.0]
2016-04-19T11:45:52.54326 ! at com.hubspot.singularity.scheduler.SingularityLeaderOnlyPoller.access$000(SingularityLeaderOnlyPoller.java:24) [SingularityService-shaded.jar:0.5.0]
2016-04-19T11:45:52.54327 ! at com.hubspot.singularity.scheduler.SingularityLeaderOnlyPoller$1.run(SingularityLeaderOnlyPoller.java:83) [SingularityService-shaded.jar:0.5.0]
2016-04-19T11:45:52.54327 ! at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_45]
2016-04-19T11:45:52.54328 ! at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_45]
2016-04-19T11:45:52.54329 ! at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_45]
2016-04-19T11:45:52.54330 ! at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_45]
2016-04-19T11:45:52.54331 ! at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_45]
2016-04-19T11:45:52.54331 ! at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_45]
2016-04-19T11:45:52.54332 ! at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
2016-04-19T11:45:52.54332 ERROR [2016-04-19 11:45:52,539] com.hubspot.singularity.SingularityAbort: Singularity on prd-useast-mesos-platform-master-02.prd.yb0t.cc is aborting due to UNRECOVERABLE_ERROR
2016-04-19T11:45:52.54333 WARN  [2016-04-19 11:45:52,539] com.hubspot.singularity.SingularityAbort: Couldn't send abort mail because no SMTP configuration is present
2016-04-19T11:45:52.54333 INFO  [2016-04-19 11:45:52,540] com.hubspot.singularity.SingularityAbort: Attempting to flush logs and wait 00:00.100 ...
2016-04-19T11:45:52.65734 I0419 07:45:52.657297  9032 sched.cpp:1805] Asked to abort the driver
2016-04-19T11:45:52.74077 I0419 07:45:52.740716  9003 sched.cpp:1070] Aborting framework 'Singularity'

It looks like it was trying to send notification emails that tasks for these pending requests were overdue. On this particular cluster though, I haven't configured any smtp settings yet it seems to have died in trying to build the email that would be sent.

wrt the odd state of the pending scheduled requests, I was able to clear that by going into zookeeper and manually doing a: rmr /<zkNamespace>/requests/pending I'd like to better understand how those pending requests could have gotten in that state to begin with as well...

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions