You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Duplicate entry '802364-push-tokens-daily-hive-1-0' for key 'PRIMARY'
Query: INSERT INTO execution_logs
Also as a result, job logs were missing for attempt #0.
I figured that the only way that this could happen is the initial attempt (0) logs were incorrectly written to the DB with with attempt=1.
I was able to verify how this happens by running FlowRunnerTest.execRetries with debugger, some manual edits & a suitable breakpoint:
All related lines from the flow log:
14-11-2017 01:00:40 UTC push-tokens-daily-flow INFO - Submitting job 'push-tokens-daily-hive' to run.
14-11-2017 01:00:40 UTC push-tokens-daily-flow INFO - Created file appender for job push-tokens-daily-hive
14-11-2017 01:00:40 UTC push-tokens-daily-flow INFO - Attached file appender for job push-tokens-daily-hive
14-11-2017 01:00:40 UTC push-tokens-daily-flow INFO - Job Started: push-tokens-daily-hive
14-11-2017 04:02:49 UTC push-tokens-daily-flow INFO - Job push-tokens-daily-hive finished with status FAILED in 10928 seconds
14-11-2017 04:02:49 UTC push-tokens-daily-flow INFO - Job 'push-tokens-daily-hive' will be retried. Attempt 0 of 3
14-11-2017 04:02:49 UTC push-tokens-daily-flow INFO - Submitting job 'push-tokens-daily-hive' to run.
14-11-2017 04:02:49 UTC push-tokens-daily-flow INFO - Created file appender for job push-tokens-daily-hive
14-11-2017 04:02:49 UTC push-tokens-daily-flow INFO - Attached file appender for job push-tokens-daily-hive
14-11-2017 04:02:49 UTC push-tokens-daily-flow INFO - No attachment file for job push-tokens-daily-hive written.
14-11-2017 04:12:49 UTC push-tokens-daily-flow INFO - Job Started: push-tokens-daily-hive
14-11-2017 06:17:14 UTC push-tokens-daily-flow INFO - Job push-tokens-daily-hive finished with status SUCCEEDED in 7465 seconds
14-11-2017 06:17:14 UTC push-tokens-daily-flow ERROR - Error writing out logs for job push-tokens-daily-hive
azkaban.executor.ExecutorManagerException: uploadLogFile failed.
at azkaban.executor.ExecutionLogsDao.uploadLogFile(ExecutionLogsDao.java:81)
at azkaban.executor.JdbcExecutorLoader.uploadLogFile(JdbcExecutorLoader.java:251)
at azkaban.execapp.JobRunner.finalizeLogFile(JobRunner.java:588)
at azkaban.execapp.JobRunner.doRun(JobRunner.java:688)
at azkaban.execapp.JobRunner.run(JobRunner.java:622)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: Error writing log part
at azkaban.executor.ExecutionLogsDao.uploadLogFile(ExecutionLogsDao.java:131)
at azkaban.executor.ExecutionLogsDao.lambda$uploadLogFile$0(ExecutionLogsDao.java:73)
at azkaban.db.DatabaseOperator.transaction(DatabaseOperator.java:89)
at azkaban.executor.ExecutionLogsDao.uploadLogFile(ExecutionLogsDao.java:78)
... 9 more
Caused by: java.sql.SQLException: Duplicate entry '802364-push-tokens-daily-hive-1-0' for key 'PRIMARY' Query: INSERT INTO execution_logs (exec_id, name, attempt, enc_type, start_byte, end_byte, log, upload_time) VALUES (?,?,?,?,?,?,?,?) Parameters: [802364, push-tokens-daily-hive, 1, 2, 0, 23252, [(the log bytes as an array))], 1510640234724]
at org.apache.commons.dbutils.AbstractQueryRunner.rethrow(AbstractQueryRunner.java:363)
at org.apache.commons.dbutils.QueryRunner.update(QueryRunner.java:490)
at org.apache.commons.dbutils.QueryRunner.update(QueryRunner.java:403)
at azkaban.db.DatabaseTransOperator.update(DatabaseTransOperator.java:101)
at azkaban.executor.ExecutionLogsDao.uploadLogPart(ExecutionLogsDao.java:168)
at azkaban.executor.ExecutionLogsDao.uploadLogFile(ExecutionLogsDao.java:126)
... 12 more
It seems that finalizeAttachmentFile and writeStatus also suffer from the same problem, because they use the attempt from node.
One way to fix this would be to make fireEvent of JOB_FINISHED be the last thing that the JobRunner does. Maybe it wrap it with try-finally to make sure that the event is always sent though. It would be even better if the attempt++ would be done outside of JobRunner's scope, but that fix might be a bit more involved.
The text was updated successfully, but these errors were encountered:
juhoautio
added a commit
to juhoautio/azkaban
that referenced
this issue
Nov 14, 2017
Got this error on the 1st retry, ie. attempt=1:
Also as a result, job logs were missing for attempt #0.
I figured that the only way that this could happen is the initial attempt (0) logs were incorrectly written to the DB with with attempt=1.
I was able to verify how this happens by running
FlowRunnerTest.execRetries
with debugger, some manual edits & a suitable breakpoint:All related lines from the flow log:
It seems that
finalizeAttachmentFile
andwriteStatus
also suffer from the same problem, because they use the attempt from node.One way to fix this would be to make
fireEvent
ofJOB_FINISHED
be the last thing that theJobRunner
does. Maybe it wrap it with try-finally to make sure that the event is always sent though. It would be even better if theattempt++
would be done outside ofJobRunner
's scope, but that fix might be a bit more involved.The text was updated successfully, but these errors were encountered: