Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error writing out logs for job on a retry attempt #1559

Closed
juhoautio opened this issue Nov 14, 2017 · 0 comments
Closed

Error writing out logs for job on a retry attempt #1559

juhoautio opened this issue Nov 14, 2017 · 0 comments

Comments

@juhoautio
Copy link
Collaborator

Got this error on the 1st retry, ie. attempt=1:

Duplicate entry '802364-push-tokens-daily-hive-1-0' for key 'PRIMARY'
Query: INSERT INTO execution_logs

Also as a result, job logs were missing for attempt #0.

I figured that the only way that this could happen is the initial attempt (0) logs were incorrectly written to the DB with with attempt=1.

I was able to verify how this happens by running FlowRunnerTest.execRetries with debugger, some manual edits & a suitable breakpoint:

nayttokuva 2017-11-14 kello 10 16 39

All related lines from the flow log:

14-11-2017 01:00:40 UTC push-tokens-daily-flow INFO - Submitting job 'push-tokens-daily-hive' to run.
14-11-2017 01:00:40 UTC push-tokens-daily-flow INFO - Created file appender for job push-tokens-daily-hive
14-11-2017 01:00:40 UTC push-tokens-daily-flow INFO - Attached file appender for job push-tokens-daily-hive

14-11-2017 01:00:40 UTC push-tokens-daily-flow INFO - Job Started: push-tokens-daily-hive
14-11-2017 04:02:49 UTC push-tokens-daily-flow INFO - Job push-tokens-daily-hive finished with status FAILED in 10928 seconds
14-11-2017 04:02:49 UTC push-tokens-daily-flow INFO - Job 'push-tokens-daily-hive' will be retried. Attempt 0 of 3

14-11-2017 04:02:49 UTC push-tokens-daily-flow INFO - Submitting job 'push-tokens-daily-hive' to run.
14-11-2017 04:02:49 UTC push-tokens-daily-flow INFO - Created file appender for job push-tokens-daily-hive
14-11-2017 04:02:49 UTC push-tokens-daily-flow INFO - Attached file appender for job push-tokens-daily-hive
14-11-2017 04:02:49 UTC push-tokens-daily-flow INFO - No attachment file for job push-tokens-daily-hive written.
14-11-2017 04:12:49 UTC push-tokens-daily-flow INFO - Job Started: push-tokens-daily-hive
14-11-2017 06:17:14 UTC push-tokens-daily-flow INFO - Job push-tokens-daily-hive finished with status SUCCEEDED in 7465 seconds

14-11-2017 06:17:14 UTC push-tokens-daily-flow ERROR - Error writing out logs for job push-tokens-daily-hive
azkaban.executor.ExecutorManagerException: uploadLogFile failed.
	at azkaban.executor.ExecutionLogsDao.uploadLogFile(ExecutionLogsDao.java:81)
	at azkaban.executor.JdbcExecutorLoader.uploadLogFile(JdbcExecutorLoader.java:251)
	at azkaban.execapp.JobRunner.finalizeLogFile(JobRunner.java:588)
	at azkaban.execapp.JobRunner.doRun(JobRunner.java:688)
	at azkaban.execapp.JobRunner.run(JobRunner.java:622)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: Error writing log part
	at azkaban.executor.ExecutionLogsDao.uploadLogFile(ExecutionLogsDao.java:131)
	at azkaban.executor.ExecutionLogsDao.lambda$uploadLogFile$0(ExecutionLogsDao.java:73)
	at azkaban.db.DatabaseOperator.transaction(DatabaseOperator.java:89)
	at azkaban.executor.ExecutionLogsDao.uploadLogFile(ExecutionLogsDao.java:78)
	... 9 more
Caused by: java.sql.SQLException: Duplicate entry '802364-push-tokens-daily-hive-1-0' for key 'PRIMARY' Query: INSERT INTO execution_logs (exec_id, name, attempt, enc_type, start_byte, end_byte, log, upload_time) VALUES (?,?,?,?,?,?,?,?) Parameters: [802364, push-tokens-daily-hive, 1, 2, 0, 23252, [(the log bytes as an array))], 1510640234724]
	at org.apache.commons.dbutils.AbstractQueryRunner.rethrow(AbstractQueryRunner.java:363)
	at org.apache.commons.dbutils.QueryRunner.update(QueryRunner.java:490)
	at org.apache.commons.dbutils.QueryRunner.update(QueryRunner.java:403)
	at azkaban.db.DatabaseTransOperator.update(DatabaseTransOperator.java:101)
	at azkaban.executor.ExecutionLogsDao.uploadLogPart(ExecutionLogsDao.java:168)
	at azkaban.executor.ExecutionLogsDao.uploadLogFile(ExecutionLogsDao.java:126)
	... 12 more

It seems that finalizeAttachmentFile and writeStatus also suffer from the same problem, because they use the attempt from node.

One way to fix this would be to make fireEvent of JOB_FINISHED be the last thing that the JobRunner does. Maybe it wrap it with try-finally to make sure that the event is always sent though. It would be even better if the attempt++ would be done outside of JobRunner's scope, but that fix might be a bit more involved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant