You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
druid.indexer.task.restoreTasksOnRestart does not work by default for Docker based deployments on Kubernetes
Affected Version
25.0.0 but the issue still exists in latest version
Description
Hi Druid experts. Our team runs Druid on Kubernetes and ingest data from Kafka. We have druid.indexer.task.restoreTasksOnRestart=true and expected ingestion tasks to restore and resume even when the MiddleManager is restarted. This is the current behavior:
MiddleManager is shut down. Because druid.indexer.task.restoreTasksOnRestart=true, restore.json is created
MiddleManager starts up, but with a different IP because we are running on Kubernetes. The task is restored and continues running.
When the peon reports its status to the Overlord, the Overlord will log that the task is not in known task id's and proceeds to shutdown the task. This is because the MiddleManager IP has changed
The solution to fix this problem is to allow druid.host to use the default value of InetAddress.getLocalHost().getCanonicalHostName() and task restoration works after that. But setting druid.host to the default value requires setting DRUID_SET_HOST to 0 through an environment variable. I am wondering what the original reasoning for using IP instead of canonical host name is. And wondering if we should change the default behavior given that using IP breaks task restoration
The text was updated successfully, but these errors were encountered:
I am wondering what the original reasoning for using IP instead of canonical host name is. And wondering if we should change the default behavior given that using IP breaks task restoration
druid.indexer.task.restoreTasksOnRestart does not work by default for Docker based deployments on Kubernetes
Affected Version
25.0.0 but the issue still exists in latest version
Description
Hi Druid experts. Our team runs Druid on Kubernetes and ingest data from Kafka. We have druid.indexer.task.restoreTasksOnRestart=true and expected ingestion tasks to restore and resume even when the MiddleManager is restarted. This is the current behavior:
The solution to fix this problem is to allow druid.host to use the default value of InetAddress.getLocalHost().getCanonicalHostName() and task restoration works after that. But setting druid.host to the default value requires setting DRUID_SET_HOST to 0 through an environment variable. I am wondering what the original reasoning for using IP instead of canonical host name is. And wondering if we should change the default behavior given that using IP breaks task restoration
The text was updated successfully, but these errors were encountered: