You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
v0.10.0 introduced the new DataHubUpgradeHistory_v1 topic to coordinate between the system update job and GMS. Specifically, GMS won't be able to start until it can read some message from the DataHubUpgradeHistory_v1 topic.
The intention was to configure the topic with infinite retention. However, there was a bug in the kafka command in kafka-setup.sh script where the -- was missing for the --config argument. As a result, the infinite retention period for the topic didn't take effect. Instead, it had the default 7 days retention period.
This bug has been fixed in v0.10.1 in the following commit. However, for those who already deployed v0.10.0, the retention.ms will stuck at 7 days even if you upgrade to v0.10.1 because the kafka-setup.sh script won't recreate the topic if it already exists.
What this means is if you have deployed v0.10.0, then your GMS won't be able to start if it is restarted more than seven days after the last run of the system update job because the messages have expired. This can happen if your K8S provider performs a maintenance update of your nodes, which was what happened in our case.
If you have upgraded from v0.9.x to v0.10.1 directly, then you won't be affected by this bug.
There are two temporary workarounds of this issue:
Run helm upgrade again. This will run the system update job again and your GMS will survive a restart for another 7 days.
Login into your kafka pod, then run the following commands to update the DataHubUpgradeHistory_v1 topic's retention to infinite. This will fix the issue for good. Note that the commands below also reads the retention.ms config of the topic before and after so you can make sure it was updated properly.
This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io
v0.10.0
introduced the newDataHubUpgradeHistory_v1
topic to coordinate between the system update job and GMS. Specifically, GMS won't be able to start until it can read some message from theDataHubUpgradeHistory_v1
topic.The intention was to configure the topic with infinite retention. However, there was a bug in the kafka command in
kafka-setup.sh
script where the--
was missing for the--config
argument. As a result, the infinite retention period for the topic didn't take effect. Instead, it had the default 7 days retention period.39920bb#diff-49b80548c7d96c9546170eefe1ef5340ef1d1a7e3dd67c4cd9f0655736156526
This bug has been fixed in
v0.10.1
in the following commit. However, for those who already deployedv0.10.0
, theretention.ms
will stuck at 7 days even if you upgrade tov0.10.1
because thekafka-setup.sh
script won't recreate the topic if it already exists.b4b3a39#diff-49b80548c7d96c9546170eefe1ef5340ef1d1a7e3dd67c4cd9f0655736156526R119
What this means is if you have deployed
v0.10.0
, then your GMS won't be able to start if it is restarted more than seven days after the last run of the system update job because the messages have expired. This can happen if your K8S provider performs a maintenance update of your nodes, which was what happened in our case.If you have upgraded from
v0.9.x
tov0.10.1
directly, then you won't be affected by this bug.There are two temporary workarounds of this issue:
DataHubUpgradeHistory_v1
topic's retention to infinite. This will fix the issue for good. Note that the commands below also reads theretention.ms
config of the topic before and after so you can make sure it was updated properly.The text was updated successfully, but these errors were encountered: