GMS cannot start due to misconfigured retention.ms in kafka topic DataHubUpgradeHistory_v1 #7882

jinlintt · 2023-04-21T16:31:29Z

v0.10.0 introduced the new DataHubUpgradeHistory_v1 topic to coordinate between the system update job and GMS. Specifically, GMS won't be able to start until it can read some message from the DataHubUpgradeHistory_v1 topic.

The intention was to configure the topic with infinite retention. However, there was a bug in the kafka command in kafka-setup.sh script where the -- was missing for the --config argument. As a result, the infinite retention period for the topic didn't take effect. Instead, it had the default 7 days retention period.

39920bb#diff-49b80548c7d96c9546170eefe1ef5340ef1d1a7e3dd67c4cd9f0655736156526

This bug has been fixed in v0.10.1 in the following commit. However, for those who already deployed v0.10.0, the retention.ms will stuck at 7 days even if you upgrade to v0.10.1 because the kafka-setup.sh script won't recreate the topic if it already exists.

b4b3a39#diff-49b80548c7d96c9546170eefe1ef5340ef1d1a7e3dd67c4cd9f0655736156526R119

What this means is if you have deployed v0.10.0, then your GMS won't be able to start if it is restarted more than seven days after the last run of the system update job because the messages have expired. This can happen if your K8S provider performs a maintenance update of your nodes, which was what happened in our case.

If you have upgraded from v0.9.x to v0.10.1 directly, then you won't be affected by this bug.

There are two temporary workarounds of this issue:

Run helm upgrade again. This will run the system update job again and your GMS will survive a restart for another 7 days.
Login into your kafka pod, then run the following commands to update the DataHubUpgradeHistory_v1 topic's retention to infinite. This will fix the issue for good. Note that the commands below also reads the retention.ms config of the topic before and after so you can make sure it was updated properly.

$ /opt/bitnami/kafka/bin/kafka-configs.sh --entity-type topics --entity-name DataHubUpgradeHistory_v1 --bootstrap-server localhost:9092 --describe --all | grep "retention.ms"
  retention.ms=604800000 sensitive=false synonyms={}
  delete.retention.ms=86400000 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.delete.retention.ms=86400000}

$ /opt/bitnami/kafka/bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type topics --entity-name DataHubUpgradeHistory_v1 --add-config retention.ms=-1
Completed updating config for topic DataHubUpgradeHistory_v1.

$ /opt/bitnami/kafka/bin/kafka-configs.sh --entity-type topics --entity-name DataHubUpgradeHistory_v1 --bootstrap-server localhost:9092 --describe --all | grep "retention.ms"
  retention.ms=-1 sensitive=false synonyms={DYNAMIC_TOPIC_CONFIG:retention.ms=-1}
  delete.retention.ms=86400000 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.delete.retention.ms=86400000}

The text was updated successfully, but these errors were encountered:

github-actions · 2023-06-08T02:07:51Z

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

github-actions · 2023-07-08T02:10:40Z

This issue was closed because it has been inactive for 30 days since being marked as stale.

jinlintt added the bug Bug report label Apr 21, 2023

jinlintt mentioned this issue Apr 21, 2023

bug(7882): run kafka-configs.sh on DataHubUpgradeHistory_v1 to make sure the retention.ms is set to infinite #7883

Merged

github-actions bot added the stale label Jun 8, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 8, 2023

sgomezvillamor mentioned this issue Sep 28, 2023

doc: DataHubUpgradeHistory_v1 #8918

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GMS cannot start due to misconfigured retention.ms in kafka topic DataHubUpgradeHistory_v1 #7882

GMS cannot start due to misconfigured retention.ms in kafka topic DataHubUpgradeHistory_v1 #7882

jinlintt commented Apr 21, 2023 •

edited

github-actions bot commented Jun 8, 2023

github-actions bot commented Jul 8, 2023

GMS cannot start due to misconfigured retention.ms in kafka topic DataHubUpgradeHistory_v1 #7882

GMS cannot start due to misconfigured retention.ms in kafka topic DataHubUpgradeHistory_v1 #7882

Comments

jinlintt commented Apr 21, 2023 • edited

github-actions bot commented Jun 8, 2023

github-actions bot commented Jul 8, 2023

jinlintt commented Apr 21, 2023 •

edited