Skip to content

Conversation

@matthewrossi
Copy link

Restarting existing services using the docker-compose.yaml, causes the datanode to crash after a few seconds.

How to reproduce:

$ docker-compose up -d # everything starts ok
$ docker-compose stop  # stop services without removing containers
$ docker-compose up -d # everything starts, but datanode crashes after a few seconds

The log produced by the datanode suggests the issue is due to a mismatch in the clusterIDs of the namenode and the datanode:

datanode_1         | 2023-12-28 11:17:15 WARN  Storage:420 - Failed to add storage directory [DISK]file:/tmp/hadoop-hadoop/dfs/data
datanode_1         | java.io.IOException: Incompatible clusterIDs in /tmp/hadoop-hadoop/dfs/data: namenode clusterID = CID-250bae07-6a8a-45ce-84bb-8828b37b10b7; datanode clusterID = CID-2c1c7105-7fdf-4a19-8ef8-7cb763e5b701 

After some troubleshooting I found out the namenode is not reusing the clusterID of the previous run because it cannot find it in the directory set by ENSURE_NAMENODE_DIR=/tmp/hadoop-root/dfs/name. This is due to a change of the default user of the namenode, which is now "hadoop",  so the namenode is actually writing these information to /tmp/hadoop-hadoop/dfs/name.

See https://issues.apache.org/jira/browse/HDFS-17307

@matthewrossi
Copy link
Author

Following this change, it would be nice if also the documentation located at https://hub.docker.com/r/apache/hadoop undergoes the same update.

@ayushtkn
Copy link
Member

do you know which commit broke this?

@matthewrossi
Copy link
Author

This is what I've found diving into the project history:

  • docker-compose.yaml was always configured with ENSURE_NAMENODE_DIR: "/tmp/hadoop-root/dfs/name"
  • the namenode base image always specified the use of the hadoop user (so my initial assumption about the previous use of the root user was wrong)
  • the default configurations of Hadoop and HDFS are the ones determining the use of the /tmp/hadoop-${user.name}/dfs/name directory, but they date back before the creation of the docker-compose.yaml

So, it looks like the issue has always been there.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2025

We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working on it, please feel free to re-open it and ask for a committer to remove the stale tag and review again.
Thanks all for your contribution.

@github-actions github-actions bot added the Stale label Oct 5, 2025
@github-actions github-actions bot closed this Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants