[Bug] Docker containers never restart when Java process crashes — broken cron monitor + tail -f /dev/null supervision

## [Bug] Docker containers never restart when Java process crashes — broken cron monitor + `tail -f /dev/null` supervision

### Before submit
- [x] I have confirmed and searched that there are no similar problems in the historical issue and documents

---

### Environment

- HugeGraph version: 1.7.0+
- Deployment: Docker / docker-compose
- Related PR: #3025

---

### Expected & Actual behavior

**Expected:**
When the HugeGraph Java process crashes inside a Docker container, the container should exit and Docker's restart policy (`restart: unless-stopped`) should automatically bring it back up.

**Actual:**
The container stays in `Up` state permanently even after Java crashes. `docker ps` shows green. Users get `Connection refused`. The container never restarts on its own. Manual intervention is required every time.

```
# Java crashes at T=0:30
$ docker ps
CONTAINER ID   IMAGE              STATUS               PORTS
abc123         hugegraph/server   Up 2 hours           0.0.0.0:8080->8080/tcp

# Container looks healthy but:
$ curl http://localhost:8080/versions
curl: (7) Failed to connect to localhost port 8080: Connection refused
```

---

### Root Cause Analysis

There are three compounding problems:

**1. `crond` is never started — the watchdog is completely dead**

`cron` is installed in all four Dockerfiles but `dumb-init` only launches `docker-entrypoint.sh`. Nobody starts `crond`. So even if `start-hugegraph.sh -m true` were called, `start-monitor.sh` registers the crontab job but since `crond` is not running, `monitor-hugegraph.sh` never fires. The entire watchdog silently does nothing in containers.

```
What happens on a VM:          What happens in Docker:
  crond reads crontab every      crond is NOT running
  minute                         monitor-hugegraph.sh NEVER fires
  monitor-hugegraph.sh fires     HugeGraph stays dead forever
  HugeGraph gets restarted
```

**2. `tail -f /dev/null` means zero supervision**

All three `docker-entrypoint.sh` files background the Java process then sleep forever:

```bash
# hugegraph-server entrypoint (current)
./bin/start-hugegraph.sh -j "${JAVA_OPTS:-}" -t 120
# ... post-startup checks ...
tail -f /dev/null   # ← keeps container alive with NO watchdog
```

When Java crashes, `tail -f /dev/null` keeps running. The container never exits. Docker's `restart: unless-stopped` only triggers on container exit — since the container never exits, the restart policy never fires. The container stays `Up (unhealthy)` forever.

**3. `HEALTHCHECK` only exists in `docker-compose.yml`, not in the Dockerfiles**

Health checks are defined per service in `docker-compose.yml` but none of the four Dockerfiles have a `HEALTHCHECK` instruction. So `docker run` without compose has no health reporting at all. `depends_on: condition: service_healthy` only works because compose injects the check at runtime — it is not baked into the image.

**4. Foreground mode is broken in `start-hugegraph.sh`**

`start-hugegraph.sh` has a `-d false` foreground flag but it is broken. `$!`, pid file write, `trap`, `wait_for_startup`, `disown`, and `OPEN_MONITOR` all run unconditionally after the daemon/foreground if/else block — meaning in foreground mode they all execute after Java has already exited, with empty/stale values. Java's exit code is lost and the script always exits 0.

**5. No foreground mode exists at all in `start-hugegraph-pd.sh` and `start-hugegraph-store.sh`**

Both scripts always background Java unconditionally with `exec java ... &` regardless of any flag. There is no `-d` flag and no foreground path.

---

### Impact

```
CURRENT — Java crashes inside container:
  T=0:30  Java crashes (OOM, segfault, deadlock, etc.)
  T=0:30  tail -f /dev/null keeps running
  T=0:30  Container stays "Up" — Docker sees nothing wrong
  T=1:00  HEALTHCHECK marks container "unhealthy" (compose only)
  T=∞     Container stays unhealthy forever, never restarts
          docker ps shows: Up 2 hours (unhealthy)
          Users get: Connection refused

AFTER FIX — Java crashes inside container:
  T=0:30  Java crashes
  T=0:30  Entrypoint exits → dumb-init exits → container exits
  T=0:30  Docker restart policy fires immediately
  T=0:31  New container starts
  T=1:41  docker ps shows: Up 1 min (healthy)
```

---

### Additional bug found during investigation

The shipped default `conf/rest-server.properties` has:
```
restserver.url=127.0.0.1:8080
```
No `http://` scheme. On macOS, `curl` fails immediately with "Protocol not supported" causing `wait_for_startup` to always time out and `start-hugegraph.sh` to exit 1 even though the server starts fine. Every other config in the repo uses `http://` explicitly — raft CI configs, the Dockerfile `sed` patch, cluster test templates, and the Java `ServerOptions` default. The shipped default is inconsistent and breaks local macOS development.

---

### Related

- Follow-up to #3025
- Deferred from [#3025](https://github.com/apache/hugegraph/pull/3025#discussion_r3241982215) discussion (comment by @imbajin ): *"Either drop cron or start it in docker-entrypoint.sh"* → decision was to keep cron in https://github.com/apache/hugegraph/pull/3025#issuecomment-4469365009 and fix properly in a follow-up

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Docker containers never restart when Java process crashes — broken cron monitor + tail -f /dev/null supervision #3043

[Bug] Docker containers never restart when Java process crashes — broken cron monitor + `tail -f /dev/null` supervision

Before submit

Environment

Expected & Actual behavior

Root Cause Analysis

Impact

Additional bug found during investigation

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug] Docker containers never restart when Java process crashes — broken cron monitor + tail -f /dev/null supervision #3043

Description

[Bug] Docker containers never restart when Java process crashes — broken cron monitor + tail -f /dev/null supervision

Before submit

Environment

Expected & Actual behavior

Root Cause Analysis

Impact

Additional bug found during investigation

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[Bug] Docker containers never restart when Java process crashes — broken cron monitor + `tail -f /dev/null` supervision