Bug Description
The start() method creates the etcd client without attempting to verify connectivity or providing any retry logic. If etcd is temporarily unavailable at startup, the registry starts successfully but all operations will fail until manually restarted.
Location
jplatform-registry-etcd/src/main/java/org/flossware/jplatform/registry/etcd/EtcdServiceRegistry.java:76-86
Problematic Code
public void start() {
if (client == null) {
client = Client.builder()
.endpoints(config.getEndpoints().toArray(new String[0]))
.build(); // No connection verification!
}
leaseRenewalExecutor = Executors.newSingleThreadScheduledExecutor();
leaseRenewalExecutor.scheduleAtFixedRate(this::renewLeases,
config.getLeaseTtl() / 2, config.getLeaseTtl() / 2, TimeUnit.SECONDS);
}
Impact
- Silent failure: start() succeeds even if etcd is down
- Deferred errors: Operations fail later with connection errors
- No retry: Once client created, no reconnection attempts
- Service unavailable: Applications can't register services
- Misleading state: Registry appears started but doesn't work
- Poor resilience: Can't handle temporary etcd outages during startup
Example
// etcd cluster is down for maintenance
EtcdServiceRegistry registry = new EtcdServiceRegistry(config);
registry.start();
// Client created successfully
// No error thrown
// Method returns normally
// User thinks registry is ready
registry.registerService(MyService.class, impl);
// First operation attempts to connect
// Connection refused or timeout
// Exception thrown
// Service not registered
// User must manually restart application
// Or catch exception and retry
// No automatic recovery
// OR etcd comes back online 1 minute later
registry.registerService(MyService.class, impl);
// Still fails - client never reconnects
// Must call start() again or recreate registry
Proposed Fix
Option 1 - Verify connectivity on start with retry:
public void start() {
synchronized (startLock) {
if (started) {
logger.warn("EtcdServiceRegistry already started");
return;
}
if (client == null) {
// Validate endpoints
List<String> endpoints = config.getEndpoints();
if (endpoints == null || endpoints.isEmpty()) {
throw new IllegalStateException("No etcd endpoints configured");
}
client = Client.builder()
.endpoints(endpoints.toArray(new String[0]))
.build();
logger.info("Created etcd client with endpoints: {}", endpoints);
}
// Verify connectivity with retry
int maxRetries = 5;
int retryDelay = 2000; // 2 seconds
boolean connected = false;
for (int attempt = 1; attempt <= maxRetries; attempt++) {
try {
// Simple connectivity check - list keys
client.getKVClient()
.get(ByteSequence.from("/health", StandardCharsets.UTF_8))
.get(5, TimeUnit.SECONDS);
logger.info("Successfully connected to etcd on attempt {}", attempt);
connected = true;
break;
} catch (Exception e) {
logger.warn("Failed to connect to etcd (attempt {}/{}): {}",
attempt, maxRetries, e.getMessage());
if (attempt < maxRetries) {
try {
Thread.sleep(retryDelay);
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
throw new IllegalStateException("Interrupted while connecting to etcd", ie);
}
} else {
throw new IllegalStateException(
"Failed to connect to etcd after " + maxRetries + " attempts", e);
}
}
}
if (!connected) {
throw new IllegalStateException("Could not connect to etcd");
}
// Start lease renewal
leaseRenewalExecutor = Executors.newSingleThreadScheduledExecutor(r -> {
Thread t = new Thread(r, "etcd-lease-renewal");
t.setDaemon(true);
return t;
});
long renewalPeriod = config.getLeaseTtl() / 2;
leaseRenewalExecutor.scheduleAtFixedRate(() -> {
try {
renewLeases();
} catch (Exception e) {
logger.error("Exception during lease renewal", e);
}
}, renewalPeriod, renewalPeriod, TimeUnit.SECONDS);
started = true;
logger.info("EtcdServiceRegistry started successfully");
}
}
Option 2 - Make connectivity check optional via configuration:
public class EtcdRegistryConfig {
private final boolean verifyConnectivity;
private final int connectionRetries;
private final int connectionRetryDelayMs;
public static class Builder {
private boolean verifyConnectivity = true;
private int connectionRetries = 5;
private int connectionRetryDelayMs = 2000;
public Builder verifyConnectivity(boolean verify) {
this.verifyConnectivity = verify;
return this;
}
public Builder connectionRetries(int retries) {
this.connectionRetries = retries;
return this;
}
public Builder connectionRetryDelay(int delayMs) {
this.connectionRetryDelayMs = delayMs;
return this;
}
}
}
// In start():
if (config.isVerifyConnectivity()) {
verifyConnectivity(config.getConnectionRetries(), config.getConnectionRetryDelayMs());
}
Bug Description
The
start()method creates the etcd client without attempting to verify connectivity or providing any retry logic. If etcd is temporarily unavailable at startup, the registry starts successfully but all operations will fail until manually restarted.Location
jplatform-registry-etcd/src/main/java/org/flossware/jplatform/registry/etcd/EtcdServiceRegistry.java:76-86Problematic Code
Impact
Example
Proposed Fix
Option 1 - Verify connectivity on start with retry:
Option 2 - Make connectivity check optional via configuration: