Start server before workload and registry ingestion #49

tgrunnagle · 2025-11-06T15:48:30Z

Speed up server startup by moving ingestion to background

Removed blocking registry and workload ingestion from server startup in cli.py
Server now starts immediately after configuration validation and database migration
Ingestion now happens asynchronously in the polling manager background thread
Eliminated artificial startup delay in polling_manager.py that was waiting for minimum polling interval
Background thread starts polling after 3-second grace period (to check for server shutdown) instead of waiting for initial ingestion to complete
Server is ready to accept connections in seconds instead of minutes
Ingestion failures no longer prevent server startup - polling will retry automatically

This change transforms the startup from synchronous blocking model to asynchronous background processing, dramatically reducing time-to-ready while maintaining the same functionality through the existing polling mechanism.

Speed up server startup by moving ingestion to background - Removed blocking registry and workload ingestion from server startup in `cli.py` - Server now starts immediately after configuration validation and database migration - Ingestion now happens asynchronously in the polling manager background thread - Eliminated artificial startup delay in `polling_manager.py` that was waiting for minimum polling interval - Background thread starts polling after 3-second grace period instead of waiting for initial ingestion to complete - Server is ready to accept connections in seconds instead of minutes - Ingestion failures no longer prevent server startup - polling will retry automatically This change transforms the startup from synchronous blocking model to asynchronous background processing, dramatically reducing time-to-ready while maintaining the same functionality through the existing polling mechanism.

claude · 2025-11-06T15:51:08Z

PR Review

Summary

This PR successfully transforms server startup from synchronous blocking to asynchronous background processing, dramatically reducing startup time. The changes are clean and well-targeted.

✅ Strengths

Clean removal: Eliminates 56 lines of blocking ingestion code from CLI startup
Preserves functionality: Ingestion still happens via existing polling loops (_poll_workloads/_poll_registry)
Improves resilience: Server starts even if initial ToolHive connection fails
Smart delay removal: Eliminates artificial startup delay in polling_manager.py:304-305

🔍 Observations

Critical - Grace Period Verification
The PR description mentions "3-second grace period" but I don't see this in the code. Line 78 in polling_manager.py shows 3 seconds total (30 iterations × 0.1s), but this is for checking shutdown signals during startup, not an ingestion delay.

Question: Is the grace period implemented elsewhere, or should the PR description be updated to reflect the actual 3-second startup check?

Startup Flow

Server validates config → runs migrations → starts uvicorn
Polling thread starts after 3-second shutdown check (_wait_for_startup)
Polling loops begin immediately (no artificial delay)
First ingestion happens at interval start (workload_polling_interval/registry_polling_interval)

Potential Issue - First Poll Timing
The polling loops (lines 355-401) start immediately but then sleep for the full interval AFTER the first poll. This means:

If intervals are 300s, first ingestion happens immediately
If intervals are 30s, startup advantage is minimal

This is fine, but worth noting that the "seconds to ready" claim depends heavily on polling intervals.

🎯 Recommendations

1. Update PR description (if applicable)
Clarify the 3-second timing - it is not a "grace period" but a shutdown-responsive startup check.

2. Consider initial delay (optional enhancement, not blocking)
To prevent immediate ingestion on every restart, consider adding a small configurable initial delay in the polling loops for the first cycle.

✅ Code Quality

Follows project conventions (logger usage, structlog)
Maintains database-agnostic approach
No unused imports introduced
Error handling preserved in polling loops

✅ Security & Performance

No security concerns
Performance improved (non-blocking startup)
No regressions expected - ingestion logic unchanged

Verdict

LGTM with minor documentation clarification ✅

The code changes are solid. Just verify the grace period description matches implementation.

aponcedeleonch · 2025-11-06T16:09:05Z

Still seeing the error in the logs

$ thv logs mcp-optimizer --proxy -f
6:03PM	INFO	Loaded configuration from state for mcp-optimizer
6:03PM	INFO	Starting tooling server mcp-optimizer...
6:03PM	INFO	OIDC validation disabled, using local user authentication
6:03PM	INFO	Using local user authentication for user: aponcedeleonch
6:03PM	INFO	Setting up streamable-http transport...
6:03PM	INFO	Deploying workload mcp-optimizer from image mcp-optimizer:latest...
6:03PM	INFO	Container created: mcp-optimizer
6:03PM	INFO	Starting streamable-http transport for mcp-optimizer...
6:03PM	INFO	Setting up transparent proxy to forward from host port 38035 to http://127.0.0.1:45878
6:03PM	INFO	Applied middleware: mcp-parser
6:03PM	INFO	Applied middleware: auth
6:03PM	INFO	No auth info handler provided; skipping /.well-known/ endpoint
6:03PM	INFO	HTTP transport started for mcp-optimizer on port 38035
6:03PM	INFO	MCP server mcp-optimizer started successfully
6:03PM	INFO	Server mcp-optimizer belongs to group optim, updating 0 registered client(s)
6:03PM	INFO	No target clients found for server mcp-optimizer
6:03PM	INFO	Running as detached process (PID: 49983)
6:03PM	INFO	MCP server not initialized yet, skipping health check for mcp-optimizer
6:03PM	INFO	MCP server not initialized yet, skipping health check for mcp-optimizer
6:04PM	INFO	MCP server not initialized yet, skipping health check for mcp-optimizer

Maybe we could also put in a background process checking for ToolHive? Or at least make it not blocking?

aponcedeleonch · 2025-11-06T16:09:49Z

Sorry, hit the close button by mistake, it's right next to the Comment one

therealnb · 2025-11-06T17:04:30Z

#48 (comment)
comment there.

This could be an improvement or not, depending on how you look at it. It doesn't explain why the server never connects.

tgrunnagle · 2025-11-06T18:02:46Z

If the thv proxy error logs are being caused by something else, do we still want to make this change? I think it makes sense to move ingestion outside of server startup regardless.

therealnb · 2025-11-07T11:27:45Z

This is probably a good change, but I don't think it will fix all the problems. See the comment here. It looks like the initialise endpoint needs to be available in less than 0.1s after the process is started.

This may not be the best place to do this.

tgrunnagle linked an issue Nov 6, 2025 that may be closed by this pull request

Optimizer initialization takes long time causing warning and error messages in thv proxy logs #48

Closed

aponcedeleonch closed this Nov 6, 2025

aponcedeleonch reopened this Nov 6, 2025

therealnb mentioned this pull request Nov 6, 2025

Optimizer initialization takes long time causing warning and error messages in thv proxy logs #48

Closed

therealnb mentioned this pull request Nov 7, 2025

test optimizer on rancher and mac #51

Closed

ptelang approved these changes Nov 7, 2025

View reviewed changes

tgrunnagle merged commit 0c6d598 into main Nov 7, 2025
10 checks passed

tgrunnagle deleted the fix_startup_2025-11-06 branch November 7, 2025 21:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Start server before workload and registry ingestion #49

Start server before workload and registry ingestion #49

Uh oh!

tgrunnagle commented Nov 6, 2025 •

edited

Loading

Uh oh!

claude bot commented Nov 6, 2025

Uh oh!

aponcedeleonch commented Nov 6, 2025

Uh oh!

aponcedeleonch commented Nov 6, 2025

Uh oh!

therealnb commented Nov 6, 2025

Uh oh!

tgrunnagle commented Nov 6, 2025

Uh oh!

therealnb commented Nov 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Start server before workload and registry ingestion #49

Start server before workload and registry ingestion #49

Uh oh!

Conversation

tgrunnagle commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Nov 6, 2025

PR Review

Summary

✅ Strengths

🔍 Observations

🎯 Recommendations

✅ Code Quality

✅ Security & Performance

Verdict

Uh oh!

aponcedeleonch commented Nov 6, 2025

Uh oh!

aponcedeleonch commented Nov 6, 2025

Uh oh!

therealnb commented Nov 6, 2025

Uh oh!

tgrunnagle commented Nov 6, 2025

Uh oh!

therealnb commented Nov 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tgrunnagle commented Nov 6, 2025 •

edited

Loading