Skip to content

Conversation

@tgrunnagle
Copy link
Collaborator

@tgrunnagle tgrunnagle commented Nov 6, 2025

Speed up server startup by moving ingestion to background

  • Removed blocking registry and workload ingestion from server startup in cli.py
  • Server now starts immediately after configuration validation and database migration
  • Ingestion now happens asynchronously in the polling manager background thread
  • Eliminated artificial startup delay in polling_manager.py that was waiting for minimum polling interval
  • Background thread starts polling after 3-second grace period (to check for server shutdown) instead of waiting for initial ingestion to complete
  • Server is ready to accept connections in seconds instead of minutes
  • Ingestion failures no longer prevent server startup - polling will retry automatically

This change transforms the startup from synchronous blocking model to asynchronous background processing, dramatically reducing time-to-ready while maintaining the same functionality through the existing polling mechanism.

Speed up server startup by moving ingestion to background

- Removed blocking registry and workload ingestion from server startup in `cli.py`
- Server now starts immediately after configuration validation and database migration
- Ingestion now happens asynchronously in the polling manager background thread
- Eliminated artificial startup delay in `polling_manager.py` that was waiting for minimum polling interval
- Background thread starts polling after 3-second grace period instead of waiting for initial ingestion to complete
- Server is ready to accept connections in seconds instead of minutes
- Ingestion failures no longer prevent server startup - polling will retry automatically

This change transforms the startup from synchronous blocking model to asynchronous background processing, dramatically reducing time-to-ready while maintaining the same functionality through the existing polling mechanism.
@claude
Copy link

claude bot commented Nov 6, 2025

PR Review

Summary

This PR successfully transforms server startup from synchronous blocking to asynchronous background processing, dramatically reducing startup time. The changes are clean and well-targeted.

✅ Strengths

  • Clean removal: Eliminates 56 lines of blocking ingestion code from CLI startup
  • Preserves functionality: Ingestion still happens via existing polling loops (_poll_workloads/_poll_registry)
  • Improves resilience: Server starts even if initial ToolHive connection fails
  • Smart delay removal: Eliminates artificial startup delay in polling_manager.py:304-305

🔍 Observations

Critical - Grace Period Verification
The PR description mentions "3-second grace period" but I don't see this in the code. Line 78 in polling_manager.py shows 3 seconds total (30 iterations × 0.1s), but this is for checking shutdown signals during startup, not an ingestion delay.

Question: Is the grace period implemented elsewhere, or should the PR description be updated to reflect the actual 3-second startup check?

Startup Flow

  1. Server validates config → runs migrations → starts uvicorn
  2. Polling thread starts after 3-second shutdown check (_wait_for_startup)
  3. Polling loops begin immediately (no artificial delay)
  4. First ingestion happens at interval start (workload_polling_interval/registry_polling_interval)

Potential Issue - First Poll Timing
The polling loops (lines 355-401) start immediately but then sleep for the full interval AFTER the first poll. This means:

  • If intervals are 300s, first ingestion happens immediately
  • If intervals are 30s, startup advantage is minimal

This is fine, but worth noting that the "seconds to ready" claim depends heavily on polling intervals.

🎯 Recommendations

1. Update PR description (if applicable)
Clarify the 3-second timing - it is not a "grace period" but a shutdown-responsive startup check.

2. Consider initial delay (optional enhancement, not blocking)
To prevent immediate ingestion on every restart, consider adding a small configurable initial delay in the polling loops for the first cycle.

✅ Code Quality

  • Follows project conventions (logger usage, structlog)
  • Maintains database-agnostic approach
  • No unused imports introduced
  • Error handling preserved in polling loops

✅ Security & Performance

  • No security concerns
  • Performance improved (non-blocking startup)
  • No regressions expected - ingestion logic unchanged

Verdict

LGTM with minor documentation clarification

The code changes are solid. Just verify the grace period description matches implementation.

@aponcedeleonch
Copy link
Member

Still seeing the error in the logs

$ thv logs mcp-optimizer --proxy -f
6:03PM	INFO	Loaded configuration from state for mcp-optimizer
6:03PM	INFO	Starting tooling server mcp-optimizer...
6:03PM	INFO	OIDC validation disabled, using local user authentication
6:03PM	INFO	Using local user authentication for user: aponcedeleonch
6:03PM	INFO	Setting up streamable-http transport...
6:03PM	INFO	Deploying workload mcp-optimizer from image mcp-optimizer:latest...
6:03PM	INFO	Container created: mcp-optimizer
6:03PM	INFO	Starting streamable-http transport for mcp-optimizer...
6:03PM	INFO	Setting up transparent proxy to forward from host port 38035 to http://127.0.0.1:45878
6:03PM	INFO	Applied middleware: mcp-parser
6:03PM	INFO	Applied middleware: auth
6:03PM	INFO	No auth info handler provided; skipping /.well-known/ endpoint
6:03PM	INFO	HTTP transport started for mcp-optimizer on port 38035
6:03PM	INFO	MCP server mcp-optimizer started successfully
6:03PM	INFO	Server mcp-optimizer belongs to group optim, updating 0 registered client(s)
6:03PM	INFO	No target clients found for server mcp-optimizer
6:03PM	INFO	Running as detached process (PID: 49983)
6:03PM	INFO	MCP server not initialized yet, skipping health check for mcp-optimizer
6:03PM	INFO	MCP server not initialized yet, skipping health check for mcp-optimizer
6:04PM	INFO	MCP server not initialized yet, skipping health check for mcp-optimizer

Maybe we could also put in a background process checking for ToolHive? Or at least make it not blocking?

@aponcedeleonch
Copy link
Member

Sorry, hit the close button by mistake, it's right next to the Comment one

@therealnb
Copy link
Collaborator

#48 (comment)
comment there.

This could be an improvement or not, depending on how you look at it. It doesn't explain why the server never connects.

@tgrunnagle
Copy link
Collaborator Author

If the thv proxy error logs are being caused by something else, do we still want to make this change? I think it makes sense to move ingestion outside of server startup regardless.

@therealnb
Copy link
Collaborator

This is probably a good change, but I don't think it will fix all the problems. See the comment here. It looks like the initialise endpoint needs to be available in less than 0.1s after the process is started.

This may not be the best place to do this.

@tgrunnagle tgrunnagle merged commit 0c6d598 into main Nov 7, 2025
10 checks passed
@tgrunnagle tgrunnagle deleted the fix_startup_2025-11-06 branch November 7, 2025 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimizer initialization takes long time causing warning and error messages in thv proxy logs

5 participants