fix(crewai-crews): harden against LLM blocking calls at import time#3974
Merged
Conversation
…ings CrewAI's ChatWithCrewFlow.__init__ (called by ag_ui_crewai's add_crewai_crew_fastapi_endpoint at module import) makes blocking synchronous LLM calls via generate_input_description_with_ai and generate_crew_description_with_ai in crewai/cli/crew_chat.py. Any LLM hiccup (aimock regression, OpenAI outage, network blip) crashes the Python process before uvicorn can bind, causing Railway healthcheck failure and deploy rollback. Patch both functions to return static strings before ag_ui_crewai is imported. The AI-generated descriptions are only used by the CrewAI chat UI (not the CopilotKit runtime), so static defaults are functionally equivalent for our showcase. Verified via docker build: unhardened image crashes on import with APIError at crew_chat.py:481; hardened image starts cleanly with an invalid OPENAI_BASE_URL and responds on /api/health. Upstream fix (deferred construction) landed on ag-ui main but is not yet released in ag-ui-crewai > 0.1.5. Remove shim when released. Upstream issue: crewAIInc/crewAI#5510
Contributor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
📣 Social Copy GeneratorGenerate social media copies (Twitter/X, LinkedIn, Blog Post) for this PR using Claude.
|
…ai ceiling Addresses CR R1 CRITICAL finding on PR #3974. setattr() on a Python module always succeeds regardless of prior attribute existence. Without a guard, an upstream rename of generate_input_description_with_ai or generate_crew_description_with_ai in a future crewai release would silently no-op the patch, leaving the real functions in place. The pre-bind LLM crash bug would quietly reappear in production with a green PR. Changes: - Add hasattr() guard in both agent_server.py files that raises RuntimeError with an actionable drift message if either symbol disappears upstream. - Add post-assignment assert to defend against import-order weirdness or module re-imports shadowing the reference. - Add an info-level log line so operators can see the shim is active and know to remove it after adoption. - Add upstream issue link (crewAIInc/crewAI#5510) and explicit ag-ui-crewai release status to the comment block. - Pin ag-ui-crewai upper bound to <0.1.6 in both requirements.txt files so the shim's applicability window is enforced by pip — upgrading past 0.1.5 forces the engineer to confront the version mismatch and remove the shim. Applied to both showcase/packages/crewai-crews/src/agent_server.py and showcase/starters/crewai-crews/agent_server.py to keep the demo and starter trees in sync. Verified locally: - Docker build succeeds (showcase/packages/crewai-crews). - Hardened container starts cleanly with OPENAI_BASE_URL set to an unreachable host; /health returns 200 on both agent (8000) and Next.js (10000) ports. - Negative case: deleting generate_input_description_with_ai from the installed crewai module inside the container and re-importing agent_server raises RuntimeError with the upstream-drift message, as expected.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CrewAI's
ChatWithCrewFlow.__init__(invoked fromag_ui_crewai.endpoint.add_crewai_crew_fastapi_endpointat module import inag-ui-crewai <= 0.1.5) makes synchronous blocking LLM calls viacrewai.cli.crew_chat.generate_input_description_with_aiandgenerate_crew_description_with_ai. ANY LLM hiccup — aimock regression, OpenAI outage, network blip, DNS failure — crashes the Python process BEFORE uvicorn can bind its port, causing Railway/Kubernetes health checks to fail and deploys to roll back.This was the direct cause of the crewai-crews Railway crash fixed server-side in #3971. That fix patched the aimock response schema, but the underlying fragility in upstream CrewAI / ag-ui-crewai remained — a future blip would crash us again.
This PR adds a defensive monkey-patch in
agent_server.pythat replaces both generator functions with static-string returns BEFOREag_ui_crewaiis imported. The AI-generated descriptions are only surfaced in the CrewAI chat UI (which the CopilotKit runtime does not use), so static defaults are functionally equivalent for our showcase.Upstream issue filed: crewAIInc/crewAI#5510
The long-term fix is deferred construction in
ag-ui-crewai, which has landed on ag-uimainbut is not yet released. Remove this shim onceag-ui-crewai > 0.1.5ships.Why a monkey-patch and not lazy-init
add_crewai_crew_fastapi_endpointis the entry point and internally constructsChatWithCrewFlow(crew)synchronously inag-ui-crewai <= 0.1.5. Deferring that call would require either vendoring the endpoint function or reimplementing it. The monkey-patch is two lines and removes cleanly when the upstream fix ships.Test plan
Verified locally via Docker build + run with an intentionally broken LLM endpoint (
OPENAI_BASE_URL=http://invalid-host/v1):Unhardened (negative control):
Container exits with code 1, never binds a port.
Hardened (this PR):
curl http://localhost:PORT/api/health->{"status":"ok","integration":"crewai-crews","agent":"ok","timestamp":"..."}(HTTP 200).Checklist
/api/health