Skip to content

fix(crewai-crews): harden against LLM blocking calls at import time#3974

Merged
jpr5 merged 3 commits into
mainfrom
fix/crewai-import-time-hardening
Apr 16, 2026
Merged

fix(crewai-crews): harden against LLM blocking calls at import time#3974
jpr5 merged 3 commits into
mainfrom
fix/crewai-import-time-hardening

Conversation

@jpr5

@jpr5 jpr5 commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Summary

CrewAI's ChatWithCrewFlow.__init__ (invoked from ag_ui_crewai.endpoint.add_crewai_crew_fastapi_endpoint at module import in ag-ui-crewai <= 0.1.5) makes synchronous blocking LLM calls via crewai.cli.crew_chat.generate_input_description_with_ai and generate_crew_description_with_ai. ANY LLM hiccup — aimock regression, OpenAI outage, network blip, DNS failure — crashes the Python process BEFORE uvicorn can bind its port, causing Railway/Kubernetes health checks to fail and deploys to roll back.

This was the direct cause of the crewai-crews Railway crash fixed server-side in #3971. That fix patched the aimock response schema, but the underlying fragility in upstream CrewAI / ag-ui-crewai remained — a future blip would crash us again.

This PR adds a defensive monkey-patch in agent_server.py that replaces both generator functions with static-string returns BEFORE ag_ui_crewai is imported. The AI-generated descriptions are only surfaced in the CrewAI chat UI (which the CopilotKit runtime does not use), so static defaults are functionally equivalent for our showcase.

Upstream issue filed: crewAIInc/crewAI#5510

The long-term fix is deferred construction in ag-ui-crewai, which has landed on ag-ui main but is not yet released. Remove this shim once ag-ui-crewai > 0.1.5 ships.

Why a monkey-patch and not lazy-init

add_crewai_crew_fastapi_endpoint is the entry point and internally constructs ChatWithCrewFlow(crew) synchronously in ag-ui-crewai <= 0.1.5. Deferring that call would require either vendoring the endpoint function or reimplementing it. The monkey-patch is two lines and removes cleanly when the upstream fix ships.

Test plan

Verified locally via Docker build + run with an intentionally broken LLM endpoint (OPENAI_BASE_URL=http://invalid-host/v1):

Unhardened (negative control):

File "/app/agent_server.py", line 27, in <module>
    add_crewai_crew_fastapi_endpoint(app, LatestAiDevelopment(), "/")
  File ".../ag_ui_crewai/endpoint.py", line 250, in add_crewai_crew_fastapi_endpoint
    add_crewai_flow_fastapi_endpoint(app, ChatWithCrewFlow(crew=crew), path)
  File ".../ag_ui_crewai/crews.py", line 56, in __init__
    self.crew_chat_inputs = crew_chat_generate_crew_chat_inputs(...)
  File ".../crewai/cli/crew_chat.py", line 387, in generate_crew_chat_inputs
    description = generate_input_description_with_ai(input_name, crew, chat_llm)
  File ".../crewai/cli/crew_chat.py", line 481, in generate_input_description_with_ai
    response = chat_llm.call(...)
APIError

Container exits with code 1, never binds a port.

Hardened (this PR):

INFO:     Started server process [7]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

curl http://localhost:PORT/api/health -> {"status":"ok","integration":"crewai-crews","agent":"ok","timestamp":"..."} (HTTP 200).

Checklist

  • Docker build succeeds locally
  • Unhardened build crashes on import with broken LLM endpoint (negative control)
  • Hardened build starts cleanly with broken LLM endpoint and responds 200 on /api/health
  • Upstream issue filed on crewAIInc/crewAI

…ings

CrewAI's ChatWithCrewFlow.__init__ (called by ag_ui_crewai's
add_crewai_crew_fastapi_endpoint at module import) makes blocking
synchronous LLM calls via generate_input_description_with_ai and
generate_crew_description_with_ai in crewai/cli/crew_chat.py. Any LLM
hiccup (aimock regression, OpenAI outage, network blip) crashes the
Python process before uvicorn can bind, causing Railway healthcheck
failure and deploy rollback.

Patch both functions to return static strings before ag_ui_crewai is
imported. The AI-generated descriptions are only used by the CrewAI
chat UI (not the CopilotKit runtime), so static defaults are
functionally equivalent for our showcase.

Verified via docker build: unhardened image crashes on import with
APIError at crew_chat.py:481; hardened image starts cleanly with an
invalid OPENAI_BASE_URL and responds on /api/health.

Upstream fix (deferred construction) landed on ag-ui main but is not
yet released in ag-ui-crewai > 0.1.5. Remove shim when released.

Upstream issue: crewAIInc/crewAI#5510
@vercel

vercel Bot commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
chat-with-your-data Ready Ready Preview, Comment Apr 16, 2026 8:11pm
docs Ready Ready Preview, Comment Apr 16, 2026 8:11pm
form-filling Ready Ready Preview, Comment Apr 16, 2026 8:11pm
research-canvas Ready Ready Preview, Comment Apr 16, 2026 8:11pm
travel Ready Ready Preview, Comment Apr 16, 2026 8:11pm

Request Review

@github-actions

Copy link
Copy Markdown
Contributor

📣 Social Copy Generator

Generate social media copies (Twitter/X, LinkedIn, Blog Post) for this PR using Claude.

  • Generate social media copies

…ai ceiling

Addresses CR R1 CRITICAL finding on PR #3974.

setattr() on a Python module always succeeds regardless of prior
attribute existence. Without a guard, an upstream rename of
generate_input_description_with_ai or generate_crew_description_with_ai
in a future crewai release would silently no-op the patch, leaving the
real functions in place. The pre-bind LLM crash bug would quietly
reappear in production with a green PR.

Changes:

- Add hasattr() guard in both agent_server.py files that raises
  RuntimeError with an actionable drift message if either symbol
  disappears upstream.
- Add post-assignment assert to defend against import-order weirdness
  or module re-imports shadowing the reference.
- Add an info-level log line so operators can see the shim is active
  and know to remove it after adoption.
- Add upstream issue link (crewAIInc/crewAI#5510) and explicit ag-ui-crewai
  release status to the comment block.
- Pin ag-ui-crewai upper bound to <0.1.6 in both requirements.txt files
  so the shim's applicability window is enforced by pip — upgrading past
  0.1.5 forces the engineer to confront the version mismatch and remove
  the shim.

Applied to both showcase/packages/crewai-crews/src/agent_server.py and
showcase/starters/crewai-crews/agent_server.py to keep the demo and
starter trees in sync.

Verified locally:

- Docker build succeeds (showcase/packages/crewai-crews).
- Hardened container starts cleanly with OPENAI_BASE_URL set to an
  unreachable host; /health returns 200 on both agent (8000) and
  Next.js (10000) ports.
- Negative case: deleting generate_input_description_with_ai from the
  installed crewai module inside the container and re-importing
  agent_server raises RuntimeError with the upstream-drift message, as
  expected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant