What happened?
After updating a queue-bound flow's code (via UI "Save Changes" or PUT /api/flows/{id}),
queue workers continue executing the old stale code on every subsequent job.
The old code contained a publish step that called an unreachable internal host
(http://runloop-engine:8080). This causes Node.js http.request() to hang
indefinitely — because req.setTimeout() does NOT fire during TCP connection
establishment (only on idle connected sockets).
The queue's visibility timeout (300s) eventually kills the hung job,
producing the misleading error: "Execution cancelled".
Run Now (manual trigger) always uses fresh code from DB → completes in < 10s ✅
Queue-triggered jobs use cached old code → hang for exactly 5m 0s ❌
The DB confirmed the new code was saved correctly:
GET /runloop/rl/api/flows/{id} returns the updated code
updatedAt: 2026-05-07T09:49:16.693Z matches the last save
Disabling and re-enabling the queue (Workers=0 → Workers=1) does NOT fix it.
New goroutines still load from cache, not fresh from DB.
Steps to reproduce
- Create a queue bound to a flow containing Node.js code that calls an
unreachable host (e.g. http://runloop-engine:8080)
- Enqueue a job → observe it hang for exactly
visibility seconds → fails
with "Execution cancelled"
- Update the flow code via UI (SAVE CHANGES) to remove the broken call
- Confirm via
GET /api/flows/{id} that DB has the new code
- Click Run Now on the flow → ✅ completes in < 10 seconds (uses new DB code)
- Enqueue a new queue job → ❌ still hangs 5 minutes (still running old code)
- Edit queue → disable (Enabled=false) → Save → re-enable → Save
- Enqueue again → ❌ still hangs 5 minutes (cache survives worker restart)
Only confirmed workaround:
Create a brand-new flow via POST /api/flows with the same code,
then rebind with PATCH /api/queues/{name} using the new flowId.
Fresh workers have no cache → load code correctly from DB ✓
Version / commit
- RunLoop version: v0.1.0 BETA - Engine: ONLINE (observed at community.oneweb.tech) - Node.js runtime inside executor: v24.14.1 - Queue backend: PostgreSQL - Concurrency: 1 | max_attempts: 3 | visibility: 300s
How are you running RunLoop?
Local dev (npm run dev)
Logs / errors
Anything else?
Summary
Queue workers cache the bound flow's code when they start up.
After updating the flow code via the UI ("Save Changes") or the PUT /api/flows/{id} API,
queue workers continue executing the old (stale) code until the engine process is restarted.
"Run Now" (manual trigger) always uses the latest code from DB — working correctly.
Only queue-triggered executions are affected.
Environment
- RunLoop version: v0.1.0 BETA
- Queue backend: PostgreSQL
- Affected queue:
pipeline-tasks
- Flow:
Pipeline Executor (o6m9d5kp6yxgn0pqalg2qxhhx)
Steps to Reproduce
- Create a queue bound to flow
FlowA
FlowA has Node.js code that calls http://internal-host:8080 (unreachable)
- Enqueue a job → it hangs for exactly
visibility seconds → fails with "Execution cancelled"
- Update
FlowA code via UI (Save Changes) to remove the broken call
- Confirm via
GET /api/flows/{id} that DB has new code (updatedAt is recent)
- Click "Run Now" → ✅ completes in < 10 seconds (uses new code from DB)
- Enqueue another queue job → ❌ still hangs for 5 minutes (still using old cached code)
- Disable queue (set Enabled=false), re-enable (Enabled=true) → ❌ still same behavior
- Only fix found: create a brand new flow with
POST /api/flows, then PATCH /api/queues/{name} with new flowId
Expected Behavior
When flow code is updated (via UI or API), queue workers should use the new code
on the next job pickup — without requiring engine restart or queue recreation.
The PATCH /api/queues/{name} API docs state:
"Changes apply on next worker pickup"
This should also apply to flow code changes, not just config parameters.
Actual Behavior
Queue workers continue running stale cached code indefinitely.
The only workarounds are:
- Restart the RunLoop engine process
- Create a new flow + rebind queue via
PATCH /api/queues/{name} with new flowId
Root Cause Analysis
The RunLoop engine appears to load the flow definition (including Node.js code)
into the worker goroutine's memory at queue startup or first job pickup,
then caches it for all subsequent jobs.
Evidence:
GET /runloop/rl/api/flows/{id} confirms DB has correct code
updatedAt: 2026-05-07T09:49:16.693Z on flow matches our last save
- "Run Now" executions use fresh code (no caching)
- Queue executions use old code (with stale
publish call to unreachable host)
- Last error on job after queue disable:
"load flow: context cancelled" — confirms worker loads flow code at pickup time, but context was cancelled due to disable
Impact
- All queue jobs fail at exactly
visibility seconds (300s default)
- Error appears as
"Execution cancelled" — misleading, actual cause is stale code hang
- System appears to work (no startup error) but silently degrades
Suggested Fix
Option 1 (Preferred): Hot-reload on flow update
When PUT /api/flows/{id} or PATCH /api/queues/{name} is called,
signal active workers to reload the flow definition from DB on next job pickup.
Option 2: Force-reload button
Add a "Reload Workers" button on the Queue detail page that signals workers to reload flow code.
Option 3 (Workaround, already works):
PATCH /api/queues/{name} with a new flowId — forces workers to use updated definition.
Document this as the official workaround until hot-reload is implemented.
Additional: req.setTimeout() Warning
The affected flow used Node.js req.setTimeout(30000, ...) expecting it to protect against hangs.
However, req.setTimeout() is a socket idle timeout — it does NOT fire during TCP connection establishment.
If a target host drops TCP SYN packets (unreachable, firewall, wrong hostname),
req.setTimeout() will never fire, causing an infinite hang.
Recommendation: Add a documentation note or example in the Node.js executor
showing proper TCP connect timeout using the socket event:
req.on('socket', function(socket) {
socket.setTimeout(connectTimeoutMs);
socket.on('timeout', function() { req.destroy(new Error('Connect timeout')); });
});
What happened?
After updating a queue-bound flow's code (via UI "Save Changes" or PUT /api/flows/{id}),
queue workers continue executing the old stale code on every subsequent job.
The old code contained a
publishstep that called an unreachable internal host(
http://runloop-engine:8080). This causes Node.jshttp.request()to hangindefinitely — because
req.setTimeout()does NOT fire during TCP connectionestablishment (only on idle connected sockets).
The queue's
visibilitytimeout (300s) eventually kills the hung job,producing the misleading error:
"Execution cancelled".Run Now (manual trigger) always uses fresh code from DB → completes in < 10s ✅
Queue-triggered jobs use cached old code → hang for exactly 5m 0s ❌
The DB confirmed the new code was saved correctly:
GET /runloop/rl/api/flows/{id}returns the updated codeupdatedAt: 2026-05-07T09:49:16.693Zmatches the last saveDisabling and re-enabling the queue (Workers=0 → Workers=1) does NOT fix it.
New goroutines still load from cache, not fresh from DB.
Steps to reproduce
unreachable host (e.g.
http://runloop-engine:8080)visibilityseconds → failswith "Execution cancelled"
GET /api/flows/{id}that DB has the new codeOnly confirmed workaround:
Create a brand-new flow via
POST /api/flowswith the same code,then rebind with
PATCH /api/queues/{name}using the newflowId.Fresh workers have no cache → load code correctly from DB ✓
Version / commit
How are you running RunLoop?
Local dev (npm run dev)
Logs / errors
Anything else?
Summary
Queue workers cache the bound flow's code when they start up.
After updating the flow code via the UI ("Save Changes") or the
PUT /api/flows/{id}API,queue workers continue executing the old (stale) code until the engine process is restarted.
"Run Now" (manual trigger) always uses the latest code from DB — working correctly.
Only queue-triggered executions are affected.
Environment
pipeline-tasksPipeline Executor(o6m9d5kp6yxgn0pqalg2qxhhx)Steps to Reproduce
FlowAFlowAhas Node.js code that callshttp://internal-host:8080(unreachable)visibilityseconds → fails with"Execution cancelled"FlowAcode via UI (Save Changes) to remove the broken callGET /api/flows/{id}that DB has new code (updatedAtis recent)POST /api/flows, thenPATCH /api/queues/{name}with newflowIdExpected Behavior
When flow code is updated (via UI or API), queue workers should use the new code
on the next job pickup — without requiring engine restart or queue recreation.
The
PATCH /api/queues/{name}API docs state:This should also apply to flow code changes, not just config parameters.
Actual Behavior
Queue workers continue running stale cached code indefinitely.
The only workarounds are:
PATCH /api/queues/{name}with newflowIdRoot Cause Analysis
The RunLoop engine appears to load the flow definition (including Node.js code)
into the worker goroutine's memory at queue startup or first job pickup,
then caches it for all subsequent jobs.
Evidence:
GET /runloop/rl/api/flows/{id}confirms DB has correct codeupdatedAt: 2026-05-07T09:49:16.693Zon flow matches our last savepublishcall to unreachable host)"load flow: context cancelled"— confirms worker loads flow code at pickup time, but context was cancelled due to disableImpact
visibilityseconds (300s default)"Execution cancelled"— misleading, actual cause is stale code hangSuggested Fix
Option 1 (Preferred): Hot-reload on flow update
When
PUT /api/flows/{id}orPATCH /api/queues/{name}is called,signal active workers to reload the flow definition from DB on next job pickup.
Option 2: Force-reload button
Add a "Reload Workers" button on the Queue detail page that signals workers to reload flow code.
Option 3 (Workaround, already works):
PATCH /api/queues/{name}with a newflowId— forces workers to use updated definition.Document this as the official workaround until hot-reload is implemented.
Additional: req.setTimeout() Warning
The affected flow used Node.js
req.setTimeout(30000, ...)expecting it to protect against hangs.However,
req.setTimeout()is a socket idle timeout — it does NOT fire during TCP connection establishment.If a target host drops TCP SYN packets (unreachable, firewall, wrong hostname),
req.setTimeout()will never fire, causing an infinite hang.Recommendation: Add a documentation note or example in the Node.js executor
showing proper TCP connect timeout using the
socketevent: