Intensive functions can bring down infrastructure #899

MikeStall · 2016-11-04T19:37:29Z

We need to give customers a clear pattern for doing intensive (I/O or CPU) functions. If it’s intermittent network glitch, then retry will solve and that’s great. But if the user function is really starving it (which we’re seeing happen at least in my case), retry will lead to infinite looping and denial-of-service.

The canonical example is:
public async Task Drown([QueueTrigger] Payload x)
{
// Read 1 million rows from Azure Tables
}

This is a general problem because many kinds of functions need some sort of "keep alive" making network calls while the function runs. These heartbeats are what tell other workers that this node is still alive (as opposed to orphaned).
For example, QueueTrigger needs to keep the queue message visibility timeout. ServicBus needs long polling. [Singleon] needs to own the lease. EventHub's EHP needs to own a lease.

lindydonna · 2016-11-14T18:46:13Z

@brettsam will do investigation on this.

brettsam · 2017-04-04T18:56:19Z

The underlying issue with this and #822 is that we do not control the code being run in the function. And because the function and host run in the same process, it means the function can monopolize the CPU time, causing our important host operations to fail. We've analyzed quite a few cases and have been able to fix the issue by having the customer improve their function code, but this is hard to see when all you get is sporadic host errors.

The approach we'll move towards in the near future is having a "canary" timer running in the host. If that timer starts firing late, we know there's a problem somewhere. We'll log an explicit message that can guide the user towards a solution. Right now we have no great place to log it -- the log will get buried in host logs somewhere. We hope the Application Insights work will give us a nice place to log warnings like this.

pratap284 · 2018-05-22T20:34:10Z

We have a similar issue with Azure Web job sdk sometimes the lock is not getting released and due to that function goes into ‘Never finished’ status. Only by restarting the web job this problem can be resolved. Do we have any update on the fix or any recommendations to solve this problem?

MikeStall mentioned this issue Nov 4, 2016

Singleton Listeners causing 409 (Conflict) error #822

Open

lindydonna added this to the 2.0.0-release milestone Nov 14, 2016

lindydonna assigned brettsam Nov 14, 2016

MikeStall mentioned this issue Dec 19, 2016

EventHub Triggered Function runs very long, sometimes shows as "Never Finished" Azure/azure-functions-host#872

Open

paulbatum modified the milestones: January 2017, 2.0.0-release Jan 5, 2017

paulbatum modified the milestones: April 2017, January 2017 Apr 3, 2017

paulbatum modified the milestones: April 2017, May 2017 May 2, 2017

paulbatum modified the milestones: May 2017, June 2017 Jun 20, 2017

brettsam modified the milestones: Next, June 2017 Jun 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intensive functions can bring down infrastructure #899

Intensive functions can bring down infrastructure #899

MikeStall commented Nov 4, 2016

lindydonna commented Nov 14, 2016

brettsam commented Apr 4, 2017

pratap284 commented May 22, 2018

Intensive functions can bring down infrastructure #899

Intensive functions can bring down infrastructure #899

Comments

MikeStall commented Nov 4, 2016

lindydonna commented Nov 14, 2016

brettsam commented Apr 4, 2017

pratap284 commented May 22, 2018