Understanding shared hosts for multiple functions in a function app #529

Closed
peterjuras opened this Issue Jul 28, 2016 · 5 comments

Projects

None yet

5 participants

@peterjuras
peterjuras commented Jul 28, 2016 edited

Hi!

TL:DR;
The post got quite long so here is the gist :) Are multiple functions within a function app reusing the same host? I tried to keep the host warm using a time triggered small function, but it didn't help to improve the response time of another http triggered function within that function app.
/TL:DR;

First of all, I really like the promise of Azure Functions, I truly hope that they evolve and one day provide us with the opportunities to write truly serverless code.

I'm currently evaluating Azure Functions for API use cases (Where a function would fully replace a hosted API in a Web app or something similar) and wanted to explore the cold start issues. This is my setup:

The code can be found here

1 Function app that triggers http requests (Timer: 1 minute, trigger-function.js)
4 Function app that are http end points (Http Trigger, api-function.js)

All function apps are in isolated resource groups but in the same region, all are on the dynamic plan.

The idea of my test was to find out when an API end point is hit by the cold start issue. For that, the four http functions are called in different intervals: every 5 minutes, every 10, 15 and 30 minutes. You can read the full logs here*, but let's look at the relevant part:

6-07-28T08:30:00.144 https://<my function api host for5 minutes>/api/HttpTriggerNodeJS1 needed 141ms to complete the request
2016-07-28T08:30:00.175 https://<my function api host for5 minutes>/api/HttpTriggerNodeJS1 needed 172ms to complete the request
2016-07-28T08:30:03.314 https://<my function api host for30 minutes>/api/HttpTriggerNodeJS1 needed 3311ms to complete the request
2016-07-28T08:30:03.393 https://<my function api host for30 minutes>/api/HttpTriggerNodeJS1 needed 3390ms to complete the request
2016-07-28T08:30:03.610 https://<my function api host for15 minutes>/api/HttpTriggerNodeJS1 needed 3607ms to complete the request
2016-07-28T08:30:03.642 https://<my function api host for15 minutes>/api/HttpTriggerNodeJS1 needed 3639ms to complete the request
2016-07-28T08:30:07.234 https://<my function api host for10 minutes>/api/HttpTriggerNodeJS1 needed 7230ms to complete the request
2016-07-28T08:30:07.234 Function completed (Success, Id=4015d7f4-8eed-4d93-8184-252ef02713ef)
2016-07-28T08:30:07.265 https://<my function api host for10 minutes>/api/HttpTriggerNodeJS1 needed 7262ms to complete the request

As you can see, only the API that is repeatedly called every 5 minutes has a sane response time, every other endpoint has brutal response times above 3 seconds.

I then tried to put another time triggered function into the 30 minute API function app and have it be called repeatedly every 5 minutes. I thought that this would keep the host "warm" but it had no effect and the 30 minute API function still needed way too long to respond. Is each function hosted in its own host, even though they are part of the same function app?

*The logs are from the functions portal/web interface and seem to be missing a lot of invocations. I think that it should be mostly fine however, since the trigger function lists all API calls correctly.

Side remarks that I found during developing:

  • Right now, my time triggered function is always invoked twice. Not sure why this is happening, but I hope this will not happen after GA
  • The console showed that node version 4.3.1 was installed, the docs say it is 5.9.1
  • Some node code parts that are valid JavaScript and run fine on my local machine would create weird behavior as an Azure function, and run forever. Looking at this gist, version B sometimes produced execution times > 5 minutes (!!!) while version A usually executes within less than 100ms.
  • Measured execution times for empty functions (only context.done()) are too high and range between 50 and 700 (!!!) ms. Both the wide range and the high execution times would make billing quite unfair and unpredictable compared to Amazon Lamdba.
  • One function app completely broke and I couldn't recover it. Neither restarting nor something else would help (git deployments, etc.). I couldn't test the function and the trigger didn't work either, I got some 503 errors. I think this was after manually changing the function enabled status in the web interface although I deployed the functions via source control (Haven't retried it though)

Keep up the good work! :)

@davidebbo
Contributor

Multiple functions do reuse the same host, but that host can be scaled to multiple VMs (each potentially running several of the functions in the app).

Hosts that are unused for 5 minutes get idled out. So having your timer be every 5 minutes puts you right on the Edge.

In term of testing host reuse, you may want to try incrementing a static variable to tell apart new host vs reuse. This will be more reliable than clock time.

I know I didn't address everything in your issue, and it may be better to open separate issues for unrelated things, like the code not running correctly.

@davidebbo
Contributor

I should add that we are working on optimizing the cold start scenario.

@christopheranderson christopheranderson self-assigned this Aug 1, 2016
@christopheranderson christopheranderson added this to the rc milestone Aug 8, 2016
@mathewc
Contributor
mathewc commented Aug 23, 2016

I'll respond quickly to a few of the points you brought up, but I agree with David that for any specific issues you should open separate bugs. Thanks for the feedback!

  • Last week we released a fix for the duplicate timer trigger issue. You shouldn't be seeing that anymore if you move to the latest 0.4 runtime version. See release notes here: https://github.com/Azure/azure-webjobs-sdk-script/releases/tag/v1.0.0-beta1-10355
  • Regarding Node version. If you move to the latest 0.4 runtime version, you'll see that the version is 5.9.1
  • billing will only take into consideration the time your function code actually runs (including any input/output bindings), and won't include any of our pipeline code before/after the invocation
@mathewc mathewc closed this Aug 23, 2016
@vance
vance commented Sep 5, 2016

in the .5 version, I can see response time up to 1-5 MINUTES on a dynamic plan. This is with a keep alive hitting my endpoint every 4 minutes. Going nuts here.

screen shot 2016-09-05 at 12 12 35 am

@davidebbo
Contributor

Being discussed in #298.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment