Cold start of Azure Function with large npm module trees is slow in Consumption plan #298

arafato · 2016-04-25T19:01:59Z

Hi--,

I've notice that the initial call of an Azure function (i.e. a couple of minutes after the last call) takes a significant amount of time (up to 15 secs). Is there any way to pre-warm a function other than initially calling it? Will the startup performance be improved in the near future? I'm using it on the Dynamic plan. Thanks!

Best regards
--Oliver.

davidebbo · 2016-04-25T23:58:01Z

Currently, there is a short idle timeout on Function apps, so if they don't get requests for 5 minutes the next request causes a cold start which takes longer. We are looking into increasing this timeout, which will make the situation much less common.

FlavorScape · 2016-09-05T00:37:22Z

Ive seen delays up to 2 minutes, or even timeout with Azure Function app and node.js on a dynamic hosting plan. This is terrible! running functions app version .5

arafato · 2016-09-05T10:23:08Z

Since upgrading to 0.5 delays have increased from 15 secs up to 50 secs on my side.

arafato · 2016-09-05T13:41:26Z

I'm currently mitigating the cold-start issue with an additional pre-warming Azure function that periodically calls the other Azure function like so.

let http = require('http');

module.exports = function (context, myTimer) {
        http.get('http://your.azure-function.net', (res) => {
        context.log(`Successfully called endpoint. Got response: ${res.statusCode}`);   
        context.done();
    }).on('error', (e) => {
        context.log(`Error: ${e.message}`);
        context.done();
    });
};

This function is called every 4 minutes through a timer-triggered function with the following
cron job expression 0 */4 * * * *

vance · 2016-09-05T19:26:42Z

I tried this. I have a timer calling every 4 minutes. I still get delays up to 5 MINUTES Here's the log of it. Fine, fine, fine, TERRIBLE...

vance · 2016-09-05T22:23:25Z

So, my classic plan app is running Function App v .4 and it seems that the debugging messages may work better? I get errors showing that don't get in the dynamic app (dynamic app just hangs for long periods). Could faults on one Function affect the performance of the whole app? Sometimes I even get "BAD GATEWAY"

davidebbo · 2016-09-06T03:40:33Z

This is strange. To make sure that this is indeed related to 0.5 and not some other factors, can you try going back to 0.4 (instructions here)?

peterjuras · 2016-09-06T08:15:32Z

For me personally, it is not only about the cold start/idle issue, but as I wrote in #529 I think it would be good to have some transparency of the scenarios that are targeted by Azure functions.

If I understand things correctly, the start up time of a function is not only occuring on a cold start of a single container, but also affects new scaled instances if the dynamic plan wants to scale out.

This means that if I have an API HTTP triggered function, and one instance of that function is already running (let's say an execution takes 100ms), it could wait until the execution finishes and then cater the new request, leading to a maximum execution time of 200ms (which is kind of fine).

But instead, the dynamic plan could decide to scale out, and now we have the cold start penalty again. This means that instead of a maximum execution time of 200ms, this request will go above and beyond 5 seconds.

Could you explain how you are handling these cases?

davidebbo · 2016-09-06T15:39:53Z

@peterjuras let's keep this issue focused on the potential cold start regression which is causing cold start of over 1 minute for some users. We can use a separate thread for a more general discussion of how the system work.

For users running into these extreme cold starts, various questions that would help investigating:

Does 0.4 vs 0.5 make a difference?
What functions are in the Function App? Is it just the one JS functions, or are there others? In particular, do you have any F# functions (there may be an issue there)?
Can you share the name of a Function App where you saw this, and the UTC time of one specific slow cold start?

vance · 2016-09-08T19:31:32Z

So I did a long running test again where I ping every 3 minutes, the response times are all under 1000 MS unless one of the scheduled jobs failed to fire. Then after the six minutes lapse, the response time goes to a whopping 150,000-280,000 MS. I will try reverting to .4 and see what happens.

davidebbo · 2016-09-08T20:27:26Z

@vance so you mean you are measuring http functions, but the app happens to also have a scheduled function? What's the function's schedule? Where does 'six minute lapse' come from?

vance · 2016-09-09T02:42:03Z

I mean I have a timer triggered function doing a HTTP request to my other function. It is set to 3 minutes. However, sometimes it does not fire, and six minutes elapse. The next time it runs, the execution time is extreme. Sometimes it's so slow, it creates a ripple effect where it goes from 280K to 140K to 30K or so, before it finally returns to under 1k MS for the next 10-20 calls.

davidebbo · 2016-09-09T02:59:59Z

That's very odd behavior. If you can share your Function App name (either directly or indirectly), and the UTC time when you saw this, we can hopefully analyse the situation.

vance · 2016-09-09T03:32:04Z

Sure, here's a dummy app in the same region. fscape-dynamic3. Any function endpoint shows the extreme time in the logs, I think.

The screenshot is dated

05 Sep 2016 19:12:35 GMT

so it should be around this time.

vance · 2016-09-09T04:29:56Z

So I reverted to v 0.4 similar results. Also, don't know if it's related, but sometimes the "pinged" services even timeout, or even issue a CORS error, when other requests work. I tried just having a local script run a query every 10 minutes. See that result in the second image. No server changes happened at this time.

dennis-yemelyanov · 2016-09-10T21:57:04Z

I'm seeing similar symptoms. Although this might be an unrelated issue, in my case most of the time seems to be spent loading the azure-storage library. Any ideas if this is normal or how this can be fixed?

context.log('requiring azure');
const azure = require('azure-storage');
context.log('required azure');

2016-09-10T21:24:46.626 requiring azure
2016-09-10T21:27:04.641 required azure

vance · 2016-09-10T22:44:56Z

Actually, this might explain a lot of the delay. I think it may be multiplied if you hit a bunch of functions simultaneously when coming out of sleep. Maybe it pegs the I/O for the VM. I do know that this dependency is seems rather large...

I have no idea how this virtualization is engineered, but I wonder if there's some way to snapshot these static resources as globals in a shared memory location that's immutable. Just a thought, don't know if that's even possible.

dennis-yemelyanov · 2016-09-11T00:13:44Z

Yeah keeping it in memory might not be possible is some scenarios (at least not within the host process, which might restart etc.). I think the main question though is why does even fresh initialization take so long? Azure-storage indeed has tons of dependencies (I was shocked when I first saw it, way too many for a library that effectively is just a simple rest API wrapper). But when I run the same code locally (in a new node process) it executes instantly without any issues, which makes me think it could be an Azure runtime issue...

vance · 2016-09-12T19:17:57Z

Wonder if it's the network or storage used. Maybe when these instances come back from idle, the data must get loaded over the network since they are not bound to any particular hardware (or VM)? I did not see these long startup times when hitting Classic Service Plan app (with AlwaysOn turned off).

arafato · 2016-09-13T13:29:35Z

Good point @vance . Deleting an "average" node_modules folder on a dynamic plan takes well over 5 minutes in many cases. IOPS seems to be very low...

vance · 2016-09-27T03:51:24Z

Any way to have the container manager or whatever app pool manager cache the resources on the container disk, so they at least come back to life without going over the network? ... at least for the first instance.

securityvoid · 2017-03-01T02:15:49Z

Okay, I'm having some issues with the Webpack technique and I'm wondering what other people are doing.

I've mentioned this before, but I created a build-script. It goes through each one of the index.js for each function and extracts every require. These then go into a single, central javascript file like @1N50MN14 mentioned earlier.

e.g.
global.deps = { package1: require('package1'), package2: require('package2'), }

The final file is WebPacked and uglified, each index.js is set to include this file, and the references for each library are updated to reference the appropriate require.

e.g.
var myvar = globals.deps['package1'];

The problem I'm running into is that the final JavaScript file, while only 700kb, is large enough that if I add it to a VERY simple test function; the cold start takes longer than 5 minutes. Since I'm on a consumption plan, you can't set the timer above 5 minutes.

For example, the following will take over 5 minutes to run:

'use strict';
require('D:\\home\\site\\repository\\azure.deps.js');

module.exports = function(context){
    context.res =  {
        status: 200,
        headers: {
            "Content-Type" : "application/json"
        },
        body : { "status" : "alive"}
    }
    context.done();
}

Now, in the above example I don't even need the require, so it can be taken out. However, just as an example if I drop out the require, it of course runs really quickly. I'm sure if I ever got it cold started, the warm function would not have any issue with the 5 minutes. However, with the 5 minute timer it just dies before it can ever get warm.

What I don't understand is I don't think my dependency list is too terribly long:

  "dependencies": {
    "crypto": "0.0.3",
    "dotenv": "^4.0.0",
    "fs": "0.0.1-security",
    "https": "^1.0.0",
    "jsonwebtoken": "^7.3.0",
    "jwk-to-pem": "^1.2.6",
    "lodash": "^4.17.4",
    "path": "^0.12.7",
    "q": "^1.4.1",
    "xml2js": "^0.4.17",
    "xmlbuilder": "^8.2.2"
  }

As a result it seems like there should be some way to get this to work, and in fact if anyone is getting anything to run; I feel like I should be able to get my code to run.

1N50MN14 · 2017-03-01T02:44:57Z

@securityvoid 700k uglified code sounds like a lot are you sure you're doing it right? path, fs, crypto, https they all sound like native node modules which you don't need to include in your bundle. Also, did you make sure your node_modules folder is either in .gitignore or completely deleted this has been the reason cold start too slow. To be on the safe side, double check in the portal node_modules does not exist in your bundle directory (or other directories you don't need it) and your git integration is properly in sync.

My entry file (named index.src.js) looks something like this:

global.deps = {
  Promise: require('bluebird'),
  jwt: require('jsonwebtoken'),
  uuid: require('uuid')
}

then I have a shell script (that would be .bat on Windows I guess?) that I use to bundle and minifiy which looks like this:

rm index.bundle.min.js
rm index.bundle.js
npm install
browserify --node index.src.js > index.bundle.js
uglify -s index.bundle.js -o index.bundle.min.js 
rm -rf ./node_modules

And in my function code I use the deps like so:

require('../SDK/index.bundle.min')

var Promise = global.deps.Promise;
var uuid = global.deps.uuid;

module.exports = function(context)
{
  //code
}

Hope some of this helps ;)

securityvoid · 2017-03-01T03:02:45Z

@1N50MN14 I'll go back and double-check all the dependencies to be sure; but I think I have it all setup right. I can tell you that I'm gitignoring node_modules, and that the code runs fine locally. As a result it should be right.

Since I'm doing this as a deploy script my code actually has the "normal" requires prior to the script running that does all the updates.

Only the optimized code ever makes it into the wwwroot (no node_modules).

securityvoid · 2017-03-01T13:43:41Z

@1N50MN14 I went back and did a print-out of the JSON Stats from Webpack to see everything that was being packed up. It looks like the ~700k is correct. A good 3/4 of which are associated with the jsonwebtoken package. That package does do a lot, so it sort of makes sense.

@davidebbo Are there plans on how to handle this? I can effectively undo all the Webpacking, but then I'm still back to the slow response times.

ricklove · 2017-03-01T14:29:07Z

@securityvoid

With my technique, I have around a 10 second cold start (which is equivalent to the average C# cold start from my tests). Some of my files are 3MB which is huge, but I don't run uglify and it doesn't seem to matter. (It seems that the cold-start issue with Azure Functions has to do with latency not bandwidth. It seems the number of file dependencies is the problem and large files don't slow it down much.)

Some observations:

I don't manually do anything to prune dependencies.
I leave everything in the original node_modules, place everything in its final position and then run webpack. This pulls in the node_modules dependencies so that it is no longer needed at all. (In my deployment folder, I have no node_modules and I have no project.json.)
I do all this on my machine before I deploy.
It takes around a minute for my build tool to run webpack on everything.
When I deploy, it takes a few minutes for everything to update from the time I git push.
Cold start time is around 10 seconds
- I have run my cold-start measurment tests from a separate azure function and concluded that Javascript, C#, and F# Functions all have equivalent (within 20%) cold start times (assuming a webpack pre-built solution like this).
There are probably some points of improvement, but I have obtained sufficient performance for production use.

Folder structure:

root
- node_modules
- project.json
- src (Typescript)
- lib (Generated from src)
- .deployments (points to deployments)
- deployments
  - no project.json
  - [function-folder]
    - function.json
    - index.js (requires build.js)
    - build.source.js (points to entry point in lib folder)
    - build.js
      - built using webpack
      - contains all dependencies from lib and node_modules
      - largest is around 3MB (I don't use uglify)

Process:

Run tsc to transpile from the src folder into a lib folder in my project.
- The code in this folder still points to the original node_modules for external require.
Generate a function-folder for each endpoint (this can be done manually).
- This includes:
  - function.json
  - index.js
  - build.source.js
Run webpack against each build.source.js which produces a build.js
- This now has every code from the lib folder and all dependencies from node_modules.
Git push the project with the deployments folder to the production branch
- no node_modules folder is included
- lib folder is not needed (but I include it because my CI tests run against it)

securityvoid · 2017-03-01T14:41:50Z

@ricklove Thank's for sharing. If you have 3mb files, the 700k definitely isn't why my stuff is so slow. I'm going to try without uglifying and see if there is a difference.

BTW, are you using the standard consumption plan?

ricklove · 2017-03-01T15:08:22Z

@securityvoid Yes, I'm using the consumption plan.

For your case, I suspect there is something in your deployments folder that has many files that is getting loaded during cold start.

Also, I just extended my cold-start timing tests so I can directly compare a JavaScript run with no dependencies against a 3MB function that uses azure-storage.

My test runs every 12 minutes alternating which function it calls (from an external function app). So it's a good test from the perspective of a client - measuring the actual turn around time from initial call to client receipt of the response (the monitor page of the function itself only shows ~2 seconds, when this test shows ~10 seconds).

I'll give a report tomorrow.

securityvoid · 2017-03-01T15:25:15Z

@ricklove I would think if it was something generically in my deployments folder then the function would not behave different whether I had that require or not.

This times out every time:

'use strict';
require('D:\\home\\site\\repository\\azure.deps.js');

module.exports = function(context){
    context.res =  {
        status: 200,
        headers: {
            "Content-Type" : "application/json"
        },
        body : { "status" : "alive"}
    }
    context.done();
}

This runs every time:

'use strict';

module.exports = function(context){
    context.res =  {
        status: 200,
        headers: {
            "Content-Type" : "application/json"
        },
        body : { "status" : "alive"}
    }
    context.done();
}

It doesn't make any sense for me to have issues with the 700k file when you have a 3M file, but it seems like the file itself is somehow the issue. I suppose the alternative is that the file name is just the last little bit that pushes it over the 5m mark.

In any case, removing uglify did not really help.

securityvoid · 2017-03-01T16:20:56Z

@ricklove Arg! I think I just noticed my own typo.

If you notice in the file path of the require is "repository". That is the non-webpacked version of the file. The webpacked version is in the wwwroot.

Since the file existed, it didn't error out, and in fact would work... given enough time.

I'm redeploying now to confirm that solves things.

phillipharding · 2017-06-08T16:38:56Z

@davidebbo @1N50MN14 I've made the Browserify technique work without the use of global...

The shared/library module is as follows...

/**
 * ./library/shared.js -->
 * export your dependencies here
 */
module.exports = {
	require: require('request'),
	requestPromise: require('request-promise'),
	common: require("./common"),
};

And the Function code file is...

/**
 * ./functionname/index.js -->
 */
var shared = require("../library/shared.min");

function functionHandler(context, req) {
	context.log('FuncImpersonate01 HTTP trigger function received a request.');
	shared.common.logRequestHeaders(context, req);

	context.res = {
		status: 200,
		headers: { "Content-Type": "application/json" },
		body: {
			Success: true,
		},
	};
	context.done();
};
module.exports = functionHandler;

The Build/Browserify commands are...

browserify --node ./library/shared.js --standalone shared -o ./library/shared.bundle.js
uglifyjs ./library/shared.bundle.js -o ./library/shared.min.js

davidebbo · 2017-06-08T16:45:40Z

@phillipharding thanks for sharing!

As an aside, note that we've added support to automatically run funcpack on Kudu git deployments. To enable it, simply set an Azure App Setting: SCM_USE_FUNCPACK=1. In theory this should just work, but it's still new, so feedback is welcome (https://github.com/Azure/azure-functions-pack is a good place for it).

1N50MN14 · 2017-06-13T22:37:24Z

@phillipharding Awesomes thanks for sharing this! it seems that browerify's --standalone did the trick!

bakatz · 2018-01-08T22:39:01Z

I've been running into this issue for some time now and it's very frustrating as I'm seeing 60 second startup times on a regular basis. I'll give azure-functions-pack a shot at some point, and I think that the azure-functions-pack tool is a decent workaround for the short term, but is there anything on the roadmap to address the root cause? (i.e. the underlying file system being prohibitively slow)

Specifically, I'm using the Slack Events API, and if Slack determines that a request to my API takes more than 5 seconds to complete (which happens regularly whenever whatever orchestrates Azure Functions decides to spin down our API), it'll automatically be retried over and over. This behavior in combination with Functions' filesystem issues means I can't really work with Slack's API without a hack that ignores "retry" messages based on a certain request header value, which isn't ideal for various reasons.

Thanks

adamdriscoll · 2018-01-16T14:57:06Z

I'm having the exact same issue with the Shopify Webhook API. It will retry if the request doesn't finish in 5 seconds. I usually get about 3-4 requests before one returns in time for it to stop trying.

murraybauer · 2018-01-23T01:53:47Z

Microsoft needs to fix this ASAP. Node.js runs fine on AWS Lamba. Disappointing when you see articles like this: seven reasons why so many Node.js devs are excited about Microsoft Azure

tony-gutierrez · 2018-02-07T22:57:29Z

Forget the webpack flag, a huge number of modules wont compress properly with it. So basically no solution still, and a pretty useless product if you want serverless node.

bakatz · 2018-02-08T00:17:48Z

@tony-gutierrez agree - this is something I would have expected Azure Functions to launch with, not something that still isn't fixed a full year after the original issue was filed. 🤔

tony-gutierrez · 2018-02-08T00:35:04Z

We started with functions trying to use the consumption plan. Not even close to working. Then we ran the app service plan for a while, and at least it worked. Then we realized we basically had no benefits of a serverless host, and all the horrid things function apps incur (like no access to console.log, super slow storage, not easy to dev locally, etc.) Did a quick modify to a normal node app and deployed to a VM and app service. The perf increase is pretty major on both.

bhosale-ajay · 2018-02-15T03:10:51Z

We used webpack to bundle the npm module dependencies into a single file to reduce the cold time, refer this medium article

davidebbo · 2018-02-15T04:02:54Z

See also the newly announced Run-From-Zip feature, which also helps with cold start.

davidebbo · 2018-06-13T04:07:11Z

Closing this issue as Run-From-Zip is working well enough for most people (and some others are using funcpack). It doesn't mean that there can't be more optimizations, but we can always track specific work separately, as this issue is too open-ended.

mattchenderson assigned davidebbo Apr 25, 2016

mattchenderson added this to the Current milestone Apr 25, 2016

christopheranderson added high-pri labels May 17, 2016

fabiocav modified the milestones: Current, Next Jul 19, 2016

davidebbo mentioned this issue Sep 6, 2016

Understanding shared hosts for multiple functions in a function app #529

Closed

murraybauer mentioned this issue Mar 12, 2017

JavaScript Browser Library File Size Too Large (1.3MB+) Azure/azure-storage-node#281

Closed

mamaso mentioned this issue Mar 23, 2017

request module takes too long to require #1340

Closed

ggailey777 mentioned this issue Apr 5, 2017

Cold Start and Warm Start on Consumption Plan Azure/Azure-Functions#131

Closed

paulbatum mentioned this issue Jun 28, 2017

Add in documentation that Node.js global vars may be kept between deployments Azure/Azure-Functions#368

Closed

tomflenniken mentioned this issue Aug 23, 2017

remove need for dynamic access to package.json to obtain lib version (util.js) AzureAD/azure-activedirectory-library-for-nodejs#155

Closed

yvele mentioned this issue Oct 29, 2017

Implementing Azure Function with Express.js brianzinn/create-react-app-typescript-babylonjs#1

Closed

paulbatum mentioned this issue Feb 17, 2018

Binding redirect support #992

Closed

davidebbo closed this as completed Jun 13, 2018

ghost locked as resolved and limited conversation to collaborators Jan 2, 2020

Cold start of Azure Function with large npm module trees is slow in Consumption plan #298

Cold start of Azure Function with large npm module trees is slow in Consumption plan #298

Comments

arafato commented Apr 25, 2016 • edited Loading

davidebbo commented Apr 25, 2016

FlavorScape commented Sep 5, 2016 • edited Loading

arafato commented Sep 5, 2016

arafato commented Sep 5, 2016 • edited Loading

vance commented Sep 5, 2016 • edited Loading

vance commented Sep 5, 2016 • edited Loading

davidebbo commented Sep 6, 2016

peterjuras commented Sep 6, 2016

davidebbo commented Sep 6, 2016

vance commented Sep 8, 2016

davidebbo commented Sep 8, 2016

vance commented Sep 9, 2016

davidebbo commented Sep 9, 2016

vance commented Sep 9, 2016 • edited Loading

vance commented Sep 9, 2016 • edited Loading

dennis-yemelyanov commented Sep 10, 2016 • edited Loading

vance commented Sep 10, 2016 • edited Loading

dennis-yemelyanov commented Sep 11, 2016

vance commented Sep 12, 2016

arafato commented Sep 13, 2016

vance commented Sep 27, 2016

securityvoid commented Mar 1, 2017

1N50MN14 commented Mar 1, 2017 • edited Loading

securityvoid commented Mar 1, 2017

securityvoid commented Mar 1, 2017 • edited Loading

ricklove commented Mar 1, 2017 • edited Loading

securityvoid commented Mar 1, 2017

ricklove commented Mar 1, 2017 • edited Loading

securityvoid commented Mar 1, 2017

securityvoid commented Mar 1, 2017

phillipharding commented Jun 8, 2017

davidebbo commented Jun 8, 2017

1N50MN14 commented Jun 13, 2017

bakatz commented Jan 8, 2018

adamdriscoll commented Jan 16, 2018

murraybauer commented Jan 23, 2018

tony-gutierrez commented Feb 7, 2018

bakatz commented Feb 8, 2018

tony-gutierrez commented Feb 8, 2018 • edited Loading

bhosale-ajay commented Feb 15, 2018

davidebbo commented Feb 15, 2018

davidebbo commented Jun 13, 2018 • edited Loading

arafato commented Apr 25, 2016 •

edited

Loading

FlavorScape commented Sep 5, 2016 •

edited

Loading

arafato commented Sep 5, 2016 •

edited

Loading

vance commented Sep 5, 2016 •

edited

Loading

vance commented Sep 5, 2016 •

edited

Loading

vance commented Sep 9, 2016 •

edited

Loading

vance commented Sep 9, 2016 •

edited

Loading

dennis-yemelyanov commented Sep 10, 2016 •

edited

Loading

vance commented Sep 10, 2016 •

edited

Loading

1N50MN14 commented Mar 1, 2017 •

edited

Loading

securityvoid commented Mar 1, 2017 •

edited

Loading

ricklove commented Mar 1, 2017 •

edited

Loading

ricklove commented Mar 1, 2017 •

edited

Loading

tony-gutierrez commented Feb 8, 2018 •

edited

Loading

davidebbo commented Jun 13, 2018 •

edited

Loading