New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Puppeteer slow execution on Cloud Functions #3120

Open
lpellegr opened this Issue Aug 22, 2018 · 47 comments

Comments

Projects
None yet
@lpellegr
Copy link

lpellegr commented Aug 22, 2018

I am experimenting Puppeteer on Cloud Functions.

After a few tests, I noticed that taking a page screenshot of https://google.com takes about 5 seconds on average when deployed on Google Cloud Functions infrastructure, while the same function tested locally (using firebase serve) takes only 2 seconds.

At first sight, I was thinking about a classical cold start issue. Unfortunately, after several consecutive calls, the results remain the same.

Is Puppeteer (transitively Chrome headless) so CPU-intensive that the best '2GB' Cloud Functions class is not powerful enough to achieve the same performance as a middle-class desktop?

Could something else explain the results I am getting? Are there any options that could help to get an execution time that is close to the local test?

Here is the code I use:

import * as functions from 'firebase-functions';
import * as puppeteer from 'puppeteer';

export const capture =
    functions.runWith({memory: '2GB', timeoutSeconds: 60})
        .https.onRequest(async (req, res) => {

    const browser = await puppeteer.launch({
        args: ['--no-sandbox']
    });

    const url = req.query.url;

    if (!url) {
        res.status(400).send(
            'Please provide a URL. Example: ?url=https://example.com');
    }

    try {
        const page = await browser.newPage();
        await page.goto(url, {waitUntil: 'networkidle2'});
        const buffer = await page.screenshot({fullPage: true});
        await browser.close();
        res.type('image/png').send(buffer);
    } catch (e) {
        await browser.close();
        res.status(500).send(e.toString());
    }
});

Deployed with Firebase Functions using NodeJS 8.

@lpellegr

This comment has been minimized.

Copy link

lpellegr commented Aug 22, 2018

I have added some probe to measure operations time with console.time.

Here are the results for a local invocation (served by firebase serve):

info: User function triggered, starting execution
info: puppeteer-launch: 87.526ms
info: puppeteer-newpage: 16.353ms
info: puppeteer-page-goto: 1646.293ms
info: puppeteer-page-screenshot: 82.034ms
info: send-buffer: 0.282ms
info: Execution took 1835 ms, user function completed successfully
info: puppeteer-close: 5.214ms

The same for an invocation on Cloud Functions:

Function execution started
puppeteer-launch: 868.091ms
puppeteer-newpage: 1113.722ms
puppeteer-page-goto: 3079.503ms
puppeteer-page-screenshot: 353.134ms
Function execution took 5427 ms, finished with status code: 200
puppeteer-close: 61.146ms
send-buffer: 63.057ms

if I compare both:

  • puppeteer-launch is 10 times slower on Cloud Functions.
  • puppeteer-newpage is 70 times slower!
  • puppeteer-page-goto takes almost twice more.
  • puppeteer-page-screenshot is 4 times slower on Cloud Functions.

I can understand why the launch is slower on Cloud Functions, even after multiple runs since the hardware is quite different from a middle-class desktop computer. However, what about time differences for newPage and goto?

@lpellegr lpellegr changed the title Puppeteer slow execution on Cloud Function Puppeteer slow execution on Cloud Functions Aug 22, 2018

@lpellegr

This comment has been minimized.

Copy link

lpellegr commented Aug 22, 2018

@ebidel I saw you have written some experiments for Puppeter on Cloud Functions recently. Did you experience the same behaviour? do you have an idea about what could explain such a difference?

I noticed your nice try Puppeteer example deployed using a custom Docker environment does not suffer from this issue. Taking a screenshot requires about 2 seconds only, as for my local environment.

@eknkc

This comment has been minimized.

Copy link

eknkc commented Aug 22, 2018

I have similar results on GCF. Most of the slowdown seems to happen on screenshot and pdf calls for me. Similar code with modified Chrome for lambda environment runs fine on AWS Lambda with same resources so it's not related to memory.

BTW, things I tried;

  • setting pipe: true - no effect
  • disabling shm - no effect
  • using the base64 encoded output directly (maybe decoding is slow somethow?) - no effect
  • different regions - nah
  • older puppeteer versions - nope

Maybe GCF CPU allocation is this bad. That would require benchmarking other stuff.

@lpellegr

This comment has been minimized.

Copy link

lpellegr commented Aug 22, 2018

@eknkc Thanks for sharing your experiments.

Here are the options I tried too. None are helping:

const browser = await puppeteer.launch({
            headless: true,
            args: [
                '--disable-gpu',
                '--disable-setuid-sandbox',
                '--no-sandbox',
                '--proxy-server="direct://"',
                '--proxy-bypass-list=*'
            ]
        });

As a quick test, I switched the function memory allocation to 1GB from 2GB. Based on the pricing documentation, this moves the CPU allocation to 1.4 GHz from 2.4 GHz.

Using 1GB function, taking a simple screenshot on Cloud Functions takes about 8s! The time increase seems to be a direct function of the CPU allocation :x

Maybe there is a magic option to get better timing and have Puppeteer really usable on production with Cloud Functions?

@ebidel

This comment has been minimized.

Copy link
Member

ebidel commented Aug 22, 2018

Thanks for the report. I've passed this info off to the Cloud team since it's really their bug.

There's a known bug with GCF atm where the first few requests always hit cold starts. That could be causing a lot of the slowdown. But generally, GCF does not have the same performance characteristics that something like App Engine Standard or Flex have (my try puppeteer demo). Since you can only change the memory class, that also limits headless Chrome.

Another optimization is to launch chrome once and reuse it across requests. See the snippet from the blog post: https://cloud.google.com/blog/products/gcp/introducing-headless-chrome-support-in-cloud-functions-and-app-engine

@lpellegr

This comment has been minimized.

Copy link

lpellegr commented Aug 22, 2018

@ebidel Thanks. Is there a public link for the issue so that I can track the progress/discussion?

@ebidel

This comment has been minimized.

Copy link
Member

ebidel commented Aug 22, 2018

Unfortunately, not one I'm aware of. Will post updates here when I hear something.

@lpellegr

This comment has been minimized.

Copy link

lpellegr commented Aug 23, 2018

OK. Thanks :)

@ebidel

This comment has been minimized.

Copy link
Member

ebidel commented Aug 23, 2018

Currently, there's a read-only filesystem is place that's hurting the performance based on our tests. The cloud team is working on optimizations to make things faster.

Another thing to try is to bundle your file so large loading deps are reduced e.g. require('puppteeter') gets inlined.

@eknkc

This comment has been minimized.

Copy link

eknkc commented Aug 23, 2018

Thanks @ebidel for the investigation.

Speaking for my case though, the performance issue is not related to the startup but rather happen during runtime so inlining would not change that I assume?

It seems like the chrome instance that is already running struggles with large viewports or simply capturing the page. That operation happens to use a lot of shared memory which might be causing the issue.

Anyway, hope we can have it resolved. Thanks again.

@ebidel

This comment has been minimized.

Copy link
Member

ebidel commented Aug 23, 2018

That's been my experience as well.

Capturing full page screenshots, on large viewports, at DPR > 1 is intensive. It appears to be especially bad on Linux: #736

@wiliam-paradox

This comment has been minimized.

Copy link

wiliam-paradox commented Aug 23, 2018

This combination improves a little the speed:

    const browser = await puppeteer.launch({args: [
        '--disable-gpu',
        '--disable-dev-shm-usage',
        '--disable-setuid-sandbox',
        '--no-first-run',
        '--no-sandbox',
        '--no-zygote',
        '--single-process', // <- this one doesn't works in Windows
    ]});

I'm getting loading times of 3 seconds in local and 13 seconds in GCF.

@bogacg

This comment has been minimized.

Copy link

bogacg commented Sep 5, 2018

I guess some improvements are made, I don't see long waiting times. I did use @wiliam-paradox options though.

@samginn

This comment has been minimized.

Copy link

samginn commented Sep 12, 2018

Experiencing this slowness as well. Anyone have suggestions on what to do to boost the speed in GCF? Local runs under 500ms, while when deployed to GCF takes 8-12 seconds.

@Kikobeats

This comment has been minimized.

Copy link
Contributor

Kikobeats commented Sep 12, 2018

I'm experiencing the same but at AWS Lambda, where requests are reaching the timeout while the same requests from my local are fine and under expected time.

@cirdes

This comment has been minimized.

Copy link

cirdes commented Sep 13, 2018

Are you guys running puppeteer in HEADFUL mode on Cloud Functions? Running in headless mode is working fine but I need to run headful to be able to download PDF files. =/

Error: function execution failed. Details:
Failed to launch chrome!
[12:12:0913/012114.601900:ERROR:browser_main_loop.cc(596)] Failed to put Xlib into threaded mode.

(chrome:12): Gtk-WARNING **: 01:21:14.702: cannot open display: 
@cirdes

This comment has been minimized.

Copy link

cirdes commented Sep 13, 2018

Thanks for the report. I've passed this info off to the Cloud team since it's really their bug.

There's a known bug with GCF atm where the first few requests always hit cold starts. That could be causing a lot of the slowdown. But generally, GCF does not have the same performance characteristics that something like App Engine Standard or Flex have (my try puppeteer demo). Since you can only change the memory class, that also limits headless Chrome.

Another optimization is to launch chrome once and reuse it across requests. See the snippet from the blog post: https://cloud.google.com/blog/products/gcp/introducing-headless-chrome-support-in-cloud-functions-and-app-engine

I'm trying to launch chrome just once exactly the way on snippet but I'm getting Function execution took 54 ms, finished with status: 'connection error' on the second run. Also running my tests with Jest the process doesn't exit the process. Closing and opening the browser between requests work fine.

@lpellegr

This comment has been minimized.

Copy link

lpellegr commented Sep 14, 2018

@joelgriffith The reported issue is not about Chrome startup time but the full execution time. So sad to write promotional messages without even reading the issue purpose.

@DimaFromCanada

This comment has been minimized.

Copy link

DimaFromCanada commented Sep 27, 2018

Any update on this? GCF is executing any given Puppeteer action at perhaps 25% ~ 50% of my local desktop speed.

@ebidel

This comment has been minimized.

Copy link
Member

ebidel commented Sep 27, 2018

@DimaFromCanada none that I've seen. To be clear, are you talking about total time (cold start + execution) or just running your handler code?

@DimaFromCanada

This comment has been minimized.

Copy link

DimaFromCanada commented Sep 27, 2018

@ebidel

This comment has been minimized.

Copy link
Member

ebidel commented Sep 27, 2018

Any URL you can share? I can pass that along to the Cloud team.

@exAspArk

This comment has been minimized.

Copy link

exAspArk commented Oct 9, 2018

A response time in seconds for the same code running with Puppeteer on AWS Lambdas vs GCP Functions with twice more memory:

image

The code uses one goto(), which consumes most of the time, to get some HTML / JS / CSS files from GCP Storage and one evaluate() to get the rendered DOM.

@alixaxel

This comment has been minimized.

Copy link
Contributor

alixaxel commented Oct 13, 2018

@lpellegr Very nice to see this brought up.

I've been facing the same pain for a while but always thought it would be closed as "won't fix".

I have a quite extensive puppeteer setup on AWS Lambda and I've been playing around with running puppeteer on Firebase/Google Cloud Functions for a while, even before support for Node 8.10 was announced. You can check the hack I did back then here (unmaintained).

A run a proxyfied authentication service (user logs in into my website, that in turn uses puppeteer to check if he can authenticate with the same credentials on a third-party website), where execution speed of puppeteer will directly affect the user experience. Nothing fancy like screenshots or PDF, just a login flow.

Most of my architecture lives on Firebase, so it would be very convenient for me to run everything there, puppeteer included - this would help with the spaghetti-like fan-out architecture I'm forced to adopt due to Lambda limitations. However, the performance of GCF/FCF is so inferior compared to AWS Lambda that I cannot bring myself to make the switch.

Even after support for specifying closer regions and Node 8.10 was released on FCF, a 2GB Cloud Function will still be less performant than a 1GB Lambda: ~4s vs 10+ seconds! And Lambda even has the handicap of having to decompress the chromium binary (0.7 seconds, see chrome-aws-lambda).

And from my extensive testing I can tell this is not due to cold-starts.

I suspect the problem is more related in the differences between AWS and Google in the way the CPU shares and bandwidth are allocated in proportion to the amount of RAM defined. I can't be sure obviously, but I read a blog post a few months ago (can no longer find it) with very comprehensive tests on the big three (AWS, Google, Azure) that seem to reflect this suspicion - AWS is more "generous" in allocation.

Obviously, this doesn't seem to be a problem of puppeteer itself, but since Google is trying hard to scale up it's serverless game (and still playing catch-up it seems) it would be awesome if you could nudge some colleague at Google to look into this @ebidel - my current AWS infrastructure relies on hundreds of lines of Ansible and Terraform code as well as a couple Makefiles to keep everything together.

Switching to the no-frills approach of just writing triggers for Cloud Functions and listing dependencies (amazing work on this BTW) would make my life a lot easier. If only the performance was (a lot) better...

@steren

This comment has been minimized.

Copy link
Contributor

steren commented Oct 13, 2018

Google Cloud PM here.

Part of the slowness comes from the fact that the filesystem on Cloud Functions is read only.
We noticed that Chrome tries a lot to write in different places, and failures doing so results in slowness.
We confirm that by enabling a writable filesystem, performances improve. However, at this time, we are not planning to enable a writable filesystem on GCF apart from /tmp.

We asked the Chromium team for help to better understand how we could configure it to not try to write outside of /tmp, as of now, we are pending guidance.

@alixaxel

This comment has been minimized.

Copy link
Contributor

alixaxel commented Oct 13, 2018

@steren AWS has the same limitation, you only get a fixed 500MB on /tmp regardless of how much memory you allocate to Lambda.

On the other hand GCF/FCF is memory-mapped:

This is a local disk mount point known as a "tmpfs" volume in which data written to the volume is stored in memory. Note that it will consume memory resources provisioned for the function.

So even if GCF was running on HDDs and Lambda on SSDs, it still wouldn't explain the huge discrepancies in performance we are seeing.

@alixaxel

This comment has been minimized.

Copy link
Contributor

alixaxel commented Oct 13, 2018

@steren @ebidel

So I just cooked up the simplest possible benchmark to test only the CPU (no disk I/O or networking).

Here's what I came up with:

const sieveOfErathosthenes = require('sieve-of-eratosthenes');

console.time('sieve');
console.log(sieveOfErathosthenes(33554432).length === 2063689);
console.timeEnd('sieve');

I deployed this function on both AWS Lambda, and Firebase Cloud Functions (both using Node 8.10).

Then I serially called the Lambda/Cloud Function and noted down the times. No warm-up was done.

FCF 2GB AWS 2GB FCF 1GB AWS 1GB
1 5089 2519 6402 4036
2 5089 2693 ERROR 4278
3 5089 2753 4283 4525
4 4236 2554 ERROR 4430
5 3954 2671 4379 4417
6 ERROR 2717 ERROR 4409
7 3931 2726 4331 4447
8 ERROR 2725 ERROR 4393
9 4132 2714 4015 4456
10 ERROR 2723 ERROR 4405
11 3771 2730 4123 4389
12 ERROR 2722 ERROR 4431
13 4235 2725 4397 4445
14 4051 2732 ERROR 4418
15 4427 2707 4681 4452
16 4006 2715 ERROR 4442
17 ERROR 2732 4422 4289
18 3685 2725 ERROR 4401
19 ERROR 2718 4585 4379
20 3890 2719 ERROR 4402
21 ERROR 2797 4220 4415
22 4073 2795 ERROR 4452
MEDIAN 4073 2722.5 4379 4416
AVERAGE 4243.867 2709.636 4530.727 4395.955
STDEVP 458.620 61.097 618.645 93.646
STDEVPA 2012.616 61.097 2307.213 93.646

The 1GB Lambda is on-par with the 2GB FCF - although with much more consistent timings and no errors.

Weirdly enough, the errors reported on 1GB FCF were:

Error: memory limit exceeded. Function invocation was interrupted.

Not sure why that happens intermittently for a deterministic function. As for the 2GB FCF, the errors were:

finished with status: 'connection error'

Similar results are reported on papers such as (there are quite a few!):

  • Benchmarking Heterogeneous Cloud Functions
  • Performance Evaluation of Parallel Cloud Functions

PS: Sorry if this is unrelated to PPTR itself, I'm just trying to suggest that CPU performance could be an important factor that explains why puppeteer performs so badly under GCF/FCF.

@lpellegr

This comment has been minimized.

Copy link

lpellegr commented Oct 13, 2018

@alixaxel For sure CPU plays an important part. However, as Google team members said, CPU is not the cause of the issue here. If you look at /proc/cpuinfo for a 2GB function/lambda allocated with Firebase function/Amazon you will see that Google allocates 4 CPUs whereas Amazon uses 2 only. Even if the frequency of the CPUs is a bit higher on Amazon it does not explain the time difference. I would even expect better timing with GCP since more CPUs allow better parallelism which seems highly used by Chrome (correct me if I am wrong).

To convince me I also made a test some weeks ago by creating a Docker image with a read/write filesystem and the puppeteer NPM dependency pre-installed, all, running in GCP kubernetes with nodes having a similar CPU allocation as a 2GB function. The results show acceptable times.

Hope we can get a guidance soon about how to configure chrome headless to write to /tmp only with Cloud functions.

Another solution seems to get access to the alpha container as a service feature on Cloud functions. In that case a simple solution could be to use a Docker image similar to the one I used with Kubernetes. Currently, it's my dream. Hope it can become a reality.

@steren

This comment has been minimized.

Copy link
Contributor

steren commented Oct 13, 2018

As I mentioned, we observed that the slowness with Headless Chrome are different from traditional CPU/memory benchmarks.

I would be glad to invite you to the Alpha of serverless containers on the Cloud Function infrastructure so that you could perform more testing. Please fill in this form http://g.co/serverlesscontainers and mention "Headless Chrome" in the "use case" field. I should be able to invite you next week.

@alixaxel

This comment has been minimized.

Copy link
Contributor

alixaxel commented Oct 13, 2018

@steren Thanks a lot for the invite, looking forward to it.

@lpellegr I was referring to CPU shares, as in --cpu-shares Docker flag:

Set this flag to a value greater or less than the default of 1024 to increase or reduce the container’s weight, and give it access to a greater or lesser proportion of the host machine’s CPU cycles. This is only enforced when CPU cycles are constrained. When plenty of CPU cycles are available, all containers use as much CPU as they need. In that way, this is a soft limit. --cpu-shares does not prevent containers from being scheduled in swarm mode. It prioritizes container CPU resources for the available CPU cycles. It does not guarantee or reserve any specific CPU access.

Obviously it's just a guess since FaaS are a blackbox to us mere mortals. But given that the read-only FS is equally present on Lambda, it seems like a weird source to justify the for the discrepancy we see.

@alixaxel

This comment has been minimized.

Copy link
Contributor

alixaxel commented Oct 13, 2018

@ebidel I was under the impression that if used with --disable-dev-shm-usage and userDataDir was pointing to /tmp (as PPTR does), then Chromium wouldn't create any additional files outside of /tmp?

@ebidel

This comment has been minimized.

Copy link
Member

ebidel commented Oct 16, 2018

That I'm not sure of. I would assume since the chromium distro/deps that's installed on the GCF machines lives outside of /tmp, Chrome is probably writing other stuff there unrelated to user data.
So just because you write user data to /tmp doesn't mean that's the only thing that's being stored there.

It would be nice if someone could confirm where writes are happening.

@alixaxel

This comment has been minimized.

Copy link
Contributor

alixaxel commented Oct 16, 2018

@ebidel So if I would run the standalone headless binary I compile for Lambda (it's compatible with GCF) under /tmp should I expect that it only tries to write to /tmp and therefore shouldn't have the IO performance penalty?

@alixaxel

This comment has been minimized.

Copy link
Contributor

alixaxel commented Oct 16, 2018

Or actually... I can just install puppeteer on my local and watch which files it touches with strace or something right?

@ebidel

This comment has been minimized.

Copy link
Member

ebidel commented Oct 17, 2018

When you use puppeteer on GCF it installs a local version of chromium into ./node-modules. So probably touching files there. @steren would that fall under the I/O perf issues?

I would try this:

  1. Test on Linux (GCF's env). You could try running things inside docker container to replicate. Example Dockerfile here.
  2. Call puppeteer.launch({userDataDir: '/tmp', args: ['--disable-dev-shm-usage']}). Hopefully that writes anything and everything to /tmp.
  3. Setup a file watcher to see what directories are being written to...
@steren

This comment has been minimized.

Copy link
Contributor

steren commented Oct 17, 2018

Executing the following on GCF:

exports.dir = (req, res) => {
  res.status(200).send(__dirname);
};

Gives me: /srv

So puppeteer and its downloaded Chromium lives in the /srv/node_modules. And this is not a writable location.

+1 to investigate exactly where is Chrome trying to write.

@TimotheeJeannin

This comment has been minimized.

Copy link

TimotheeJeannin commented Oct 29, 2018

Any news on this ? It would be nice if GCF could run puppeteer correctly. I tried to launch chrome with the userDataDir: '/tmp' option but it doesn't seem to have any effect on performance.

@TimotheeJeannin

This comment has been minimized.

Copy link

TimotheeJeannin commented Oct 29, 2018

I tried to copy the whole chrome folder to the /tmp directory and launch chrome from there.

const fs = require('fs-extra');

if (!fs.existsSync('/tmp/chrome-linux')) {
    let chromeFolder = fs.readdirSync(__dirname + '/node_modules/puppeteer/.local-chromium')[0];
    fs.copySync(
        __dirname + '/node_modules/puppeteer/.local-chromium/' + chromeFolder + '/chrome-linux',
        '/tmp/chrome-linux')
}

if (!browser) {
    console.warn('Launching browser.');
    browser = await puppeteer.launch({
        userDataDir: '/tmp/testing',
        executablePath: '/tmp/chrome-linux/chrome',
        args: ['--no-sandbox']
    });
}

From the performance mesurments I made, this changes nothing, GCF is still at least two times slower than AWS Lambda.

@alixaxel

This comment has been minimized.

Copy link
Contributor

alixaxel commented Oct 29, 2018

@TimotheeJeannin I also ran the Chromium I compile for AWS with the exact same approach / paths and everything. And, all things being equal, GCF is way slower. I don't know why Google devs are trying to dismiss this issue as a disk I/O issue, if that was the case the Sieve of Eratosthenes I shared before would have no justification for being so slow as well.

@ncruces

This comment has been minimized.

Copy link

ncruces commented Nov 19, 2018

@lpellegr /proc/cpuinfo is always shows 4 CPUs on GCF, and os.cpus() always shows the 8 hyperthreads, regarless of "instance size".

A bit annoying actually since some apps will use this to decide how many threads they'll create for a CPU intensive job, and a 128 MB function for sure won't be allowed to tax all 8 of the host's hyperthreads.

In situations where I launch a CPU/memory intensive sub-process, I've got to a point where I can't even kill the sub-process. Then my function eventually times out, container is "suspended", and when another request comes in, container is "reused", old process is still running, and I still can't kill it.

@bahattincinic

This comment has been minimized.

Copy link

bahattincinic commented Nov 30, 2018

@ebidel Any updates on this problem?

We want to move our project from AWS Lambda to Google Cloud functions. Actually, we completed migration. But we are waiting for this issue.

@TimotheeJeannin

This comment has been minimized.

Copy link

TimotheeJeannin commented Nov 30, 2018

Same here. We wanted to migrate from AWS Lambda to GCF because the underlying linux distribution used by AWS Lambda is a pain to work with. We did quite extensive stress tests on GCF and we experienced extremly slow functions compared to AWS Lambda. It's so much slower than it's currently not possible for us to migrate even if we would prefer to work with the underlying linux distribution GCF uses.

@aslushnikov aslushnikov added the chromium label Dec 6, 2018

@baratrion

This comment has been minimized.

Copy link

baratrion commented Dec 13, 2018

@steren I assume you were the one who marketed this back in August with this blog post: https://cloud.google.com/blog/products/gcp/introducing-headless-chrome-support-in-cloud-functions-and-app-engine

Isn't it a bit awkward to push a product to masses without actually testing performance aspect of it, especially in a product (Cloud Functions) that people would like to use it at scale?

@steren

This comment has been minimized.

Copy link
Contributor

steren commented Dec 15, 2018

Many customers are successfully using puppeteer on Cloud Functions or App Engine.

We tested headless Chrome performances and were aware of them before publishing the blog post. To sum up: let's say that this is part of the current tradeoff of using our pay-for-usage fast-scaling managed compute products (Cloud Functions and App Engine standard environment)

If performance is what you are optimizing for, Google Cloud Platform has many other compute options that allow you to run puppeteer with better performances: take a look at the App Engine flexible environment, Google Kubernetes Engine or just a Compute Engine VM

@alixaxel

This comment has been minimized.

Copy link
Contributor

alixaxel commented Dec 30, 2018

I ran some benchmarks again with chrome-aws-lambda and I noticed some improvements on Firebase.

The average timings I got with multiple URLs and warmed up functions were:

  • puppeteer (2684 ms on Firebase 1GB)
  • chrome-aws-lambda (1675 ms on Firebase 1GB)
  • chrome-aws-lambda (1154 ms on AWS Lambda 1GB)

With chrome-aws-lambda, FCFs are "only" 45% slower than Lambdas (compared to 130%+ when using puppeteer). In light of this, I've added support for GCFs to my package, if anyone wants to try it out:

npm i chrome-aws-lambda ilotorb puppeteer-core

Sample code (you need Node 8 runtime for it):

const chromium = require('chrome-aws-lambda');
const puppeteer = require('puppeteer-core');
const functions = require('firebase-functions');

const options = {
  memory: '2GB',
  timeoutSeconds: 300,
};

exports.chrome = functions.runWith(options).https.onRequest(async (request, response) => {
  let result = null;
  let browser = null;

  try {
    browser = await puppeteer.launch({
      args: chromium.args,
      defaultViewport: chromium.defaultViewport,
      executablePath: await chromium.executablePath,
      headless: chromium.headless,
    });

    let page = await browser.newPage();

    await page.goto(request.query.url || 'https://example.com');

    result = await page.title();
  } catch (error) {
    throw error;
  } finally {
    if (browser !== null) {
      await browser.close();
    }
  }

  return response.send(result);
});
@jineshshah36

This comment has been minimized.

Copy link

jineshshah36 commented Jan 18, 2019

I can also confirm that using chrome-aws-lambda with puppeteer-core on firebase functions yields a significant speedup

@kylewill

This comment has been minimized.

Copy link

kylewill commented Jan 18, 2019

I can confirm significant improvements in Firebase Functions / GCF. Enough so that I’m now using it in several mission critical production workflows for several weeks now.

@steren if helpful for future launches, I’m grateful for the announcement with the known issues and the follow up improvements. This allowed for me to build based on the documentation and deploy based on the project requirements as improvements have been made (still some to go :)

I don’t think you need to defend the state at launch, especially given the open approach the team has taken to acknowledgement and improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment