Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IgnoreHTTPSErrors only sometimes works? #2485

Closed
byt3bl33d3r opened this issue Jun 23, 2022 · 4 comments
Closed

IgnoreHTTPSErrors only sometimes works? #2485

byt3bl33d3r opened this issue Jun 23, 2022 · 4 comments

Comments

@byt3bl33d3r
Copy link

byt3bl33d3r commented Jun 23, 2022

Describe the bug

It looks like sometimes browserless doesn't launch chrome with the ignoreHTTPSErrors flag even though it's specified in the docker-compose.yml file and/or using the URL parameter.

To Reproduce

In my docker-compose.yml:

  chrome:
    image: browserless/chrome:latest
    security_opt:
      - "seccomp=../chromium.json"
    deploy:
      replicas: 2
    expose:
      - 3000
    environment:
     # https://docs.browserless.io/docs/docker.html
     - DEFAULT_IGNORE_HTTPS_ERRORS=true
     - ENABLE_DEBUGGER=false
     - DEFAULT_IGNORE_DEFAULT_ARGS=["--no-sandbox"]
     - DEFAULT_STEALTH=true
     - FUNCTION_ENABLE_INCOGNITO_MODE=true
     - KEEP_ALIVE=true
     - PREBOOT_CHROME=true
     - EXIT_ON_HEALTH_FAILURE=true

Python script using playwright:

import asyncio
import logging
from urllib.parse import urlparse
from playwright.async_api import async_playwright
from playwright._impl._api_types import Error as PlaywrightError

URLS = [
    'https://bot.sannysoft.com/',
    'https://arh.antoinevastel.com/bots/areyouheadless',
    'https://200.70.58.134:8443/',
    'https://200.55.247.6:3000/',
    'https://190.227.183.117:8443/',
    'https://190.221.139.82:8443/',
    'https://181.30.162.226:4433/'
]

queue = asyncio.Queue()

log = logging.getLogger('play')
log.addHandler(logging.StreamHandler())
log.setLevel(logging.DEBUG)

async def screenshot(url):
    async with async_playwright() as p:
        browser = await p.chromium.connect_over_cdp('ws://chrome:3000/?ignoreHTTPSErrors=true&ignoreDefaultArgs=--no-sandbox')
        try:
            log.debug(url)
            parsed_url = urlparse(url) 
            page = await browser.new_page()

            try:
                await page.goto(url, wait_until="networkidle")
            except PlaywrightError as e:
                if not 'CERT' in str(e):
                    raise

                log.debug('Caught certificate error')
                await page.goto(url, wait_until="networkidle")

            await page.screenshot(path=f'./play/screenshot_{parsed_url.scheme}_{parsed_url.netloc}_{parsed_url.port}.png', full_page=True)
        finally:
            await browser.close()

async def producer():
    for url in URLS:
        queue.put_nowait(url)

async def consumer():
    url = await queue.get()
    try:
        await screenshot(url)
    except Exception as e:
        log.error(f'{e}')
    finally:
        queue.task_done()

async def main():
    n_threads = 8
    asyncio.create_task(producer())

    while queue.qsize() == 0:
        await asyncio.sleep(0.1)

    while queue.qsize() > 0:
        tasks = [
            consumer()
            for _ in range(
                n_threads if queue.qsize() > n_threads else queue.qsize()
            )
        ]
        await asyncio.gather(*tasks)

asyncio.run(main())

Expected behavior
Browserless should be instructing chrome to ignore HTTPS certificate errors if DEFAULT_IGNORE_HTTPS_ERRORS=true is specified through docker or via the connection URL .

Screenshots
First run of the Python script everything is fine:
image

Third time running the script, certificate error is thrown when calling page.goto():
image

Additional context

Currently working around this by catching any error with CERT in the string and calling page.goto() again (line 35-40 in the python script, no clue why this works). Obviously this isn't ideal, chrome should be ignoring cert errors everytime it gets started if the correct knobs are turned.

@dgtlmoon
Copy link

dgtlmoon commented Dec 16, 2022

Using the following script, I'm able to reproduce it, you can see it ran fine 3 times, then on the 4th..

Usingbrowserless/chrome:1.53-chrome-stable in two different places

  • I can reproduce it on fairly busy server
  • Cannot reproduce this on my laptop

Tested with playwright 1.28.0 and 1.27.1 - same outcome

#!/usr/bin/python3

from playwright.sync_api import sync_playwright
import playwright._impl._api_types

# pip3 install playwright
# docker run -d --name browserless --rm  -p 3000:3000  --shm-size="2g"  browserless/chrome:1.53-chrome-stable

def letsgo():
    print ("Trying...")
    with sync_playwright() as p:
        browser = p.chromium.connect_over_cdp('ws://127.0.0.1:3000/?ignoreHTTPSErrors=true&stealth=1&--disable-web-security=true', timeout=10000)

        context = browser.new_context(
            bypass_csp=True,
            service_workers='block',
            accept_downloads=False
        )

        page = context.new_page()
        page.on("console", lambda msg: print(f"Playwright console: Watch URL: {msg.type}: {msg.text} {msg.args}"))
        page.goto("https://untrusted-root.badssl.com/", wait_until='commit')
        page.wait_for_timeout(1 * 1000)
        context.close()
        browser.close()


if __name__ == '__main__':
    while True:
        letsgo()

Here

# ./test.py 
Trying...
Trying...
Trying...
Traceback (most recent call last):
  File "/root/./test.py", line 30, in <module>
    letsgo()
  File "/root/./test.py", line 22, in letsgo
    page.goto("https://untrusted-root.badssl.com/", wait_until='commit')
  File "/usr/local/lib/python3.10/dist-packages/playwright/sync_api/_generated.py", line 8200, in goto
    self._sync(
  File "/usr/local/lib/python3.10/dist-packages/playwright/_impl/_sync_base.py", line 104, in _sync
    return task.result()
  File "/usr/local/lib/python3.10/dist-packages/playwright/_impl/_page.py", line 491, in goto
    return await self._main_frame.goto(**locals_to_params(locals()))
  File "/usr/local/lib/python3.10/dist-packages/playwright/_impl/_frame.py", line 147, in goto
    await self._channel.send("goto", locals_to_params(locals()))
  File "/usr/local/lib/python3.10/dist-packages/playwright/_impl/_connection.py", line 44, in send
    return await self._connection.wrap_api_call(
  File "/usr/local/lib/python3.10/dist-packages/playwright/_impl/_connection.py", line 419, in wrap_api_call
    return await cb()
  File "/usr/local/lib/python3.10/dist-packages/playwright/_impl/_connection.py", line 79, in inner_send
    result = next(iter(done)).result()
playwright._impl._api_types.Error: net::ERR_CERT_AUTHORITY_INVALID at https://untrusted-root.badssl.com/
=========================== logs ===========================
navigating to "https://untrusted-root.badssl.com/", waiting until "commit"
============================================================

@andyMrtnzP
Copy link
Collaborator

Sorry for the delay on these few. I ran some tests with both browserless and Puppeteer.launch(), and the results are the same: it works for expired/revoked/MitM SSLs (tested from https://badssl.com/), but not for SSL Protocol Errors. It seems as if Chromium classifies SSL errors as something different from HTTPS errors. You could try creating a custom function to force loading the page over HTTP in case HTTPS fails:

const newPage = async (browser, url) => {
  const page = await browser.newPage();
  return page.goto(url).catch((err) => {
    if (err.message.includes("ERR_SSL_PROTOCOL_ERRO")) {
      const urlObj = new URL(url);
      urlObj.protocol = "http";
      return page.goto(urlObj.href);
    }
    throw err;
  });
};

const page = await newPage(browser, "https://200.55.247.6:3000/");

@andyMrtnzP
Copy link
Collaborator

Closing this one for lack of activity. Let us know if we should reopen it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants