check `isClosed` before issuing retries for a failed job #325

deldrid1 · 2020-08-14T17:25:46Z

These changes prevent puppeteer-cluster from thinking that the browser has closed with Protocol error (Runtime.callFunctionOn): Target closed. or Navigation failed because browser has disconnected! and issuing retries when cluster.close() is called. Since this method should only be used after cluster.idle() I think this is much more reasonable behavior (at least it fits my use case below!).

The added test case demonstrates this well. Before the change the new test case fails in this manor:

$ npx jest -t 'retries stop after close called'                                                                                                                                                                                                                                                                                                   
 FAIL  test/Cluster.test.ts (45.014s)
  ● options › concurrency: 1 › retries stop after close called

    expect.assertions(1)

    Expected one assertion to be called but received four assertion calls.

      161 | 
      162 |             test('retries stop after close called', async () => {
    > 163 |                 expect.assertions(1); // 3 retries will be 4 times called (just like test above), unless we actually close the cluster and cancel retries
          |                        ^
      164 | 
      165 |                 const cluster = await Cluster.launch({
      166 |                     concurrency,

      at test/Cluster.test.ts:163:24
      at test/Cluster.test.ts:8:71
      at Object.<anonymous>.__awaiter (test/Cluster.test.ts:4:12)
      at Object.test (test/Cluster.test.ts:162:64)

  ● options › concurrency: 2 › retries stop after close called

    expect.assertions(1)

    Expected one assertion to be called but received four assertion calls.

      161 | 
      162 |             test('retries stop after close called', async () => {
    > 163 |                 expect.assertions(1); // 3 retries will be 4 times called (just like test above), unless we actually close the cluster and cancel retries
          |                        ^
      164 | 
      165 |                 const cluster = await Cluster.launch({
      166 |                     concurrency,

      at test/Cluster.test.ts:163:24
      at test/Cluster.test.ts:8:71
      at Object.<anonymous>.__awaiter (test/Cluster.test.ts:4:12)
      at Object.test (test/Cluster.test.ts:162:64)

  ● options › concurrency: 3 › retries stop after close called

    expect.assertions(1)

    Expected one assertion to be called but received four assertion calls.

      161 | 
      162 |             test('retries stop after close called', async () => {
    > 163 |                 expect.assertions(1); // 3 retries will be 4 times called (just like test above), unless we actually close the cluster and cancel retries
          |                        ^
      164 | 
      165 |                 const cluster = await Cluster.launch({
      166 |                     concurrency,

      at test/Cluster.test.ts:163:24
      at test/Cluster.test.ts:8:71
      at Object.<anonymous>.__awaiter (test/Cluster.test.ts:4:12)
      at Object.test (test/Cluster.test.ts:162:64)

Test Suites: 1 failed, 3 skipped, 1 of 4 total
Tests:       3 failed, 80 skipped, 83 total
Snapshots:   0 total
Time:        46.225s, estimated 57s
Ran all test suites with tests matching "retries stop after close called".

After the change, all tests pass!

The use case for this is aborting long running scrape jobs. My particular issue is that my cloud environment may kill an HTTP request and issue its own retries (kicking off duplicate scrapes), and my long running scrape job needs to abort. With an http.IncomingRequest from node.js this is pretty straightforward

  cluster.on('taskerror', (err, job, willRetry) => {
      if (willRetry) {
        console.log(`Encountered an error while crawling.  This job will be retried.`, err.message)
      } else {
        console.error(`Failed to crawl`, err.message)
      }
  });


  request.on("aborted", () => {
    console.log(`HTTP Request ended unexpectedly after ${elapsedTime(requestStart)}s.  Signaling scraping to shut down`)
    
    return cluster.close()
      .then(() => {
        log('Cluster IS CLOSED!')
      })
      .catch((error) => {
        logError("UNABLE TO CLOSE CLUSTER", error)
      })
  })

Unfortunately, what I was finding was that puppeteer-cluster wasn't aborting like I wanted it to! Instead it was detecting a browser crash and restarting.

HTTP Request ended unexpectedly after 2.368s.  Signaling scraping to shut down
Cluster IS CLOSED!
Encountered an error while crawling.  This job will be retried. Protocol error (Runtime.callFunctionOn): Target closed.
Encountered an error while crawling.  This job will be retried. Navigation failed because browser has disconnected!
Encountered an error while crawling.  This job will be retried. Navigation failed because browser has disconnected!

The current implementation leaves the taskerror event being called (with jobWillRetry now set to false) even after isClosed has resolved and returned true. I don't have a preference for if this is right or not (after cluster.close() I could argue that no events should be emitted), but worth calling out as to being how the implementation currently works

DemGiran · 2020-08-26T04:09:28Z

+1 for it to be merged

… so a catch doesn't call repair() and launch chrome again unnecessarily

…ntext -> createBrowserContext

deldrid1 added 3 commits August 14, 2020 13:11

check isClosed before issuing retries for a failed job

695cb3a

Comment cleanup

4789b31

Fix linter issues and add dist files

4c5fdb8

deldrid1 and others added 7 commits March 10, 2022 19:03

Merge remote-tracking branch 'upstream/master'

cca71c7

Update to puppeteer 13.5.0

1c18b38

add eslint

da5baab

Remove .gitignored files

4e50caf

lint fixes + add 5 second timeout for close events

50b7773

Free the page (which exists) and not the context (which is undefined)…

83da45e

… so a catch doesn't call repair() and launch chrome again unnecessarily

Bump to latest version of puppeteer and move createIncognitoBrowserCo…

f69ada7

…ntext -> createBrowserContext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check `isClosed` before issuing retries for a failed job #325

check `isClosed` before issuing retries for a failed job #325

deldrid1 commented Aug 14, 2020

DemGiran commented Aug 26, 2020 •

edited

check isClosed before issuing retries for a failed job #325

Are you sure you want to change the base?

check isClosed before issuing retries for a failed job #325

Conversation

deldrid1 commented Aug 14, 2020

DemGiran commented Aug 26, 2020 • edited

check `isClosed` before issuing retries for a failed job #325

check `isClosed` before issuing retries for a failed job #325

DemGiran commented Aug 26, 2020 •

edited