Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check isClosed before issuing retries for a failed job #325

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

deldrid1
Copy link

These changes prevent puppeteer-cluster from thinking that the browser has closed with Protocol error (Runtime.callFunctionOn): Target closed. or Navigation failed because browser has disconnected! and issuing retries when cluster.close() is called. Since this method should only be used after cluster.idle() I think this is much more reasonable behavior (at least it fits my use case below!).

The added test case demonstrates this well. Before the change the new test case fails in this manor:

$ npx jest -t 'retries stop after close called'                                                                                                                                                                                                                                                                                                   
 FAIL  test/Cluster.test.ts (45.014s)
  ● options › concurrency: 1 › retries stop after close called

    expect.assertions(1)

    Expected one assertion to be called but received four assertion calls.

      161 | 
      162 |             test('retries stop after close called', async () => {
    > 163 |                 expect.assertions(1); // 3 retries will be 4 times called (just like test above), unless we actually close the cluster and cancel retries
          |                        ^
      164 | 
      165 |                 const cluster = await Cluster.launch({
      166 |                     concurrency,

      at test/Cluster.test.ts:163:24
      at test/Cluster.test.ts:8:71
      at Object.<anonymous>.__awaiter (test/Cluster.test.ts:4:12)
      at Object.test (test/Cluster.test.ts:162:64)

  ● options › concurrency: 2 › retries stop after close called

    expect.assertions(1)

    Expected one assertion to be called but received four assertion calls.

      161 | 
      162 |             test('retries stop after close called', async () => {
    > 163 |                 expect.assertions(1); // 3 retries will be 4 times called (just like test above), unless we actually close the cluster and cancel retries
          |                        ^
      164 | 
      165 |                 const cluster = await Cluster.launch({
      166 |                     concurrency,

      at test/Cluster.test.ts:163:24
      at test/Cluster.test.ts:8:71
      at Object.<anonymous>.__awaiter (test/Cluster.test.ts:4:12)
      at Object.test (test/Cluster.test.ts:162:64)

  ● options › concurrency: 3 › retries stop after close called

    expect.assertions(1)

    Expected one assertion to be called but received four assertion calls.

      161 | 
      162 |             test('retries stop after close called', async () => {
    > 163 |                 expect.assertions(1); // 3 retries will be 4 times called (just like test above), unless we actually close the cluster and cancel retries
          |                        ^
      164 | 
      165 |                 const cluster = await Cluster.launch({
      166 |                     concurrency,

      at test/Cluster.test.ts:163:24
      at test/Cluster.test.ts:8:71
      at Object.<anonymous>.__awaiter (test/Cluster.test.ts:4:12)
      at Object.test (test/Cluster.test.ts:162:64)

Test Suites: 1 failed, 3 skipped, 1 of 4 total
Tests:       3 failed, 80 skipped, 83 total
Snapshots:   0 total
Time:        46.225s, estimated 57s
Ran all test suites with tests matching "retries stop after close called".

After the change, all tests pass!

The use case for this is aborting long running scrape jobs. My particular issue is that my cloud environment may kill an HTTP request and issue its own retries (kicking off duplicate scrapes), and my long running scrape job needs to abort. With an http.IncomingRequest from node.js this is pretty straightforward

  cluster.on('taskerror', (err, job, willRetry) => {
      if (willRetry) {
        console.log(`Encountered an error while crawling.  This job will be retried.`, err.message)
      } else {
        console.error(`Failed to crawl`, err.message)
      }
  });


  request.on("aborted", () => {
    console.log(`HTTP Request ended unexpectedly after ${elapsedTime(requestStart)}s.  Signaling scraping to shut down`)
    
    return cluster.close()
      .then(() => {
        log('Cluster IS CLOSED!')
      })
      .catch((error) => {
        logError("UNABLE TO CLOSE CLUSTER", error)
      })
  })

Unfortunately, what I was finding was that puppeteer-cluster wasn't aborting like I wanted it to! Instead it was detecting a browser crash and restarting.

HTTP Request ended unexpectedly after 2.368s.  Signaling scraping to shut down
Cluster IS CLOSED!
Encountered an error while crawling.  This job will be retried. Protocol error (Runtime.callFunctionOn): Target closed.
Encountered an error while crawling.  This job will be retried. Navigation failed because browser has disconnected!
Encountered an error while crawling.  This job will be retried. Navigation failed because browser has disconnected!

The current implementation leaves the taskerror event being called (with jobWillRetry now set to false) even after isClosed has resolved and returned true. I don't have a preference for if this is right or not (after cluster.close() I could argue that no events should be emitted), but worth calling out as to being how the implementation currently works

@DemGiran
Copy link

DemGiran commented Aug 26, 2020

+1 for it to be merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants