Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot mark request as handled, because it is not in progress! #502

Closed
cspeer opened this issue Nov 6, 2019 · 8 comments · Fixed by #507
Closed

Cannot mark request as handled, because it is not in progress! #502

cspeer opened this issue Nov 6, 2019 · 8 comments · Fixed by #507

Comments

@cspeer
Copy link

cspeer commented Nov 6, 2019

Version: 0.16.2-beta.0

ERROR: BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.4komma5.de/Warenkorb","retryCount":1,"id":"igZ1MMPZ6srUZfb"}
  Error: Cannot mark request igZ1MMPZ6srUZfb as handled, because it is not in progress!
    at RequestQueueLocal.markRequestHandled (crawler/node_modules/apify/build/request_queue.js:962:13)
ERROR: BasicCrawler: runTaskFunction error handler threw an exception. This places the crawler and its underlying storages into an unknown state and crawling will be terminated. This may have happened due to an internal error of Apify's API or due to a misconfigured crawler. If you are sure that there is no error in your code, selecting "Restart on error" in the actor's settingswill make sure that the run continues where it left off, if programmed to handle restarts correctly.
  Error: Cannot reclaim request igZ1MMPZ6srUZfb, because it is not in progress!
    at RequestQueueLocal.reclaimRequest (crawler/node_modules/apify/build/request_queue.js:999:13)
ERROR: AutoscaledPool: runTaskFunction failed.
  Error: Cannot reclaim request igZ1MMPZ6srUZfb, because it is not in progress!
    at RequestQueueLocal.reclaimRequest (crawler/node_modules/apify/build/request_queue.js:999:13)
INFO: Crawler final request statistics: {"avgDurationMillis":369,"perMinute":150,"finished":11,"failed":0,"retryHistogram":[11]}
ERROR: The function passed to Apify.main() threw an exception:
  Error: Cannot reclaim request igZ1MMPZ6srUZfb, because it is not in progress!
    at RequestQueueLocal.reclaimRequest (crawler/node_modules/apify/build/request_queue.js:999:13)

Code to reproduce:

const Apify = require('apify');

Apify.main(async () => {
  const requestQueue = await Apify.openRequestQueue();
  await requestQueue.addRequest({ url: 'https://www.4komma5.de/' });
  const crawler = new Apify.CheerioCrawler({
    requestQueue,
    maxRequestsPerCrawl: 100,
    handlePageFunction: async ({ request, $ }) => {
      console.log(`Processing ${request.url}...`);

      await Apify.utils.enqueueLinks({
        $,
        selector: 'a',
        requestQueue,
        pseudoUrls: [new Apify.PseudoUrl('https://www.4komma5.de/[.*]')],
        baseUrl: 'https://www.4komma5.de/'
      });
    },
    minConcurrency: 1,
    maxConcurrency: 1
  });

  await crawler.run();

  console.log('Crawler finished.');
});
@cspeer
Copy link
Author

cspeer commented Nov 6, 2019

This also happens when using https://news.ycombinator.com. I chose 4komm5.de because it happens faster there.

@cspeer
Copy link
Author

cspeer commented Nov 6, 2019

So, this only happens when using Apify.utils.enqueueLinks. When we create the same queue with Apify.openRequestQueue but add about 8k URLs with requestQueue.addRequest this issue does not occur.

@cspeer
Copy link
Author

cspeer commented Nov 6, 2019

Interesting. The url (https://www.4komma5.de/Warenkorb ) it is complaining about in the example above however is not redirecting. Maybe it redirected to that url.

// edit: the comment this is in response to was deleted.

@cspeer
Copy link
Author

cspeer commented Nov 11, 2019

Any updates?

@cspeer
Copy link
Author

cspeer commented Nov 11, 2019

Thanks a lot @mnmkng

@surfshore
Copy link

surfshore commented Feb 6, 2020

Hi,
I have the same problem. (apify@0.19.1)
Is this also a problem with RequestQueueLocal? #505

ERROR: BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://xxx/xxx","retryCount":1,"id":"IyKBUk3SwBOrQ3H"}
  Error: Cannot mark request IyKBUk3SwBOrQ3H as handled, because it is not in progress!
    at RequestQueueLocal.markRequestHandled (/app/node_modules/apify/build/request_queue.js:967:13)
    at async BasicCrawler._runTaskFunction (/app/node_modules/apify/build/crawlers/basic_crawler.js:406:7)
    at async AutoscaledPool._maybeRunTask (/app/node_modules/apify/build/autoscaling/autoscaled_pool.js:463:7)
ERROR: BasicCrawler: runTaskFunction error handler threw an exception. This places the crawler and its underlying storages into an unknown state and crawling will be terminated. This may have happened due to an internal error of Apify's API or due to a misconfigured crawler. If you are sure that there is no error in your code, selecting "Restart on error" in the actor's settingswill make sure that the run continues where it left off, if programmed to handle restarts correctly.
  Error: Cannot reclaim request IyKBUk3SwBOrQ3H, because it is not in progress!
    at RequestQueueLocal.reclaimRequest (/app/node_modules/apify/build/request_queue.js:1004:13)
ERROR: AutoscaledPool: runTaskFunction failed.
  Error: Cannot reclaim request IyKBUk3SwBOrQ3H, because it is not in progress!
    at RequestQueueLocal.reclaimRequest (/app/node_modules/apify/build/request_queue.js:1004:13)
INFO: Crawler final request statistics: {"avgDurationMillis":7890,"perMinute":207,"finished":9719,"failed":0,"retryHistogram":[9719]}
ERROR: The function passed to Apify.main() threw an exception:
  Error: Cannot reclaim request IyKBUk3SwBOrQ3H, because it is not in progress!
    at RequestQueueLocal.reclaimRequest (/app/node_modules/apify/build/request_queue.js:1004:13)
error Command failed with exit code 91.

@mdbetancourt
Copy link

Hi, same issue (apify@0.19.1)

ERROR: BasicCrawler: runTaskFunction error handler threw an exception. This places the crawler and its underlying storages into an unknown state and crawling will be terminated. This may have happened due to an internal error of Apify's API or due to a misconfigured crawler. If you are sure that there is no error in your code, selecting "Restart on error" in the actor's settingswill make sure that the run continues where it left off, if programmed to handle restarts correctly.
  Error: Cannot reclaim request X5rx6Uz1u9vTMYN, because it is not in progress!
    at RequestQueueLocal.reclaimRequest (/var/www/web-app/packages/crawler/node_modules/apify/build/request_queue.js:1004:13)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
ERROR: AutoscaledPool: runTaskFunction failed.
  Error: Cannot reclaim request X5rx6Uz1u9vTMYN, because it is not in progress!
    at RequestQueueLocal.reclaimRequest (/var/www/web-app/packages/crawler/node_modules/apify/build/request_queue.js:1004:13)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
INFO: Crawler final request statistics: {"avgDurationMillis":563,"perMinute":441,"finished":47709,"failed":0,"retryHistogram":[47708,1]}
ERROR: The function passed to Apify.main() threw an exception:
  Error: Cannot reclaim request X5rx6Uz1u9vTMYN, because it is not in progress!
    at RequestQueueLocal.reclaimRequest (/var/www/web-app/packages/crawler/node_modules/apify/build/request_queue.js:1004:13)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)

@mnmkng
Copy link
Member

mnmkng commented Apr 16, 2020

@mdbetancourt Please use apify@dev or 0.20.4-dev.0. Also see #505

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants