Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PhantomCrawler: Unhandled exception - Cannot mark request as handled, because it is not in progress! #8

Closed
MSIH opened this issue Mar 22, 2020 · 1 comment

Comments

@MSIH
Copy link

MSIH commented Mar 22, 2020

The crawler is crashing and causing the task to fail. The problem seems to be related to network timeout and the PageManager.

2020-03-22T10:27:24.364Z [S0000005] ERROR: RemoteRequestManager.webPage.onResourceTimeout(): {
2020-03-22T10:27:24.366Z              "errorCode": 408,
2020-03-22T10:27:24.369Z              "errorString": "Network timeout on resource.",
2020-03-22T10:27:24.371Z              "headers": [
2020-03-22T10:27:24.374Z                {
2020-03-22T10:27:24.376Z                  "name": "Accept",
2020-03-22T10:27:24.378Z                  "value": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
2020-03-22T10:27:24.380Z                },
2020-03-22T10:27:24.383Z                {
2020-03-22T10:27:24.386Z                  "name": "Origin",
2020-03-22T10:27:24.388Z                  "value": "null"
2020-03-22T10:27:24.390Z                },
2020-03-22T10:27:24.394Z                {
2020-03-22T10:27:24.396Z                  "name": "User-Agent",
2020-03-22T10:27:24.399Z                  "value": "Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.1.1s-apifier Safari/538.1"
2020-03-22T10:27:24.401Z                },
2020-03-22T10:27:24.403Z                {
2020-03-22T10:27:24.406Z                  "name": "Content-Type",
2020-03-22T10:27:24.408Z                  "value": "application/json"
2020-03-22T10:27:24.410Z                },
2020-03-22T10:27:24.413Z                {
2020-03-22T10:27:24.415Z                  "name": "Content-Length",
2020-03-22T10:27:24.417Z                  "value": "54"
2020-03-22T10:27:24.419Z                }
2020-03-22T10:27:24.421Z              ],
2020-03-22T10:27:24.424Z              "id": 19,
2020-03-22T10:27:24.426Z              "method": "POST",
2020-03-22T10:27:24.428Z              "postData": "{\"messageType\":\"dummy\",\"piggybackBufferedRequests\":[]}",
2020-03-22T10:27:24.431Z              "time": "2020-03-22T10:26:34.395Z",
2020-03-22T10:27:24.433Z              "url": "http://localhost:34519/slave/5"
2020-03-22T10:27:24.437Z            }
2020-03-22T10:27:24.439Z [S0000005] ERROR: RemoteRequestManager.webPage.onResourceError(): {
2020-03-22T10:27:24.441Z              "errorCode": 5,
2020-03-22T10:27:24.443Z              "errorString": "Operation canceled",
2020-03-22T10:27:24.445Z              "id": 19,
2020-03-22T10:27:24.448Z              "status": null,
2020-03-22T10:27:24.450Z              "statusText": null,
2020-03-22T10:27:24.452Z              "url": "http://localhost:34519/slave/5"
2020-03-22T10:27:24.454Z            }
2020-03-22T10:27:24.457Z [S0000005] ERROR: Couldn't send message to control server at 'http://localhost:34519/slave/5' (messageType=dummy, status: 'fail'):
2020-03-22T10:27:24.460Z [S0000005] A fatal error occurred, shutting down...
2020-03-22T10:27:24.463Z INFO: Slave exited {"slaveId":5,"pid":36,"code":232,"signal":null}
2020-03-22T10:27:24.465Z INFO: Reclaiming request to queue, it will be retried again {"requestId":"bzqn4DDibYTEBp2","slaveId":5}
2020-03-22T10:27:25.047Z [S0000018] Loading crawler configuration from: /tmp/tmp-6M1q6QBgl1of3/config.json
2020-03-22T10:27:25.049Z [S0000018] WARNING: No 'crawlPurls' specified in the configuration!
2020-03-22T10:27:25.051Z [S0000018] Starting crawler using RemoteRequestManager (URL: http://localhost:34519/slave/18, bootstrap: undefined)...
2020-03-22T10:27:43.100Z INFO: Reclaiming request to queue, it will be retried again {"requestId":"ywWZUDAYrQ3e7qq"}
2020-03-22T10:27:43.189Z ERROR: PhantomCrawler: Unhandled exception
2020-03-22T10:27:43.192Z   Error: Cannot mark request Fy4QLsrdYTqEB1R as handled, because it is not in progress!
2020-03-22T10:27:43.194Z     at RequestQueue.markRequestHandled (/home/myuser/node_modules/apify/build/request_queue.js:431:13)
2020-03-22T10:27:43.196Z     at PageManager.markRequestHandled (/home/myuser/src/page_manager.js:290:52)
2020-03-22T10:27:43.200Z     at runMicrotasks (<anonymous>)
2020-03-22T10:27:43.202Z     at processTicksAndRejections (internal/process/task_queues.js:97:5)
2020-03-22T10:27:43.204Z     at async PhantomCrawler._handleNextTaskFromSlave (/home/myuser/src/phantom_crawler.js:735:21)
@jancurn
Copy link
Member

jancurn commented Mar 23, 2020

Might be caused by apify/crawlee#502. We've updated to the latest Apify SDK and rebuilt the actor, hope it will fix the problem.

Feel free to reopen the issue if the problem persists.

@jancurn jancurn closed this as completed Mar 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants