You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
featureIssues that represent new features or improvements to existing features.t-toolingIssues with this label are in the ownership of the tooling team.
Which package is the feature request for? If unsure which one to select, leave blank
None
Feature
We have now error statistics that count a number of errors per error message (with some unification system). In some crawlers, we are using automatic screenshotter that for every first error of that type saves snapshot (HTML with Cheerio and HTML + screenshot for browsers (optionally)). The snapshot URL is also linked to the error reporter so we can link it to the error statistics.
Motivation
It significantly improves debugging both for first-time users (that are not used to snapshotting) and scheduled scrapers where you can quickly see what happened on the page (was redirected, blocked, layout changed, lazy loading etc.)
I would especially make it default on generic scrapers in DEVELOPMENT mode with some nice message, it will also reduce the amount of support since people will figure out on their own.
Ideal solution or implementation, and any additional constraints
It just provides a wrapper over an arbitrary function and also works if you nest it. I think we don't need this feature, we can just bake it to Crawlee as a default action inside errorHandler based on the error statistics error parsing.
Alternative solutions or implementations
No response
Other context
No response
The text was updated successfully, but these errors were encountered:
featureIssues that represent new features or improvements to existing features.t-toolingIssues with this label are in the ownership of the tooling team.
Which package is the feature request for? If unsure which one to select, leave blank
None
Feature
We have now error statistics that count a number of errors per error message (with some unification system). In some crawlers, we are using automatic screenshotter that for every first error of that type saves snapshot (HTML with Cheerio and HTML + screenshot for browsers (optionally)). The snapshot URL is also linked to the error reporter so we can link it to the error statistics.
Motivation
It significantly improves debugging both for first-time users (that are not used to snapshotting) and scheduled scrapers where you can quickly see what happened on the page (was redirected, blocked, layout changed, lazy loading etc.)
I would especially make it default on generic scrapers in DEVELOPMENT mode with some nice message, it will also reduce the amount of support since people will figure out on their own.
Ideal solution or implementation, and any additional constraints
Old implementation is here: https://github.com/apify-projects/apify-extra-library/blob/master/src-js/error-handling.js#L26
It just provides a wrapper over an arbitrary function and also works if you nest it. I think we don't need this feature, we can just bake it to Crawlee as a default action inside
errorHandler
based on the error statistics error parsing.Alternative solutions or implementations
No response
Other context
No response
The text was updated successfully, but these errors were encountered: