Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance error statistics with snapshot storing to KV #1771

Closed
metalwarrior665 opened this issue Feb 6, 2023 · 1 comment
Closed

Enhance error statistics with snapshot storing to KV #1771

metalwarrior665 opened this issue Feb 6, 2023 · 1 comment
Labels
feature Issues that represent new features or improvements to existing features. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@metalwarrior665
Copy link
Member

Which package is the feature request for? If unsure which one to select, leave blank

None

Feature

We have now error statistics that count a number of errors per error message (with some unification system). In some crawlers, we are using automatic screenshotter that for every first error of that type saves snapshot (HTML with Cheerio and HTML + screenshot for browsers (optionally)). The snapshot URL is also linked to the error reporter so we can link it to the error statistics.

Motivation

It significantly improves debugging both for first-time users (that are not used to snapshotting) and scheduled scrapers where you can quickly see what happened on the page (was redirected, blocked, layout changed, lazy loading etc.)

I would especially make it default on generic scrapers in DEVELOPMENT mode with some nice message, it will also reduce the amount of support since people will figure out on their own.

Ideal solution or implementation, and any additional constraints

Old implementation is here: https://github.com/apify-projects/apify-extra-library/blob/master/src-js/error-handling.js#L26

It just provides a wrapper over an arbitrary function and also works if you nest it. I think we don't need this feature, we can just bake it to Crawlee as a default action inside errorHandler based on the error statistics error parsing.

Alternative solutions or implementations

No response

Other context

No response

@metalwarrior665 metalwarrior665 added the feature Issues that represent new features or improvements to existing features. label Feb 6, 2023
@mtrunkat mtrunkat added the t-tooling Issues with this label are in the ownership of the tooling team. label Jul 18, 2023
@metalwarrior665
Copy link
Member Author

Closing as duplicate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Issues that represent new features or improvements to existing features. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

2 participants