Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(basic-crawler): allow configuring the automatic status message #2001

Merged
merged 2 commits into from Jul 20, 2023

Conversation

B4nan
Copy link
Member

@B4nan B4nan commented Jul 20, 2023

Adds statusMessageCallback option to all the crawler classes to allow customizing the status message call.

Allows overriding the default status message. The callback needs to call crawler.setStatusMessage() explicitly.
The default status message is provided in the parameters.

const crawler = new CheerioCrawler({
    statusMessageCallback: async (ctx) => {
        return ctx.crawler.setStatusMessage(`this is status message from ${new Date().toISOString()}`, { level: 'INFO' }); // log level defaults to 'DEBUG'
    },
    statusMessageLoggingInterval: 1, // defaults to 10s
    async requestHandler({ $, enqueueLinks, request, log }) {
        // ...
    },
});

/**
* Allows overriding the default status message. When the callback returns `null` or `undefined`, the default message will be used as a fallback.
*/
statusMessageLogLevel?: LogLevel.DEBUG | LogLevel.INFO | LogLevel.WARNING | LogLevel.ERROR;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if these are really needed. If you need to customize it, you can set your own setInterval and disable the Crawlee messages (set huge interval?). Because most likely, if you want to make this good, you would want to make the log level depend on the state of the stats (which is what Crawlee is doing with the high error rate) so setting one on crawler level is not that useful

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, what about changing the callback so it calls the setStatusMessage()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way you can also disable the status periodic messages by not calling it inside the callback.

Adds two new options to all the crawler classes:
- `statusMessageCallback`
- `statusMessageLogLevel`
@B4nan B4nan merged commit 3eb4e4c into master Jul 20, 2023
7 checks passed
@B4nan B4nan deleted the bc-status-message-callback branch July 20, 2023 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants