Skip to content

Add capability to Crawlers to archive pages to WARC files #1262

Open
@Pijukatel

Description

@Pijukatel

New feature would allow Crawlers to dynamically archive specific crawled pages as WARC files.

WARC are used for archiving pages and all their resources and can be used for various purposes. For example testing crawlers in development to avoid the need to test on real site, regression testing of crawlers, or just archiving the page for future reference.

Most likely this could be turned on in pre-navigation hook for each page that the user wants to record. This would redirect the request through proxy that handles the recording. For example using wayback in proxy recording mode and store the recorded WARC file in the KVS.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request.t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions