Skip to content

Use record filtering from JWARC#19

Merged
lfoppiano merged 3 commits intomainfrom
feature/use-filter-from-jwarc
Apr 20, 2026
Merged

Use record filtering from JWARC#19
lfoppiano merged 3 commits intomainfrom
feature/use-filter-from-jwarc

Conversation

@lfoppiano
Copy link
Copy Markdown
Collaborator

Description

In iipc/jwarc#104 we pushed upstream in JWARC a more flexible record filtering. We can then get rid of the custom code and update the whirlwind tour to use jwarc for all operations around cdxj.

This PR:

  • update the README.md to use jwarc instead of custom code
  • remove the custom code from this repository, since it was pushed upstream to JWARC
  • correct a minor make incorrect command to download jwarc

Copy link
Copy Markdown

@sebastian-nagel sebastian-nagel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks, @lfoppiano!

WAT and WET files are properly indexed.

@lfoppiano lfoppiano merged commit 48228f1 into main Apr 20, 2026
1 check passed
@lfoppiano lfoppiano deleted the feature/use-filter-from-jwarc branch April 20, 2026 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants