Skip to content

How to exclude PDF files #528

Answered by ldko
oschihin asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @oschihin ,
I think you are on the right track. You should be able to reject the mimetypes in the warcWriter bean. This works for me to reject image/jpeg types:

 <!-- Define WARC scope at top-level, to enable logging -->
 <bean id="warcWriterScope" class="org.archive.modules.deciderules.DecideRuleSequence">
       <property name="logToFile" value="true" />
       <property name="rules">
         <list>
           <bean class="org.archive.modules.deciderules.AcceptDecideRule">
           </bean>
           <bean class="org.archive.modules.deciderules.ContentTypeMatchesRegexDecideRule">
             <property name="decision" value="REJECT"/>
             <property name="regex" value="^im…

Replies: 8 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by ato
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
5 participants
Converted from issue

This discussion was converted from issue #453 on September 30, 2022 00:49.