New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FeedParserBolt: metadata added by parse filters not passed forward in topology #541

Closed
sebastian-nagel opened this Issue Mar 5, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@sebastian-nagel
Collaborator

sebastian-nagel commented Mar 5, 2018

FeedParserBolt does not pass the metadata add by parse filters forward in the topology. AdaptiveFetchSchedule does not recognize anymore whether a feed has change by signature comparison (cc/news-crawl#19). f344f4b has changed the way metadata is passed to the parsefilters in FeedParserBolt and SiteMapParserBolt while tika and jsoup parsers still work on the Metadata object directly (set by parseData.setMetadata(metadata))

@jnioche

This comment has been minimized.

Member

jnioche commented Mar 6, 2018

thanks @sebastian-nagel. the shortcut parse.put(url, metadata) creates a new instance of metadata under the bonnet and copies the values from the original metadata which indeed causes a problem aswe use and return the latter further down the code. Will fix this now.

@jnioche jnioche added bug core labels Mar 6, 2018

@jnioche jnioche added this to the 1.8 milestone Mar 6, 2018

@jnioche jnioche self-assigned this Mar 6, 2018

@jnioche jnioche closed this in 5b64f4e Mar 6, 2018

@sebastian-nagel

This comment has been minimized.

Collaborator

sebastian-nagel commented Mar 6, 2018

Thanks, @jnioche!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment