Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible improvement to TerminatingBlocksFinder #12

Closed
GoogleCodeExporter opened this issue May 6, 2015 · 1 comment
Closed

Possible improvement to TerminatingBlocksFinder #12

GoogleCodeExporter opened this issue May 6, 2015 · 1 comment

Comments

@GoogleCodeExporter
Copy link

The following block of code:
final String text = tb.getText().trim();
if (text.startsWith("Comments")
  || N_COMMENTS.matcher(text).find()
  || text.contains("What you think...")
  || text.contains("add your comment")
  || text.contains("Add your comment")
  || text.contains("Add Your Comment")
  || text.contains("Add Comment")
  || text.contains("Reader views")
  || text.contains("Have your say")
  || text.contains("Have Your Say")
  || text.contains("Reader Comments")
  || text.equals("Thanks for your comments - this feedback is now closed")
  || text.startsWith("© Reuters")
  || text.startsWith("Please rate this")

Might be rewritten as:
final String text = tb.getText().trim().toLowerCase();
if (text.startsWith("comments")
  || N_COMMENTS.matcher(text).find()
  || text.contains("what you think...")
  || text.contains("add your comment")
  || text.contains("add comment")
  || text.contains("reader views")
  || text.contains("have your say")
  || text.contains("reader comments")
  || text.equals("thanks for your comments - this feedback is now closed")
  || text.startsWith("© reuters")
  || text.startsWith("please rate this")


It would catch more cases this way and be easier to maintain.

Also, I saw the Washington Post use "Post a Comment", so it could be good to 
add that one as well.

Original issue reported on code.google.com by benjamin...@gmail.com on 21 Nov 2010 at 8:15

@GoogleCodeExporter
Copy link
Author

Hi Benjamin,

thanks for your suggestion.

I have evaluated the proposed changes on the L3S-GN1 dataset and can confirm 
that it actually slightly improves precision while minimally reducing recall 
(thus improving F1), while not slowing down processing.

I have also added the "Post a comment", and moreover changed the Pattern 
matcher in the original code to a string-based comparison, which saves some 
more nanoseconds ;)

The changes are in SVN trunk and will be included in the next release.

Thanks!
Christian

Original comment by ckkohl79 on 21 Nov 2010 at 1:40

  • Changed state: Fixed
  • Added labels: OpSys-All, Performance, Type-Enhancement
  • Removed labels: Type-Defect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant