Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RevisionMachine] Missing revision histories for articles with colon in title #33

Closed
GoogleCodeExporter opened this issue Mar 26, 2015 · 5 comments

Comments

@GoogleCodeExporter
Copy link

Currently, all articles with prefixes in the title (like User:) are filtered by 
the RevisionMachine unless the prefix appears in a whitelist.
This way, only "normal" articles appear in the db PLUS everything you 
specifically define in the whitelist.
At the moment, a page is identified as having a prefix by looking for a colon 
in the title. There are, however, a few pages which have a colon in the title 
whitout using it for prefix demarkation. These pages will currently be lost. 
(<0.20%)
We therefore should adjust the filter and maybe go back to a (language 
dependent) blacklist filter.

Original issue reported on code.google.com by oliver.ferschke on 7 Jul 2011 at 10:28

@GoogleCodeExporter
Copy link
Author

Original comment by oliver.ferschke on 12 Jul 2011 at 9:00

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

The WikipediaXMLReader should read in the namespace mappings from the dump, 
which should then be used by the article name checker for filtering.

Original comment by oliver.ferschke on 13 Jul 2011 at 1:17

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

Reimplemented ArticleFilter. It now uses namespace information from the 
siteinfo section of the xml dump.

Original comment by oliver.ferschke on 22 Jul 2011 at 3:35

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

Original comment by oliver.ferschke on 22 Jul 2011 at 3:35

  • Changed state: Fixed
  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

Original comment by oliver.ferschke on 16 Feb 2012 at 1:24

  • Added labels: Milestone-Before-0.8.0
  • Removed labels: ****

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant