Skip to content

PHP8 wikipedia bot for bibliographic references

License

Notifications You must be signed in to change notification settings

Dispositif/Wikibot

Repository files navigation

Wikibot

Build Status CircleCI Maintainability GitHub Code Coverage Scrutinizer Code Quality

PHP CLI app for Wikipedia robot.

See https://fr.wikipedia.org/wiki/Utilisateur:CodexBot for the live bot.

  • Correction and completion of bibliographic references on the french Wikipedia, using my legacy code and importing open data (GoogleBooks, OpenLibrary, Bibliothèque nationale de France, Wikidata...) based on ISBN or other book's identifiers. Lots of data cleaning and post-processing, because bibliographic data is sooo serious but inconsistent.

  • Completion of "external links" from the World Wide Web : the bot acts like a web crawler and transforms the raw links (http://...) into detailed references with page's title, author, site name, date, etc. It uses metadata from OpenGraph, JSON-LD, DublinCore, TwitterCard or naive prediction from HTML. Not a lot of data postprocessing, because web data is cool but rather consistent (SEO).

  • Detection of dead links (404, DNS, etc.) and replacement by web archives (provided by Wikiwix or Internet Archive).

Please do not play with this package. These programs can actually modify the live Wikipedias, and proper wiki-etiquette should be followed before running it on any wiki. See https://en.wikipedia.org/wiki/WP:Bot for rules and authorization requests.

Tech stack : PHP >=8.1 (version 1.1 on PHP7.2), RabbitMQ or MySQL, Tor, composer libraries (Symfony components, addwiki/mediawiki-api, etc), hexagonal architecture….

Use make on root, for the list of some available commands.

schemas of workers

Special thanks to

  • addshore (wiki API)
  • biblys (ISBN formating)
  • cloudamqp.com (AMQP server)
  • many frwiki users for quality control

Memo :

About

PHP8 wikipedia bot for bibliographic references

Topics

Resources

License

Stars

Watchers

Forks

Languages