LST-Guard watches recent changes in Wikimedia projects and catches edits that may cause broken links in (transclusion) pages. This is done by one process (lst_poller
) catching and saving all changed section labels and the second process (lst_worker
) checking if those result in broken transclusions and if so, updating the section labels in pages where they are transcluded.
- Files
- Architecture
- Supported languages
- Usage
- Install requirements
- Further development
- Contribute
- License
This repository contains:
- config.ini - contains options for running the program, as well credentials of as a Wikimedia user (bot) to edit pages. (See details below.)
- lst_manager.py - manage and monitor the program.
- app.py - runs
lst_poller
andlst_worker
in the background. - lst_poller.py - detects changed section labels and stores them in Redis.
- lst_worker.py - checks stored labels and corrects transclusions if necessary.
- localizations.py - syntax details and other language-specific data used to extract label names.
- requirements.txt - list of dependencies necessary to run this program.
LST-Guard consists of two background processes: lst_poller
constantly watches recent changes in a Wikimedia project reading the EventStreams feed, detects changed section labels and stores them to be checked later. It filters out edits in project
(usually Wikisource), in languages
(defined in config.ini
) and in namespace 104
(Pages:). Consequently it checks if section labels have been changed in these edits. If yes, old and new labels, page and edit info is stored in the Redis database.
The second process, lst_worker
, runs only on intervals (5 minutes by default). If there is new data in Redis, it checks if any sections of the edited page is transcluded in other content pages (namespace 1
) and if they are not updated manually, it will replace old labels with new labels.
Both modules are called into life by app.py
. It is preferable not to execute this module directly, but to use lst_manager.py
instead.
Every time a page is edited, lst_poller
compares the old and new revision texts of the article and extracts section labels assuming the following syntax is used (including localizations and minor syntactic variations, see localizations.py
):
<section begin="Some Label" />
When the number of section labels in the old and new versions are the same, it will assume that they correspond to each other.
When it comes to correcting these labels in transclusions, lst_worker
has to recognize three different syntaxes (again: including localizations and minor syntactic variations):
- HTML syntax:
<pages index="Original Page" fromsection="Section Label", tosection= "Section Label"/>
- Mediawiki syntax:
{{#lst:Original Page|Section Label}}
- Template:
{{page|Original Page|num=Page number|section=Section Label}}
Here is an example when editing sectoin labels causes a broken transclusion and how it is handled by LST-Guard:
- in a Wikisource page the section label
s1
has is changed toJordan, Dorothea
by an editor. - the article where this section was transcluded lost its content
lst_poller
detects the change in label name in the original pagelst_worker
corrects the label in the transcluding article
The current version supports 9 languages:
- English
- German
- Spanish
- Armenian
- Portuguese
- French
- Italian
- Polish
- Russian
To use LST-Guard, Python3 is required. redis-server
must also be installed. For a Debian/Ubunti machine, install it with this command:
$ apt-get install redis-server
or use your preferred package manager.
Furthermore, the Python libraries redis
, configparser
, sseclient
and requests
are required. Install all of them with PIP:
$ pip3 install -r requirements.txt
Finally we use nohup
to run LST-Guard in background, this program is present in most Linux machines.
Before starting LST-Guard, a redis-server should be running. This can be done with this command (note that &
will run it in the background):
$ redis-server --port 7777 &
Start LST-Guard using lst_manager
:
$ ./lst_manager.py -start
If everything is okay, your terminal should print this:
$ ./lst_manager.py -start
Check: Redis DB running OK (host: localhost, port: 7777, db: 0)
Check: config file [config.ini]: Success.
Flushing Redis database.
Starting LST-guard: lst_poller & lst_worker initiated.
$
The status of the processes can be queried with the following command
$ python3 lst_manager.py -status
An example output could be:
LST-Guard processes:
lst_poller: RUNNING
lst_worker: RUNNING
Run lst_manager
without any options to see its full functionality: you can check the status of the two processes, restart or stop them, check the redis-database and export its contents.
#TODO
The config.ini
file contains the types of data:
-
[run on]
- contains the project and the language(s) that LST-Guard will watch when running. -
[supported on]
- contains the projects and languages that are supported. Change this section only with great precaution and on your own risk. -
[credentials]
- should contain the username and password of your Wikimedia bot account. Note, that these have to be obtained from Special:BotPasswords. -
[redis database]
- containshostname
,port
anddb
number of the redis database.
log.txt is the main log and contains all detected changed labels and corrections.
stdout.txt is where the terminal output of the last/current run is dumped and contains all pages that were checked by the bot.
Will be updated
Next stage is to expand lst_poller
with some basic NLP to be sure that corresponding labels are correctly identified (as mentioned, current versions assumes that if the number of labels in old and new versions is the same, then they must be corresponding labels).
Secondly we want to add languages, especially languages that have many transclusions in Wikisource. Done
Help us with adding new languages. Test the code and find bugs.
We will define the license of LST-Guard soon. Meanwhile feel free to use, share, copy and modify it however you want as it is free software.