Bot-Jagwar is a side project which aims to automate editing on the Malagasy Wiktionary as much as possible.
- Linux-like environment (if you're on Windows, WSL will do)
- Python 3.10 (pip is used for requirements) or later versions. It can work on earlier versions of Python 3, but their compatibility with those earlier versions is no longer being tested.
- PostgreSQL 11 server or later versions, used with psycopg2
- HAProxy, used for the NLLB inference server
- Supervisor, to manage the services
- PostgREST, used for
entry_translator
- Pywikibot, to access Wiktionary through its API
- Screen, to run the services in the background
- Nginx, used for the frontend application
- VueJS, used for the frontend application
- ctranslate2, used for the NLLB inference server, used by
entry_translator
In the project directory, run install.sh
. The Python virtual environment as well as the scripts and the required configuration will be deployed on the target machine at /opt/botjagwar
. They can be removed by removing the install folder.
If you intend to use the bot for editing on Wiktionary, you need to set up your pywikibot instance. Visit Pywikibots installation manual for more details on how to do that.
To confirm whether you have a working installation, run test.sh
. All tests should pass.
However, some of them may not pass on the Raspberry Pi due files not being deleted after teardowns.
- Go to
/opt/botjagwar
- Run
python3 wiktionary_irc.py
in a screen instance - Run
python3 dictionary_service.py
in a screen instance, serves on 8001 - Run
python3 entry_translator.py
in a screen instance, serves on 8000
Alternatively, a supervisor script supervisor-botjagwar.conf
can help you configure the program above as services.
Connects to the recent changes real time feed of French and English Wiktionaries on irc.wikimedia.org
and attempts to translate every entries
that are being created.
This is an IRC client and connects to entry_translator.py REST API for translations.
Word storage engine. REST API required by the wiktionary_irc.py
to store and get translations.
Default engine is SQLite, please see database_uri
at conf/config.ini
for a change.
It is used by SQLAschemy to connect to the database backend.
This API has been tested and used on MySQL (manual test), SQLite (automatic test) and PostgreSQL databases (manual test) thanks to SQLAlchemy. For the best performance and mostly if you want to use the frontend application, please use PostgreSQL.
You might also be interested in the associated frontend: dictionary frontend which provides an interface to manage dictionary in a more user-friendly manner. It will allow you to edit link and delete words and definitions as well as an access to a per-language dictionary.
The frontend application makes use of VueJS, and Nginx. With PostgreSQL backend,
Postgrest is used to lessen the load on dictionary_service
for read operations.
Nginx acts as a proxy which redirect requests to either dictionary_service
or PostgREST API.
Wiki page handling that also uses translation and page rendering APIs.
Side effects are page updates and creations to the target wiki.
REST service required by wiktionary_irc.py
The requirements for this script are:
- NLLB inference server (
ctranslate.py
) being installed and running (see below for more information). You can install it usinginstall-ctranslate.sh
script. - A running instance of the
dictionary_service.py
script.
This script uses the NLLB 3.3B model to run. It has its own requirements that have minimal impact on the rest of the project.
However, for your convenience, it must be separately installed, as the whole deployment environment requires
30+ gigabytes of storage. If you ever choose to, you can install it on a separate machine. If you do that, do not forget to change
the HAProxy settings (translator_<x>
backends) in conf/haproxy.conf
, so as not to break entry_translator.py
.
Independent script to translate non-lemma entries on the English Wiktionary into Malagasy.
Independent script that updates the statistics table for each Wiktionary, Wikipedia and Wikibooks and stores it to the user's subpage on the Malagasy Wiktionary
Independent script, which, in detail:
- fetches the words created in the last 30 days;
- checks the missing language templates, and translates them in malagasy with a basic phonetic transcription algorithm;
- if the language name could be translated: creates the templates and categories for the missing language; or
- stores a list of untranslated language names in a table, stored on the Malagasy wiktionary at
Mpikambana:<USERNAME>/Lisitry ny kaodim-piteny tsy voafaritra
© 2018 Rado A. (Terakasorotany) -- MIT Licence.