This Go script automates the process of updating a wiki page with the remaining amount of Transkribus credits left for the Wikimedia account. It fetches the value of remaining credits from the Transkribus dashboard and updates the Data:Wikimedia_OCR,_Transkribus_quota.tab page by adding a new row at the bottom of the table.
- Have Go installed and working on your system. You can follow the official docs to do the same.
- Have an instance of Mediawiki core running on your system so that you have your own local wiki to test with and do not rely on modifying any pages on hosted wikis.
- Since the bot deals with a
.tab
page, it requires that theJsonConfig
extension be installed as well. Read more about.tab
pages here.
You can clone the repository by running the following command
git clone https://github.com/parthiv-m/tr-stat-update
If you wish to fork and then clone the repository, you are welcome to do so!
The environment variables required to run the script are provided in the .example.env
file.
This script is strictly for the Transkribus account managed by Wikimedia. However, it can be generalised for any Transkribus account.
- Navigate to
Special:BotPasswords
on your local wiki. - You will be prompted to enter details like bot name, and clarify the grants required for the bot. This bot only requires permission to edit existing pages.
- The subsequent page gives the bot username of the form
username@bot_name
and a password. These are to be mentioned in the.env
file appropriately.
There are no major dependencies used in the script except for the godotenv
package to handle the .env
file. Nevertheless, install all possible packages listed in the go.mod
file using the command go get .
Once this is done, you are all set to run the script!
In general, the command to run a go script is go run <filename>.go
. In our case, this becomes go run main.go
.
When run without any arguments, the script runs in development
mode. This is indicated by the logging statement
Running in development...
Warning
This will modify the publicly available page. Only run if you are sure of what you are doing!
To run the script to update the actual wiki page on Commons, run it as follows
go run main.go production
This will produce a logging statment that says
Running in production...
If you are not a developer and are not interested in tinkering around with the script, but still would like to run the script from time to time, it is best to download a binary of the script from the releases section.
Note
Currently, binaries are available only for Linux.
Extracting the downloaded .tar.gz
file using the tar -xvf <file_name>
command should result in a tr-stat-update
file as the final executable. You will still be required to set the appropriate environment variables in the same directory as the downloaded file.
To run the executable, simply do
./tr-stat-update production
All logs for the script are stored in a debug.log
file in the same directory as the script. If you run into any trouble, you might want to check the logs!
The script follows a linear workflow as outlined below:
- First, it authenticates itself to the Transkribus API using the login credentials provided by the user
- Next, it makes a request to fetch the total credits left in the user's Transkribus dashboard
- It then goes on to fetch the Data:Wikimedia_OCR,_Transkribus_quota.tab page using the Mediawiki Action API
- Once the contents of the page are available, the script authenticates itself using the credentials of the bot generated by the user
- After the bot is logged in successfully, the script requests for a CSRF token for the bot so that it can make edits safely on the wiki page
- Now, the script is ready to add a new row to the page, along with an apprpriate summary consisting of the date and time of updation of the wiki page
- Transkribus is a platform for the text recognition, image analysis, and structure recognition of historical documents. By means of its web interface and a desktop client, it provides users access to a rich set of features to transcribe texts and train custom handwritten text recognition models.
- Wikimedia OCR is a web service and interface for providing OCR text from images hosted on MediaWiki wikis. Transkribus is the newest addition to the set of OCR engines available on the tool. Try it out now!