# Wikidata Bots

> Welcome!
> 
> Please download this notebook and load it into your [PAWS instance](https://hub-paws.wmcloud.org/)

This notebook provides an overview of working with Wikidata via bots using Python frameworks.

Check out the [Wikidata:Bots documentation page](https://www.wikidata.org/wiki/Wikidata:Bots) and [Wikidata:Creating a bot](https://www.wikidata.org/wiki/Wikidata:Creating_a_bot) for a full overview of the process.

**Note**: for this notebook we're using the [Wikidata item sandbox](https://www.wikidata.org/wiki/Q4115189), but if you want to be changing more things in your testing, please consider using [Test Wikidata](https://test.wikidata.org).

**Note**: Remember that you'll need to remake the entities each time you want to add new claims, or generally just keep in mind that operations that you're doing might get compounded like adding something twice when you want to do the a new operation but haven't reloaded, or changing your statement from something that was broken and still having the broken statement loaded in to what you're trying to upload.

## Introduction

Source: [Wikidata:Bots](https://www.wikidata.org/wiki/Wikidata:Bots)

Wikidata is meant to be edited by both humans and machines, so learning how to use bots is an invaluable step in improving your ability to have an impact on data ingestion and quality. Bots can add [interwiki links](https://www.wikidata.org/wiki/Help:Sitelinks), [labels](https://www.wikidata.org/wiki/Help:Label), [descriptions](https://www.wikidata.org/wiki/Help:Description), [statements](https://www.wikidata.org/wiki/Help:Statements), [sources](https://www.wikidata.org/wiki/Help:Sources), and can even create items, among other things.

**Note**: Wikidata has a strict bot policy to assure that they are used in a way that is beneficial to the community and project as a whole.

> In the case of any damage caused by a bot, the bot operator is asked to stop the bot. Depending on the scale of the damage, an administrator may block the bot. The bot operator is responsible for cleaning up any damage caused by the bot.

A summary of this policy and suggestions are:

- Please [create a separate account](https://www.wikidata.org/w/index.php?title=Special:CreateAccount&returnto=Wikidata%3ABots) for your bot activity
  - Name your bot after your normal username and add in the word `bot`
  - The above helps identify bot activity
    - Bot flags in our Data Lake are based on usernames and activity
  - Create a talk page with the [bot template](https://www.wikidata.org/wiki/Template:Bot)
    - Just put `{{Bot|USER_NAME}}` in the user page text
- Add an [assertion](https://www.mediawiki.org/wiki/API:Assert) to your activity to make sure the bot is logged in
  - The Python packages below will do this for you
- Add the [bots page on Wikidata](https://www.wikidata.org/wiki/Wikidata:Bots) to your watched pages as discussions related to bots happen there
- Bots are expected to respect [maxlag](https://www.mediawiki.org/wiki/Manual:Maxlag_parameter) and follow the [API etiquette](https://www.mediawiki.org/wiki/API:Etiquette)
- Request permission for a bot flag at [Wikidata:Requests_for_permissions/Bot](https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot)
  - You're expected to run a test of 50-250 edits
    - Start the permission process before this
    - Or edit the test item for your 50 edits (see below)
  - You should outline the purpose of the bot, and if it expands too much, consider making a new bot flag request
  - If accepted, an admin/bureaucrat will close the request and a bureaucrat will add the flag
- No permission is needed for the following pages
  - The operator/bot's userspace
  - The [Wikidata sandbox](https://www.wikidata.org/wiki/Wikidata:Sandbox)
  - The [Wikidata item sandbox](https://www.wikidata.org/wiki/Q4115189) (what we'll edit!)

There are also admin bots! In addition to requesting permission for the bot, a request for [admin permissions](https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Administrator) is also necessary.

See [bot requirements](https://www.wikidata.org/wiki/Wikidata:Bots#Bot_requirements) for a full list of what is expected once your bot is up and running.

## Creating a Bot

Source: [Wikidata:Creating a bot](https://www.wikidata.org/wiki/Wikidata:Creating_a_bot)

**Note**: the suggested tool is [Wikibase Integrator](https://github.com/LeMyst/WikibaseIntegrator)

There are three Python based bot libraries of note:
- [Pywikibot](https://github.com/wikimedia/pywikibot)
  - See: [Wikidata:Creating_a_bot#Pywikibot](https://www.wikidata.org/wiki/Wikidata:Creating_a_bot#Pywikibot)
  - For interacting with the MediaWiki API, so not just Wikidata
  - Note that at time of writing it has incomplete support for lexemes (language data)
- [Wikidata Integrator](https://github.com/SuLab/WikidataIntegrator)
  - See: [Wikidata:Creating_a_bot#Wikidata_Integrator](https://www.wikidata.org/wiki/Wikidata:Creating_a_bot#Wikidata_Integrator)
  - Has high integration with the Wikidata SPARQL endpoint
  - Note that adding references was found to be difficult and is thus not included
- [Wikibase Integrator](https://github.com/LeMyst/WikibaseIntegrator)
  - See: [Wikidata:Creating_a_bot#WikibaseIntegrator](https://www.wikidata.org/wiki/Wikidata:Creating_a_bot#WikibaseIntegrator)
  - The [usage examples](https://github.com/LeMyst/WikibaseIntegrator/tree/master/notebooks)/[docs](https://wikibaseintegrator.readthedocs.io/en/stable/) are expansive
  - Most of what you need is in the [readme](https://github.com/LeMyst/WikibaseIntegrator?tab=readme-ov-file#wikibase-integrator)
  
We'll include a basic example of interacting with the [Wikidata item sandbox](https://www.wikidata.org/wiki/Q4115189) for each of the packages above.

Once our bot is ready, we'd host it on [Toolforge](https://wikitech.wikimedia.org/wiki/Help:Toolforge), a Raspberry Pi with a cron job running the script, or countless other ways of reading Python code and running it at an interval to do the assigned functions of the bot.

## Imports

The install of `plantuml` below is for Wikidata Integrator.

In [44]:
%%capture
pip install pywikibot

In [None]:
%%capture
pip install plantuml

In [25]:
%%capture
pip install wikidataintegrator

In [3]:
%%capture
pip install wikibaseintegrator

In [2]:
%load_ext jupyter_black

In [1]:
from datetime import datetime
from getpass import getpass
import os

import pywikibot
from pywikibot.login import ClientLoginManager
from wikibaseintegrator import WikibaseIntegrator, datatypes, wbi_login
from wikibaseintegrator.models import References, Reference
from wikibaseintegrator.wbi_config import config
from wikidataintegrator import wdi_core, wdi_login

For these examples, we'll be using [Andrew McAllister Bot (WMDE)](https://www.wikidata.org/wiki/User:Andrew_McAllister_Bot_(WMDE)).

In [3]:
# Login information for all the tools.
WD_USER = "Andrew McAllister Bot (WMDE)"
WD_PASS = getpass(prompt="Enter your bot account password: ")

Enter your bot account password: ········


In [4]:
# Sandbox item ID that we can experiment with.
WD_ITEM_SANDBOX_QID = "Q4115189"
WD_ITEM_URL_PREFIX = "https://www.wikidata.org/wiki/"

print(f"{WD_ITEM_URL_PREFIX}{WD_ITEM_SANDBOX_QID}")

https://www.wikidata.org/wiki/Q4115189


In [5]:
# For references later.
current_year = datetime.now().year
current_month = datetime.now().month
current_day = datetime.now().day

## Pywikibot

- [See Pywikibot on GitHub](https://github.com/wikimedia/pywikibot)

Comment: Has a lot of great features, but getting setup and usage is a bit cumbersome.

Note that when using Pywikibot the following files will be made in your local directory:

- apicache
- pywikibot.lwp
- pywikibot-BOT_NAME.lwp
- throttle.ctrl

Remember to include these files in `.gitignore`.

### Create and Register Configuration File

Before proceeding, create a file `user-config.py` with the following information:

```
family = "wikidata"
mylang = "wikidata"
usernames["wikidata"]["wikidata"] = "YOUR_BOT_OR_USER_NAME"  # noqa
```

For the following process, `Andrew McAllister Bot (WMDE)` was replaced for the user name above, and `# noqa` is used to suppress a linting error that `usernames` is undefined. 

**Note**: if you're using [Test Wikidata](https://test.wikidata.org), `mylang` above is `"test"`.

In [8]:
pywikibot.config.register_families_folder(os.getcwd())
os.environ["PYWIKIBOT_DIR"] = os.getcwd()

### Login and Initialize Repo

In [11]:
site = pywikibot.Site("wikidata", "wikidata")

lm = ClientLoginManager(
    site=site,
    user=WD_USER,
    password=WD_PASS,
)
lm.login()

repo = site.data_repository()

Logging in to wikidata:wikidata as Andrew McAllister Bot (WMDE)


In [12]:
pywikibot.login.LoginStatus(0).name

'AS_USER'

### Access Basic Item Information

In [13]:
item = pywikibot.ItemPage(repo, WD_ITEM_SANDBOX_QID)

In [14]:
item.labels.get("en")

'Wikidata Sandbox'

In [15]:
item.descriptions.get("en")

'This is a sandbox for testing changes to items. Please be gentle with it. Feel free to change anything on this page! For testing links, try adding ones to userpages.'

In [16]:
sandbox_entity_instance_of_qid = "Q" + str(
    item.claims.get("P31")[0].toJSON()["mainsnak"]["datavalue"]["value"]["numeric-id"]
)

print(f"{WD_ITEM_URL_PREFIX}{sandbox_entity_instance_of_qid}")

https://www.wikidata.org/wiki/Q21281405


### Editing the Sandbox Item

In [17]:
# Add the "used by" (P1535) statement referencing "Internet bot" (Q191865).
claim = pywikibot.Claim(repo, "P1535")
target = pywikibot.ItemPage(repo, "Q191865")
claim.setTarget(target)
item.addClaim(claim, summary="Adding claim used by Internet bot")

In [18]:
# Add the "statement is subject of" (P805) qualifier referencing "Pywikibot" (Q15169668).
qualifier = pywikibot.Claim(repo, "P805")
target = pywikibot.ItemPage(repo, "Q15169668")
qualifier.setTarget(target)
claim.addQualifier(
    qualifier, summary="Adding qualifier for statement being subject of Pywikibot."
)

In [22]:
# Add the "stated in" (P248) source referencing "software documentation" (Q181702).
statedin = pywikibot.Claim(repo, "P248")
source = pywikibot.ItemPage(repo, "Q181702")
statedin.setTarget(source)

# Add the "retrieved" (P813) reference for today's date.
retrieved = pywikibot.Claim(repo, "P813")
date = pywikibot.WbTime(year=current_year, month=current_month, day=current_day)
retrieved.setTarget(date)

claim.addSources([statedin, retrieved], summary="Adding sources.")

## Wikidata Integrator

- [See Wikidata Integrator on GitHub](https://github.com/SuLab/WikidataIntegrator)

Comment: Was the easiest to get started and work with, but adding references proved too difficult to demonstrate.

### Login and Create Object

In [30]:
login_instance = wdi_login.WDLogin(user=WD_USER, pwd=WD_PASS)

### Access Basic Item Information

In [31]:
item = wdi_core.WDItemEngine(wd_item_id=WD_ITEM_SANDBOX_QID)

In [32]:
item_dict = item.get_wd_json_representation()

In [33]:
item_dict["labels"]["en"]["value"]

'Wikidata Sandbox'

In [34]:
item_dict["descriptions"]["en"]["value"]

'This is a sandbox for testing changes to items. Please be gentle with it. Feel free to change anything on this page! For testing links, try adding ones to userpages.'

In [37]:
sandbox_entity_instance_of_qid = item_dict["claims"]["P31"][0]["mainsnak"]["datavalue"][
    "value"
]["id"]

print(f"https://www.wikidata.org/wiki/{sandbox_entity_instance_of_qid}")

https://www.wikidata.org/wiki/Q21281405


### Editing the Sandbox Item

In [132]:
# Add the "different from" (P1889) statement referencing "sandpit" (Q213454).
statements = [wdi_core.WDItemID(value="Q213454", prop_nr="P1889")]

In [133]:
wd_item = wdi_core.WDItemEngine(wd_item_id=WD_ITEM_SANDBOX_QID, data=statements)

In [134]:
wd_item.write(login_instance)

'Q4115189'

## Wikibase Integrator

- [See Wikibase Integrator on GitHub](https://github.com/LeMyst/WikibaseIntegrator)

- Comment: Seems to be the best option for Wikidata. Setup is in the middle of the three presented tools, but includes enforcement of best practices. Editing Wikidata and including references is quite simple.

### Login and Create Object

In [6]:
# Make a user agent string that clearly states the bot account.
USER_AGENT_STRING = "Andrew McAllister Bot (WMDE)/1.0 (https://www.wikidata.org/wiki/User:Andrew McAllister Bot (WMDE))"

In [7]:
config["USER_AGENT"] = USER_AGENT_STRING

In [8]:
WD_ACTION_API_ENDPOINT = "https://www.wikidata.org/w/api.php"

In [9]:
login_wikidata = wbi_login.Login(
    user=WD_USER,
    password=WD_PASS,
    mediawiki_api_url=WD_ACTION_API_ENDPOINT,
)



In [10]:
# Add login credentials to your WikibaseIntegrator object so they're applied to all requests.
wbi = WikibaseIntegrator(login=login_wikidata)

### Access Basic Item Information

In [43]:
sandbox_entity = wbi.item.get(
    WD_ITEM_SANDBOX_QID,
    mediawiki_api_url=WD_ACTION_API_ENDPOINT,
)

In [44]:
sandbox_entity.labels.get("en").value

'Wikidata Sandbox'

In [45]:
sandbox_entity.descriptions.get("en").value

'This is a sandbox for testing changes to items. Please be gentle with it. Feel free to change anything on this page! For testing links, try adding ones to userpages.'

In [46]:
sandbox_entity_instance_of_qid = sandbox_entity.claims.get("P31")[0].mainsnak.datavalue[
    "value"
]["id"]

print(f"https://www.wikidata.org/wiki/{sandbox_entity_instance_of_qid}")

https://www.wikidata.org/wiki/Q21281405


### Editing the Sandbox Item

In [47]:
# Add the "part of" (P361) reference for Wikidata (Q2013).
# Add the "retrieved" (P813) reference for today's date.
# Add the "Sandbox-External identifier" (P2536) reference for '2013'.
references = References()

reference_item = Reference()
reference_time = Reference()
reference_external_id = Reference()

reference_item.add(datatypes.Item(value="Q2013", prop_nr="P361"))

reference_time.add(
    datatypes.Time(
        prop_nr="P813", time="+" + datetime.now().strftime("%Y-%m-%d") + "T00:00:00Z"
    )
)

reference_external_id.add(datatypes.ExternalID(value="2013", prop_nr="P2536"))

references.add(reference_item)
references.add(reference_time)
references.add(reference_external_id)

<References @bed510 _References__references=[<Reference @1a2190 _Reference__hash=None _Reference__snaks=<Snaks @254110 snaks={'P361': [<Snak @cd0950 _Snak__snaktype=<WikibaseSnakType.KNOWN_VALUE: 'value'> _Snak__property_number='P361' _Snak__hash=None _Snak__datavalue={'value': {'entity-type': 'item', 'numeric-id': 2013, 'id': 'Q2013'}, 'type': 'wikibase-entityid'} _Snak__datatype='wikibase-item'>]}> _Reference__snaks_order=[]>, <Reference @255090 _Reference__hash=None _Reference__snaks=<Snaks @5ee990 snaks={'P813': [<Snak @447590 _Snak__snaktype=<WikibaseSnakType.KNOWN_VALUE: 'value'> _Snak__property_number='P813' _Snak__hash=None _Snak__datavalue={'value': {'time': '+2024-04-15T00:00:00Z', 'before': 0, 'after': 0, 'precision': 11, 'timezone': 0, 'calendarmodel': 'http://www.wikidata.org/entity/Q1985727'}, 'type': 'time'} _Snak__datatype='time'>]}> _Reference__snaks_order=[]>, <Reference @4f75d0 _Reference__hash=None _Reference__snaks=<Snaks @447690 snaks={'P2536': [<Snak @3cba90 _Sna

In [48]:
# Add the "maintained by" (P126) statement referencing the "Tester" family name (Q37528637).
claim_string = datatypes.Item(prop_nr="P126", value="Q37528637", references=references)
sandbox_entity.claims.add(claim_string)

<Claims @ba8d50 _Claims__claims={'P31': [<Item @ba8310 _Claim__mainsnak=<Snak @ba9d10 _Snak__snaktype=<WikibaseSnakType.KNOWN_VALUE: 'value'> _Snak__property_number='P31' _Snak__hash='d5a7ba441737489875ed734038d09d2bd5e3e0f0' _Snak__datavalue={'value': {'entity-type': 'item', 'numeric-id': 21281405, 'id': 'Q21281405'}, 'type': 'wikibase-entityid'} _Snak__datatype='wikibase-item'> _Claim__type='statement' _Claim__qualifiers=<Qualifiers @baa4d0 _Qualifiers__qualifiers={}> _Claim__qualifiers_order=[] _Claim__id='Q4115189$c99d6ad2-4c86-8b3a-5399-3ab1ac6c369b' _Claim__rank=<WikibaseRank.NORMAL: 'normal'> _Claim__removed=False _Claim__references=<References @ba8750 _References__references=[<Reference @baa450 _Reference__hash='67f594d1feef711b0d459ff142a881f70c90365a' _Reference__snaks=<Snaks @bab250 snaks={'P854': [<Snak @ba9b10 _Snak__snaktype=<WikibaseSnakType.KNOWN_VALUE: 'value'> _Snak__property_number='P854' _Snak__hash='a824a848663de576ada9225a891673a56d7f93c0' _Snak__datavalue={'value

In [49]:
sandbox_entity.write()

<ItemEntity @905710 _BaseEntity__api=<wikibaseintegrator.wikibaseintegrator.WikibaseIntegrator object at 0x71174a5b5990>
	 _BaseEntity__title='Q4115189'
	 _BaseEntity__pageid=4246474
	 _BaseEntity__lastrevid=2129943157
	 _BaseEntity__type='item'
	 _BaseEntity__id='Q4115189'
	 _BaseEntity__claims=<Claims @5d6d90 _Claims__claims={'P31': [<Item @31ad10 _Claim__mainsnak=<Snak @206b50 _Snak__snaktype=<WikibaseSnakType.KNOWN_VALUE: 'value'> _Snak__property_number='P31' _Snak__hash='d5a7ba441737489875ed734038d09d2bd5e3e0f0' _Snak__datavalue={'value': {'entity-type': 'item', 'numeric-id': 21281405, 'id': 'Q21281405'}, 'type': 'wikibase-entityid'} _Snak__datatype='wikibase-item'> _Claim__type='statement' _Claim__qualifiers=<Qualifiers @5dd6d0 _Qualifiers__qualifiers={}> _Claim__qualifiers_order=[] _Claim__id='Q4115189$c99d6ad2-4c86-8b3a-5399-3ab1ac6c369b' _Claim__rank=<WikibaseRank.NORMAL: 'normal'> _Claim__removed=False _Claim__references=<References @75bf50 _References__references=[<Reference

### Creating a New Item

Feel free to try out the [item creation example](https://github.com/LeMyst/WikibaseIntegrator/blob/master/notebooks/item_create_new.ipynb) on [Test Wikidata](https://test.wikidata.org/wiki/Wikidata:Main_Page)!