# Wikibase Updater Bot
This notebook contains a simple implementation that allows to download data from Wikidata, modify it, and upload it again either to Wikidata or to a locally instanced wikibase. It uses WikibaseIntegrator as its main tool for downloading and uploading entities. See https://github.com/LeMyst/WikibaseIntegrator and https://wikibaseintegrator.readthedocs.io/en/latest/ for more info.

Some parts of this notebook also require to have a local wikibase working. To create a local wikibase, follow the instructions at https://www.mediawiki.org/wiki/Wikibase/Docker. 

## Using WikibaseIntegrator


### Installing and configuring WikibaseIntegrator

In [52]:
!pip install wikidataintegrator
!pip install wikibaseintegrator



### Editing Wikidata with WikibaseIntegrator

This code is a brief example on how to link a scientific article and a software in wikidata. We will introduce an article-software link and a software-article link using Apache SystemDS as an example. Keep in mind that WikibaseIntegrator supports SPARQL queries using wbi_helpers.execute_sparql_query() to obtain Wikidata Qnodes.

#### Retrieve entities from Wikidata

In [54]:
from wikibaseintegrator import WikibaseIntegrator
from wikibaseintegrator.wbi_config import config as wbi_config

#Defaulting to wikidata config in case we change the target graph
wbi_config['MEDIAWIKI_API_URL'] = 'https://www.wikidata.org/w/api.php'
wbi_config['SPARQL_ENDPOINT_URL'] = 'https://query.wikidata.org/'
wbi_config['WIKIBASE_URL'] = 'https://www.wikidata.org'

wbi = WikibaseIntegrator()

scientific_article='Q59517993'
software='Q22681943'

article_wikidata_item = wbi.item.get(entity_id=scientific_article)

software_wikidata_item = wbi.item.get(entity_id=software)

# to check successful installation and retrieval of the data, we can print the representation of both items
print('ARTICLE\n', article_wikidata_item.get_json(), '\n')
#print('SOFTWARE\n', software_wikidata_item.get_json())

'''#Optional: download jsons for better visualization

f = open("art.json", "w")
f.write(str(article_wikidata_item.get_json()))
f.close()

'''


#Visualize the results

ARTICLE
 {'labels': {'en': {'language': 'en', 'value': 'WIDOCO: A Wizard for Documenting Ontologies'}, 'nl': {'language': 'nl', 'value': 'WIDOCO: A Wizard for Documenting Ontologies'}}, 'descriptions': {'nl': {'language': 'nl', 'value': 'wetenschappelijk artikel'}, 'uk': {'language': 'uk', 'value': 'наукова стаття, опублікована у 2017'}, 'ast': {'language': 'ast', 'value': 'artículu científicu'}}, 'aliases': {}, 'type': 'item', 'claims': {'P356': [{'mainsnak': {'snaktype': 'value', 'property': 'P356', 'datatype': 'external-id', 'datavalue': {'value': '10.1007/978-3-319-68204-4_9', 'type': 'string'}}, 'type': 'statement', 'id': 'Q59517993$4E38D98F-F087-49CC-B786-0EE7C5B8F593', 'rank': 'normal'}], 'P31': [{'mainsnak': {'snaktype': 'value', 'property': 'P31', 'datatype': 'wikibase-item', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 13442814, 'id': 'Q13442814'}, 'type': 'wikibase-entityid'}}, 'type': 'statement', 'id': 'Q59517993$C42AF837-9F37-4366-98F8-80F67F3B8229', 'rank

'#Optional: download jsons for better visualization\n\nf = open("art.json", "w")\nf.write(str(article_wikidata_item.get_json()))\nf.close()\n\n'

#### Login
To edit items in Wikidata, we must first log in. These lines will allow the script to 'impersonate' us when doing edits. Ideally, we should have a bot account to log in using OAuth

In [55]:
from wikibaseintegrator.wbi_config import config as wbi_config
from wikibaseintegrator import wbi_login


#Change these to a valid username and password in wikidata
wikidata_user="<your-wikidata-username>"
wikidata_pwd="<your-wikidata-password>"


#change the USER_AGENT config parameter to edit the User-Agent header. See https://www.wikidata.org/wiki/Wikidata:Data_access for more info
wbi_config['USER_AGENT'] = 'MyWikibaseBot/1.0 (https://www.wikidata.org/wiki/User:TrialAndError2)'

#change username and password to your own account login data on Wikidata
login_instance = wbi_login.Clientlogin(user=wikidata_user, password=wikidata_pwd)

#### Upload changes to wikidata

### Uploading Wikidata entities to you local wikibase


#### Getting entities from wikidata

In this case, we will get the entities using sparql queries from wikidata. Keep in mind that you need to be logged in wikidata before executing the next lines.

In [56]:
from wikidataintegrator import wdi_core

# SPARQL query to fetch entities
sparql_query = """
SELECT DISTINCT ?item WHERE {
  ?item wdt:P31 wd:Q13442814.
}
LIMIT 1
"""

# Execute the query

results = wdi_core.WDItemEngine.execute_sparql_query(sparql_query)

# The line before provides the entity URIs of the selected items. To acces their statements, we need to retrieve them using WDItemEngine

for result in results["results"]["bindings"]:
    entity_id = result["item"]["value"].split("/")[-1]
    
    #retrieving the items by URI
    entity = wdi_core.WDItemEngine(wd_item_id=entity_id)
    
    #printing the labels of the returned items
    print(entity.get_wd_json_representation()['labels']['en']['value'])
    

1178 Irmela


#### Changing the configuration to target the local wikibase
After retrieving the desired items, we need to indicate to WikibaseIntegrator the configuration of our local wikibase. The following code changes the configuration to the default config of a local wikibase deployed on docker using https://github.com/wmde/wikibase-release-pipeline/tree/main/example. After that, it retrieves one entity to check if the connection has been succesful.

In [57]:
from wikibaseintegrator import WikibaseIntegrator
from wikibaseintegrator.wbi_config import config as wbi_config

#Changing wbi configuration to target local wikibase
#CHANGE THIS TO YOUR DEPLOYMENT'S SPECIFIC URLs
wbi_config['MEDIAWIKI_API_URL'] = 'http://localhost:80/api.php'
wbi_config['SPARQL_ENDPOINT_URL'] = 'http://localhost:8834/proxy/wdqs/bigdata/namespace/wdq/sparql'
wbi_config['WIKIBASE_URL'] = 'http://localhost:80'


#Getting Q1 from the local wikibase. 
#Q1 must be a valid entity in your local wikibase
entity='Q1'

wbi = WikibaseIntegrator()

item1 = wbi.item.get(entity_id=entity)

print('Q1: ', item1.get_json())
#print(item1)

#print(item1.get)
#print([method_name for method_name in dir(item1) if callable(getattr(item1, method_name))])




Q1:  {'labels': {'en': {'language': 'en', 'value': 'asd'}}, 'descriptions': {'en': {'language': 'en', 'value': 'desde'}}, 'aliases': {}, 'type': 'item', 'claims': {}, 'id': 'Q1'}


#### Logging in into your local wikibase

The procces is the same as when logging into Wikidata. Change your user and password accordingly, as well as the user URL in the USER_AGENT config parameter

In [58]:
from wikibaseintegrator.wbi_config import config as wbi_config
from wikibaseintegrator import wbi_login

#Change these to a valid username and password on your local wikibase
wikibase_user="<your-wikibase-username>"
wikibase_pwd="<your-wikibase-password>"

#change the USER_AGENT config parameter to edit the User-Agent header. See https://www.wikidata.org/wiki/Wikidata:Data_access for more info
wbi_config['USER_AGENT'] = 'MyWikibaseBot/1.0 (http://localhost/wiki/User:<your-wikibase-username>)'

#change username and password to your own account login data on Wikidata
login_instance = wbi_login.Clientlogin(user=wikibase_user, password=wikibase_pwd)

#### Creating entities in your local wikibase

One defect of the WikibaseIntegrator approach is that there is no way to import the entities directly. Instead, we have to create a new entity and create the same statements it has on Wikidata. The following code creates a new entity in the local wikibase, aasigns it the label obtained from its entity in Wikidata and adds the "instance_of scholarly article statement". Notice that, in order to make this work, we have to create the entiity "scholarly article" and the property "instance_of" in the wikibase beforehand.

In [59]:

from wikibaseintegrator.datatypes import ExternalID, Item, String, Time

wbi = WikibaseIntegrator(login=wb_login_instance)


for result in results["results"]["bindings"]:
    #Parse the results from the sparql query
    entity_id = result["item"]["value"].split("/")[-1]
    
    #Get the entity from wikidata
    entity = wdi_core.WDItemEngine(wd_item_id=entity_id)
    ent_json= entity.get_wd_json_representation()
    
    #Create the new entity
    item = wbi.item.new()
    
    #Assign the same label it has on wikidata
    item.labels.set(language='en', value=ent_json['labels']['en']['value'])
    
    #Add the "instance_of scholarly article" statement
    instance = Item(value='Q2', prop_nr='P1')
    
    #statements go into a list, since multiple statements can be added at once
    data=[instance]
    item.claims.add(data)
    item.write()



Notice that, since a new item is created each time we import an entity, importing the same Wikidata entity multiple times will create multiple instances of the same entity on the local wikibase.

#### Using mappings to avoid multiple entity creation

The following code will create a JSON mapping which will link a wikidata QNode with the local wikibase Qnode

In [101]:

import json
import os

filename = 'mapping.json'

if not os.path.exists(filename):
    # If the file doesn't exist, create an empty mapping
    with open(filename, 'w') as f:
        json.dump({}, f)

with open(filename, 'r+') as f:
    mapping = json.load(f)
    



print(mapping)



{'Q136796': 'Q53'}


In [102]:
for result in results["results"]["bindings"]:
    #Parse the results from the sparql query
    entity_id = result["item"]["value"].split("/")[-1]
    
    
    if entity_id not in mapping.keys():
        #Get the entity from wikidata
        entity = wdi_core.WDItemEngine(wd_item_id=entity_id)
        ent_json= entity.get_wd_json_representation()
    
    
    
        #Create the new entity
        item = wbi.item.new()
    
        #print([method_name for method_name in dir(item) if callable(getattr(item1, method_name))])
        
        
        #Assign the same label it has on wikidata
        item.labels.set(language='en', value=ent_json['labels']['en']['value'])
    
        #Add the "instance_of scholarly article" statement
        instance = Item(value='Q2', prop_nr='P1')
    
        #statements go into a list, since multiple statements can be added at once
        data=[instance]
        item.claims.add(data)
        item = item.write()
        #print(item.get_json()['id'])
        mapping.update({entity_id:item.get_json()['id']})
        
    else:
        print(entity_id,' is already imported as ', mapping[entity_id])
        item = wbi.item.new()
        item = item.get(mapping[entity_id])
        #print([method_name for method_name in dir(item) if callable(getattr(item1, method_name))])
        #print(item.get_json())


with open(filename, 'w') as f:
    json.dump(mapping, f)


Q136796  is already imported as  Q53
{'Q136796': 'Q53'}


### Additional testing
These are other different approaches to import entities from wikidata without usinf WikidataIntegrator or WikibaseIntegrator. However, this is still a work in progress and none of them work

#### Using requests and the API

In [None]:
#Download the mediawiki library
!pip install git+https://github.com/barrust/mediawiki.git

In [None]:


import requests
from wikidataintegrator import wdi_core, ref_handlers
from mediawiki import MediaWikiSession


# Replace these values with your Wikidata and Wikibase installation details
WIKIDATA_API_ENDPOINT = "https://www.wikidata.org/w/api.php"
WIKIBASE_API_ENDPOINT = "http://localhost:80/w/api.php"



# Set up authentication for Wikidata
session = requests.Session()
login_token_response = session.get(WIKIDATA_API_ENDPOINT, params={
    "action": "query",
    "meta": "tokens",
    "type": "login",
    "format": "json"
})
login_token = login_token_response.json()["query"]["tokens"]["logintoken"]
login_response = session.post(WIKIDATA_API_ENDPOINT, data={
    "action": "login",
    "lgname": wikidata_user,
    "lgpassword": wikidata_pwd,
    "lgtoken": login_token,
    "format": "json"
})
if not login_response.json()["login"]["result"] == "Success":
    raise ValueError("Unable to authenticate with Wikidata")

# Set up authentication for local Wikibase
local_session = MediaWikiSession(WIKIBASE_API_ENDPOINT)
local_session.login(user=wikibase_user, password=wikibase_pwd)

# Define function to download and upload an entity
def download_and_upload_entity(entity_id):
    # Download entity from Wikidata
    entity = wdi_core.WDItemEngine(wd_item_id=entity_id)
    entity.get_wd_json_representation(api_endpoint=WIKIDATA_API_ENDPOINT)

    # Upload entity to local Wikibase
    data = {
        "action": "wbeditentity",
        "id": entity_id,
        "data": entity.wd_json_representation,
        "format": "json",
        "token": local_session.get_token("edit")
    }
    response = local_session.post(WIKIBASE_API_ENDPOINT, data=data)
    if response.json().get("success") != 1:
        raise ValueError(f"Error uploading entity {entity_id} to Wikibase")

    print(f"Entity {entity_id} successfully uploaded to Wikibase")

# Example usage: download and upload Q42 (Douglas Adams)
download_and_upload_entity("Q42")


#### Using RaiseWikibase

#### Using Pywikibot