# Populate the portal using the Wikibase API
Goal of this notebook: Import the data prepared in `filter_papers_by_software.ipynb` into the data structure setup in `import_wikidata_properties.ipynb`.

The API documentation is [here](https://www.wikidata.org/w/api.php?action=help&modules=wbeditentity)

In [2]:
# import common definitions and functions
%run WB_common.ipynb

## Login to the Wikibase

### Network settings
* Make sure the wikibase is running, e.g. using [MaRDI4NFDI/portal-compose](https://github.com/MaRDI4NFDI/portal-compose)
* Make sure this jupyter notebook is in the same network as the wiki. This is done in docker-compose

```
networks:
  default:
    external: true
    name: portal-compose_default
```

Networks can be listed using `docker network ls`. Here, "portal-compose_default" is the name of the network started by portal-compose.
* Verify that this notebook is in the correct network `docker network inspect portal-compose_default`

The wiki is then accessible from the notebook container at `http://mardi-wikibase`.

In [3]:
import requests
import json 
import configparser

# url of the API endpoint
WIKIBASE_API = 'http://mardi-wikibase/w/api.php?format=json'

def login(username, botpwd):
    """
    Starts a new session and logins using a bot account.
    @username, @botpwd string: credentials of an existing bot user
    @returns requests.sessions.Session object
    """
    # create a new session
    session = requests.Session()

    # get login token
    r1 = session.get(WIKIBASE_API, params={
        'format': 'json',
        'action': 'query',
        'meta': 'tokens',
        'type': 'login'
    })
    # login with bot account
    r2 = session.post(WIKIBASE_API, data={
        'format': 'json',
        'action': 'login',
        'lgname': username,
        'lgpassword': botpwd,
        'lgtoken': r1.json()['query']['tokens']['logintoken'],
    })
    # raise when login failed
    if r2.json()['login']['result'] != 'Success':
        raise WBAPIException(r2.json()['login'])
        
    return session

### Bot user
* Login to the wiki as admin
* Go to Special:BotPasswords, create a bot user, call it "import", grant it "High-volume editing", "Edit existing pages", "Create, edit, and move pages"
* Copy `data/credentials.tpl` to `data/credentials.ini`. Replace the username and password by those of the newly created bot user (make sure not to commit this file)

In [77]:
# read bot username and password from data/credentials.ini
config = configparser.ConfigParser()
config.sections()
config.read('data/credentials.ini')
username = config['default']['username']
botpwd = config['default']['password']

session = login(username, botpwd)

## Create a wikibase property
A function that creates a new wikidatabase property and returns the new id.

If the property label already exists in the wiki, will not overwrite it, but raise an error.

In [6]:
def get_csrf_token(session):
    """Gets a security (CSRF) token."""
    params1 = {
        "action": "query",
        "meta": "tokens",
        "type": "csrf"
    }
    r1 = session.get(WIKIBASE_API, params=params1)
    token = r1.json()['query']['tokens']['csrftoken']

    return token
    

def create_property(session, data):
    """
    Creates a wikibase property.
    @session requests.sessions.Session: session obtained from login 
    @data python dict: creation parameters of the property
    @returns string: id of the new property
    """
    token = get_csrf_token(session)
    
    params = {
        "action": "wbeditentity",
        "format": "json",
        'new': 'property',
        'data': json.dumps(data),
        'token': token
    }
    r1 = session.post(WIKIBASE_API, data=params)
    r1.json = r1.json()
    
    # raise when edit failed
    if 'error' in r1.json.keys():
        raise WBAPIException(r1.json['error'])

    return r1.json['entity']['id']

For example create a property with these parameters will return an id-string in the form 'Px' (where x is a number).

The property can be seen in the wiki under `http:localhost:8080/wiki/Property:Px`

In [5]:
data = {"labels":{"en":{"language":"en","value":"Propertylabel9"}},"descriptions":{"en":{"language":"en","value":"Propertydescription"}},"datatype":"string"}
create_property(session, data)

'P1'

## Create a wikibase entity
A function that creates a new wikidatabase entity (item) and returns the new id.

If the entity label already exists in the wiki, will not overwrite it, but create a new entity.

In [7]:
def create_entity(session, data):
    """
    Creates a wikibase entity.
    @session requests.sessions.Session: session obtained from login 
    @data python dict: creation parameters of the entity
    @returns string: id of the new entity
    """
    token = get_csrf_token(session)
    
    params = {
        'action': 'wbeditentity',
        'format': 'json',
        'new': 'item',
        'data': json.dumps(data),
        'token': token
    }
    r1 = session.post(WIKIBASE_API, data=params)
    r1.json = r1.json()
    
    # raise when edit failed
    if 'error' in r1.json.keys():
        raise WBAPIException(r1.json['error'])

    return r1.json['entity']['id']

For example create an item with these parameters will return an id-string in the form 'Qx' (where x is a number).

The item can be seen in the wiki under `http:localhost:8080/wiki/Item:Qx`

In [7]:
data={"labels":{"de":{"language":"de","value":"de-value"},"en":{"language":"en","value":"en-value"}}}
create_entity(session, data)

'Q320'

## Import the authors list
Before importing anything, make sure the corresponding items and properties have been imported from wikidata. See notebook `WB_wikidata_properties.ipynb`.

A subsample of the authors list was created in notebook `filter_papers_by_software.ipyb`. This list contains the authors of a papers related to the first 1000 software entries in the list of softwares (`data/swMath-software-list.csv`). The list of authors is in file `data/all_authors.csv.zip`. 

In [23]:
# load the list of authors
import pandas as pd

# load the list of zbMath authors
authors_df = pd.read_csv('data/all_authors.csv.zip') 
authors_df.head()

Unnamed: 0,author_id,author_name
0,aidun.cyrus-k,"Aidun, Cyrus K."
1,babuska.ivo-m,"Babuška, I."
2,banerjee.uday,"Banerjee, U."
3,bartels.alexander,"Bartels, Alexander"
4,basko.mikhail-m,"Basko, M. M."


Use the create_entity function to import the authors into the wiki.
The Q-id returned by the wikibase is appended to the pandas dataframe of authors.

In [35]:
for i,current in authors_df.iterrows():
    author_name = current['author_name'].strip()
    author_id = current['author_id'].strip()
    data = {
        'labels':{'en':{'language':'en','value':author_name}},
        'claims': [
            # instance of 'human'
            {'mainsnak':{
                'snaktype':'value', 'property':'P31', 'datavalue':{'type':'wikibase-entityid', 'value': {'entity-type':'item','id':'Q5'}}},
            'type': 'statement', 'rank': 'normal'},
            # zbMath author id
            {'mainsnak':{
                'snaktype':'value', 'property':'P1556', 'datavalue': {'type':'string', 'value': author_id}},
            'type': 'statement', 'rank': 'normal'}
            ]
    }
    # import into wikibase, save Qid
    authors_df.loc[i, 'qid'] = create_entity(session, data)

In [16]:
authors_df.head()

Unnamed: 0,author_id,author_name,qid
0,aardal.karen-i,"Aardal, Karen",Q321
1,aarts.gert,"Aarts, Gert",Q322
2,abad.alberto-j,"Abad, Alberto",Q323
3,abada.asmaa,"Abada, Asmaa",Q324
4,abanades.miguel-angel,"Abánades, Miguel A.",Q325


## Import the software list
All software entries have already been imported into the MaRDI portal.
Here I will import the first 10 (out of 40000) software entries into the local wiki for testing.

In [37]:
MAX_ENTRIES = 10 # number of software entries to process

# load the list of swMath software
software_df = pd.read_csv('data/swMATH-software-list.csv')
software_df = software_df[:MAX_ENTRIES]
software_df.head()

Unnamed: 0,qid,P13,Len,#
0,,'0',swMATH,initial csv import 2021-12-17
1,,'1',FORTRAN,initial csv import 2021-12-17
2,,'2',SuperLU-DIST,initial csv import 2021-12-17
3,,'3',WHISPAR,initial csv import 2021-12-17
4,,'4',MULTI2D,initial csv import 2021-12-17


Use the create_entity function to import the software into the wiki.
The Q-id returned by the wikibase is appended to the pandas dataframe of software.

In [41]:
for i,current in software_df[:MAX_ENTRIES].iterrows():
    software_name = current['Len'].strip()
    software_id = current['P13'].strip().replace("'", '')

    data = {
        'labels':{'en':{'language':'en','value':software_name}},
        'claims': [
            # instance of 'software'
            {'mainsnak':{
                'snaktype':'value', 'property':'P31', 'datavalue':{'type':'wikibase-entityid', 'value': {'entity-type':'item','id':'Q7397'}}},
            'type': 'statement', 'rank': 'normal'},
            # swMath work id
            {'mainsnak':{
                'snaktype':'value', 'property':'P6830', 'datavalue': {'type':'string', 'value': software_id}},
            'type': 'statement', 'rank': 'normal'}
            ]
    }
    # import into wikibase, save Qid
    software_df.loc[i, 'qid'] = create_entity(session, data)

In [42]:
software_df.head()

Unnamed: 0,qid,P13,Len,#
0,Q1034,'0',swMATH,initial csv import 2021-12-17
1,Q1035,'1',FORTRAN,initial csv import 2021-12-17
2,Q1036,'2',SuperLU-DIST,initial csv import 2021-12-17
3,Q1037,'3',WHISPAR,initial csv import 2021-12-17
4,Q1038,'4',MULTI2D,initial csv import 2021-12-17


## Import the papers list
A subsample of the papers list was created in notebook `filter_papers_by_software.ipyb`. This list contains the papers related to the first 10 software entries in the list of softwares (`data/swMath-software-list.csv`). The list of papers is in file `data/all_papers.csv.zip`. 

In [43]:
# load the list of zbMath papers
papers_df = pd.read_csv('data/all_papers.csv.zip')
papers_df.head(3)

Unnamed: 0,id,author,author_ids,document_title,source,classifications,language,links,keywords,publication_year,doi
0,oai:zbmath.org:5181224,"Zlatev, Zahari; Dimov, Ivan",zlatev.zahari;dimov.ivan-todor,Computational and numerical challenges in envi...,Studies in Computational Mathematics 13. Amste...,65M20;65-02;65Y05;35Kxx,English,http://www.sciencedirect.com/science/book/9780...,textbook;parallel computation;semidiscretizati...,2006,
1,oai:zbmath.org:5187737,"Buttari, Alfredo; D'Ambra, Pasqua; di Serafino...",buttari.alfredo;dambra.pasqua;di-serafino.dani...,2LEV-D2P4: a package of high-performance preco...,"Appl. Algebra Eng. Commun. Comput. 18, No. 3, ...",65F35;65F10;65Y15;65Y05,English,,parallel numerical software;algebraic two-leve...,2007,10.1007/s00200-007-0035-z
2,oai:zbmath.org:5187739,"Gupta, Anshul",gupta.anshul,A shared- and distributed-memory parallel gene...,"Appl. Algebra Eng. Commun. Comput. 18, No. 3, ...",65F05;65F50;68W30;65Y05,English,,sparse matrix factorization;parallel sparse so...,2007,10.1007/s00200-007-0037-x


In [47]:
for i,current in papers_df.iterrows():
    document_title = current['document_title'].strip()
    document_id = current['id'].strip()

    data = {
        'labels':{'en':{'language':'en','value':document_title}},
        'claims': [
            # instance of 'scholarly article'
            {'mainsnak':{
                'snaktype':'value', 'property':'P31', 'datavalue':{'type':'wikibase-entityid', 'value': {'entity-type':'item','id':'Q13442814'}}},
            'type': 'statement', 'rank': 'normal'},
            # zbMath work id
            {'mainsnak':{
                'snaktype':'value', 'property':'P894', 'datavalue': {'type':'string', 'value': document_id}},
            'type': 'statement', 'rank': 'normal'}
            ]
    }
    # import into wikibase, save qid
    papers_df.loc[i, 'qid'] = create_entity(session, data)

In [48]:
papers_df.head(3)

Unnamed: 0,id,author,author_ids,document_title,source,classifications,language,links,keywords,publication_year,doi,qid
0,oai:zbmath.org:5181224,"Zlatev, Zahari; Dimov, Ivan",zlatev.zahari;dimov.ivan-todor,Computational and numerical challenges in envi...,Studies in Computational Mathematics 13. Amste...,65M20;65-02;65Y05;35Kxx,English,http://www.sciencedirect.com/science/book/9780...,textbook;parallel computation;semidiscretizati...,2006,,Q1044
1,oai:zbmath.org:5187737,"Buttari, Alfredo; D'Ambra, Pasqua; di Serafino...",buttari.alfredo;dambra.pasqua;di-serafino.dani...,2LEV-D2P4: a package of high-performance preco...,"Appl. Algebra Eng. Commun. Comput. 18, No. 3, ...",65F35;65F10;65Y15;65Y05,English,,parallel numerical software;algebraic two-leve...,2007,10.1007/s00200-007-0035-z,Q1045
2,oai:zbmath.org:5187739,"Gupta, Anshul",gupta.anshul,A shared- and distributed-memory parallel gene...,"Appl. Algebra Eng. Commun. Comput. 18, No. 3, ...",65F05;65F50;68W30;65Y05,English,,sparse matrix factorization;parallel sparse so...,2007,10.1007/s00200-007-0037-x,Q1046


### Append the paper-to-software relations
**Papers may use multiple softwares**. The relation between papers and software is in an additional file `data/all_papers_software.csv.zip`.

In [69]:
# load the list of zbMath papers
papers_software_df = pd.read_csv('data/all_papers_software.csv.zip')
papers_software_df.head(3)

Unnamed: 0.1,Unnamed: 0,id,software
0,0,oai:zbmath.org:5181224,SuperLU-DIST
1,1,oai:zbmath.org:5187737,SuperLU-DIST
2,3,oai:zbmath.org:5187739,SuperLU-DIST


In [78]:
import math

def edit_entity(session, qid, data):
    token = get_csrf_token(session)
    
    params = {
        'id': qid,
        'action': 'wbeditentity',
        'format': 'json',
        'data': json.dumps(data),
        'token': token
    }
    r1 = session.post(WIKIBASE_API, data=params)
    r1.json = r1.json()
    
    # raise when edit failed
    if 'error' in r1.json.keys():
        raise WBAPIException(r1.json['error'])

    return r1.json['entity']['id']

groups = papers_software_df.groupby('id')
for paper_ref,group in groups:
    paper_id = papers_df[papers_df['id'] == paper_ref]['qid'].values[0]
    for software_name in group['software']:
        software_id = software_df[software_df['Len'] == software_name]['qid']
        software_id = software_id.values[0]
        data = {
            'claims': [
                {
                    # describe project that uses
                    'mainsnak':{
                        'snaktype':'value', 'property':'P4510', 'datavalue':{'type':'wikibase-entityid', 'value': {'entity-type':'item','id':software_id}}},
                        'type': 'statement', 'rank': 'normal'}]
        }
        edit_entity(session, paper_id, data)

In [68]:
for paper_ref,group in groups:
    print(paper_ref)

oai:zbmath.org:1000050
oai:zbmath.org:1001349
oai:zbmath.org:1004450
oai:zbmath.org:1005351
oai:zbmath.org:1011107
oai:zbmath.org:1013332
oai:zbmath.org:1014252
oai:zbmath.org:1014266
oai:zbmath.org:1016859
oai:zbmath.org:1016860
oai:zbmath.org:1019270
oai:zbmath.org:1021684
oai:zbmath.org:1023842
oai:zbmath.org:1031825
oai:zbmath.org:1036497
oai:zbmath.org:1039213
oai:zbmath.org:1040371
oai:zbmath.org:1040458
oai:zbmath.org:1045524
oai:zbmath.org:1046171
oai:zbmath.org:1046174
oai:zbmath.org:1051591
oai:zbmath.org:1057411
oai:zbmath.org:1058996
oai:zbmath.org:1059001
oai:zbmath.org:1059002
oai:zbmath.org:1059004
oai:zbmath.org:1059009
oai:zbmath.org:1059012
oai:zbmath.org:1059013
oai:zbmath.org:1059018
oai:zbmath.org:1059468
oai:zbmath.org:1059473
oai:zbmath.org:1059486
oai:zbmath.org:1059500
oai:zbmath.org:1062219
oai:zbmath.org:1067231
oai:zbmath.org:1067946
oai:zbmath.org:1067947
oai:zbmath.org:1072313
oai:zbmath.org:1076156
oai:zbmath.org:1081904
oai:zbmath.org:1085338
oai:zbmath.

In [75]:
paper_ref = 'oai:zbmath.org:7073637'
papers_df[papers_df['id'] == paper_ref]['qid'].values[0]

'Q521'

In [71]:
idx = papers_df['qid'].isna()
papers_df[~idx]

Unnamed: 0,id,author,author_ids,document_title,source,classifications,language,links,keywords,doi,publication_year,qid
0,oai:zbmath.org:7073637,"Fang, Jun; Qian, Jianliang; Zepeda-Núñez, Leon...",fang.jun;qian.jianliang;zepeda-nunez.leonardo;...,A hybrid approach to solve the high-frequency ...,"J. Comput. Phys. 371, 261-279 (2018).",65N30;35J05;65N15,English,https://arxiv.org/abs/1710.02307,Helmholtz equation;Babich's expansion;ray-FEM;...,10.1016/j.jcp.2018.03.011,2018,Q521
1,oai:zbmath.org:5181224,"Zlatev, Zahari; Dimov, Ivan",zlatev.zahari;dimov.ivan-todor,Computational and numerical challenges in envi...,Studies in Computational Mathematics 13. Amste...,65M20;65-02;65Y05;35Kxx,English,http://www.sciencedirect.com/science/book/9780...,textbook;parallel computation;semidiscretizati...,,2006,Q522
2,oai:zbmath.org:5969837,"Manguoglu, Murat",manguoglu.murat,A domain-decomposing parallel sparse linear sy...,"J. Comput. Appl. Math. 236, No. 3, 319-325 (20...",65F10;65F05;65Y05,English,,sparse linear systems;parallel solvers;direct ...,10.1016/j.cam.2011.07.017,2011,Q523
3,oai:zbmath.org:5982908,"Bientinesi, Paolo; Eijkhout, Victor; Kim, Kyun...",bientinesi.paolo;eijkhout.victor-l;kim.kyungjo...,Sparse direct factorizations through unassembl...,"Comput. Methods Appl. Mech. Eng. 199, No. 9-12...",65N30;65F50,English,http://citeseerx.ist.psu.edu/viewdoc/summary?d...,factorizations;Gaussian elimination;sparse mat...,10.1016/j.cma.2009.07.012,2010,Q524
4,oai:zbmath.org:5187737,"Buttari, Alfredo; D'Ambra, Pasqua; di Serafino...",buttari.alfredo;dambra.pasqua;di-serafino.dani...,2LEV-D2P4: a package of high-performance preco...,"Appl. Algebra Eng. Commun. Comput. 18, No. 3, ...",65F35;65F10;65Y15;65Y05,English,,parallel numerical software;algebraic two-leve...,10.1007/s00200-007-0035-z,2007,Q525
...,...,...,...,...,...,...,...,...,...,...,...,...
95,oai:zbmath.org:1088829,"Gladwell, I.; Bouas-Dockery, K.; Brankin, R. W.",gladwell.ian;bouas-dockery.k;brankin.r-w,A Fortran 90 separable Hamiltonian system solver,"Appl. Numer. Math. 25, No. 2-3, 207-217 (1997).",65L05;34-04;37J99;65L50;70H05;65L70,English,,Hamiltonian system;initial value problem;steps...,10.1016/S0168-9274(97)00060-3,1997,Q616
96,oai:zbmath.org:5604874,"Papadimitriou, Dimitrios I.; Giannakoglou, Kyr...",papadimitriou.dimitrios-i;giannakoglou.kyriakos-c,Aerodynamic shape optimization using first and...,"Arch. Comput. Methods Eng. 15, No. 4, 447-488 ...",76N25;76M25;76-02,English,,,10.1007/s11831-008-9025-y,2008,Q617
97,oai:zbmath.org:6980134,"Marco, Onofre; Ródenas, Juan José; Fuenmayor, ...",marco.onofre;rodenas.juan-jose;fuenmayor.franc...,An extension of shape sensitivity analysis to ...,"Comput. Mech. 62, No. 4, 701-723 (2018).",74S05;74P10;65D07;65N30,English,http://hdl.handle.net/10251/133375,Cartesian grid-FEM;sensitivity analysis;veloci...,10.1007/s00466-017-1522-0,2018,Q618
98,oai:zbmath.org:1384296,"Marí Beffa, Gloria; Olver, Peter J.",mari-beffa.gloria;olver.peter-j,Differential invariants for parametrized proje...,"Commun. Anal. Geom. 7, No. 4, 807-839 (1999).",53A55;35Q53;53A20,English,,KdV equation;KP equation;\(n\)th order moving ...,10.4310/CAG.1999.v7.n4.a6,1999,Q619
