# Import data into the MaRDI Portal

Used [Protege online](https://webprotege.stanford.edu/) to model the required classes and properties.

The resulting model was saved in OWL/XMLformat (`data/mardi-portal.owl`).

Used [WebVOWL](http://vowl.visualdataweb.org/webvowl.html) to visualize the model.

![title](data/mardi-portal.owl.svg)

1. Start a local wikibase, e.g. using [MaRDI4NFDI/portal-compose](https://github.com/MaRDI4NFDI/portal-compose)
2. Increase memory limit by setting `ini_set( 'memory_limit', '1024M' );` in LocalSettings.d/LocalSettings.override.php
3. ~Enable federated properties from Wikidata by setting `$wgWBRepoSettings['federatedPropertiesEnabled'] = true;` in Localsettings.d/LocalSettings.override.php~
4. Import the items file `data/Wikidata_items.xml` into the local wikibase. This file contains item and property pages that have been exported from Wikidata.

## Import the authors list
A subsample of the authors list was created in notebook `filter_papers_by_software.ipyb`. This list contains the authors of a papers related to the first 1000 software entries in the list of softwares (`data/swMath-software-list.csv`). The list of authors is in file `data/all_authors.csv.zip`. 

In [1]:
# load the list of authors
import pandas as pd

# load the list of zbMath authors
authors_df = pd.read_csv('data/all_authors.csv.zip')
authors_df.head()

Unnamed: 0,author_id,author_name
0,aangenent.w-h-t-m,"Aangenent, W. H. T. M."
1,aardal.karen-i,"Aardal, Karen"
2,aarts.gert,"Aarts, Gert"
3,aavatsmark.ivar,"Aavatsmark, Ivar"
4,abad.alberto-j,"Abad, Alberto"


Map the columns to MaRDI-Portal properties, reformat the data according to [the CSV file syntax expected by Quickstatements](https://www.wikidata.org/wiki/Help:QuickStatements#CSV_file_syntax). 

In [13]:
from datetime import datetime

import_authors_df = pd.DataFrame()
import_authors_df['qid'] = len(authors_df) * [''] # leave empty to create new item
import_authors_df['Len'] = authors_df['author_name']
import_authors_df['P31'] = len(authors_df) * ['Q5'] # instance of 'human'
import_authors_df['P1556'] = authors_df['author_id'] # zbMath author id
import_authors_df['#'] = len(authors_df) * ['{}: imported from zbMath Open API'.format(datetime.now())]

In [14]:
import_authors_df.head()

Unnamed: 0,qid,Len,P31,#,P1556
0,,"Aangenent, W. H. T. M.",Q5,2022-01-26 14:36:29.973332: imported from zbMa...,aangenent.w-h-t-m
1,,"Aardal, Karen",Q5,2022-01-26 14:36:29.973332: imported from zbMa...,aardal.karen-i
2,,"Aarts, Gert",Q5,2022-01-26 14:36:29.973332: imported from zbMa...,aarts.gert
3,,"Aavatsmark, Ivar",Q5,2022-01-26 14:36:29.973332: imported from zbMa...,aavatsmark.ivar
4,,"Abad, Alberto",Q5,2022-01-26 14:36:29.973332: imported from zbMa...,abad.alberto-j


In [15]:
# save as csv
import_authors_df.to_csv('data/qs_import_authors.csv', index=None) # suppress index to make valid CSV for import 

Copy and paste the data in the csv into Quickstatements. If you started the local wikibase using [MaRDI4NFDI/portal-compose](https://github.com/MaRDI4NFDI/portal-compose), then Quickstatements can be found at http://localhost:8840.

## Import the software list
All software entries have already been imported into the MaRDI portal.
Here I will import the first 1000 (out of 40000) software entries into the local wiki for testing.

In [18]:
# load the list of zbMath authors
software_df = pd.read_csv('data/swMATH-software-list.csv')
software_df = software_df[:1000]
software_df.head()

Unnamed: 0,qid,P13,Len,#
0,,'0',swMATH,initial csv import 2021-12-17
1,,'1',FORTRAN,initial csv import 2021-12-17
2,,'2',SuperLU-DIST,initial csv import 2021-12-17
3,,'3',WHISPAR,initial csv import 2021-12-17
4,,'4',MULTI2D,initial csv import 2021-12-17


Map the columns to MaRDI-Portal properties, reformat the data according to [the CSV file syntax expected by Quickstatements](https://www.wikidata.org/wiki/Help:QuickStatements#CSV_file_syntax). The zbMath software id 

In [None]:
import_software_df = pd.DataFrame()
import_software_df['qid'] = len(software_df) * [''] # leave empty to create new item
import_software_df['Len'] = software_df['Len']
import_authors_df['P31'] = len(software_df) * ['Q5'] # instance of 'human'
import_authors_df['P1556'] = authors_df['author_id'] # zbMath author id
import_authors_df['#'] = len(authors_df) * ['{}: imported from zbMath Open API'.format(datetime.now())]

## Import the articles list
A subsample of the articleslist was created in notebook `filter_papers_by_software.ipyb`. This list contains the papers related to the first 1000 software entries in the list of softwares (`data/swMath-software-list.csv`). The list of papers is in file `data/all_papers.csv.zip`. 

# Example queries
Some example SPARQL queries that *should* be possible with the model and can be used to test it:
* List all papers that use a certain software
* List all papers by one author, sort by date
* List all papers published by a certain journal, sort by author
