The goal of this notebook is to use WikiData's citation and co-author information to generate centrality metrics for Fraunhofer SCAI.BIO. Keep in mind that WikiData is not necessarily complete, so these calculations are only based on the information that is available.

In [1]:
import sys
import time

import networkx as nx
import pandas as pd
import requests
import SPARQLWrapper as sw

In [2]:
print(sys.version)

3.7.3 (default, Mar 27 2019, 09:23:15) 
[Clang 10.0.1 (clang-1001.0.46.3)]


In [3]:
print(time.asctime())

Sat May 11 14:10:10 2019


This SPARQL query is borrowed from Scholia.

In [4]:
query = """
SELECT
  ?author1 ?author1Label ?image1 ?rgb
  ?author2 ?author2Label ?image2 
WITH {
  SELECT
    ?author1 (SAMPLE(?image1_) AS ?image1)
    ?author2 (SAMPLE(?image2_) AS ?image2)
    (SAMPLE(?rgb_) AS ?rgb)
  WHERE {
    wd:Q1451981 ^wdt:P361* / ^( wdt:P108 | wdt:P1416 | wdt:P463 ) ?author1 , ?author2 . 
    ?work wdt:P50 ?author1 , ?author2 .

    # Only display co-authorship for certain types of documents
    # Journal and conference articles, books, not (yet?) software
    VALUES ?publication_type { wd:Q13442814 wd:Q571 wd:Q26973022 wd:Q17928402 wd:Q947859 }
    FILTER EXISTS { ?work wdt:P31 ?publication_type . }

    # No self-links.
    FILTER (?author1 != ?author2)
    
    # Images
    OPTIONAL { ?author1 wdt:P18 ?image1_ }
    OPTIONAL { ?author2 wdt:P18 ?image2_ }

    # Coloring of the nodes
    BIND("FFFFFF" AS ?rgb_)
  }
  GROUP BY ?author1 ?author2
} AS %result
WHERE {
  INCLUDE %result
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,da,de,es,fr,jp,sv,ru,zh".
  }
}
"""

Get the results. Alternatively, the same code could be used:

```python
url = 'https://query.wikidata.org/bigdata/namespace/wdq/sparql'
results = requests.get(url, params={'query': query, 'format': 'json'}).json()
```

In [5]:
sparql = sw.SPARQLWrapper("https://query.wikidata.org/sparql")
sparql.setQuery(query)
sparql.setReturnFormat(sw.JSON)
results = sparql.query().convert()

Aggregate the results into a NetworkX graph.

In [6]:
graph = nx.Graph([
    (result['author1Label']['value'], result['author2Label']['value'])
    for result in results['results']['bindings']
])

Calculate various centrality metrics and generate a dataframe.

In [7]:
degrees = dict(nx.degree(graph))
degree_centralities = nx.degree_centrality(graph)
betweenness_centralities = nx.betweenness_centrality(graph)

df = pd.DataFrame(
    [
        (
            name, 
            round(100 * betweenness_centralities[name], 2), 
            round(100 * degree_centralities[name], 2),
            degrees[name],
        )
        for name in sorted(betweenness_centralities, key=betweenness_centralities.get, reverse=True)
    , 
    columns=[
        'Name', 
        'Betweenness Centrality (%)', 
        'Degree Centrality (%)',
        'Degree',
    ],
).set_index('Name')

In [13]:
df = pd.DataFrame.from_dict(
    {
        name: {   
            'Between': round(100 * betweenness_centralities[name], 2), 
            '': round(100 * degree_centralities[name], 2),
            '': degrees[name],
        }
        for name in sorted(betweenness_centralities, key=betweenness_centralities.get, reverse=True)
    }, 
    orient='index',
)
df

Unnamed: 0,0,1,2
Martin Hofmann-Apitius,44.93,100.0,35
Daniel Domingo-Fernández,11.71,71.43,25
Charles Tapley Hoyt,5.43,54.29,19
Christian Ebeling,4.39,54.29,19
Alpha Tom Kodamullil,3.0,45.71,16
Anandhi Iyappan,1.38,37.14,13
Juliane Fluck,0.94,22.86,8
Erfan Younesi,0.87,28.57,10
Stephan Springstubbe,0.73,34.29,12
Reagon Karki,0.65,34.29,12


The betweenness centrality of an author represents how collaborative they are, i.e., how many different groups of people they have worked with. The degree and degree centrality represents the total number of co-authors, and the normalized total number of co-authors against the author with the highest degree.

In [8]:
df

Unnamed: 0_level_0,Betweenness Centrality (%),Degree Centrality (%),Degree
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Martin Hofmann-Apitius,44.93,100.0,35
Daniel Domingo-Fernández,11.71,71.43,25
Charles Tapley Hoyt,5.43,54.29,19
Christian Ebeling,4.39,54.29,19
Alpha Tom Kodamullil,3.0,45.71,16
Anandhi Iyappan,1.38,37.14,13
Juliane Fluck,0.94,22.86,8
Erfan Younesi,0.87,28.57,10
Stephan Springstubbe,0.73,34.29,12
Reagon Karki,0.65,34.29,12
