# A Knowledge Graph for the SEEKCommons Project

The Resource Hub is a *[knowledge graph](https://en.wikipedia.org/wiki/Knowledge_graph)* curated by the __[SEEKCommons Project](https://seekcommons.org/)__ that connects the various resources and entities related to the SEEKCommons Project. The graph is built using the [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page) knowledge base and is accessible via the [SPARQL](https://en.wikipedia.org/wiki/SPARQL) query language.

Quite a mouthful, right? Let's break it down!

## What is a Knowledge Graph?

A (lame) example:

<img src="img/Knowledge_graph_entity_alignment.png" width="800">

Why is that a good idea?

> ...combine data catalogs and virtualization to create a so-called semantic data fabric. This means that the data stays where it is and is accessed via the semantic layer, with the data catalog pointing to the underlying data storage systems.

<img src="img/Data_Silos.png" width="600">

> ...a solid foundation for the creation of such high quality data graphs can only be established if sufficient time is invested in the creation and maintenance of curated taxonomies and ontologies.

## Where to start?

Begin at the beginning? There is no beginning. There is no end. It's a circle. It's a cycle. It's a graph.

The graph is centered around the concept of a *resource*. A resource can be anything from a researcher, a research organization, a software tool, a dataset or a publication. Each resource is connected to other resources through various relationships. For example, a publication can be connected to a researcher who created it, a project that funded it, a publication that used it, or a software tool that processed it.

Semantic applications can be built on top of the graph to provide a variety of services. For the time being, the graph is accessible via the SPARQL query language.

<img src="img/SPARQL.png" width="800">

```sparql
SELECT ?column1 ?column2 ... ?columnN
WHERE
{
  ?subject predicate1 object1 .
  ?subject predicate2 object2 .
  ...
}
```

### Researchers

What is a researcher? A researcher is a human being who is involved in the creation of knowledge. This can be through the creation of new knowledge, the dissemination of existing knowledge, or the application of knowledge to solve problems. Researchers can be affiliated with research organizations, and can be funded by research projects. They can also be connected to publications, datasets, and software tools.

How many researchers are there in the Wikidata graph? Let's find out!

```sparql
SELECT (COUNT(?researcher) AS ?count)
WHERE
{
  # instance of (https://www.wikidata.org/wiki/Property:P31) = human (https://www.wikidata.org/wiki/Q5)
  ?researcher wdt:P31 wd:Q5 .
  # occupation (https://www.wikidata.org/wiki/Property:P106) = researcher (https://www.wikidata.org/wiki/Q1650915)
  ?researcher wdt:P106 wd:Q1650915 .
}
```

In [14]:
import python.data_extraction as DEX

query = '''
SELECT (COUNT(?researcher) AS ?count)
WHERE
{
  ?researcher wdt:P31 wd:Q5 .
  ?researcher wdt:P106 wd:Q1650915 .
}
'''

data_extracter = DEX.WikiDataQueryResults(query)
df = data_extracter.load_as_dataframe()
df['count'][0]

'1950959'

There are slightly under 2 million researchers in the Wikidata graph.

How many researchers are there with an ORCID iD?

```sparql
SELECT (COUNT(?researcher) AS ?count)
WHERE
{
  # instance of (https://www.wikidata.org/wiki/Property:P31) = human (https://www.wikidata.org/wiki/Q5)
  ?researcher wdt:P31 wd:Q5 .
  # occupation (https://www.wikidata.org/wiki/Property:P106) = researcher (https://www.wikidata.org/wiki/Q1650915)
  ?researcher wdt:P106 wd:Q1650915 .
  # ORCID iD (https://www.wikidata.org/wiki/Property:P496)
  ?researcher wdt:P496 ?orcid .
}
```

In [18]:
query = '''
SELECT (COUNT(?researcher) AS ?count)
WHERE
{
  ?researcher wdt:P31  wd:Q5 .
  ?researcher wdt:P106 wd:Q1650915 .
  ?researcher wdt:P496 ?orcid .
}
'''

data_extracter = DEX.WikiDataQueryResults(query)
df = data_extracter.load_as_dataframe()
df['count'][0]

'1695800'

It appears that the majority of researchers in the Wikidata graph does have an ORCID iD. (Good! => We can use this to link to other resources.)

Let's look at the first 10 researchers in the Wikidata graph!

```sparql
SELECT ?researcher ?researcherLabel ?orcid
WHERE
{
  # instance of (https://www.wikidata.org/wiki/Property:P31) = human (https://www.wikidata.org/wiki/Q5)
  ?researcher wdt:P31 wd:Q5 .
  # occupation (https://www.wikidata.org/wiki/Property:P106) = researcher (https://www.wikidata.org/wiki/Q1650915)
  ?researcher wdt:P106 wd:Q1650915 .
  # ORCID iD (https://www.wikidata.org/wiki/Property:P496)
  ?researcher wdt:P496 ?orcid .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 10
```

In [19]:
query = '''
SELECT ?researcher ?researcherLabel ?orcid
WHERE
{
  ?researcher wdt:P31  wd:Q5 .
  ?researcher wdt:P106 wd:Q1650915 .
  ?researcher wdt:P496 ?orcid .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 10
'''

data_extracter = DEX.WikiDataQueryResults(query)
df = data_extracter.load_as_dataframe()
df

Unnamed: 0,researcher,orcid,researcherLabel
0,http://www.wikidata.org/entity/Q450572,0000-0003-2634-789X,Jacques Laskar
1,http://www.wikidata.org/entity/Q654415,0000-0002-6340-9247,David Healy
2,http://www.wikidata.org/entity/Q783942,0000-0003-1319-6914,Paul Stamets
3,http://www.wikidata.org/entity/Q942873,0000-0003-1963-8840,Salvador Macip i Maresma
4,http://www.wikidata.org/entity/Q1098841,0000-0001-7471-9817,Claus Tieber
5,http://www.wikidata.org/entity/Q98881001,0000-0002-6991-2214,Armin Wolf
6,http://www.wikidata.org/entity/Q1252182,0000-0003-3354-1738,József Rácz
7,http://www.wikidata.org/entity/Q1265475,0000-0001-5005-4961,Duncan J. Watts
8,http://www.wikidata.org/entity/Q1333579,0000-0003-1331-0318,Robert L. Byer
9,http://www.wikidata.org/entity/Q1388715,0000-0001-5442-1669,Henry de Lumley


Let's explore a few of them in the Wikidata graph!

How do we connect researchers to a particular field of research?

Candidates:
- Affiliation(s)
- Publications (We have the ORCID iD!)
- Research projects.

### To be continued...