# Working with data from the EBRAINS Knowledge Graph: an advanced tutorial

This notebook contains a tutorial for working with the [EBRAINS Knowledge Graph](https://docs.kg.ebrains.eu),
which is the metadata management system of the [EBRAINS Research Infrastructure](https://www.ebrains.eu).

There are many different software tools for working with the Knowledge Graph (KG).
In this tutorial, we will demonstrate how to work with the [fairgraph](https://fairgraph.readthedocs.io) library for Python,
which is based on 
- the [ebrains-kg-core](https://pypi.org/project/ebrains-kg-core/) Python library
- the [openMINDS](https://openminds-documentation.readthedocs.io/) metadata schemas 

A tutorial on working with the KG in Javascript is available [here](https://github.com/apdavison/ebrains-kg-tutorial-javascript).

## 1. Getting started

If you are running this notebook in the [EBRAINS Lab](https://lab.ebrains.eu/) using the latest kernel from the EBRAINS Software Distribution, fairgraph is already installed.

If you are running this elsewhere, install fairgraph using:

```
pip install fairgraph
```

### Authentication / authorization

Using the [KG API](https://core.kg.ebrains.eu/swagger-ui.html) is restricted to EBRAINS users who have agreed to the [Terms of Use](https://docs.kg.ebrains.eu/search-terms-of-use.html?v=2.2) (please read them now!) 
When making a request to the API, the user must provide an authorization token.

#### Option A. If you are running this notebook in the EBRAINS Lab

In this case, fairgraph knows how to obtain a token from your environment (since you've already logged into the Lab using your EBRAINS account). 
You can therefore go ahead and create a client for communicating with the KG:

In [None]:
from fairgraph import KGClient

kg_client = KGClient(host="core.kg.ebrains.eu")

#### Option B. If you are running the notebook locally on your own machine

You will need to obtain a token from somewhere else.
Here are three possibilities:

1. Log-in to the [KG Editor](https://editor.kg.ebrains.eu) app, then click on the Account icon (top right), click on "Copy token to clipboard", then paste the token into the code cell below as the variable `auth_token`.
Note that this token has quite a short validity, only about 10 minutes, so you will need to repeat this process, and re-execute the cell from time to time as you work through the tutorial.

![Screenshot of "Copy token to clipboard" button in KG Editor app](images/get-token.png)

2. In a Jupyter notebook in the [EBRAINS Lab](https://lab.ebrains.eu/), run the following code:

```
from clb_nb_utils.oauth import get_token
get_token()
```

and then copy/paste the token as above. Tokens obtained from the Lab are valid for longer.

In [7]:
from fairgraph import KGClient

auth_token = "eyJh..."
kg_client = KGClient(token=auth_token, host="core.kg.ebrains.eu")

3. Create a KG client without a token, then run any code that needs to access the KG. This will give you a link that you can click to log-in via a web browser, you can then return to this notebook.

In [None]:
from fairgraph import KGClient

kg_client = KGClient(host="core.kg.ebrains.eu")
kg_client.user_info()

## 2. Key concepts

The core of the EBRAINS Knowledge Graph is a [graph database](https://en.wikipedia.org/wiki/Graph_database) containing detailed metadata about neuroscience datasets, models, software, and other research products.
The actual data/code of the datasets, models, etc. are stored outside the KG, but the KG stores links to the data/code locations.
The EBRAINS KG is implemented using an open-source software system called MarmotGraph (developed in the Human Brain Project and EBRAINS projects).

The primary user interface of the KG is a web API, whose end-points are [documented interactively here](https://core.kg.ebrains.eu/swagger-ui/index.html).
In this tutorial we will use the fairgraph Python library to send requests to this API and handle the responses,
but the API can be accessed using many different programming languages,
and EBRAINS also provides libraries for [Java](https://central.sonatype.com/artifact/eu.ebrains.kg/kg-core-sdk[) and [Javascript/Typescript](https://www.npmjs.com/package/@ebrains/kg-core) to simplify working with the API.

The primary document type used by the KG API is [JSON-LD](https://json-ld.org).
This builds on the widely-used JSON format by adding features to support linked data,
i.e., data that follows a graph structure.

While JSON-LD specifies the form of the data/metadata, it doesn't specify the content.
[openMINDS](https://openminds-documentation.readthedocs.io/) is a project to develop metadata schemas and libraries for neuroscience and related fields.
It specifies what types of object we can store metadata about, and what properties each type should have, 
e.g., a [`Person`](https://openminds-documentation.readthedocs.io/en/latest/schema_specifications/core/actors/person.html) should have a `givenName` and a `familyName`.
All metadata in the KG follows the openMINDS schemas.

fairgraph provides a Python class for each openMINDS type, and lets you create or access KG nodes as Python objects. 
Behind the scenes, fairgraph takes care of:
- converting the Python objects into/from JSON-LD documents
- communicating with the KG.

fairgraph also provides a simplified query interface, while still allowing low-level access to the underlying query language if necessary. 

## 3. Retrieving a metadata node based on its ID

As noted above, each node in the KG can be represented by a Python object. 
If you know the hexadecimal identifier of the node and its type, the object can be retrieved using the `from_id()` method of the appropriate class. Here, we retrieve a specific dataset:

In [2]:
import fairgraph.openminds.core as omcore
omcore.set_error_handling(None)

dataset_id = "bd5f91ff-e829-4b85-92eb-fc56991541f1"
dataset_version = omcore.DatasetVersion.from_id(dataset_id, kg_client)

In [3]:
dataset_version.show()

id                       https://kg.ebrains.eu/api/instances/bd5f91ff-e829-4b85-92eb-fc56991541f1
space                    dataset
type                     https://openminds.ebrains.eu/core/DatasetVersion
accessibility            KGProxy([<class 'fairgraph.openminds.controlled_terms.product_accessibility.ProductAccessibility'>], 'https://kg.ebrains.eu/api/instances/b2ff7a47-b349-48d7-8ce4-cf51868675f1')
data_types               KGProxy([<class 'fairgraph.openminds.controlled_terms.semantic_data_type.SemanticDataType'>], 'https://kg.ebrains.eu/api/instances/f468ee45-37a6-4e71-8b70-0cbe66d367db')
description
digital_identifier       KGProxy((<class 'fairgraph.openminds.core.digital_identifier.doi.DOI'>, <class 'fairgraph.openminds.core.digital_identifier.identifiers_dot_org_id.IdentifiersDotOrgID'>), 'https://kg.ebrains.eu/api/instances/df93b012-3bb6-4565-bce0-3bd2d3aed63b')
ethics_assessment        KGProxy([<class 'fairgraph.openminds.controlled_terms.ethics_assessment.EthicsAssessment'

<!-- Notes to self: 

1. rename KGProxy to Link?
2. add an `open_in_search_ui()` method

-->

You can view the same dataset in the KG Search UI by clicking on this link: https://search.kg.ebrains.eu/instances/bd5f91ff-e829-4b85-92eb-fc56991541f1.

You will notice in the output from `.show()` that:

1. for many of the properties, the value is a `KGProxy()` object, or a list of such objects. These represent links to other nodes in the graph. To see what these nodes contain, we need to follow them.
2. the "description" property is empty, even though the dataset card in the Search UI contains a description. We'll come back to this in Section 5, below.

## 4. Following links in the graph

To follow links in the graph, we use the `resolve()` method:

In [4]:
accessibility = dataset_version.accessibility.resolve(kg_client)
accessibility.show()

id          https://kg.ebrains.eu/api/instances/b2ff7a47-b349-48d7-8ce4-cf51868675f1
space       controlled
type        https://openminds.ebrains.eu/controlledTerms/ProductAccessibility
definition  With 'free access' selected, data and metadata are both released and become immediately available without any access restrictions.
name        free access


In [5]:
doi = dataset_version.digital_identifier.resolve(kg_client)
doi.show()

id          https://kg.ebrains.eu/api/instances/df93b012-3bb6-4565-bce0-3bd2d3aed63b
space       dataset
type        https://openminds.ebrains.eu/core/DOI
identifier  https://doi.org/10.25493/YJFW-HPY


In [6]:
techniques = [tech.resolve(kg_client) for tech in dataset_version.techniques]
for tech in techniques:
    print(tech.name)

current clamp
whole cell patch clamp


### Reverse links

We can also follow *reverse* links, i.e. where the property of another object points *to* the dataset version.
Reverse properties are not shown by default by `obj.show()`, but you can see them by passing the `include_empty_properties` option:

In [7]:
dataset_version.show(include_empty_properties=True)

id                         https://kg.ebrains.eu/api/instances/bd5f91ff-e829-4b85-92eb-fc56991541f1
space                      dataset
type                       https://openminds.ebrains.eu/core/DatasetVersion
accessibility              KGProxy([<class 'fairgraph.openminds.controlled_terms.product_accessibility.ProductAccessibility'>], 'https://kg.ebrains.eu/api/instances/b2ff7a47-b349-48d7-8ce4-cf51868675f1')
authors                    None
behavioral_protocols       None
copyright                  None
custodians                 None
data_types                 KGProxy([<class 'fairgraph.openminds.controlled_terms.semantic_data_type.SemanticDataType'>], 'https://kg.ebrains.eu/api/instances/f468ee45-37a6-4e71-8b70-0cbe66d367db')
description
digital_identifier         KGProxy((<class 'fairgraph.openminds.core.digital_identifier.doi.DOI'>, <class 'fairgraph.openminds.core.digital_identifier.identifiers_dot_org_id.IdentifiersDotOrgID'>), 'https://kg.ebrains.eu/api/instances/df93b012-3bb6

Reverse properties are represented by a `KGQuery` object. These can also be resolved:

In [8]:
dataset = dataset_version.is_version_of.resolve(kg_client)
dataset.show()

id            https://kg.ebrains.eu/api/instances/65b1b9a3-af50-4c9c-a4e6-7556bc700da0
space         dataset
type          https://openminds.ebrains.eu/core/Dataset
authors       [KGProxy([<class 'fairgraph.openminds.core.actors.person.Person'>], 'https://kg.ebrains.eu/api/instances/5dff4ef9-bd56-4aee-979a-6ed15f65d235'), KGProxy([<class 'fairgraph.openminds.core.actors.person.Person'>], 'https://kg.ebrains.eu/api/instances/2843990a-69dd-468b-a1d3-ff9589b485ae')]
custodians    KGProxy([<class 'fairgraph.openminds.core.actors.person.Person'>], 'https://kg.ebrains.eu/api/instances/2843990a-69dd-468b-a1d3-ff9589b485ae')
description   In this study we analyzed the intrinsic electrophysiological properties of CA1 excitatory hippocampal neurons in a mouse model of Alzheimer’s Disease (AD) at two age points: a presymptomatic age (3-4 months) and a symptomatic age: (9-10 months). At this latter age, this APPPS1 model harbors amyloid plaques and hippocampus-dependent cognitive alterations. Litt

### Following multiple links at once

Following links one at a time can be rather slow, as it requires many network requests.

If you know in advance the paths you wish to follow through the graph, you can specify them in the `follow_links` argument to the `resolve()` method.
For example:

In [9]:
print("# Before")
print(f"Digital identifier: {type(dataset_version.digital_identifier)}")
print(f"Ethics assessment:  {type(dataset_version.ethics_assessment)}")
print(f"Study targets:      {type(dataset_version.study_targets)}")
print(f"Is version of:      {type(dataset_version.is_version_of)}")

print("\n# After")
dataset_version.resolve(
    kg_client,
    follow_links={
        "digital_identifier": {},
        "ethics_assessment": {},
        "study_targets": {},
        "is_version_of": {
            "authors": {}
        }
    })
print(f"Digital identifier: {type(dataset_version.digital_identifier)}")
print(f"Ethics assessment:  {type(dataset_version.ethics_assessment)}")
print(f"Study targets:      {type(dataset_version.study_targets)}")
print(f"Is version of:      {type(dataset_version.is_version_of)}")

print("Authors:")
for author in dataset_version.is_version_of[0].authors:
    print(f"  {author.given_name} {author.family_name}")

# Before
Digital identifier: <class 'fairgraph.kgproxy.KGProxy'>
Ethics assessment:  <class 'fairgraph.kgproxy.KGProxy'>
Study targets:      <class 'fairgraph.kgproxy.KGProxy'>
Is version of:      <class 'fairgraph.kgquery.KGQuery'>

# After
Digital identifier: <class 'fairgraph.openminds.core.digital_identifier.doi.DOI'>
Ethics assessment:  <class 'fairgraph.openminds.controlled_terms.ethics_assessment.EthicsAssessment'>
Study targets:      <class 'fairgraph.openminds.controlled_terms.disease.Disease'>
Is version of:      <class 'list'>
Authors:
  Ana Rita Salgueiro-Pereira
  Hélène Marie


Note that this request includes following a "reverse" link ("is_version_of") followed by a "forward" link ("authors").
Any combination of forward and reverse links can be followed.

Following links can even be done at the same time as retrieving the base node, reducing the number of network requests to just one:

In [10]:
dataset_version = omcore.DatasetVersion.from_id(
    dataset_id,
    kg_client,
    follow_links={
        "digital_identifier": {},
        "ethics_assessment": {},
        "is_version_of": {
            "authors": {}
        }
    }
)
dataset_version.show()

id                       https://kg.ebrains.eu/api/instances/bd5f91ff-e829-4b85-92eb-fc56991541f1
space                    dataset
type                     https://openminds.ebrains.eu/core/DatasetVersion
accessibility            KGProxy([<class 'fairgraph.openminds.controlled_terms.product_accessibility.ProductAccessibility'>], 'https://kg.ebrains.eu/api/instances/b2ff7a47-b349-48d7-8ce4-cf51868675f1')
data_types               KGProxy([<class 'fairgraph.openminds.controlled_terms.semantic_data_type.SemanticDataType'>], 'https://kg.ebrains.eu/api/instances/f468ee45-37a6-4e71-8b70-0cbe66d367db')
description
digital_identifier       DOI(identifier='https://doi.org/10.25493/YJFW-HPY', space=None, id=https://kg.ebrains.eu/api/instances/df93b012-3bb6-4565-bce0-3bd2d3aed63b)
ethics_assessment        EthicsAssessment(definition="'EU compliant, non sensitive' data should be able to provide an ethics approval as part of the metadata. An EBRAINS ethics compliance check is not required.", descrip

Note: due to underlying restrictions on the length of queries to the KG, trying to follow too many links at once can fail with a 500 error. This is being worked on.

## 5. Searching the Knowledge Graph

So far in this tutorial, all the metadata we've retrieved has been based on knowing the hexadecimal identifier of the node.
Very often, however, we don't know the identifier.
We may know the title of the dataset, its DOI, or we may wish to retrieve a group of nodes that share some property.

For all such queries, we use the `list()` method:

### Retrieving a dataset based on its name

In [11]:
dataset_versions = omcore.DatasetVersion.list(kg_client, full_name="Excitability profile of CA1 pyramidal neurons in APPPS1 Alzheimer disease mice and control littermates")
print(f"Found {len(dataset_versions)} dataset(s)")

Found 1 dataset(s)


Note that you don't need the complete name, fairgraph will also search for fragments of the name, e.g.:

In [12]:
dataset_versions = omcore.DatasetVersion.list(kg_client, full_name="APPPS1 Alzheimer")
print(f"Found {len(dataset_versions)} dataset(s)")
dataset_versions[0].full_name


Found 1 dataset(s)


'Excitability profile of CA1 pyramidal neurons in APPPS1 Alzheimer disease mice and control littermates'

### Retrieving a dataset based on its DOI

Since the "digital_identifier" property links to a node of type `DOI`, we need to search on the "identifier" property of the `DOI` node, which contains the actual text string.

To do this, we join the terms with a double underscore, i.e. "digital_identifier__identifier".

In [13]:
dataset_version = omcore.DatasetVersion.list(
    kg_client,
    digital_identifier__identifier="10.25493/YJFW-HPY",
    follow_links={"digital_identifier": {}}  # this argument is optional
)
print(f"Found {len(dataset_versions)} dataset(s)")
dataset_versions[0].full_name
#dataset_versions[0].digital_identifier.identifier


Found 1 dataset(s)


'Excitability profile of CA1 pyramidal neurons in APPPS1 Alzheimer disease mice and control littermates'



### Finding all datasets with "Alzheimer" in their name


In [14]:
alzheimers_datasets = omcore.DatasetVersion.list(kg_client, full_name="Alzheimer")
print(f"Found {len(alzheimers_datasets)} dataset(s)")

Found 1 dataset(s)


It may surprise you that we get only a single result for "Alzheimer" with this query,
but [multiple responses in the KG Search UI](https://search.kg.ebrains.eu/?category=Dataset&q=Alzheimer).

One reason for this is that the Search UI searches both the "full_name" and "description" properties.
Another is that sometimes the full_name property comes from the parent dataset record, and is not set on an individual dataset version.

openMINDS distinguishes between a `Dataset` and a `DatasetVersion`.
Each dataset can have multiple versions.
Both `Dataset` and `DatasetVersion` have "name" and "description" properties.
The convention in the EBRAINS Knowledge Graph [Search UI](https://search.kg.ebrains.eu/) is that if the "name" property of a `DatasetVersion` is empty, it should inherit it from the parent `DataSet`.

Fortunately, we can do the same thing in our query:

In [15]:
alzheimers_datasets = omcore.DatasetVersion.list(kg_client, is_version_of__full_name="Alzheimer")
print(f"Found {len(alzheimers_datasets)} dataset(s)")

Found 5 dataset(s)


Now we find 5 datasets. If we also search in the description of the parent dataset, we find even more:

In [16]:
alzheimers_datasets = omcore.DatasetVersion.list(
    kg_client,
    is_version_of__full_name="Alzheimer",
    is_version_of__description="Alzheimer"
)
print(f"Found {len(alzheimers_datasets)} dataset(s)")

Found 11 dataset(s)


### Finding all datasets with "Alzheimers' Disease" as a study target

Many properties in openMINDS, like "study_targets", use controlled vocabularies, which means that the property is a link to a node, which in turn contains both the term itself and other information about the term, such as a definition.

To help us build our query, we can look at the possible node types that can be linked from the property "study_targets":

In [17]:
omcore.DatasetVersion._property_lookup["study_targets"].types

(fairgraph.openminds.controlled_terms.auditory_stimulus_type.AuditoryStimulusType,
 fairgraph.openminds.controlled_terms.biological_order.BiologicalOrder,
 fairgraph.openminds.controlled_terms.biological_sex.BiologicalSex,
 fairgraph.openminds.controlled_terms.breeding_type.BreedingType,
 fairgraph.openminds.controlled_terms.cell_culture_type.CellCultureType,
 fairgraph.openminds.controlled_terms.cell_type.CellType,
 fairgraph.openminds.controlled_terms.disease.Disease,
 fairgraph.openminds.controlled_terms.disease_model.DiseaseModel,
 fairgraph.openminds.controlled_terms.electrical_stimulus_type.ElectricalStimulusType,
 fairgraph.openminds.controlled_terms.genetic_strain_type.GeneticStrainType,
 fairgraph.openminds.controlled_terms.gustatory_stimulus_type.GustatoryStimulusType,
 fairgraph.openminds.controlled_terms.handedness.Handedness,
 fairgraph.openminds.controlled_terms.molecular_entity.MolecularEntity,
 fairgraph.openminds.controlled_terms.olfactory_stimulus_type.OlfactoryStimul

"Alzheimer's disease" is probably a controlled term of type "Disease", so let's look at the properties of the `Disease` class:

In [18]:
import fairgraph.openminds.controlled_terms as terms

terms.Disease.property_names

['definition',
 'description',
 'interlex_identifier',
 'knowledge_space_link',
 'name',
 'preferred_ontology_identifier',
 'synonyms',
 'describes',
 'is_modeled_by',
 'is_used_to_group',
 'specimen_state',
 'studied_in']

Let's first see if we can find a term with that name:

In [19]:
possible_terms = terms.Disease.list(kg_client, name="Alzheimer's disease")
print(f"Found {len(possible_terms)} term(s)")

alzheimers = possible_terms[0]
alzheimers.show()

Found 1 term(s)
id                             https://kg.ebrains.eu/api/instances/161baa02-4e08-4cf2-a641-81cf323cc15d
space                          controlled
type                           https://openminds.ebrains.eu/controlledTerms/Disease
name                           Alzheimer's disease
preferred_ontology_identifier  http://purl.obolibrary.org/obo/DOID_10652
specimen_state                 [KGProxy([<class 'fairgraph.openminds.core.research.subject_state.SubjectState'>], 'https://kg.ebrains.eu/api/instances/6f0d62b4-7ed0-4580-9db8-c2302b87a0af'), KGProxy([<class 'fairgraph.openminds.core.research.subject_state.SubjectState'>], 'https://kg.ebrains.eu/api/instances/9d3ed4e7-1c12-44ca-a053-3483e33607a3'), KGProxy([<class 'fairgraph.openminds.core.research.subject_state.SubjectState'>], 'https://kg.ebrains.eu/api/instances/948981a3-ab75-4420-87e5-5a6f5c05cfd9'), KGProxy([<class 'fairgraph.openminds.core.research.subject_state.SubjectState'>], 'https://kg.ebrains.eu/api/instances/ec

Now we have the graph node for that term, there are two ways we can find dataset versions that link to it:

1. Starting from `DatasetVersion`

In [20]:
alzheimers_datasets1 = omcore.DatasetVersion.list(kg_client, study_targets=alzheimers)
print(f"Found {len(alzheimers_datasets1)} datasets(s)")

Found 11 datasets(s)


2. Starting from the controlled term node:

In fact, the query for the term already gave us links to all the dataset versions that link to that term, using the property "studied_in", which is the reverse of "study_targets".

In [21]:
alzheimers_datasets2 = [link.resolve(kg_client) for link in alzheimers.studied_in]
print(f"Found {len(alzheimers_datasets2)} datasets(s)")

Found 12 datasets(s)


You may ask why the first query gave 11 results and the second query gave 12?!

The answer is that the KG contains far more than just datasets. Let's look at the types of node that the Alzheimer node links to:

In [22]:
[link.cls for link in alzheimers.studied_in]

[fairgraph.openminds.core.products.dataset_version.DatasetVersion,
 fairgraph.openminds.core.products.dataset_version.DatasetVersion,
 fairgraph.openminds.core.products.dataset_version.DatasetVersion,
 fairgraph.openminds.core.products.dataset_version.DatasetVersion,
 fairgraph.openminds.core.products.dataset_version.DatasetVersion,
 fairgraph.openminds.core.products.dataset_version.DatasetVersion,
 fairgraph.openminds.core.products.model.Model,
 fairgraph.openminds.core.products.dataset_version.DatasetVersion,
 fairgraph.openminds.core.products.dataset_version.DatasetVersion,
 fairgraph.openminds.core.products.dataset_version.DatasetVersion,
 fairgraph.openminds.core.products.dataset_version.DatasetVersion,
 fairgraph.openminds.core.products.dataset_version.DatasetVersion]

The mystery is explained! One of the 12 linked nodes is a model, not a dataset version.

In [23]:
alzheimers_model = [item for item in alzheimers_datasets2 if isinstance(item, omcore.Model)][0]
alzheimers_model.show()

id                 https://kg.ebrains.eu/api/instances/2b600072-be7f-423f-a979-1e35283acaca
space              model
type               https://openminds.ebrains.eu/core/Model
abstraction_level  KGProxy([<class 'fairgraph.openminds.controlled_terms.model_abstraction_level.ModelAbstractionLevel'>], 'https://kg.ebrains.eu/api/instances/75d660a3-ca3a-4bcc-933b-3bd8b6fea162')
custodians         KGProxy((<class 'fairgraph.openminds.core.actors.consortium.Consortium'>, <class 'fairgraph.openminds.core.actors.organization.Organization'>, <class 'fairgraph.openminds.core.actors.person.Person'>), 'https://kg.ebrains.eu/api/instances/9a729478-8ad1-485c-b9ad-9a8a3d767f3d')
description        Age-dependent accumulation of amyloid-b, provoking increasing brain amyloidopathy, triggers abnormal patterns of neuron activity and circuit synchronization in Alzheimer’s disease (AD) as observed in human AD patients and AD mouse models. Recent studies on AD mouse models, mimicking this age-dependent amyloid

## 6. Writing to the Knowledge Graph

When sharing scientific data and the associated metadata,
especially sensitive or pre-publication data,
it is important to consider who has access to the data,
and who has permissions to create or modify metadata.

The EBRAINS Knowledge Graph is divided into different spaces,
each with its own access control permissions.
By default, the core API searches across all spaces for which a user has "read" permission,
although it is possible to restrict searches to specific spaces.

Public metadata in the EBRAINS KG are stored in curated spaces,
to which only nominated curators have write access.
This ensures high standards of quality control.

However, each user has their own private space in the KG, called "myspace",
and user may also create shared private spaces,
associated with a collab workspace in the [EBRAINS Collaboratory](https://wiki.ebrains.eu/).

In this part of the tutorial, we will create our own openMINDS nodes, and save them to your "myspace".
Let's try adding some people. To see what properties the `Person` type has, we can examine the `property_name` attribute:

In [27]:
omcore.Person.property_names

['affiliations',
 'alternate_names',
 'associated_accounts',
 'contact_information',
 'digital_identifiers',
 'family_name',
 'given_name',
 'activities',
 'comments',
 'coordinated_projects',
 'developed',
 'funded',
 'is_custodian_of',
 'is_owner_of',
 'is_provider_of',
 'manufactured',
 'published',
 'started']

To see more information, like what Python type each property expects, and whether a property can contain multiple items in a list, we can use the `properties` attribute:

In [29]:
omcore.Person.properties

[Property(name='affiliations', types=(<class 'fairgraph.openminds.core.actors.affiliation.Affiliation'>,), path='vocab:affiliation', required=False, multiple=True),
 Property(name='alternate_names', types=(<class 'str'>,), path='vocab:alternateName', required=False, multiple=True),
 Property(name='associated_accounts', types=(<class 'fairgraph.openminds.core.actors.account_information.AccountInformation'>,), path='vocab:associatedAccount', required=False, multiple=True),
 Property(name='contact_information', types=(<class 'fairgraph.openminds.core.actors.contact_information.ContactInformation'>,), path='vocab:contactInformation', required=False, multiple=False),
 Property(name='digital_identifiers', types=(<class 'fairgraph.openminds.core.digital_identifier.orcid.ORCID'>,), path='vocab:digitalIdentifier', required=False, multiple=True),
 Property(name='family_name', types=(<class 'str'>,), path='vocab:familyName', required=False, multiple=False),
 Property(name='given_name', types=(<cl

We'll stick to something simple: adding the "given_name", "family_name" and "affiliation" properties.

In [31]:
mgm = omcore.Organization(full_name="Metro-Goldwyn-Mayer", short_name="MGM")
stan = omcore.Person(given_name="Stan", family_name="Laurel", affiliations=omcore.Affiliation(member_of=mgm))
ollie = omcore.Person(given_name="Oliver", family_name="Hardy", alternate_names=["Ollie"], affiliations=omcore.Affiliation(member_of=mgm))

Now we can save these as nodes in the KG. 

Note that we use the "recursive" option, so that the "mgm" node will also be saved, as it is a child of the "stan" and "ollie" nodes. Alternatively, we could have saved "mgm" explicitly, and then saved "stan" and "ollie" with `recursive=False`.
In general, it is safer and faster to use `recursive=False` and to manage saving all nodes explicitly, but it does take more effort.

In [None]:
stan.save(kg_client, space="myspace", recursive=True)
ollie.save(kg_client, space="myspace", recursive=True)

Let's check this worked:

In [34]:
for person in omcore.Person.list(kg_client, scope="in progress", space="myspace"):
    print(f"- {person.full_name} ({person.uuid})")

- Bilbo Baggins (39a51d78-a181-402b-a6ee-cee868a3f864)
- Oliver Hardy (b359d7f3-8e7a-440b-a4d4-9c393f428508)
- Stan Laurel (c187464f-4529-417c-9c1b-84496922e8b5)


One more important thing to note: when querying the KG to get a list of people,
we pass the option `scope="in progress"`.

When first creating a new node in the KG, it is set to the status "In progress".
Only after quality control checks, and when we wish to publish the metadata,
is the node set to "Released" status.

What happens if we create a new Stan Laurel node and save it?

In [35]:
new_stan = omcore.Person(given_name="Stan", family_name="Laurel", affiliations=omcore.Affiliation(member_of=mgm))
new_stan.save(kg_client, space="myspace", recursive=True)

for person in omcore.Person.list(kg_client, scope="in progress", space="myspace"):
    print(f"- {person.full_name} ({person.uuid})")

- Bilbo Baggins (39a51d78-a181-402b-a6ee-cee868a3f864)
- Oliver Hardy (b359d7f3-8e7a-440b-a4d4-9c393f428508)
- Stan Laurel (c187464f-4529-417c-9c1b-84496922e8b5)


We have the same list, and our local Python objects `stan` and `new_stan` have the same IDs.

In [38]:
print(stan.uuid)
print(new_stan.uuid)

c187464f-4529-417c-9c1b-84496922e8b5
c187464f-4529-417c-9c1b-84496922e8b5


This happens because fairgraph checks whether a node already exists in the KG before creating a new one. To do this, it uses the attribute `existence_query_properties`:

In [39]:
omcore.Person.existence_query_properties

('given_name', 'family_name')

so any `Person` object with the same "given_name" and "family_name" will be considered to represent the same person, even if other properties, such as "affiliations", differ between the two objects.

If you really need to add a new object with the same "existence query" properties, you could create the object with slightly modified attributes, then change it back to the original value and save the modifications.

In [41]:
new_stan = omcore.Person(given_name="Stan_CHANGETHIS", family_name="Laurel")
new_stan.save(kg_client, space="myspace")
new_stan.given_name = "Stan"
new_stan.save(kg_client)

print(new_stan.uuid)

for person in omcore.Person.list(kg_client, scope="in progress", space="myspace"):
    print(f"- {person.full_name} ({person.uuid})")


dae57942-ea76-4283-9439-b368b5e26fb6
- Bilbo Baggins (39a51d78-a181-402b-a6ee-cee868a3f864)
- Oliver Hardy (b359d7f3-8e7a-440b-a4d4-9c393f428508)
- Stan Laurel (c187464f-4529-417c-9c1b-84496922e8b5)
- Stan Laurel (dae57942-ea76-4283-9439-b368b5e26fb6)


(Note that when saving modifications to an existing node, you don't need to specify the space, since fairgraph keeps track of this).

The alternative approach is to generate the ID locally:

In [42]:
from uuid import uuid4

another_new_stan = omcore.Person(
    given_name="Stan",
    family_name="Laurel",
    id=kg_client.uri_from_uuid(uuid4())
)
another_new_stan.save(kg_client, space="myspace")

for person in omcore.Person.list(kg_client, scope="in progress", space="myspace"):
    print(f"- {person.full_name} ({person.uuid})")


- Bilbo Baggins (39a51d78-a181-402b-a6ee-cee868a3f864)
- Oliver Hardy (b359d7f3-8e7a-440b-a4d4-9c393f428508)
- Stan Laurel (c187464f-4529-417c-9c1b-84496922e8b5)
- Stan Laurel (dae57942-ea76-4283-9439-b368b5e26fb6)
- Stan Laurel (e0f45937-1948-44ac-accd-d6045c2494b9)


But there was only one Stan Laurel, so let's delete the extra ones!

In [43]:
new_stan.delete(kg_client)
another_new_stan.delete(kg_client)

for person in omcore.Person.list(kg_client, scope="in progress", space="myspace"):
    print(f"- {person.full_name} ({person.uuid})")

- Bilbo Baggins (39a51d78-a181-402b-a6ee-cee868a3f864)
- Oliver Hardy (b359d7f3-8e7a-440b-a4d4-9c393f428508)
- Stan Laurel (c187464f-4529-417c-9c1b-84496922e8b5)


![Stan Laurel (1920) from Wikimedia Commons. This work is in the Public Domain.](https://upload.wikimedia.org/wikipedia/commons/thumb/3/34/Stan_Laurel_c1920.jpg/183px-Stan_Laurel_c1920.jpg)

## 7. Conclusions

This is the end of the tutorial.
The aim was to give you an understanding of some of the key concepts, and to give you experience with querying, retrieving, and creating metadata nodes in the EBRAINS Knowledge Graph, using fairgraph.

You may also be interested in our tutorial on [working with the EBRAINS Knowledge Graph using Javascript](https://github.com/apdavison/ebrains-kg-tutorial-javascript).