Version: 12.12.2018

## Description

This notebook showcases a **recommended workflow** to prepare the modeling parameters annotated with NeuroCurator for their integration in the **Blue Brain Knowledge Graph** backed by [Blue Brain Nexus](https://bluebrain.github.io/nexus/).

The four main parts of the workflow are:
1. Prepare schemas from Neuroshapes. Publish them in Nexus.
2. Transform source data into JSON-LD / RDF. Register them in Nexus. Recover from errors. Profile invalid data.
3. Create test data for schema validation in Neuroshapes.
4. Clean, _if needed_, created data in Nexus.

For the features of the underlying toolkit, please refer to the [README](https://github.com/BlueBrain/nat/blob/master/kg/README.md) in the directory.

## Getting Started

```
mkdir repos
cd repos

git clone -b prov_literature_annotation --depth=1 https://github.com/INCF/neuroshapes.git
git clone -b master --depth=1 https://github.com/BlueBrain/corpus-thalamus.git
git clone -b master --depth=1 https://github.com/BlueBrain/nat.git

conda create -yn kg_env python=3.7 jupyter requests
conda activate kg_env
pip install --extra-index-url https://testpypi.python.org/pypi pyxus==0.5.1

jupyter notebook ./nat/kg/"Literature Annotation - Knowledge Graph.ipynb"
```

In [1]:
cd ../..

/Users/fonta/repos


## User variables

In [102]:
with open("TOKEN", "r") as f:
    TOKEN = f.readline().rstrip()

In [3]:
SCHEMA_VERSION = "v0.2.0"

In [4]:
DEPLOYMENT = "https://bbp-nexus.epfl.ch/staging/v0"

In [5]:
AGENT_IRI_BASE = f"{DEPLOYMENT}/data/thalamusproject/contributor/agent/v0.1.0"

In [6]:
NEUROSHAPES_DIR = "./neuroshapes"

In [7]:
ANNOTATION_DIR = "./corpus-thalamus"

In [8]:
UTILS_DIR = "./nat/kg"

## Helpers

In [9]:
%load_ext autoreload

In [10]:
import sys

In [11]:
sys.path.append(UTILS_DIR)

In [12]:
%autoreload 1

In [13]:
%aimport nexus_utils
%aimport kg_utils
%aimport parameters
%aimport annotations

In [14]:
from nexus_utils import *
from kg_utils import *
from parameters import *
from annotations import *

## Nexus client

In [15]:
client = init_client(TOKEN, DEPLOYMENT)

## Configuration

### Commons

In [16]:
commons_pipeline = PipelineConfiguration(NEUROSHAPES_DIR, client, "neurosciencegraph", "commons")

### Core

In [17]:
core_pipeline = PipelineConfiguration(NEUROSHAPES_DIR, client, "neurosciencegraph", "core")

### Literature Annotation

In [18]:
organization = "literatureannotation"
organization_desc = "Data extracted from the scientific literature."

In [19]:
domain = "modelingparameter"
domain_desc = "Modeling parameters extracted from the scientific literature."

In [20]:
pipeline = PipelineConfiguration(NEUROSHAPES_DIR, client, organization, domain, organization_desc, domain_desc)

In [21]:
%%time
created_organization = pipeline.create_organization()

<already created> literatureannotation
CPU times: user 17.3 ms, sys: 4.74 ms, total: 22.1 ms
Wall time: 314 ms


In [22]:
%%time
created_domain = pipeline.create_domain()

<already created> modelingparameter
CPU times: user 16.5 ms, sys: 2.47 ms, total: 19 ms
Wall time: 78 ms


**For demonstration:**

In [23]:
pipeline.is_organization_created()

True

In [24]:
created_organization

<pyxus.resources.entity.Organization at 0x106bf4da0>

In [25]:
pipeline.is_domain_created()

True

In [26]:
created_domain

<pyxus.resources.entity.Domain at 0x106c097f0>

## Contexts

In [27]:
contexts = [
    ("schema", "v0.2.0"),
    ("data", "v1.0.6"),
]

In [28]:
prepared_contexts = core_pipeline.prepare_contexts(contexts)

<prepared> schema/v0.2.0
<prepared> data/v1.0.6


In [29]:
%%time
created_contexts = core_pipeline.create_contexts(prepared_contexts, publish=True)

<already created> schema/v0.2.0
<already created> data/v1.0.6
CPU times: user 35.5 ms, sys: 4.89 ms, total: 40.3 ms
Wall time: 188 ms


In [30]:
data_context_iri = created_contexts["data/v1.0.6"].get_self_link()

**For demonstration:**

In [31]:
data_context_iri

'https://bbp-nexus.epfl.ch/staging/v0/contexts/neurosciencegraph/core/data/v1.0.6'

In [32]:
created_contexts["data/v1.0.6"]

<pyxus.resources.entity.Context at 0x106c2b080>

In [33]:
core_pipeline.is_context_created("data", "v1.0.6")

True

In [34]:
prettify(core_pipeline.are_contexts_created(contexts))

{
  "schema/v0.2.0": true,
  "data/v1.0.6": true
}


In [35]:
prepared_data_context = core_pipeline.prepare_context("data", "v1.0.6")

<prepared> data/v1.0.6


In [36]:
core_pipeline.create_context("data", "v1.0.6", prepared_data_context, publish=True)

<already created> data/v1.0.6


<pyxus.resources.entity.Context at 0x106c2bac8>

## Schemas

### Commons

In [37]:
commons_schemas = [
    ("parameter", SCHEMA_VERSION),
    ("variables", SCHEMA_VERSION),
    ("selectors", SCHEMA_VERSION),
    ("annotation", SCHEMA_VERSION),
]

In [38]:
prepared_commons_schemas = commons_pipeline.prepare_schemas(commons_schemas)

<prepared> parameter/v0.2.0
<prepared> variables/v0.2.0
<prepared> selectors/v0.2.0
<prepared> annotation/v0.2.0


In [39]:
%%time
created_commons_schemas = commons_pipeline.create_schemas(prepared_commons_schemas, publish=True)

<already created> parameter/v0.2.0
<already created> variables/v0.2.0
<already created> selectors/v0.2.0
<already created> annotation/v0.2.0
CPU times: user 79.2 ms, sys: 8.48 ms, total: 87.7 ms
Wall time: 323 ms


**For demonstration:**

In [40]:
created_commons_schemas["parameter/v0.2.0"]

<pyxus.resources.entity.Schema at 0x106c6ec50>

In [41]:
commons_pipeline.is_schema_created("parameter", "v0.2.0")

True

In [42]:
prettify(commons_pipeline.are_schemas_created(commons_schemas))

{
  "parameter/v0.2.0": true,
  "variables/v0.2.0": true,
  "selectors/v0.2.0": true,
  "annotation/v0.2.0": true
}


In [43]:
prepared_parameter_schema = core_pipeline.prepare_schema("annotation", "v0.2.0")

<prepared> annotation/v0.2.0


In [44]:
core_pipeline.create_schema("parameter", "v0.2.0", prepared_parameter_schema, publish=True)

<already created> parameter/v0.2.0


<pyxus.resources.entity.Schema at 0x106c47278>

### Core

In [45]:
core_schemas = [
    ("annotation", SCHEMA_VERSION),
    ("selectors", SCHEMA_VERSION),
    ("variables", SCHEMA_VERSION),
]

In [46]:
prepared_core_schemas = core_pipeline.prepare_schemas(core_schemas)

<prepared> annotation/v0.2.0
<prepared> selectors/v0.2.0
<prepared> variables/v0.2.0


In [47]:
%%time
created_core_schemas = core_pipeline.create_schemas(prepared_core_schemas, publish=True)

<already created> annotation/v0.2.0
<already created> selectors/v0.2.0
<already created> variables/v0.2.0
CPU times: user 54.8 ms, sys: 6.1 ms, total: 60.9 ms
Wall time: 257 ms


### Literature Annotation

In [48]:
schemas = [
    ("parameterannotation", SCHEMA_VERSION),
    ("pointvalueparameter", SCHEMA_VERSION),
    ("numericaltraceparameter", SCHEMA_VERSION),
    ("functionparameter", SCHEMA_VERSION),
]

In [49]:
prepared_schemas = pipeline.prepare_schemas(schemas, is_user_domain=True)

<prepared> parameterannotation/v0.2.0
<prepared> pointvalueparameter/v0.2.0
<prepared> numericaltraceparameter/v0.2.0
<prepared> functionparameter/v0.2.0


In [50]:
%%time
created_schemas = pipeline.create_schemas(prepared_schemas, publish=True)

<already created> parameterannotation/v0.2.0
<already created> pointvalueparameter/v0.2.0
<already created> numericaltraceparameter/v0.2.0
<already created> functionparameter/v0.2.0
CPU times: user 78.1 ms, sys: 8.21 ms, total: 86.3 ms
Wall time: 334 ms


## Raw annotations

In [51]:
import json
from pathlib import Path
from itertools import chain

In [52]:
# TODO To be done with NAT in the future.
def raw_annotations_from_dir(repo_dir):
    path = Path(repo_dir)
    files = path.glob("*.pcr")
    for x in files:
        if x.stat().st_size > 0:
            with x.open("r", encoding="utf-8") as f:
                json_obj = json.load(f)
                for y in json_obj:
                    yield y

In [53]:
# TODO To be done with NAT in the future.
def raw_annotations_stats(raw_annotations):
    annotated_paper_count = len(set(x["pubId"] for x in raw_annotations))
    print(annotated_paper_count, "papers")  # >= 113

    annotation_count = len(raw_annotations)
    print(annotation_count, "annotations")  # >= 582

    parameter_count = len(list(chain.from_iterable(x["parameters"] for x in raw_annotations)))
    print(parameter_count, "parameters")  # >= 485

In [54]:
unfiltered_raw_annotations = list(raw_annotations_from_dir(ANNOTATION_DIR))

In [55]:
raw_annotations_stats(unfiltered_raw_annotations)
# On 24.09.18:
# 113 papers
# 582 annotations
# 485 parameters

113 papers
582 annotations
485 parameters


### Filter raw annotations - Work In Progress

Excluded from the raw annotations:

**Annotations of facts** (i.e. no parameters) because they are currently not modeled in the SHACL shapes in Neuroshapes.

Annotations with at least a **parameter of type function** because this type of parameter refers to other parameters and handling the IDs in this case is not straightforward with Nexus v0.

Annotations with a **DOI containing a `:`** because Nexus v0 is expecting an URI and this is an invalid one.

In [56]:
def is_selected_raw_annotation(x):
    return (x["parameters"]
            and "function" not in {p["description"]["type"] for p in x["parameters"]}
            and ":" not in x["pubId"])

In [57]:
raw_annotations = [x for x in unfiltered_raw_annotations if is_selected_raw_annotation(x)]

In [58]:
raw_annotations_stats(raw_annotations)
# On 24.09.18:
# 78 papers
# 315 annotations
# 478 parameters

78 papers
315 annotations
478 parameters


## Parameters

### Variable type labels

In [59]:
import csv
import requests

In [60]:
# TODO To be done through an Ontology Service serving these terms in the future.
def variable_type_labels():
    response = requests.get(
        "https://raw.githubusercontent.com/BlueBrain/nat/master/nat/data/modelingDictionary.csv")
    content = response.content.decode("utf-8")
    reader = csv.reader(content.splitlines(), delimiter=";")
    for x in reader:
        if x and not x[0].startswith("#"):
            yield x[0], x[2]  # type, label

In [61]:
variable_labels_mapping = dict(variable_type_labels())

**For demonstration:**

In [62]:
variable_labels_mapping["BBP-121012"]

'interbouton_distance'

### Transformation

In [63]:
parameters = list(transform_parameters(raw_annotations, data_context_iri, variable_labels_mapping))

### Registration

#### Selection

In [64]:
pv_parameters = list(select_by_type("nsg:PointValueParameter", parameters))
len(pv_parameters)
# On 24.09.18:
# 427

427

In [65]:
nt_parameters = list(select_by_type("nsg:NumericalTraceParameter", parameters))
len(nt_parameters)
# On 22.10.18:
# 51

51

In [66]:
# TODO Upcoming. See filtering section.
# f_parameters = list(select_by_type("nsg:FunctionParameter", parameters))
# len(f_parameters)

#### Push

Invalid parameters - Work In Progress

In [67]:
def condition_1(x):
    return "nsg:series" not in x["nsg:dependentVariable"]

def condition_2(x):
    return ("nsg:SimpleNumericalVariable" in x["nsg:dependentVariable"]["@type"]
            and "schema:unitCode" not in x["nsg:dependentVariable"]["nsg:series"])

In [68]:
invalid_pv_parameter_idxs = profile(pv_parameters, [condition_1, condition_2], flatten=True)
# On 24.09.18:
# <count> condition_1 15
# <count> condition_2 1

<count> condition_1 15
<count> condition_2 1


Point value parameters

In [69]:
len(pv_parameters) - len(invalid_pv_parameter_idxs)
# On 24.09.18:
# 411

411

In [70]:
%%time
pipeline.create_instances("pointvalueparameter", SCHEMA_VERSION, pv_parameters,
                        exclude_idxs=invalid_pv_parameter_idxs)

<count> 411
CPU times: user 15 s, sys: 1.05 s, total: 16 s
Wall time: 2min 35s


Numerical trace parameters

In [71]:
len(nt_parameters)
# On 22.10.18:
# 51

51

In [72]:
%%time
pipeline.create_instances("numericaltraceparameter", SCHEMA_VERSION, nt_parameters)

<count> 51
CPU times: user 1.74 s, sys: 120 ms, total: 1.86 s
Wall time: 19.6 s


Function parameters

In [73]:
# TODO Upcoming. See filtering section.
# len(f_parameters)

In [74]:
# TODO Upcoming. See filtering section.
# %%time
# pipeline.create_instances("functionparameter", SCHEMA_VERSION, f_parameters)

**For demonstration:**

In [75]:
invalid_idx = invalid_pv_parameter_idxs[0]

In [76]:
invalid_idx
# On 24.09.18:
# 64

64

In [77]:
pipeline.create_instance("pointvalueparameter", SCHEMA_VERSION, pv_parameters[invalid_idx])

<error>
{
  "violations": [
    "Error: Violation Error(<http://www.w3.org/ns/shacl#ShapesFailed>). Node(_:ba19c2446eab6c2e395d991a3b478824) Failed property shapes Node: _:ba19c2446eab6c2e395d991a3b478824, Constraint: _:c6219e9851a259d11e5b571b3d3e1357, path: PredicatePath(<>)",
    "Error: Violation Error(<http://www.w3.org/ns/shacl#minCountError>). Node(_:ba19c2446eab6c2e395d991a3b478824) MinCount violation. Expected 2, obtained: 0 Node: _:ba19c2446eab6c2e395d991a3b478824, Constraint: _:ccd251a10818a3c994f633af5ec577a1, path: PredicatePath(<https://bbp-nexus.epfl.ch/vocabs/bbp/neurosciencegraph/core/v0.1.0/series>)"
  ],
  "code": "ShapeConstraintViolations",
  "@context": "https://bbp-nexus.epfl.ch/staging/v0/contexts/nexus/core/error/v0.1.0"
}
<data>
{
  "@context": "https://bbp-nexus.epfl.ch/staging/v0/contexts/neurosciencegraph/core/data/v1.0.6",
  "@type": [
    "nsg:PointValueParameter"
  ],
  "schema:name": "p-connectivity-274c8102",
  "nsg:providerId": "274c8102-ffde-11e5-a94

In [78]:
pipeline.create_instances("pointvalueparameter", SCHEMA_VERSION, pv_parameters, start_idx=invalid_idx)

<error>
{
  "violations": [
    "Error: Violation Error(<http://www.w3.org/ns/shacl#ShapesFailed>). Node(_:9a265e08fbe210860ee20ec75423a482) Failed property shapes Node: _:9a265e08fbe210860ee20ec75423a482, Constraint: _:296ddc0c293f5e5646ee61a4918e1a05, path: PredicatePath(<>)",
    "Error: Violation Error(<http://www.w3.org/ns/shacl#minCountError>). Node(_:9a265e08fbe210860ee20ec75423a482) MinCount violation. Expected 2, obtained: 0 Node: _:9a265e08fbe210860ee20ec75423a482, Constraint: _:5320f2ec34f2424ccd74a28548459d57, path: PredicatePath(<https://bbp-nexus.epfl.ch/vocabs/bbp/neurosciencegraph/core/v0.1.0/series>)"
  ],
  "code": "ShapeConstraintViolations",
  "@context": "https://bbp-nexus.epfl.ch/staging/v0/contexts/nexus/core/error/v0.1.0"
}
<data>
{
  "@context": "https://bbp-nexus.epfl.ch/staging/v0/contexts/neurosciencegraph/core/data/v1.0.6",
  "@type": [
    "nsg:PointValueParameter"
  ],
  "schema:name": "p-connectivity-274c8102",
  "nsg:providerId": "274c8102-ffde-11e5-a94

## [Nexus v0] Parameter IDs mapping

In [79]:
len(parameters) - len(invalid_pv_parameter_idxs)
# On 24.09.18:
# 462

462

In [87]:
# Note: Need to wait a bit so that Nexus makes the instances available for retrieve_all_results().
created_parameters_search = pipeline.instances_of_domain(resolve=True)

<count> 462


In [88]:
created_parameters = pipeline.retrieve_all_results(created_parameters_search)

In [89]:
parameter_uuid_mapping = dict(uuid_iri_mapping(created_parameters))

**For demonstration:**

In [90]:
created_parameters_search

<pyxus.resources.entity.SearchResultList at 0x107beed30>

In [91]:
created_parameters[0]

<pyxus.resources.entity.Instance at 0x107dc2438>

In [92]:
parameter_uuid_mapping["90c530b4-6e05-11e6-873d-64006a4c56ef"]

'https://bbp-nexus.epfl.ch/staging/v0/data/literatureannotation/modelingparameter/numericaltraceparameter/v0.2.0/f647de10-ce98-4530-99c9-47b5a299b4a1'

In [93]:
pipeline.instances_by_schema("pointvalueparameter", SCHEMA_VERSION).results[0]

<count> 411


<pyxus.resources.entity.SearchResult at 0x106ce2c88>

In [94]:
pipeline.instances_by_schema("pointvalueparameter", SCHEMA_VERSION, resolve=True).results[0]

<count> 411


<pyxus.resources.entity.Instance at 0x10715f048>

## Annotations

### Transformation

Invalid annotations because of invalid parameters - Work In Progress

In [95]:
invalid_pv_parameter_uuids = {pv_parameters[i]["nsg:providerId"] for i in invalid_pv_parameter_idxs}

In [96]:
valid_raw_annotations = [x for x in raw_annotations
                         if not {y["id"] for y in x["parameters"]} & invalid_pv_parameter_uuids]

In [97]:
len(raw_annotations) - len(valid_raw_annotations)
# On 24.09.18:
# 10

10

Normal flow

In [98]:
annotations = list(transform_annotations(valid_raw_annotations, data_context_iri, AGENT_IRI_BASE,
                                         parameter_uuid_mapping))

### Registration

#### Push

In [99]:
len(annotations)
# On 22.10.18:
# 305

305

In [100]:
%%time
pipeline.create_instances("parameterannotation", SCHEMA_VERSION, annotations)

<count> 305
CPU times: user 11.8 s, sys: 870 ms, total: 12.7 s
Wall time: 2min 16s


**For demonstration:**

In [101]:
pipeline.create_instances("parameterannotation", "v0.0.0", annotations)

<error> Schema does not exist!
<index> 0
<count> 0


---

## Neuroshapes test data generation

This part of the workflow is intended to be done **before pushing the schemas and the data**.

In [95]:
cleaning_rules = [
    (DEPLOYMENT, "{{base}}"),
    ("thalamusproject", "neurosciencegraph"),
    (f"{organization}/{domain}", f"neurosciencegraph/{organization}"),
]

In [96]:
neuroshapes_helper = TestDataConfiguration(NEUROSHAPES_DIR, organization, cleaning_rules)

### Valid data

#### ParameterAnnotation

In [97]:
pa_schema_name = "parameterannotation"
pa_schema_version = SCHEMA_VERSION
pa_optional_keys = ["nsg:keyword", "nsg:comment", "nsg:experimentalProperty"]

TextPositionTarget

In [98]:
neuroshapes_helper.write(
    find("e6914b74-ed2a-11e5-b291-3417ebb8f5ca", annotations),
    pa_schema_name, pa_schema_version,
    pa_optional_keys,
    suffix="text")

<written> parameterannotation/v0.2.0/auto-all-fields-text.json
<written> parameterannotation/v0.2.0/auto-min-fields-text.json


FigureTarget

In [99]:
neuroshapes_helper.write(
    find("1132613e-31b1-11e8-b594-64006a67e5d0", annotations),
    pa_schema_name, pa_schema_version,
    pa_optional_keys,
    suffix="figure")

<written> parameterannotation/v0.2.0/auto-all-fields-figure.json
<written> parameterannotation/v0.2.0/auto-min-fields-figure.json


EquationTarget

In [100]:
neuroshapes_helper.write(
    find("c3af1a82-c5a7-11e5-b8a7-3417ebb8f5ca", annotations),
    pa_schema_name, pa_schema_version,
    pa_optional_keys,
    suffix="equation")

<written> parameterannotation/v0.2.0/auto-all-fields-equation.json
<written> parameterannotation/v0.2.0/auto-min-fields-equation.json


TableTarget

In [101]:
neuroshapes_helper.write(
    find("5ffe9b02-ecf3-11e5-b708-64006a4c56ef", annotations),
    pa_schema_name, pa_schema_version,
    pa_optional_keys,
    suffix="table")

<written> parameterannotation/v0.2.0/auto-all-fields-table.json
<written> parameterannotation/v0.2.0/auto-min-fields-table.json


AreaTarget

In [102]:
neuroshapes_helper.write(
    find("52afbf40-04bc-11e6-b795-64006a4c56ef", annotations),
    pa_schema_name, pa_schema_version,
    pa_optional_keys,
    suffix="area")

<written> parameterannotation/v0.2.0/auto-all-fields-area.json
<written> parameterannotation/v0.2.0/auto-min-fields-area.json


#### PointValueParameter

In [103]:
pvp_schema_name = "pointvalueparameter"
pvp_schema_version = SCHEMA_VERSION
pvp_optional_keys = ["nsg:requiredTag"]

SimpleNumericalVariable

In [104]:
neuroshapes_helper.write(
    find("870f0288-061b-11e6-9d1c-c869cd917532", parameters),
    pvp_schema_name, pvp_schema_version,
    pvp_optional_keys,
    suffix="simple")

<written> pointvalueparameter/v0.2.0/auto-all-fields-simple.json
<written> pointvalueparameter/v0.2.0/auto-min-fields-simple.json


CompoundNumericalVariable


In [105]:
neuroshapes_helper.write(
    find("7516d66e-0bb4-11e6-843d-64006a4c56ef", parameters),
    pvp_schema_name, pvp_schema_version,
    pvp_optional_keys,
    suffix="compound")

<written> pointvalueparameter/v0.2.0/auto-all-fields-compound.json
<written> pointvalueparameter/v0.2.0/auto-min-fields-compound.json


#### NumericalTraceParameter

In [106]:
nt_schema_name = "numericaltraceparameter"
nt_schema_version = SCHEMA_VERSION
nt_optional_keys = ["nsg:requiredTag"]

SimpleNumericalVariable - SimpleNumericalVariable

In [107]:
neuroshapes_helper.write(
    find("0ebdc338-6d38-11e6-b432-64006a4c56ef", parameters),
    nt_schema_name, nt_schema_version,
    nt_optional_keys,
    suffix="simple-simple")

<written> numericaltraceparameter/v0.2.0/auto-all-fields-simple-simple.json
<written> numericaltraceparameter/v0.2.0/auto-min-fields-simple-simple.json


SimpleNumericalVariable - CompoundNumericalVariable

In [108]:
len([x for x in parameters
     if "NumericalTraceParameter" in x["@type"]
     and "SimpleNumericalVariable" in x["nsg:dependentVariable"]["@type"]
     and "CompoundNumericalVariable" in {y["@type"] for y in x["nsg:independentVariable"]}])
# On 24.09.18:
# 0, no data

0

CompoundNumericalVariable - SimpleNumericalVariable

In [109]:
neuroshapes_helper.write(
    find("db0e39fe-9a9c-11e6-974a-64006a4c56ef", parameters),
    nt_schema_name, nt_schema_version,
    nt_optional_keys,
    suffix="compound-simple")

<written> numericaltraceparameter/v0.2.0/auto-all-fields-compound-simple.json
<written> numericaltraceparameter/v0.2.0/auto-min-fields-compound-simple.json


CompoundNumericalVariable - CompoundNumericalVariable

In [110]:
len([x for x in parameters
     if "NumericalTraceParameter" in x["@type"]
     and "CompoundNumericalVariable" in x["nsg:dependentVariable"]["@type"]
     and "CompoundNumericalVariable" in {y["@type"] for y in x["nsg:independentVariable"]}])
# On 24.09.18:
# 0, no data

0

#### FunctionParameter

In [111]:
f_schema_name = "functionparameter"
f_schema_version = SCHEMA_VERSION
f_optional_keys = ["nsg:equationParameter", "nsg:requiredTag"]

In [113]:
# TODO Upcoming. See filtering section.

# Meanwhile, ad hoc and temporary code to generate the test data for Neuroshapes.

fp = find("71233092-eb80-11e5-a9b7-64006a4c56ef",
          chain.from_iterable(x["parameters"] for x in unfiltered_raw_annotations),
          "id")

tfp = transform_parameter(fp, data_context_iri, variable_labels_mapping)

for i, x in enumerate(tfp["nsg:equationParameter"]):
    puuid = x["@id"]
    pbase = "{{base}}/data/neurosciencegraph/literatureannotation/pointvalueparameter/" + SCHEMA_VERSION
    tfp["nsg:equationParameter"][i]["@id"] =  f"{pbase}/{puuid}"

neuroshapes_helper.write(
    tfp,  # find("71233092-eb80-11e5-a9b7-64006a4c56ef", parameters),
    f_schema_name, f_schema_version,
    f_optional_keys)

<written> functionparameter/v0.2.0/auto-all-fields.json
<written> functionparameter/v0.2.0/auto-min-fields.json


### Invalid data

#### ParameterAnnotation

TextPositionTarget

In [114]:
neuroshapes_helper.write_missing(
    pa_schema_name, pa_schema_version, "auto-min-fields-text",
    [
        ("schema:name", "entityshape-name"),
        ("nsg:contribution", "contribution"),
        ("nsg:contribution.prov:agent", "contributionshape-agent"),
        ("oa:hasTarget", "hastarget"),
        ("oa:hasBody", "hasbody"),
        ("oa:hasTarget.oa:hasSource", "selectortargetshape-hassource"),
        # hasTarget.hasSelector: Same for the other specialized TargetShape.
        ("oa:hasTarget.oa:hasSelector", "textpositiontargetshape-hasselector"),
        ("oa:hasTarget.oa:hasSelector.oa:start", "textpositionselectorshape-start"),
        ("oa:hasTarget.oa:hasSelector.oa:end", "textpositionselectorshape-end"),
    ])

<written> parameterannotation/v0.2.0/auto-missing-entityshape-name.json
<written> parameterannotation/v0.2.0/auto-missing-contribution.json
<written> parameterannotation/v0.2.0/auto-missing-contributionshape-agent.json
<written> parameterannotation/v0.2.0/auto-missing-hastarget.json
<written> parameterannotation/v0.2.0/auto-missing-hasbody.json
<written> parameterannotation/v0.2.0/auto-missing-selectortargetshape-hassource.json
<written> parameterannotation/v0.2.0/auto-missing-textpositiontargetshape-hasselector.json
<written> parameterannotation/v0.2.0/auto-missing-textpositionselectorshape-start.json
<written> parameterannotation/v0.2.0/auto-missing-textpositionselectorshape-end.json


FigureTarget

In [115]:
neuroshapes_helper.write_missing(
    pa_schema_name, pa_schema_version, "auto-min-fields-figure",
    [
        # hasTarget.hasSelector.index: Same for the other specialized IndexSelectorShape (Equation and Table).
        ("oa:hasTarget.oa:hasSelector.nsg:index", "indexselectorshape-index"),
    ])

<written> parameterannotation/v0.2.0/auto-missing-indexselectorshape-index.json


AreaTarget

In [116]:
neuroshapes_helper.write_missing(
    pa_schema_name, pa_schema_version, "auto-min-fields-area",
    [
        ("oa:hasTarget.oa:hasSelector.rdf:value", "fragmentselectorshape-value"),
        ("oa:hasTarget.oa:hasSelector.dcterms:conformsTo", "fragmentselectorshape-conformsto"),
    ])

<written> parameterannotation/v0.2.0/auto-missing-fragmentselectorshape-value.json
<written> parameterannotation/v0.2.0/auto-missing-fragmentselectorshape-conformsto.json


#### PointValueParameter

In [117]:
neuroshapes_helper.write_missing(
    pvp_schema_name, pvp_schema_version, "auto-min-fields-simple",
    [
        ("nsg:dependentVariable.nsg:series", "series-simple"),
        ("nsg:dependentVariable.nsg:quantityType", "variableshape-quantitytype-simple"),
        ("nsg:dependentVariable.nsg:series.schema:value", "valueshape-value-simple"),
        ("nsg:dependentVariable.nsg:series.schema:unitCode", "algebraicvalueshape-unitcode-simple"),
    ])

<written> pointvalueparameter/v0.2.0/auto-missing-series-simple.json
<written> pointvalueparameter/v0.2.0/auto-missing-variableshape-quantitytype-simple.json
<written> pointvalueparameter/v0.2.0/auto-missing-valueshape-value-simple.json
<written> pointvalueparameter/v0.2.0/auto-missing-algebraicvalueshape-unitcode-simple.json


In [118]:
neuroshapes_helper.write_missing(
    pvp_schema_name, pvp_schema_version, "auto-min-fields-compound",
    [
        ("nsg:dependentVariable.nsg:series", "series-compound"),
        ("nsg:dependentVariable.nsg:quantityType", "variableshape-quantitytype-compound"),
    ])

<written> pointvalueparameter/v0.2.0/auto-missing-series-compound.json
<written> pointvalueparameter/v0.2.0/auto-missing-variableshape-quantitytype-compound.json


#### NumericalTraceParameter

In [119]:
neuroshapes_helper.write_missing(
    nt_schema_name, nt_schema_version, "auto-min-fields-simple-simple",
    [
        ("nsg:independentVariable", "indpendentvariable-simple"),
    ])

<written> numericaltraceparameter/v0.2.0/auto-missing-indpendentvariable-simple.json


#### FunctionParameter

In [120]:
neuroshapes_helper.write_missing(
    f_schema_name, f_schema_version, "auto-min-fields",
    [
        ("nsg:equation", "equation"),
    ])

<written> functionparameter/v0.2.0/auto-missing-equation.json


---

## Cleaning - KNOW WHAT YOU ARE DOING

In [21]:
# pipeline.domain

'modelingparameter'

In [22]:
# %%time
# pipeline.clean(organization=False, domain=False)

<deprecated> 1534 instances
CPU times: user 1min 27s, sys: 6.24 s, total: 1min 33s
Wall time: 6min 46s
