# Example Usage

To use `bento-mdf` in a project, start by installing the latest version with `pip install bento-mdf` and importing it into your project.

In [40]:
import bento_mdf
from pathlib import Path # for file paths
from importlib.metadata import version # check package version

version("bento_mdf")

'0.10.0'

## Loading the Model from MDF(s)

The `bento-mdf` package provides functionality for loading, validating, and manipulating MDF file content in Python.

The `MDF` class is the main interface to the package. It is initialized with the relevant MDF file(s), filepath(s), or URL pointing to these.

In [41]:
from bento_mdf.mdf import MDF

### Loading from File(s)

First, we can specify the paths to the MDF files we want to load. Then, we simply provide these to the `MDF` class to initalize the model. This loads the content of these files into their corresponding `bento-meta` Python object representations, which we can access via the `Model` object found at `MDF.model`.

(Note: if a top-level model `Handle` object is not present in the MDFs, it needs to be provided to the MDF class's `handle` argument.)

In [46]:
mdf_dir = Path.cwd().parent / "tests" / "samples"
ctdc_model = mdf_dir / "ctdc_model_file.yaml"
ctdc_props = mdf_dir / "ctdc_model_properties_file.yaml"

mdf_from_file = MDF(ctdc_model, ctdc_props, handle="CTDC")
mdf_from_file.model

No instance yaml(s) specified


<bento_meta.model.Model at 0x23880923b00>

### Loading from URL(s)

Similarly, we can instantiate an MDF from URL(s) pointing to the model file(s):

In [47]:
model_url = "https://cbiit.github.io/icdc-model-tool/model-desc/icdc-model.yml"
props_url = "https://cbiit.github.io/icdc-model-tool/model-desc/icdc-model-props.yml"

mdf = MDF(model_url, props_url, handle="ICDC")
mdf.model

No instance yaml(s) specified


<bento_meta.model.Model at 0x238808e91c0>

## Exploring the Model

Once we've loaded the model, we can start looking at its constiuent parts such as Nodes, Relationships, Properties, and Terms. These are conveniently stored in the `Model` object. 

(Note: This example will use the model created in the previous section from a URL)

### Nodes

Model nodes are stored as dictionaries in `Model.nodes`, where the keys are node handles and the values are `bento-meta Node` objects.

In [65]:
nodes = mdf.model.nodes

len(nodes)

33

In [75]:
list(nodes.keys())[:3]

['program', 'study', 'study_site']

In [74]:
list(nodes.values())[:3]

[<bento_meta.objects.Node at 0x238809fbbc0>,
 <bento_meta.objects.Node at 0x238806b7140>,
 <bento_meta.objects.Node at 0x23880664a10>]

In [70]:
nodes["study"]

<bento_meta.objects.Node at 0x238806b7140>

the `get_attr_dict()` method is a convenient way to get a dictionary of a `bento-meta Entity's` set attributes. This will return string versions of the attributes. This can be useful for exploring the entity or for providing parameters to Neo4j Cypher queries.

(Note: this only includes "simple" attributes and not other bento-meta Entities or collections of Entities. All attributes can be accessed via methods matching their names.)

In [71]:
nodes["diagnosis"].get_attr_dict()

{'handle': 'diagnosis',
 'model': 'ICDC',
 'desc': 'The Diagnosis node contains numerous properties which fully characterize the type of cancer with which any given patient/subject/donor was diagnosed, inclusive of stage. This node also contains properties pertaining to comorbidities, and the availability of pathology reports, treatment data and follow-up data.'}

### Relationships

Simlarly, Model relationships are stored in `Model.edges`. This is a dictionary where the keys are (edge.handle, src.handle, dst.handle) tuples. The values are `Edge` objects.

In [76]:
edges = mdf.model.edges

len(edges)

49

In [77]:
list(edges.keys())[:3]

[('member_of', 'case', 'cohort'),
 ('member_of', 'cohort', 'study_arm'),
 ('member_of', 'study_arm', 'study')]

In [79]:
list(edges.values())[:3]

[<bento_meta.objects.Edge at 0x238ff57d340>,
 <bento_meta.objects.Edge at 0x238ff57cbf0>,
 <bento_meta.objects.Edge at 0x238ff57deb0>]

In [86]:
edges[("of_case", "diagnosis", "case")].get_attr_dict()

{'handle': 'of_case', 'model': 'ICDC', 'multiplicity': 'many_to_one'}

In [90]:
edge = edges[("of_case", "diagnosis", "case")]
print(edge.handle, edge.src.handle, edge.dst.handle, sep=", ")


# TIP: here's a convenient method to get the 3-tuple of an edge
print(edge.triplet)

of_case, diagnosis, case
('of_case', 'diagnosis', 'case')


An `Edge's` `src` and `dst` attributes are `Nodes`

In [89]:
print(edge.src)

print(edge.src.handle)

<bento_meta.objects.Node object at 0x000002388065E870>
diagnosis


The `Model` object also has some useful methods to work with relationships/edges including:
  * `edges_by_src(node)` - get all edges that have a given node as their src attribute
  * `edges_by_dst(node)` - get all edges that have a given node as their dst attribute
  * `edges_by_type(edge_handle)` - get all edges that have a given edge type (i.e., handle)

In [98]:
[e.triplet for e in mdf.model.edges_by_dst(mdf.model.nodes["case"])]

[('of_case', 'enrollment', 'case'),
 ('of_case', 'demographic', 'case'),
 ('of_case', 'diagnosis', 'case'),
 ('of_case', 'cycle', 'case'),
 ('of_case', 'follow_up', 'case'),
 ('of_case', 'sample', 'case'),
 ('of_case', 'file', 'case'),
 ('of_case', 'visit', 'case'),
 ('of_case', 'adverse_event', 'case'),
 ('of_case', 'registration', 'case')]

In [106]:
[e.triplet for e in mdf.model.edges_by_type("of_study")]

[('of_study', 'study_site', 'study'),
 ('of_study', 'principal_investigator', 'study'),
 ('of_study', 'file', 'study'),
 ('of_study', 'image_collection', 'study'),
 ('of_study', 'publication', 'study')]

### Properties

Model properties are stored in `Model.props`. This is a dictionary where the keys are ({edge|node}.handle, prop.handle) tuples. The values are `Property` objects.

In [117]:
props = mdf.model.props
len(props)

240

In [118]:
list(props.keys())[:3]

[('program', 'program_name'),
 ('program', 'program_acronym'),
 ('program', 'program_short_description')]

In [119]:
list(props.values())[:3]

[<bento_meta.objects.Property at 0x238806da900>,
 <bento_meta.objects.Property at 0x238806d9760>,
 <bento_meta.objects.Property at 0x238806d8ad0>]

In [121]:
primary_disease_site = props[("diagnosis", "primary_disease_site")]
primary_disease_site.get_attr_dict()

{'handle': 'primary_disease_site',
 'model': 'ICDC',
 'is_required': 'Yes',
 'desc': 'The anatomical location at which the primary disease originated, recorded in relatively general terms at the subject level; the anatomical locations from which tumor samples subject to downstream analysis were acquired is recorded in more detailed terms at the sample level.',
 'value_domain': 'value_set',
 'is_strict': 'True'}

#### Properties with Value Sets

Properties with the value_domain "value_set" have the `value_set` attribute (`bento-meta ValueSet`), which has a `terms` attribute (`bento-meta Term` dictionary) .

In [122]:
primary_disease_site.value_set

<bento_meta.objects.ValueSet at 0x238804abd70>

In [124]:
primary_disease_site.value_set.terms

{'Bladder': <bento_meta.objects.Term object at 0x00000238804AB7A0>, 'Bladder, Prostate': <bento_meta.objects.Term object at 0x00000238804A9250>, 'Bladder, Urethra': <bento_meta.objects.Term object at 0x00000238804AB5F0>, 'Bladder, Urethra, Prostate': <bento_meta.objects.Term object at 0x00000238804AAD20>, 'Bladder, Urethra, Vagina': <bento_meta.objects.Term object at 0x00000238804AAB10>, 'Bone': <bento_meta.objects.Term object at 0x00000238804A8440>, 'Bone (Appendicular)': <bento_meta.objects.Term object at 0x00000238804A9C10>, 'Bone (Axial)': <bento_meta.objects.Term object at 0x00000238804AA030>, 'Bone Marrow': <bento_meta.objects.Term object at 0x00000238804A8140>, 'Brain': <bento_meta.objects.Term object at 0x00000238804AA8A0>, 'Carpus': <bento_meta.objects.Term object at 0x00000238FF56AD20>, 'Chest Wall': <bento_meta.objects.Term object at 0x00000238FF56B8C0>, 'Distal Urethra': <bento_meta.objects.Term object at 0x00000238FF56A2D0>, 'Kidney': <bento_meta.objects.Term object at 0x0

`Property` objects with value sets have some useful methods to get to those terms and their values including:
  * `.terms` returns a list of `Term` objects from the property's value set
  * `.values` returns a list of the term values from the property's value set

In [128]:
print(primary_disease_site.terms)

# TIP: this is the same object fround at the ValueSet's `terms` attribute
print(primary_disease_site.terms is primary_disease_site.value_set.terms)

{'Bladder': <bento_meta.objects.Term object at 0x00000238804AB7A0>, 'Bladder, Prostate': <bento_meta.objects.Term object at 0x00000238804A9250>, 'Bladder, Urethra': <bento_meta.objects.Term object at 0x00000238804AB5F0>, 'Bladder, Urethra, Prostate': <bento_meta.objects.Term object at 0x00000238804AAD20>, 'Bladder, Urethra, Vagina': <bento_meta.objects.Term object at 0x00000238804AAB10>, 'Bone': <bento_meta.objects.Term object at 0x00000238804A8440>, 'Bone (Appendicular)': <bento_meta.objects.Term object at 0x00000238804A9C10>, 'Bone (Axial)': <bento_meta.objects.Term object at 0x00000238804AA030>, 'Bone Marrow': <bento_meta.objects.Term object at 0x00000238804A8140>, 'Brain': <bento_meta.objects.Term object at 0x00000238804AA8A0>, 'Carpus': <bento_meta.objects.Term object at 0x00000238FF56AD20>, 'Chest Wall': <bento_meta.objects.Term object at 0x00000238FF56B8C0>, 'Distal Urethra': <bento_meta.objects.Term object at 0x00000238FF56A2D0>, 'Kidney': <bento_meta.objects.Term object at 0x0

In [135]:
print(primary_disease_site.values[20])

print(len(primary_disease_site.values))

print(primary_disease_site.values == list(primary_disease_site.terms.keys()))

Shoulder
29
True


#### Properties via Parent

Model properties can also be accessed via their parent node|edge's `props` attribute, which is a dictionary of properties.

In [111]:
diagnosis_props = nodes["diagnosis"].props
len(diagnosis_props)

14

In [112]:
list(diagnosis_props.keys())[:3]

['diagnosis_id', 'disease_term', 'primary_disease_site']

In [113]:
list(diagnosis_props.values())[:3]

[<bento_meta.objects.Property at 0x238ff578590>,
 <bento_meta.objects.Property at 0x238ff578500>,
 <bento_meta.objects.Property at 0x238804abdd0>]

Properties accesed via their parents are the same Property objects found in `Model.props`.

In [116]:
diagnosis_props["primary_disease_site"] is props[("diagnosis", "primary_disease_site")]

True

### Terms

Model terms are stored in `Model.terms` as a dictionary of `Term` objects. The keys are the term handles, and the values are the `Term` objects.

#### Term Annotations