# Common use cases

This notebook contains a collection of common use cases. Each section is self-contained.

In [1]:
import mammos_entity as me
import pandas as pd

## Combining two `EntityCollections`

### Appending to a collection

We first create two different collections:

In [2]:
ec_1 = me.EntityCollection(x=[0, 5, 10] * me.units.mm, Tc=me.Tc([50, 53, 56]))
ec_1

EntityCollection(
    description='',
    x=<Quantity [ 0.,  5., 10.] mm>,
    Tc=Entity(ontology_label='CurieTemperature', value=array([50., 53., 56.]), unit='K'),
)

In [3]:
ec_2 = me.EntityCollection(x=[0, 10, 20] * me.units.mm, Ms=me.Ms([4e5, 6e5, 7e5]))
ec_2

EntityCollection(
    description='',
    x=<Quantity [ 0., 10., 20.] mm>,
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([400000., 600000., 700000.]), unit='A / m'),
)

Adding an entity from one collection to another can be achieved by simply adding an attribute:

In [4]:
ec_1.Ms = ec_2.Ms

ec_1

EntityCollection(
    description='',
    x=<Quantity [ 0.,  5., 10.] mm>,
    Tc=Entity(ontology_label='CurieTemperature', value=array([50., 53., 56.]), unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([400000., 600000., 700000.]), unit='A / m'),
)

To perform more complex operations with entity collections a temporary conversion to a pandas dataframe is often required. As an example we show how to combine two entity collections using different `pandas` functions.

### Combining with pandas – index-based

First we create two collections:

In [5]:
ec_1 = me.EntityCollection(x=[0, 5, 10] * me.units.mm, Tc=me.Tc([50, 53, 56]))
ec_1

EntityCollection(
    description='',
    x=<Quantity [ 0.,  5., 10.] mm>,
    Tc=Entity(ontology_label='CurieTemperature', value=array([50., 53., 56.]), unit='K'),
)

In [6]:
ec_2 = me.EntityCollection(Ms=me.Ms([4e5, 6e5, 7e5]))
ec_2

EntityCollection(
    description='',
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([400000., 600000., 700000.]), unit='A / m'),
)

We can convert the collections to pandas dataframes and in addition extract the metadata:

In [7]:
data_1 = ec_1.to_dataframe(include_units=False)
metadata_1 = ec_1.metadata()

In [8]:
data_1

Unnamed: 0,x,Tc
0,0.0,50.0
1,5.0,53.0
2,10.0,56.0


In [9]:
metadata_1

{'description': '',
 'x': {'unit': 'mm'},
 'Tc': {'ontology_label': 'CurieTemperature', 'unit': 'K', 'description': ''}}

In [10]:
data_2 = ec_2.to_dataframe(include_units=False)
metadata_2 = ec_2.metadata()

In [11]:
data_2

Unnamed: 0,Ms
0,400000.0
1,600000.0
2,700000.0


In [12]:
metadata_2

{'description': '',
 'Ms': {'ontology_label': 'SpontaneousMagnetization',
  'unit': 'A / m',
  'description': ''}}

We can now combine the columns. Pandas offers multiple different ways:

In [13]:
data_1.join(data_2)

Unnamed: 0,x,Tc,Ms
0,0.0,50.0,400000.0
1,5.0,53.0,600000.0
2,10.0,56.0,700000.0


In [14]:
pd.concat((data_1, data_2), axis=1)

Unnamed: 0,x,Tc,Ms
0,0.0,50.0,400000.0
1,5.0,53.0,600000.0
2,10.0,56.0,700000.0


In [15]:
data_combined = pd.merge(data_1, data_2, left_index=True, right_index=True)
data_combined

Unnamed: 0,x,Tc,Ms
0,0.0,50.0,400000.0
1,5.0,53.0,600000.0
2,10.0,56.0,700000.0


To convert this dataframe back to an entity collection (e.g. to subsequently write it to a file with ontology information) we need to also merge the metadata. We can e.g. add the missing `Ms` key to `metadata_1`:

In [16]:
metadata_1["Ms"] = metadata_2["Ms"]

and can subsequently create a new collection:

In [17]:
me.io.EntityCollection.from_dataframe(data_combined, metadata_1)

EntityCollection(
    description='',
    x=<Quantity [ 0.,  5., 10.] mm>,
    Tc=Entity(ontology_label='CurieTemperature', value=array([50., 53., 56.]), unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([400000., 600000., 700000.]), unit='A / m'),
)

### Combining with pandas – merge based on specific column(s)

This example shows how to merge two EntityCollections that have a shared entity. We use the shared entity as key to merge on.

First, we create two collections:

In [18]:
ec_1 = me.EntityCollection(x=[0, 5, 10] * me.units.mm, Tc=me.Tc([50, 53, 56]))
ec_1

EntityCollection(
    description='',
    x=<Quantity [ 0.,  5., 10.] mm>,
    Tc=Entity(ontology_label='CurieTemperature', value=array([50., 53., 56.]), unit='K'),
)

In [19]:
ec_2 = me.EntityCollection(x=[0, 10, 20] * me.units.mm, Ms=me.Ms([4e5, 6e5, 7e5]))
ec_2

EntityCollection(
    description='',
    x=<Quantity [ 0., 10., 20.] mm>,
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([400000., 600000., 700000.]), unit='A / m'),
)

Both collections have the entity-like `x`, for which a subset of values are identical.

We convert the collections to pandas dataframes and in addition extract the metadata:

In [20]:
data_1 = ec_1.to_dataframe(include_units=False)
metadata_1 = ec_1.metadata()

In [21]:
data_1

Unnamed: 0,x,Tc
0,0.0,50.0
1,5.0,53.0
2,10.0,56.0


In [22]:
metadata_1

{'description': '',
 'x': {'unit': 'mm'},
 'Tc': {'ontology_label': 'CurieTemperature', 'unit': 'K', 'description': ''}}

In [23]:
data_2 = ec_2.to_dataframe(include_units=False)
metadata_2 = ec_2.metadata()

In [24]:
data_2

Unnamed: 0,x,Ms
0,0.0,400000.0
1,10.0,600000.0
2,20.0,700000.0


In [25]:
metadata_2

{'description': '',
 'x': {'unit': 'mm'},
 'Ms': {'ontology_label': 'SpontaneousMagnetization',
  'unit': 'A / m',
  'description': ''}}

We can now merge the two dataframes. By default `pandas` will merge on all columns with identical names and only keep data present in both dataframes:

In [26]:
data_combined = pd.merge(data_1, data_2)
data_combined

Unnamed: 0,x,Tc,Ms
0,0.0,50.0,400000.0
1,10.0,56.0,600000.0


To convert this dataframe back to an entity collection (e.g. to subsequently write it to a file with ontology information) we need to also merge the metadata. We can e.g. add the missing `Ms` key to `metadata_1`:

In [27]:
metadata_1["Ms"] = metadata_2["Ms"]

We can now create a new collection using the new dataframe and the updated metadata:

In [28]:
ec_combined = me.io.EntityCollection.from_dataframe(data_combined, metadata_1)
ec_combined

EntityCollection(
    description='',
    x=<Quantity [ 0., 10.] mm>,
    Tc=Entity(ontology_label='CurieTemperature', value=array([50., 56.]), unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([400000., 600000.]), unit='A / m'),
)

Pandas' [`merge`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html) function is very powerful. In the following we show how to keep all rows present in the first dataframe, for more details refer to the pandas documentation.

We can use `how="left"` to let pandas keep all rows present in the left dataframe. Missing data in the right dataframe will be filled with `NaN`s:

In [29]:
data_combined = pd.merge(data_1, data_2, how="left")
data_combined

Unnamed: 0,x,Tc,Ms
0,0.0,50.0,400000.0
1,5.0,53.0,
2,10.0,56.0,600000.0


In [30]:
me.io.EntityCollection.from_dataframe(data_combined, metadata_1)

EntityCollection(
    description='',
    x=<Quantity [ 0.,  5., 10.] mm>,
    Tc=Entity(ontology_label='CurieTemperature', value=array([50., 53., 56.]), unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([400000.,     nan, 600000.]), unit='A / m'),
)

## Appending rows to an `EntityCollection` (values to all individual entities)

In this example we show how to add an additional value to each entity in a collection. All entities in our collection have the same length and are one-dimensional, so we can understand this as adding a row to our entity table.

First, we create two collections with the same entities:

In [31]:
ec_1 = me.io.EntityCollection(x=[0, 5, 10], Tc=me.Tc([50, 51, 52]), Ms=me.Ms([4e5, 6e5, 7e5]))
ec_1

EntityCollection(
    description='',
    x=[0, 5, 10],
    Tc=Entity(ontology_label='CurieTemperature', value=array([50., 51., 52.]), unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([400000., 600000., 700000.]), unit='A / m'),
)

In [32]:
ec_2 = me.io.EntityCollection(x=[15], Tc=me.Tc([53]), Ms=me.Ms([8e5]))
ec_2

EntityCollection(
    description='',
    x=[15],
    Tc=Entity(ontology_label='CurieTemperature', value=array([53.]), unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([800000.]), unit='A / m'),
)

We convert both collections to dataframes and extract the metadata (which is identical for both collections so we only need it once):

In [33]:
data_1 = ec_1.to_dataframe()
data_2 = ec_2.to_dataframe()
metadata = ec_1.metadata()

In [34]:
data_1

Unnamed: 0,x,Tc,Ms
0,0,50.0,400000.0
1,5,51.0,600000.0
2,10,52.0,700000.0


In [35]:
data_2

Unnamed: 0,x,Tc,Ms
0,15,53.0,800000.0


We can now combine the tables using:

In [36]:
combined = pd.concat((data_1, data_2), ignore_index=True)
combined

Unnamed: 0,x,Tc,Ms
0,0,50.0,400000.0
1,5,51.0,600000.0
2,10,52.0,700000.0
3,15,53.0,800000.0


The columns have not changed, so the metadata can stay unchanged and we can convert the result back into an `EntityCollection`:

In [37]:
me.io.EntityCollection.from_dataframe(combined, metadata)

EntityCollection(
    description='',
    x=array([ 0,  5, 10, 15]),
    Tc=Entity(ontology_label='CurieTemperature', value=array([50., 51., 52., 53.]), unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([400000., 600000., 700000., 800000.]), unit='A / m'),
)