# Entity collections

In [1]:
import mammos_entity as me
import pandas as pd

To group multiple entities together, e.g. when reading a file with `mammos_entity.io.entities_from_file`, `mammos_entity` provides the class `EntityCollection`. It currently is part of the `io` module.

Entities can be passed to it when creating the entity:

In [2]:
collection = me.io.EntityCollection(
    Tc=me.Tc([10, 100], "K"),
    Ms=me.Ms([50, 60]).q,
)
collection

EntityCollection(
    description='',
    Tc=Entity(ontology_label='CurieTemperature', value=array([ 10., 100.]), unit='K'),
    Ms=<Quantity [50., 60.] A / m>,
)

Additional entities can be added at any later point by just adding a new attribute to the collection:

In [3]:
collection.A = me.A([8e-12, 9e-12]).value
collection

EntityCollection(
    description='',
    Tc=Entity(ontology_label='CurieTemperature', value=array([ 10., 100.]), unit='K'),
    Ms=<Quantity [50., 60.] A / m>,
    A=array([8.e-12, 9.e-12]),
)

Similarly, elements of the collections can be accessed by dot notation:

In [4]:
collection.Tc

In [5]:
collection.Ms

<Quantity [50., 60.] A / m>

In [6]:
collection.A

array([8.e-12, 9.e-12])

## Conversion to and from dataframe

If all entities in the collection are one-dimensional and the same length, the collection can be converted to a pandas dataframe:

In [7]:
collection.to_dataframe()

Unnamed: 0,Tc (K),Ms (A / m),A
0,10.0,50.0,8e-12
1,100.0,60.0,9e-12


By default, units are included in the column headers. This can be turned off:

In [8]:
data = collection.to_dataframe(include_units=False)
data

Unnamed: 0,Tc,Ms,A
0,10.0,50.0,8e-12
1,100.0,60.0,9e-12


It is also possible to convert a dataframe back to an `EntityCollection`. The dataframe does not carry enough metadata (ontology information is missing, units are not always present). Therefore, the additional metadata has to be provided as a dictionary. When starting from an `EntityCollection` the metadata dictionary can be created as follows: 

In [9]:
metadata = collection.entity_metadata()
metadata

{'Tc': {'ontology_label': 'CurieTemperature', 'unit': 'K', 'description': ''},
 'Ms': {'ontology_label': None, 'unit': Unit("A / m"), 'description': None},
 'A': {'ontology_label': None, 'unit': None, 'description': None}}

We can now create a new entity collection using the dataframe `data` and the `metadata` dictionary:

In [10]:
me.io.EntityCollection.from_dataframe(data, metadata)

EntityCollection(
    description='',
    Tc=Entity(ontology_label='CurieTemperature', value=array([ 10., 100.]), unit='K'),
    Ms=<Quantity [50., 60.] A / m>,
    A=array([8.e-12, 9.e-12]),
)

Metadata lookup is done by column name. Therefore, only dataframes without units are supported. If you have a dataframe with incompatible headers, you need to first align names of the header and metadata keys.

We can modify the dataframe and/or metadata dictionary before creating the `EntityCollection`. As an example, we add the missing ontology information to the metadata and scale data for the `A` column by 2:

In [11]:
data["A"] *= 2

In [12]:
metadata["Ms"]["ontology_label"] = "SpontaneousMagnetization"
metadata["Ms"]["description"] = ""

In [13]:
me.io.EntityCollection.from_dataframe(data, metadata)

EntityCollection(
    description='',
    Tc=Entity(ontology_label='CurieTemperature', value=array([ 10., 100.]), unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([50., 60.]), unit='A / m'),
    A=array([1.6e-11, 1.8e-11]),
)

The examples above are somewhat academic. A more useful example is given in the following section.

## Merging two collections

To perform more complex operations with entity collections a temporary conversion to a pandas dataframe is often required. As an example we show how to combine two entity collections using `pandas.merge`.

We first create two different collections:

In [14]:
ec_1 = me.io.EntityCollection(x=[0, 5, 10], Tc=me.Tc([50, 53, 56]))
ec_2 = me.io.EntityCollection(x=[0, 10, 20], Ms=me.Ms([4e5, 6e5, 7e5]))

In [15]:
data_1 = ec_1.to_dataframe(include_units=False)
metadata_1 = ec_1.entity_metadata()

data_2 = ec_2.to_dataframe(include_units=False)
metadata_2 = ec_2.entity_metadata()

In [16]:
data_1

Unnamed: 0,x,Tc
0,0,50.0
1,5,53.0
2,10,56.0


In [17]:
data_2

Unnamed: 0,x,Ms
0,0,400000.0
1,10,600000.0
2,20,700000.0


### Merging by name

We can now merge the two tables. By default pandas will use the shared column `x` to find matching rows and only pick rows that appear in both colums:

In [18]:
data_combined = pd.merge(data_1, data_2)
data_combined

Unnamed: 0,x,Tc,Ms
0,0,50.0,400000.0
1,10,56.0,600000.0


To convert this back to an entity collection, e.g. to subsequently write it to a file with ontology information we need to also merge the metadata. We can e.g. add the missing `Ms` key to `metadata_1`:

In [19]:
metadata_1["Ms"] = metadata_2["Ms"]

In [20]:
ec_combined = me.io.EntityCollection.from_dataframe(data_combined, metadata_1)
ec_combined

EntityCollection(
    description='',
    x=array([ 0, 10]),
    Tc=Entity(ontology_label='CurieTemperature', value=array([50., 56.]), unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([400000., 600000.]), unit='A / m'),
)

Pandas' [`merge`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html) function is very powerful. In the following we will show two more common use cases, for more details check the pandas documentation.

First, we can select a different mechanism to pick rows in the table. E.g., we can use `how="left"` to let pandas keep all rows present in the left dataframe. Missing data in the right dataframe will be filled with `NaN`s:

In [21]:
data_combined = pd.merge(data_1, data_2, how="left")
data_combined

Unnamed: 0,x,Tc,Ms
0,0,50.0,400000.0
1,5,53.0,
2,10,56.0,600000.0


In [22]:
me.io.EntityCollection.from_dataframe(data_combined, metadata_1)

EntityCollection(
    description='',
    x=array([ 0,  5, 10]),
    Tc=Entity(ontology_label='CurieTemperature', value=array([50., 53., 56.]), unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([400000.,     nan, 600000.]), unit='A / m'),
)

### Merging by index

Second, we show how to merge tables with no matching keys, where the merge is done purely on position. For that we first delete the element `x` from the second collection:

In [23]:
del ec_2.x
ec_2

EntityCollection(
    description='',
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([400000., 600000., 700000.]), unit='A / m'),
)

In [24]:
data_2 = ec_2.to_dataframe(include_units=False)
data_2

Unnamed: 0,Ms
0,400000.0
1,600000.0
2,700000.0


In [25]:
data_combined = pd.merge(data_1, data_2, left_index=True, right_index=True)
data_combined

Unnamed: 0,x,Tc,Ms
0,0,50.0,400000.0
1,5,53.0,600000.0
2,10,56.0,700000.0


In [26]:
me.io.EntityCollection.from_dataframe(data_combined, metadata_1)

EntityCollection(
    description='',
    x=array([ 0,  5, 10]),
    Tc=Entity(ontology_label='CurieTemperature', value=array([50., 53., 56.]), unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([400000., 600000., 700000.]), unit='A / m'),
)

Pandas offers multiple ways of joining by index, you could also use:

In [27]:
data_1.join(data_2)

Unnamed: 0,x,Tc,Ms
0,0,50.0,400000.0
1,5,53.0,600000.0
2,10,56.0,700000.0


or

In [28]:
pd.concat((data_1, data_2), axis=1)

Unnamed: 0,x,Tc,Ms
0,0,50.0,400000.0
1,5,53.0,600000.0
2,10,56.0,700000.0


### Appending to a collection

If you know that all entities are of the same length (and do not need the safety of pandas re-checking that) *and* you want to merge by index (i.e. need no reordering) you can also directly add entities from one collection to another:

In [29]:
ec_1

EntityCollection(
    description='',
    x=[0, 5, 10],
    Tc=Entity(ontology_label='CurieTemperature', value=array([50., 53., 56.]), unit='K'),
)

In [30]:
ec_1.Ms = ec_2.Ms

In [31]:
ec_1

EntityCollection(
    description='',
    x=[0, 5, 10],
    Tc=Entity(ontology_label='CurieTemperature', value=array([50., 53., 56.]), unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([400000., 600000., 700000.]), unit='A / m'),
)

## Appending rows (values to individual entities)

You can also use pandas to append data to the individual entities by appending new rows (possibly also combining two dataframes). First, we create a new collection:

In [32]:
ec_1 = me.io.EntityCollection(x=[0, 5, 10], Tc=me.Tc([50, 51, 52]), Ms=me.Ms([4e5, 6e5, 7e5]))
ec_1

EntityCollection(
    description='',
    x=[0, 5, 10],
    Tc=Entity(ontology_label='CurieTemperature', value=array([50., 51., 52.]), unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([400000., 600000., 700000.]), unit='A / m'),
)

In [33]:
ec_2 = me.io.EntityCollection(x=[15], Tc=me.Tc([53]), Ms=me.Ms([8e5]))
ec_2

EntityCollection(
    description='',
    x=[15],
    Tc=Entity(ontology_label='CurieTemperature', value=array([53.]), unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([800000.]), unit='A / m'),
)

In [34]:
data_1 = ec_1.to_dataframe(include_units=False)
data_2 = ec_2.to_dataframe(include_units=False)
metadata = ec_1.entity_metadata()
data

Unnamed: 0,Tc,Ms,A
0,10.0,50.0,1.6e-11
1,100.0,60.0,1.8e-11


We can add individual values as follows:

https://pandas.pydata.org/pandas-docs/version/1.3/reference/api/pandas.DataFrame.append.html

In [35]:
comdined = pd.concat((data_1, data_2), ignore_index=True)
comdined

Unnamed: 0,x,Tc,Ms
0,0,50.0,400000.0
1,5,51.0,600000.0
2,10,52.0,700000.0
3,15,53.0,800000.0


In [36]:
me.io.EntityCollection.from_dataframe(comdined, metadata)

EntityCollection(
    description='',
    x=array([ 0,  5, 10, 15]),
    Tc=Entity(ontology_label='CurieTemperature', value=array([50., 51., 52., 53.]), unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([400000., 600000., 700000., 800000.]), unit='A / m'),
)