# Entity collections

To group multiple entities together, e.g. when reading a file with `mammos_entity.io.entities_from_file`, `mammos_entity` provides the class `EntityCollection`.


In [1]:
import mammos_entity as me

## `EntityCollection` basics

Entities can be passed as keyword arguments when creating the collection. In addition, the collection can have a description:

In [2]:
collection = me.io.EntityCollection(
    description="Some random test data.\n\nDescriptions can have multiple lines.",
    Tc=me.Tc([10, 100], "K"),
    Ms=me.Ms([50, 60]),
)
collection

EntityCollection(
    description='Some random test data.\n\nDescriptions can have multiple lines.',
    Tc=Entity(ontology_label='CurieTemperature', value=array([ 10., 100.]), unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([50., 60.]), unit='A / m'),
)

`EntityCollection` is not limited to storing `Entity` objects. It also accepts `astropy.units.Quantity` objects (the object returned from `Entity.quantity` or `Entity.q`) or other data (list/tuple/numpy array/etc.). We refer to all these objects as *entity-like*. (Implementation detail: no checks are performed when passing an entity-like. Some operations later on may however fail/produce surprising results if unsuitable elements have been passed as entity-likes.)

When possible you should use `Entity` objects. This is however not always possible as the ontology does not cover everything.

### Accessing elements
We can access the entities in the collection using two different ways. First, we can access entities via their name using:

In [3]:
collection.Tc

This method is limited to entity names that are valid Python names. Furthermore, `EntityCollection` has a number of methods. If you have an entity with the same name you cannot access it via attribute access (you would get the method instead). Therefore, we can also access entities using:

In [4]:
collection["Ms"]

In addition the collection carries a description, which we can access with (we print it because our description is multi-line):

In [5]:
print(collection.description)

Some random test data.

Descriptions can have multiple lines.


Defining an entity with the name `description` is not allowed.

### Adding or overwriting elements

Additional entities can be added at any later point by just adding a new attribute to the collection:

In [6]:
collection.A = [8e-12, 9e-12]
collection

EntityCollection(
    description='Some random test data.\n\nDescriptions can have multiple lines.',
    Tc=Entity(ontology_label='CurieTemperature', value=array([ 10., 100.]), unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([50., 60.]), unit='A / m'),
    A=[8e-12, 9e-12],
)

Likewise, we can add entities using:

In [7]:
collection["B_ext"] = me.B(1, "T")

Both methods are generally equivalent. If you need an entity with the same name as one of the methods of `EntityCollection` only the latter way of adding it works. You should avoid reusing method names if you can.

Our collection now carries the following elements:

In [8]:
collection

EntityCollection(
    description='Some random test data.\n\nDescriptions can have multiple lines.',
    Tc=Entity(ontology_label='CurieTemperature', value=array([ 10., 100.]), unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([50., 60.]), unit='A / m'),
    A=[8e-12, 9e-12],
    B_ext=Entity(ontology_label='MagneticFluxDensity', value=np.float64(1.0), unit='T'),
)

If an entity with the given name exists already it will be overwritten. We can use both access methods.

First, we replace the entity `Ms` with a quantity `Ms`:

In [9]:
collection.Ms = me.Ms([400, 500], "kA/m").q
collection

EntityCollection(
    description='Some random test data.\n\nDescriptions can have multiple lines.',
    Tc=Entity(ontology_label='CurieTemperature', value=array([ 10., 100.]), unit='K'),
    Ms=<Quantity [400., 500.] kA / m>,
    A=[8e-12, 9e-12],
    B_ext=Entity(ontology_label='MagneticFluxDensity', value=np.float64(1.0), unit='T'),
)

Second, we overwrite `B_ext` with a new entity:

In [10]:
collection["B_ext"] = me.B([1, 1.2], "T")
collection

EntityCollection(
    description='Some random test data.\n\nDescriptions can have multiple lines.',
    Tc=Entity(ontology_label='CurieTemperature', value=array([ 10., 100.]), unit='K'),
    Ms=<Quantity [400., 500.] kA / m>,
    A=[8e-12, 9e-12],
    B_ext=Entity(ontology_label='MagneticFluxDensity', value=array([1. , 1.2]), unit='T'),
)

:::{note}
- Entities are immutable. You can instead use this method to replace an entity in a collection.
- You can use the same mechanism to add an entity from one collection to another.
:::

### Checking if an element is in the collection

To check if an entity-like with a given name exists in the collection use:

In [11]:
"Ms" in collection

True

In [12]:
"Js" in collection

False

### Iterating over all entities in the collection

We can iterate over all entity-likes in the collection: we get tuples `(name, entity_like)`.

In the following example we print `name`, `entity_like` and type of `entity_like` for each element in the collection:

In [13]:
for name, entity_like in collection:
    print(f"{name}\t{entity_like} of type '{type(entity_like).__name__}'")

Tc	CurieTemperature(value=[ 10. 100.], unit=K) of type 'Entity'
Ms	[400. 500.] kA / m of type 'Quantity'
A	[8e-12, 9e-12] of type 'list'
B_ext	MagneticFluxDensity(value=[1.  1.2], unit=T) of type 'Entity'


### Removing elements

We can remove elements from the collection using `del`:

In [14]:
del collection.B_ext  # equivalent alternative: del collection["B_ext"]
collection

EntityCollection(
    description='Some random test data.\n\nDescriptions can have multiple lines.',
    Tc=Entity(ontology_label='CurieTemperature', value=array([ 10., 100.]), unit='K'),
    Ms=<Quantity [400., 500.] kA / m>,
    A=[8e-12, 9e-12],
)

## Saving to file

`EntityCollections` can be stored as `yaml` files:

In [15]:
collection.to_file("example.yaml")

More details are provided in [the io documentation](io.ipynb).

## Conversion to and from dataframe

If all entities in the collection are one-dimensional and have the same length, the collection can be converted to a pandas dataframe:

In [16]:
data = collection.to_dataframe()
data

Unnamed: 0,Tc,Ms,A
0,10.0,400.0,8e-12
1,100.0,500.0,9e-12


By default, only the name of the entity in the collection is used as header.

Units can optionally be included in the column header, the ontology information is however always lost in the dataframe:

In [17]:
collection.to_dataframe(include_units=True)

Unnamed: 0,Tc (K),Ms (kA / m),A
0,10.0,400.0,8e-12
1,100.0,500.0,9e-12


It is also possible to convert a dataframe back to an `EntityCollection`. The dataframe does not carry enough metadata (ontology information is missing, units are not always present). Therefore, the additional metadata has to be provided as a dictionary. When starting from an `EntityCollection` the metadata dictionary can be created as follows: 

In [18]:
metadata = collection.metadata()
metadata

{'description': 'Some random test data.\n\nDescriptions can have multiple lines.',
 'Tc': {'ontology_label': 'CurieTemperature', 'unit': 'K', 'description': ''},
 'Ms': {'unit': 'kA / m'},
 'A': {}}

Ignoring the special key `description`, each key corresponds to one entity-like in the collection. The values are dictionaries whose keys depend on the type of the entity-like:
- for entities it has keys `ontology_label`, `unit` and `description`
- for quantities it has key `unit`
- otherwise it is empty

We can now create a new entity collection using the dataframe `data` and the `metadata` dictionary:

In [19]:
me.io.EntityCollection.from_dataframe(data, metadata)

EntityCollection(
    description='Some random test data.\n\nDescriptions can have multiple lines.',
    Tc=Entity(ontology_label='CurieTemperature', value=array([ 10., 100.]), unit='K'),
    Ms=<Quantity [400., 500.] kA / m>,
    A=array([8.e-12, 9.e-12]),
)

Metadata lookup is done by column name. Therefore, only dataframes without units are supported (because keys in the metadata dictionary do not contain units). If you have a dataframe with incompatible headers, e.g. with units, you need to first align column names and metadata keys.

We can modify the dataframe and/or metadata dictionary before creating the `EntityCollection`. As an example, we add the missing ontology information for `Ms` to the metadata and scale data for the `A` column by 2:

In [20]:
data["A"] *= 2

In [21]:
metadata["Ms"]["ontology_label"] = "SpontaneousMagnetization"

In [22]:
me.io.EntityCollection.from_dataframe(data, metadata)

EntityCollection(
    description='Some random test data.\n\nDescriptions can have multiple lines.',
    Tc=Entity(ontology_label='CurieTemperature', value=array([ 10., 100.]), unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([400., 500.]), unit='kA / m'),
    A=array([1.6e-11, 1.8e-11]),
)

Conversion to a dataframe can e.g. be useful to combine two `EntityCollections` in advanced ways. More details in [this tutorial](useful_operations.ipynb).