# NOTE: this should become part of the io notebook, which however requires some cleanup first. Therefor, I have for now created a separate notebook. I will combine the two once I have finished reviewing the io notebook.

In [1]:
import mammos_entity as me

## Writing and reading HDF5

`mammos_entity` provides support for writing entities and entity collections to HDF5 and reading them back in.

HDF5 provides many options and it is the user's responsibility to create the file.

In [2]:
import h5py

### Single entity

As a first example we create a single entity with four values an save it to disk:

In [3]:
T = me.T([10, 20, 50, 100], "K")
T

To save it to file, we need to pass an open file/group in a file and a name for the newly created dataset.

In [4]:
with h5py.File("test.hdf5", "w") as f:
    T.to_hdf5(f, "temperature")

We can inspect the content of the file using [`h5glance`](https://pypi.org/project/h5glance/):

In [5]:
!h5glance --attrs test.hdf5

[94mtest.hdf5[0m
└[1mtemperature[0m	[float64: 4]
  └4 attributes:
    ├description: ''
    ├ontology_iri: 'https://w3id.org/em...2_86c6_69e26182a17f'
    ├ontology_label: 'ThermodynamicTemperature'
    └unit: 'K'



We can see that we got a single dataset `temperature` with data of type `float64` and our elements (we don't see the actual values). Furthermore, the temperature dataset contains metedata attributes for description, ontology information and the unit.

We can read the dataset and get an entity back:

In [6]:
with h5py.File("test.hdf5") as f:
    print(me.io.from_hdf5(f["/temperature"]))

ThermodynamicTemperature(value=[ 10.  20.  50. 100.], unit=K)


### Entity collection

To group together multiple entities we use `EntityCollection`s. For HDF5 `mammos_entity` maps the collection to an HDF5 group.

In [7]:
collection = me.EntityCollection(
    description="intrinsic properties",
    Tc=me.Tc(800, "K"),
    Ms=me.Ms(600, "kA/m"),
)
collection

EntityCollection(
    description='intrinsic properties',
    Tc=Entity(ontology_label='CurieTemperature', value=800.0, unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=600.0, unit='kA / m'),
)

We write it as before, this time passing a name for the newly created group. Each entity of the collection will be stored as a dataset inside the newly created group. The names of these datasets will be the names of the entities in the collection.

In [8]:
with h5py.File("test.hdf5", "a") as f:  # append to the file created before
    collection.to_hdf5(f, "/properties")

In [9]:
!h5glance --attrs test.hdf5

[94mtest.hdf5[0m
├[94mproperties[0m
│ ├1 attributes:
│ │ └description: 'intrinsic properties'
│ ├[1mTc[0m	[float64: scalar]
│ │ └4 attributes:
│ │   ├description: ''
│ │   ├ontology_iri: 'https://w3id.org/em...3_a1d6_54c9f778343d'
│ │   ├ontology_label: 'CurieTemperature'
│ │   └unit: 'K'
│ └[1mMs[0m	[float64: scalar]
│   └4 attributes:
│     ├description: ''
│     ├ontology_iri: 'https://w3id.org/em...b-9c9d-6dafaa17ef25'
│     ├ontology_label: 'SpontaneousMagnetization'
│     └unit: 'kA / m'
└[1mtemperature[0m	[float64: 4]
  └4 attributes:
    ├description: ''
    ├ontology_iri: 'https://w3id.org/em...2_86c6_69e26182a17f'
    ├ontology_label: 'ThermodynamicTemperature'
    └unit: 'K'



We can see that our HDF5 file now has two top-level elements:
- the dataset `temperature` created in the first step
- the group `properties` with two datasets `Tc` and `Ms` created from the collection. The group attributes contain the collection description.

We can read the group and get a new EntityCollection:

In [10]:
with h5py.File("test.hdf5") as f:
    print(me.io.from_hdf5(f["/properties"]))

EntityCollection(
    description='intrinsic properties',
    Tc=Entity(ontology_label='CurieTemperature', value=800.0, unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=600.0, unit='kA / m'),
)


We can also read the whole file. We get two nested collections:

In [11]:
with h5py.File("test.hdf5") as f:
    print(me.io.from_hdf5(f))

EntityCollection(
    description='',
    properties=EntityCollection(
        description='intrinsic properties',
        Tc=Entity(ontology_label='CurieTemperature', value=800.0, unit='K'),
        Ms=Entity(ontology_label='SpontaneousMagnetization', value=600.0, unit='kA / m'),
    ),
    temperature=Entity(ontology_label='ThermodynamicTemperature', value=array([ 10.,  20.,  50., 100.]), unit='K'),
)


### Multiple writes

We can add as many datasets/groups anywhere in the HDF5 file we like. We are also not limited to only writing data with `mammos_entity` and instead can also add other data.

Appart from the methods shown before there is also a function `me.io.to_hdf5` that can write any entity-like or entity collection to HDF5. The method will automatically store the additional metadata required for EntityCollection/Entity/Quantity as required for reading the file with mammos-entity. The first argument is the data to be written.

Function and method can be used interchangably.

In [12]:
with h5py.File("test.hdf5", "w") as f:  # overwrite the file created before
    Hc = me.Hc(300, "kA/m")
    me.io.to_hdf5(Hc, f, "/Hc")

    # implicitly create a new group
    me.io.to_hdf5(me.Entity("Length", 5, "nm", description="edge length x"), f, "/geometry/x")

    # pass a group instead of a file
    me.Entity("Length", 10, "nm", description="edge length y").to_hdf5(f["/geometry"], "y")

    me.io.to_hdf5(collection, f, "intrinsic properties")

    # a quantity
    me.io.to_hdf5(5 * me.units.mm**2, f, "/intrinsic properties/q")

    # additional data, making use of other options available in create_dataset
    f.create_dataset("raw data", data=[0.1, 0.2, 0.3, 0.5, 0.9], dtype="float32")

In [13]:
!h5glance test.hdf5

[94mtest.hdf5[0m
├[1mHc[0m	[float64: scalar] (4 attributes)
├[94mgeometry[0m
│ ├[1mx[0m	[float64: scalar] (4 attributes)
│ └[1my[0m	[float64: scalar] (4 attributes)
├[94mintrinsic properties[0m (1 attributes)
│ ├[1mTc[0m	[float64: scalar] (4 attributes)
│ ├[1mMs[0m	[float64: scalar] (4 attributes)
│ └[1mq[0m	[float64: scalar] (1 attributes)
└[1mraw data[0m	[float32: 5]



We can read the whole file as a single nested entity collection. It will read all groups/datasets and choose the most appropriate type:

In [14]:
with h5py.File("test.hdf5") as f:
    content = me.io.from_hdf5(f)

content

EntityCollection(
    description='',
    Hc=Entity(ontology_label='CoercivityHcExternal', value=300.0, unit='kA / m'),
    geometry=EntityCollection(
        description='',
        x=Entity(ontology_label='Length', value=5.0, unit='nm', description='edge length x'),
        y=Entity(ontology_label='Length', value=10.0, unit='nm', description='edge length y'),
    ),
    intrinsic properties=EntityCollection(
        description='intrinsic properties',
        Tc=Entity(ontology_label='CurieTemperature', value=800.0, unit='K'),
        Ms=Entity(ontology_label='SpontaneousMagnetization', value=600.0, unit='kA / m'),
        q=<Quantity 5. mm2>,
    ),
    raw data=array([0.1, 0.2, 0.3, 0.5, 0.9], dtype=float32),
)

We can access individual elements of the nested structure:

In [15]:
content.Hc

In [16]:
content.geometry.x

Some of the names are not valid python variables, so we have to use the dict interface of EntityCollection:

In [17]:
content["intrinsic properties"].Tc

In [18]:
content["intrinsic properties"].q

<Quantity 5. mm2>

In [19]:
content["raw data"]

array([0.1, 0.2, 0.3, 0.5, 0.9], dtype=float32)