# `mammos_entity.io`: reading and writing entities

In [1]:
import mammos_entity as me
import mammos_units as u

## Supported file format

`mammos_entity.io` can read and write csv files containing entity like objects in tabluar format.

The file is structured as follows:
- The first line is commented and contains the preferred ontology label.
- The second line is commented and contains the ontology IRI.
- The third line is commented and contains units.
- The fourth line contains short labels used to refer to individual columns when working with the data, e.g. in a `pandas` dataframe. Omitting spaces in this string is advisable.  
  Ideally this string is the short ontology label.
- All remaining lines contain data.
- Comment lines start with a # as first character. Inline comments are not supported.

If a column has no ontology entry lines 1 and 2 are empty for this column.

If a column has no units (with or without ontology entry) row 3 has no entry for this
column.

Here is an example with four columns:
- an index with no units or ontology label
- spontaneous magnetization from the ontology
- a made-up quantity alpha with a unit but no ontology
- demagnetizing factor with an ontology entry but no unit.

To keep this example short the actual IRIs are omitted:

```
#,SpontaneousMagnetization,,DemagnetizingFactor
#,https://w3id.org/emm/...,,https://w3id.org/emmo/...
#,kA/m,s^2,
index,Ms,alpha,DemagnetizingFactor
0,1e5,1.2,1
1,1e5,3.4,0.5
2,1e5,5.6,0.5
```

## Example

We create some artificial data to write to a csv file.

In [2]:
Ms = me.Ms([1e6, 2e6, 3e6])
T = me.T([1, 2, 3])
theta_angle = [0, 0.5, 0.7] * u.rad
demag_factor = me.Entity("DemagnetizingFactor", [1 / 3, 1 / 3, 1 / 3])
comments = ["Some comment", "Some other comment", "A third comment"]

### Writing
We can write them to a csv file as shown in the following cell. Names of the keyword arguments determine column names in the file.

In [3]:
me.io.entities_to_csv("example.csv", Ms=Ms, T=T, angle=theta_angle, demag_factor=demag_factor, comment=comments)

This has produced the following file:

In [4]:
print(open("example.csv").read())  # noqa: SIM115

#SpontaneousMagnetization,ThermodynamicTemperature,,DemagnetizingFactor,
#https://w3id.org/emmo/domain/magnetic_material#EMMO_032731f8-874d-5efb-9c9d-6dafaa17ef25,https://w3id.org/emmo#EMMO_affe07e4_e9bc_4852_86c6_69e26182a17f,,https://w3id.org/emmo/domain/magnetic_material#EMMO_0f2b5cc9-d00a-5030-8448-99ba6b7dfd1e,
#A / m,K,rad,,
Ms,T,angle,demag_factor,comment
1000000.0,1.0,0.0,0.3333333333333333,Some comment
2000000.0,2.0,0.5,0.3333333333333333,Some other comment
3000000.0,3.0,0.7,0.3333333333333333,A third comment



### Reading
We can read it back in and get a container object (called EntityCollection) containing all columns:

In [5]:
content = me.io.entities_from_csv("example.csv")
content

EntityCollection(
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=array([1000000., 2000000., 3000000.]), unit='A / m'),
    T=Entity(ontology_label='ThermodynamicTemperature', value=array([1., 2., 3.]), unit='K'),
    angle=<Quantity [0. , 0.5, 0.7] rad>,
    demag_factor=Entity(ontology_label='DemagnetizingFactor', value=array([0.33333333, 0.33333333, 0.33333333])),
    comment=array(['Some comment', 'Some other comment', 'A third comment'],
      dtype=object),
)

We can access the individual elements:

In [6]:
content.Ms

In [7]:
content.T

In [8]:
content.angle

<Quantity [0. , 0.5, 0.7] rad>

In [9]:
content.demag_factor

In [10]:
content.comment

array(['Some comment', 'Some other comment', 'A third comment'],
      dtype=object)

We can also get a `pandas` dataframe of the data we have read:

In [11]:
content.to_dataframe()

Unnamed: 0,Ms,T,angle,demag_factor,comment
0,1000000.0,1.0,0.0,0.333333,Some comment
1,2000000.0,2.0,0.5,0.333333,Some other comment
2,3000000.0,3.0,0.7,0.333333,A third comment


### Reading with `pandas`
If we only need the numerical data but not the entity information, we can also read the csv file with pandas:

In [12]:
import pandas as pd

pd.read_csv("example.csv", comment="#")

Unnamed: 0,Ms,T,angle,demag_factor,comment
0,1000000.0,1.0,0.0,0.333333,Some comment
1,2000000.0,2.0,0.5,0.333333,Some other comment
2,3000000.0,3.0,0.7,0.333333,A third comment


### Check that data has not changed
We can compare with the original data:

In [13]:
Ms == content.Ms

True

In [14]:
T == content.T

True

In [15]:
theta_angle == content.angle

array([ True,  True,  True])

In [16]:
demag_factor == content.demag_factor

True

In [17]:
comments == content.comment

array([ True,  True,  True])

# FAQ: Converting unformatted csv files and tables

The syntax of the csv file readable in this setting is very strict. Reading csv or other tabular files written in the wrong format require some extra steps.

## If the file can be read using `pandas`

Let's assume that the we want to read a hysteresis loop written in a `dat` file with this structure:
```dat
1 10.0 1.6083568305976572 -16778187.088808443
1 9.0 1.6083393931987826 -15498304.121589921
1 8.0 1.6083184361075116 -14218436.37373519
1 7.0 1.608292941666901 -12938587.029585946
1 6.0 1.6082614950059932 -11658760.230932372
1 5.0 1.608222081883206 -10378961.467100028
1 4.0 1.6081717550468129 -9099198.173237378
1 3.0 1.6081060612831548 -7819480.6795090465
1 2.0 1.6080180098704495 -6539823.778270881
1 1.0 1.6078961072484168 -5260249.438047437
1 0.0 1.6077203434898446 -3980791.785582987
1 -1.0 1.6074532481818082 -2701506.9443550394
1 -2.0 1.6070175560418265 -1422494.3948380197
1 -3.0 1.6062307607212265 -143949.74970698575
1 -4.0 1.604561953357357 1133677.3737576925
1 -5.0 1.5997253356028722 2409024.642323177
2 -6.0 -1.608261495186274 -11658760.230930354
```
where the user knows that the first column is the configuration type, the second column is the value of $\mu_0 H_{\mathsf{ext}}$ in Tesla, the third column is the magnetic polarisation in Tesla and the last column is the energy density in J/m$^3$.

In particular, we observe no headers and the use of the space as separator rather than a comma.

In [18]:
import pandas as pd

Let us read this file via `pandas`:

In [19]:
with open("example.dat", "w") as f:
    f.write("""
1 10.0 1.6083568305976572 -16778187.088808443
1 9.0 1.6083393931987826 -15498304.121589921
1 8.0 1.6083184361075116 -14218436.37373519
1 7.0 1.608292941666901 -12938587.029585946
1 6.0 1.6082614950059932 -11658760.230932372
1 5.0 1.608222081883206 -10378961.467100028
1 4.0 1.6081717550468129 -9099198.173237378
1 3.0 1.6081060612831548 -7819480.6795090465
1 2.0 1.6080180098704495 -6539823.778270881
1 1.0 1.6078961072484168 -5260249.438047437
1 0.0 1.6077203434898446 -3980791.785582987
1 -1.0 1.6074532481818082 -2701506.9443550394
1 -2.0 1.6070175560418265 -1422494.3948380197
1 -3.0 1.6062307607212265 -143949.74970698575
1 -4.0 1.604561953357357 1133677.3737576925
1 -5.0 1.5997253356028722 2409024.642323177
2 -6.0 -1.608261495186274 -11658760.230930354
""")
df = pd.read_csv("example.dat", sep=" ", names=["configuration_type", "mu0_Hext", "Js", "energy_density"])
df

Unnamed: 0,configuration_type,mu0_Hext,Js,energy_density
0,1,10.0,1.608357,-16778190.0
1,1,9.0,1.608339,-15498300.0
2,1,8.0,1.608318,-14218440.0
3,1,7.0,1.608293,-12938590.0
4,1,6.0,1.608261,-11658760.0
5,1,5.0,1.608222,-10378960.0
6,1,4.0,1.608172,-9099198.0
7,1,3.0,1.608106,-7819481.0
8,1,2.0,1.608018,-6539824.0
9,1,1.0,1.607896,-5260249.0


To rewrite this file in the correct format, we can extract its entities by hand (converting to the right unit when necessary) and using `entities_to_csv`.
When doing so, one has to use the following information:
- Is it just a column of numbers (like `configuration_type` above)? Then you need to convert the `pandas.Series` to a NumPy array with the `to_numpy` method (see the cell below).
- What is this column representing? Look for the right ontology label in EMMO or in the MaMMoS ontology. Use this value in the field `ontology_label`.
- If this object is in the ontology, is the value expressed in a unit that is compatible with the ontology unit? For example, the `mu0_Hext` in this example is in Tesla but we want to use `ExternalMagneticField` which is expressed in Ampere per metre. So we will convert to the right unit first and then use it as `value`.
- What unit is your data written is? This might be useful if you are working, for example, with `kA/m` instead of `A/m`.

In [20]:
configuration_type = df["configuration_type"].to_numpy()
H = me.Entity(
    ontology_label="ExternalMagneticField",
    value=(df["mu0_Hext"].to_numpy() * u.T).to(u.A / u.m, equivalencies=u.magnetic_flux_field()),
    unit=u.A / u.m,
)
Js = me.Entity(
    ontology_label="MagneticPolarisation",
    value=df["Js"],
    unit=u.T,
)
energy_density = me.Entity(ontology_label="EnergyDensity", value=df["energy_density"], unit=u.J / u.m**3)

In [21]:
me.io.entities_to_csv("example.csv", configuration_type=configuration_type, H=H, Js=Js, energy_density=energy_density)

In [22]:
print(open("example.csv").read())  # noqa: SIM115

#,ExternalMagneticField,MagneticPolarisation,EnergyDensity
#,https://w3id.org/emmo/domain/magnetic_material#EMMO_da08f0d3-fe19-58bc-8fb6-ecc8992d5eb3,https://w3id.org/emmo#EMMO_74a096dd_cc83_4c7e_b704_0541620ff18d,https://w3id.org/emmo/domain/magnetic_material#EMMO_56258d3a-f2ee-554e-af99-499dd8620457
#,A / m,T,J / m3
configuration_type,H,Js,energy_density
1,7957747.150262763,1.6083568305976572,-16778187.088808443
1,7161972.435236487,1.6083393931987826,-15498304.12158992
1,6366197.72021021,1.6083184361075116,-14218436.37373519
1,5570423.005183934,1.608292941666901,-12938587.029585946
1,4774648.290157658,1.6082614950059932,-11658760.230932372
1,3978873.5751313814,1.608222081883206,-10378961.467100028
1,3183098.860105105,1.6081717550468129,-9099198.173237378
1,2387324.145078829,1.6081060612831548,-7819480.6795090465
1,1591549.4300525526,1.6080180098704495,-6539823.778270881
1,795774.7150262763,1.6078961072484168,-5260249.438047437
1,0.0,1.6077203434898446,-3980791.785582987
1,-795774.715