# How to handle WeldX files
In this notebook we will demonstrate how to create, read, and update ASDF files created by WeldX. All the needed funcationality is contained in a single class named `WeldxFile`. We are going to show different modes of operation, like working with physical files on your harddrive, and in-memory files, both read-only and read-write mode.

## Imports
The WeldxFile class is being imported from the top-level of the weldx package.

In [None]:
from weldx import WeldxFile
import numpy as np
from datetime import datetime

## Basic operations
Now we create our first file, by invoking the `WeldxFile` constructor without any additional arguments. By doing so, we create an in-memory file. This means, that your changes will be temporary until you write it to an actual file on your harddrive. The `file_handle` attribute will point to the actual underlying file. In this case it is the in-memory file or buffer as shown below.

In [None]:
file = WeldxFile()
file.file_handle

Next we assign some dictionary like data to the file, by storing it some attribute name enclosed by square brackets.
Then we look at the representation of the file header or contents. This will depend on the execution environment.
In JupyterLab you will see an interactive tree like structure, which can be expanded and searched.
The root of the tree is denoted as "root" followed by children created by the ASDF library "asdf_library" and "history". We attached the additional child "some_data" with our assignment.

Note, that here we are using some very common types, namely an NumPy array and a timestamp. For weldx specialized types like the coordinates system manager, (welding) measurements etc., the weldx package provides ASDF extensions to handle those types automatically during loading and saving ASDF data. You do not need to worry about them. If you try to save types, which cannot be handled by ASDF, you will trigger an error.

In [None]:
file["some_data"] = {"data_set": {"data": np.random.random(100),
                                  "time": datetime.now()}}
file

We could also have created the same structure in one step:

In [None]:
file = WeldxFile(tree={"data_sets": {"first": np.random.random(100),
                                     "time": "now"}},
                 mode="rw")
file

You might have noticed, that we got a warning about the in-memory operation during showing the file in Jupyter.
Now we have passed the additional argument mode="rw", which indiciates, that we want to perform write operations just in memory,
or alternatively to the passed physical file. So this warning went away.

We can use all dictionary operations on the data we like, e.g. update, assign, and delete items.

In [None]:
file["data_sets"]["second"] =  {"data": np.random.random(100),
                                "time": datetime.now()}

# delete the first data set again:

del file["data_sets"]["first"]
file

We can also iterate over all keys as usual. You can also have a look at the documentation of the builtin type `dict` for a complete overview of its features. 

In [None]:
for key, value in file.items():
    print(key, value)

In order to make your changes persistent, we are going to save the memory-backed file to disk by invoking `WeldxFile.write_to`.

In [None]:
file.write_to("example.asdf")

This newly created file can be opened up again, in read-write mode like by passing the appropriate arguments.

In [None]:
example = WeldxFile("example.asdf", mode="rw")
example["updated"] = True
example.close()

Note, that we closed the file here explictly. Before closing, we wanted to write a simple item to tree. But lets see what happens, if we open the file once again.

In [None]:
example = WeldxFile("example.asdf", mode='rw')
example

As you see the `updated` state has been written, because we closed the file properly. If we omit closing the file, 
our changes would be lost when the object runs out of scope or Python terminates.

## Handling updates within a context manager
To ensure you will not forget to update your file after making changes, 
we are able to enclose our file-chaning operations within a context manager.
This ensures that all operations done in this context (the `with` block) are being written to the file, once the context is left.
Note that the underlying file is also closed after the context ends. This is useful, when you have to update lots of files, as there is a limited amount of files an operating system can deal with.

In [None]:
# FIXME: this logic is broken!
with WeldxFile("example.asdf", mode='rw') as example:
    example["updated"] = True
    # now the context ends, and the file is being saved to disk again.

# If you access the file now to obtain a key, you will trigger an error.
try:
    print(example["updated"])
except Exception as e:
    print(e)

Let us inspect the file once again, to see whether our `updated` item has been correctly written. 

In [None]:
WeldxFile("example.asdf")

In case an error got triggered (e.g. an exception has been raised), the underlying file is.  

In [None]:
with WeldxFile("example.asdf", mode="rw") as file:
    file["updated"] = False
    raise Exception("oh no")

In [None]:
WeldxFile("example.asdf")

## Handeling of custom schemas
An important aspect of WeldX or ASDF files is, that you can validate them to comply with a defined schema. A schema defines required and optional attributes a tree structure has to provide to pass the schema validation. Further the types of these attributes can be defined, e.g. the data attribute should be a NumPy array, or a timestamp should be of type `pandas.Timestamp`.
There are several schemas provided by WeldX, which can be used by passing them to the `schema` argument. It is expected to be a path-like type, so a string (`str`) or `pathlib.Path` is accepted. The provided utility function `get_schema_path` returns the path to named schema. So its output can directly be used in WeldxFile(schema=...)

In [None]:
from weldx.asdf.util import get_schema_path

In [None]:
schema = get_schema_path("single_pass_weld*")
schema

This schema defines a complete experimental setup with measurement data, e.g requires the following attributes to be defined in our tree:
  - workpiece
  - TCP
  - welding_current
  - welding_voltage
  - measurements
  - equipment

We use a testing function to provide this data now, and validate it against the schema.

In [None]:
from weldx.asdf.cli.welding_schema import single_pass_weld_example
_, single_pass_weld_data = single_pass_weld_example(out_file=None)
display(single_pass_weld_data)

That is a lot of data, containing complex data structures and objects describing the whole experiment including measurement data.
We can now create new `WeldxFile` and validate the data against the schema.

In [None]:
WeldxFile(tree=single_pass_weld_data, custom_schema=schema, mode='rw')

But what would happen, if we forget an import attribute? Lets have a closer look...

In [None]:
# simulate we forgot something important, so we delete the workpiece:
del single_pass_weld_data["workpiece"]

# now create the file again, and see what happens:
try:
    WeldxFile(tree=single_pass_weld_data, custom_schema=schema, mode='rw')
except Exception as e:
    display(e)

We receive a ValidationError from the ASDF library, which tells us exactly what the missing information is. The same will happen, if we accidentially pass the wrong type.

In [None]:
# simulate a wrong type by changing it to a NumPy array.
single_pass_weld_data["welding_current"] = np.zeros(10)

# now create the file again, and see what happens:
try:
    WeldxFile(tree=single_pass_weld_data, custom_schema=schema, mode='rw')
except Exception as e:
    display(e)

Here we see, that a "signal" tag is expected, but a "asdf/core/ndarray-1.0.0" was received. 
The ASDF library assignes tags to certain types to handle their storage in the file format. 
As shown, the "signal" tag is contained in "weldx/measurement" container, provided by "weldx.bam.de". The tags and schemas