MSL-IO follows the data model used by HDF5 to read and write data files -- where there is a ~msl.io.base.Root
, msl-io-group
s and msl-io-dataset
s and these objects each have msl-io-metadata
associated with them.
The tree structure is similar to the file-system structure used by operating systems. msl-io-group
s are analogous to the directories (where ~msl.io.base.Root
is the root msl-io-group
) and msl-io-dataset
s are analogous to the files.
The data files that can be read or created are not restricted to HDF5 files, but any file format that has a Reader <io-readers>
implemented can be read and data files can be created using any of the Writers <io-writers>
.
msl-io-write
msl-io-read
msl-io-convert
msl-io-read-table
Suppose you want to create a new HDF5 file. We first create an instance of ~msl.io.writers.hdf5.HDF5Writer
>>> from msl.io import HDF5Writer
>>> h5 = HDF5Writer()
then we can add ~msl.io.metadata.Metadata
to the ~msl.io.base.Root
,
>>> h5.add_metadata(one=1, two=2)
create a ~msl.io.dataset.Dataset
in the ~msl.io.base.Root
,
>>> dataset1 = h5.create_dataset('dataset1', data=[1, 2, 3, 4])
create a ~msl.io.group.Group
in the ~msl.io.base.Root
,
>>> my_group = h5.create_group('my_group')
and create a ~msl.io.dataset.Dataset
in my_group
>>> dataset2 = my_group.create_dataset('dataset2', data=[[1, 2], [3, 4]], three=3)
Finally, we write the file
>>> h5.write(file='my_file.h5')
Note
The file is not created until you call the ~msl.io.base.Writer.write
or ~msl.io.base.Writer.save
method.
The ~msl.io.read
function is available to read a file. Provided that a Reader <io-readers>
exists to read the file a ~msl.io.base.Root
object is returned. We will read the file that we created above.
>>> from msl.io import read
>>> root = read('my_file.h5')
You can print a representation of all ~msl.io.group.Group
s and ~msl.io.dataset.Dataset
s in the ~msl.io.base.Root
by calling the ~msl.io.base.Root.tree
method
>>> print(root.tree())
<HDF5Reader 'my_file.h5' (1 groups, 2 datasets, 2 metadata)>
<Dataset '/dataset1' shape=(4,) dtype='<f8' (0 metadata)>
<Group '/my_group' (0 groups, 1 datasets, 0 metadata)>
<Dataset '/my_group/dataset2' shape=(2, 2) dtype='<f8' (1 metadata)>
Since the root object is a msl-io-group
(which operates like a Python dict
) you can iterate over the items that are in the file using
>>> for name, value in root.items():
... print('{!r} -- {!r}'.format(name, value))
'/dataset1' -- <Dataset '/dataset1' shape=(4,) dtype='<f8' (0 metadata)>
'/my_group' -- <Group '/my_group' (0 groups, 1 datasets, 0 metadata)>
'/my_group/dataset2' -- <Dataset '/my_group/dataset2' shape=(2, 2) dtype='<f8' (1 metadata)>
where value will either be a ~msl.io.group.Group
or a ~msl.io.dataset.Dataset
.
You can iterate over the msl-io-group
s that are in the file
>>> for group in root.groups():
... print(group)
<Group '/my_group' (0 groups, 1 datasets, 0 metadata)>
or iterate over the msl-io-dataset
s
>>> for dataset in root.datasets():
... print(repr(dataset))
<Dataset '/dataset1' shape=(4,) dtype='<f8' (0 metadata)>
<Dataset '/my_group/dataset2' shape=(2, 2) dtype='<f8' (1 metadata)>
You can access the msl-io-metadata
of any object through the ~msl.io.vertex.Vertex.metadata
attribute
>>> root.metadata
<Metadata '/' {'one': 1, 'two': 2}>
You can access values of the msl-io-metadata
as attributes
>>> root.metadata.one
1
>>> dataset2.metadata.three
3
or as keys
>>> root.metadata['two']
2
>>> dataset2.metadata['three']
3
When root is returned it is accessed in read-only mode
>>> root.read_only
True
>>> for name, value in root.items():
... print('is {!r} in read-only mode? {}'.format(name, value.read_only))
is '/dataset1' in read-only mode? True
is '/my_group' in read-only mode? True
is '/my_group/dataset2' in read-only mode? True
If you want to edit the ~msl.io.metadata.Metadata
for root, or modify any ~msl.io.group.Group
s or ~msl.io.dataset.Dataset
s in root, then you must first set the object to be editable. Setting the read-only mode of root propagates that mode to all items within root. For example,
>>> root.read_only = False
will make root and all msl-io-group
s and all msl-io-dataset
s within root to be editable
>>> root.read_only
False
>>> for name, value in root.items():
... print('is {!r} in read-only mode? {}'.format(name, value.read_only))
is '/dataset1' in read-only mode? False
is '/my_group' in read-only mode? False
is '/my_group/dataset2' in read-only mode? False
You can make only a specific object (and it's descendants) editable as well. You can make my_group and dataset2 to be in read-only mode by the following (recall that root behaves like a Python dict
)
>>> root['my_group'].read_only = True
and this will keep root and dataset1 in editable mode, but change my_group and dataset2 to be in read-only mode
>>> root.read_only
False
>>> for name, value in root.items():
... print('is {!r} in read-only mode? {}'.format(name, value.read_only))
is '/dataset1' in read-only mode? False
is '/my_group' in read-only mode? True
is '/my_group/dataset2' in read-only mode? True
You can access the msl-io-group
s and msl-io-dataset
s as keys or as class attributes
>>> root['my_group']['dataset2'].shape
(2, 2)
>>> root.my_group.dataset2.shape
(2, 2)
See attribute-key-limitations
for more information.
You can convert between file formats using any of the Writers <io-writers>
. Suppose you had an HDF5 file and you wanted to convert it to the JSON format
>>> from msl.io import JSONWriter
>>> h5 = read('my_file.h5')
>>> writer = JSONWriter('my_file.json')
>>> writer.write(root=h5)
The ~msl.io.read_table
function is available to read a table from a file.
A table has the following properties:
- The first row is a header.
- All rows have the same number of columns.
- All data values in a column have the same data type.
The returned object is a ~msl.io.dataset.Dataset
with the header provided as metadata.
Suppose a file called my_table.csv contains the following information
x, | y, | z |
---|---|---|
1, | 2, | 3 |
4, | 5, | 6 |
7, | 8, | 9 |
You can read this file and interact with the data using the following
>>> from msl.io import read_table
>>> csv = read_table('my_table.csv')
>>> csv
<Dataset 'my_table.csv' shape=(3, 3) dtype='<f8' (1 metadata)>
>>> csv.metadata
<Metadata 'my_table.csv' {'header': array(['x', 'y', 'z'], dtype='<U1')}>
>>> csv.data
array([[1., 2., 3.],
[4., 5., 6.],
[7., 8., 9.]])
>>> csv.max()
9.0
You can read a table from a text-based file or from an Excel spreadsheet.
Install <install> Group <group> Dataset <dataset> Metadata <metadata> readers writers API <api_docs> attribute_access License <license> Authors <authors> Release Notes <changelog>
modindex