Skip to content

Latest commit

 

History

History
338 lines (237 loc) · 9.09 KB

index.rst

File metadata and controls

338 lines (237 loc) · 9.09 KB

MSL-IO

MSL-IO follows the data model used by HDF5 to read and write data files -- where there is a ~msl.io.base.Root, msl-io-groups and msl-io-datasets and these objects each have msl-io-metadata associated with them.

image

The tree structure is similar to the file-system structure used by operating systems. msl-io-groups are analogous to the directories (where ~msl.io.base.Root is the root msl-io-group) and msl-io-datasets are analogous to the files.

The data files that can be read or created are not restricted to HDF5 files, but any file format that has a Reader <io-readers> implemented can be read and data files can be created using any of the Writers <io-writers>.

Getting Started

  • msl-io-write
  • msl-io-read
  • msl-io-convert
  • msl-io-read-table

Write a file

Suppose you want to create a new HDF5 file. We first create an instance of ~msl.io.writers.hdf5.HDF5Writer

>>> from msl.io import HDF5Writer
>>> h5 = HDF5Writer()

then we can add ~msl.io.metadata.Metadata to the ~msl.io.base.Root,

>>> h5.add_metadata(one=1, two=2)

create a ~msl.io.dataset.Dataset in the ~msl.io.base.Root,

>>> dataset1 = h5.create_dataset('dataset1', data=[1, 2, 3, 4])

create a ~msl.io.group.Group in the ~msl.io.base.Root,

>>> my_group = h5.create_group('my_group')

and create a ~msl.io.dataset.Dataset in my_group

>>> dataset2 = my_group.create_dataset('dataset2', data=[[1, 2], [3, 4]], three=3)

Finally, we write the file

>>> h5.write(file='my_file.h5')

Note

The file is not created until you call the ~msl.io.base.Writer.write or ~msl.io.base.Writer.save method.

Read a file

The ~msl.io.read function is available to read a file. Provided that a Reader <io-readers> exists to read the file a ~msl.io.base.Root object is returned. We will read the file that we created above.

>>> from msl.io import read
>>> root = read('my_file.h5')

You can print a representation of all ~msl.io.group.Groups and ~msl.io.dataset.Datasets in the ~msl.io.base.Root by calling the ~msl.io.base.Root.tree method

>>> print(root.tree())
<HDF5Reader 'my_file.h5' (1 groups, 2 datasets, 2 metadata)>
  <Dataset '/dataset1' shape=(4,) dtype='<f8' (0 metadata)>
  <Group '/my_group' (0 groups, 1 datasets, 0 metadata)>
    <Dataset '/my_group/dataset2' shape=(2, 2) dtype='<f8' (1 metadata)>

Since the root object is a msl-io-group (which operates like a Python dict) you can iterate over the items that are in the file using

>>> for name, value in root.items():
...     print('{!r} -- {!r}'.format(name, value))
'/dataset1' -- <Dataset '/dataset1' shape=(4,) dtype='<f8' (0 metadata)>
'/my_group' -- <Group '/my_group' (0 groups, 1 datasets, 0 metadata)>
'/my_group/dataset2' -- <Dataset '/my_group/dataset2' shape=(2, 2) dtype='<f8' (1 metadata)>

where value will either be a ~msl.io.group.Group or a ~msl.io.dataset.Dataset.

You can iterate over the msl-io-groups that are in the file

>>> for group in root.groups():
...     print(group)
<Group '/my_group' (0 groups, 1 datasets, 0 metadata)>

or iterate over the msl-io-datasets

>>> for dataset in root.datasets():
...     print(repr(dataset))
<Dataset '/dataset1' shape=(4,) dtype='<f8' (0 metadata)>
<Dataset '/my_group/dataset2' shape=(2, 2) dtype='<f8' (1 metadata)>

You can access the msl-io-metadata of any object through the ~msl.io.vertex.Vertex.metadata attribute

>>> root.metadata
<Metadata '/' {'one': 1, 'two': 2}>

You can access values of the msl-io-metadata as attributes

>>> root.metadata.one
1
>>> dataset2.metadata.three
3

or as keys

>>> root.metadata['two']
2
>>> dataset2.metadata['three']
3

When root is returned it is accessed in read-only mode

>>> root.read_only
True
>>> for name, value in root.items():
...     print('is {!r} in read-only mode? {}'.format(name, value.read_only))
is '/dataset1' in read-only mode? True
is '/my_group' in read-only mode? True
is '/my_group/dataset2' in read-only mode? True

If you want to edit the ~msl.io.metadata.Metadata for root, or modify any ~msl.io.group.Groups or ~msl.io.dataset.Datasets in root, then you must first set the object to be editable. Setting the read-only mode of root propagates that mode to all items within root. For example,

>>> root.read_only = False

will make root and all msl-io-groups and all msl-io-datasets within root to be editable

>>> root.read_only
False
>>> for name, value in root.items():
...     print('is {!r} in read-only mode? {}'.format(name, value.read_only))
is '/dataset1' in read-only mode? False
is '/my_group' in read-only mode? False
is '/my_group/dataset2' in read-only mode? False

You can make only a specific object (and it's descendants) editable as well. You can make my_group and dataset2 to be in read-only mode by the following (recall that root behaves like a Python dict)

>>> root['my_group'].read_only = True

and this will keep root and dataset1 in editable mode, but change my_group and dataset2 to be in read-only mode

>>> root.read_only
False
>>> for name, value in root.items():
...     print('is {!r} in read-only mode? {}'.format(name, value.read_only))
is '/dataset1' in read-only mode? False
is '/my_group' in read-only mode? True
is '/my_group/dataset2' in read-only mode? True

You can access the msl-io-groups and msl-io-datasets as keys or as class attributes

>>> root['my_group']['dataset2'].shape
(2, 2)
>>> root.my_group.dataset2.shape
(2, 2)

See attribute-key-limitations for more information.

Convert a file

You can convert between file formats using any of the Writers <io-writers>. Suppose you had an HDF5 file and you wanted to convert it to the JSON format

>>> from msl.io import JSONWriter
>>> h5 = read('my_file.h5')
>>> writer = JSONWriter('my_file.json')
>>> writer.write(root=h5)

Read data in a table

The ~msl.io.read_table function is available to read a table from a file.

A table has the following properties:

  1. The first row is a header.
  2. All rows have the same number of columns.
  3. All data values in a column have the same data type.

The returned object is a ~msl.io.dataset.Dataset with the header provided as metadata.

Suppose a file called my_table.csv contains the following information

x, y, z
1, 2, 3
4, 5, 6
7, 8, 9

You can read this file and interact with the data using the following

>>> from msl.io import read_table
>>> csv = read_table('my_table.csv')
>>> csv
<Dataset 'my_table.csv' shape=(3, 3) dtype='<f8' (1 metadata)>
>>> csv.metadata
<Metadata 'my_table.csv' {'header': array(['x', 'y', 'z'], dtype='<U1')}>
>>> csv.data
array([[1., 2., 3.],
       [4., 5., 6.],
       [7., 8., 9.]])
>>> csv.max()
9.0

You can read a table from a text-based file or from an Excel spreadsheet.

Contents

Install <install> Group <group> Dataset <dataset> Metadata <metadata> readers writers API <api_docs> attribute_access License <license> Authors <authors> Release Notes <changelog>

Index

  • modindex