Documentation updates for recent API changes (#174)

Add more documentation on using the new TdmsFile.open method and prefer use of channel[:] to access data rather than channel.data.
adamreeve · Mar 31, 2020 · 06bec71 · 06bec71
1 parent a957c07
commit 06bec71
Show file tree

Hide file tree

Showing 7 changed files with 208 additions and 84 deletions.
diff --git a/README.rst b/README.rst
@@ -14,31 +14,47 @@ npTDMS
 
 npTDMS is a cross-platform Python package for reading and writing TDMS files as produced by LabVIEW,
 and is built on top of the `numpy <http://www.numpy.org/>`__ package.
-Data read from a TDMS file is stored in numpy arrays,
-and numpy arrays are also used when writing TDMS file.
+Data is read from TDMS files as numpy arrays,
+and npTDMS also allows writing numpy arrays to TDMS files.
+
+TDMS files are structured in a hierarchy of groups and channels.
+A TDMS file can contain multiple groups, which may each contain multiple channels.
+A file, group and channel may all have properties associated with them,
+but only channels have array data.
 
-TDMS files can contain multiple data channels organised intro groups.
 Typical usage when reading a TDMS file might look like::
 
     from nptdms import TdmsFile
 
     tdms_file = TdmsFile.read("path_to_file.tdms")
-    channel = tdms_file['Group']['Channel1']
-    data = channel.data
-    time = channel.time_track()
-    # do stuff with data
+    group = tdms_file['group name']
+    channel = group['channel name']
+    channel_data = channel[:]
+    channel_properties = channel.properties
+
+The ``TdmsFile.read`` method reads all data into memory immediately.
+When you are working with large TDMS files or don't need to read all channel data,
+you can instead use ``TdmsFile.open``. This is more memory efficient but
+accessing data can be slower::
+
+    with TdmsFile.open("path_to_file.tdms"):
+        group = tdms_file['group name']
+        channel = group['channel name']
+        channel_data = channel[:]
 
-And to write a file::
+npTDMS also has rudimentary support for writing TDMS files.
+Using npTDMS to write a TDMS file looks like::
 
     from nptdms import TdmsWriter, ChannelObject
     import numpy
 
     with TdmsWriter("path_to_file.tdms") as tdms_writer:
         data_array = numpy.linspace(0, 1, 10)
-        channel = ChannelObject('Group', 'Channel1', data_array)
+        channel = ChannelObject('group name', 'channel name', data_array)
         tdms_writer.write_segment([channel])
 
-For more information, see the `npTDMS documentation <http://nptdms.readthedocs.io>`__.
+For more detailed documentation on reading and writing TDMS files,
+see the `npTDMS documentation <http://nptdms.readthedocs.io>`__.
 
 Installation
 ------------
@@ -72,7 +88,7 @@ Limitations
 -----------
 
 This module doesn't support TDMS files with XML headers or with
-extended floating point data.
+extended precision floating point data.
 
 TDMS files support timestamps with a resolution of 2^-64 seconds but these
 are read as numpy datetime64 values with microsecond resolution.

diff --git a/docs/index.rst b/docs/index.rst
@@ -3,8 +3,8 @@ Welcome to npTDMS's documentation
 
 npTDMS is a cross-platform Python package for reading and writing TDMS files as produced by LabVIEW,
 and is built on top of the `numpy <http://www.numpy.org/>`__ package.
-Data read from a TDMS file is stored in numpy arrays,
-and numpy arrays are also used when writing TDMS file.
+Data is read from TDMS files as numpy arrays,
+and npTDMS also allows writing numpy arrays to TDMS files.
 
 Contents
 ========

diff --git a/docs/limitations.rst b/docs/limitations.rst
@@ -1,5 +1,5 @@
 Limitations
 ===========
 
-npTDMS currently doesn't support reading TDMS files with XML headers,
-or files with extended floating point data.
+npTDMS currently doesn't support reading TDMS files with XML headers (TDM files),
+or files with extended precision floating point data.
diff --git a/docs/quickstart.rst b/docs/quickstart.rst
@@ -28,14 +28,26 @@ Typical usage when reading a TDMS file might look like::
             # Access dictionary of properties:
             properties = channel.properties
             # Access numpy array of data for channel:
-            data = channel.data
-            # do stuff with data and properties...
+            data = channel[:]
+            # Access a subset of data
+            data_subset = channel[100:200]
 
 Or to access a channel by group name and channel name directly::
 
-    channel = tdms_file[group_name][channel_name]
+    group = tdms_file[group_name]
+    channel = group[channel_name]
 
-And to write a TDMS file::
+The ``TdmsFile.read`` method reads all data into memory immediately.
+When you are working with large TDMS files or don't need to read all channel data,
+you can instead use ``TdmsFile.open``. This is more memory efficient but
+accessing data can be slower::
+
+    with TdmsFile.open("path_to_file.tdms"):
+        channel = tdms_file[group_name][channel_name]
+        channel_data = channel[:]
+
+npTDMS also has rudimentary support for writing TDMS files.
+Using npTDMS to write a TDMS file looks like::
 
     from nptdms import TdmsWriter, ChannelObject
     import numpy

diff --git a/docs/reading.rst b/docs/reading.rst
@@ -2,30 +2,45 @@ Reading TDMS files
 ==================
 
 To read a TDMS file, create an instance of the :py:class:`~nptdms.TdmsFile`
-class using the static :py:meth:`~nptdms.TdmsFile.read` method, passing the path to the file, or an already opened file::
+class using one of the static :py:meth:`nptdms.TdmsFile.read` or :py:meth:`nptdms.TdmsFile.open` methods,
+passing the path to the file, or an already opened file.
+The :py:meth:`~nptdms.TdmsFile.read` method will read all channel data immediately::
 
     tdms_file = TdmsFile.read("my_file.tdms")
 
-This will read all of the contents of the TDMS file, then groups within the file
-can be accessed using the
-:py:meth:`~nptdms.TdmsFile.groups` method, or by indexing into the file with a group name::
+If using the :py:meth:`~nptdms.TdmsFile.open` method, only the file metadata will be read initially,
+and the returned :py:class:`~nptdms.TdmsFile` object should be used as a context manager to keep
+the file open and allow channel data to be read on demand::
+
+    with TdmsFile.open("my_file.tdms") as tdms_file:
+        # Use tdms_file
+        ...
+
+Using an instance of :py:class:`~nptdms.TdmsFile`, groups within the file
+can be accessed by indexing into the file with a group name, or all groups
+can be retrieved as a list with the :py:meth:`~nptdms.TdmsFile.groups` method::
 
-    all_groups = tdms_file.groups()
     group = tdms_file["group name"]
+    all_groups = tdms_file.groups()
 
 A group is an instance of the :py:class:`~nptdms.TdmsGroup` class,
-and can contain multiple channels of data. You can access channels in a group with the
-:py:meth:`~nptdms.TdmsGroup.channels` method or by indexing into the group with a channel name::
+and can contain multiple channels of data.
+You can access channels in a group by indexing into the group with a channel name
+or retrieve all channels as a list with the :py:meth:`~nptdms.TdmsGroup.channels` method::
 
-    all_group_channels = group.channels()
     channel = group["channel name"]
+    all_group_channels = group.channels()
 
 Channels are instances of the :py:class:`~nptdms.TdmsChannel` class
-and have a ``data`` attribute for accessing the channel data as a numpy array::
+and act like arrays. They can be indexed with an integer index to retrieve
+a single value or with a slice to retrieve all data or a subset of data
+as a numpy array::
 
-    data = channel.data
+    all_channel_data = channel[:]
+    data_subset = channel[100:200]
+    first_channel_value = channel[0]
 
-If the array is waveform data and has the ``wf_start_offset`` and ``wf_increment``
+If the channel contains waveform data and has the ``wf_start_offset`` and ``wf_increment``
 properties, you can get an array of relative time values for the data using the
 :py:meth:`~nptdms.TdmsChannel.time_track` method::
 
@@ -36,13 +51,13 @@ you can pass ``absolute_time=True`` to get an array of absolute times in UTC.
 
 A TDMS file, group and channel can all have properties associated with them, so each of the
 :py:class:`~nptdms.TdmsFile`, :py:class:`~nptdms.TdmsGroup` and :py:class:`~nptdms.TdmsChannel`
-classes provide access to these properties as a dictionary using their ``properties`` property::
+classes provide access to these properties as a dictionary using their ``properties`` attribute::
 
     # Iterate over all items in the file properties and print them
     for name, value in tdms_file.properties.items():
         print("{0}: {1}".format(name, value))
 
-    # Get a single property value
+    # Get a single property value from the file
     property_value = tdms_file.property("my_property_name")
 
     # Get a group property
@@ -55,34 +70,21 @@ Reading large files
 -------------------
 
 TDMS files are often too large to easily fit in memory so npTDMS offers a few ways to deal with this.
-
-If you want to work with all file data as if it was in memory,
-you can pass the ``memmap_dir`` argument when reading a file.
-This will read data into memory mapped numpy arrays on disk,
-and your operating system will then page data in and out of memory as required.
-
-If you have a large file with multiple channels but only need to read them individually,
-you can open a TDMS file for reading without reading all the data immediately
+A TDMS file can be opened for reading without reading all the data immediately
 using the static :py:meth:`~nptdms.TdmsFile.open` method,
-then read channel data as required::
+then channel data is read as required::
 
     with TdmsFile.open(tdms_file_path) as tdms_file:
         channel = tdms_file[group_name][channel_name]
-        channel_data = channel.read_data()
-
-You also have the option to read only a subset of the data.
-For example, to read 200 data points, beginning at offset 1,000::
-
-    with TdmsFile.open(tdms_file_path) as tdms_file:
-        channel = tdms_file[group_name][channel_name]
-        offset = 1000
-        length = 200
-        channel_data = channel.read_data(offset, length)
-
-Alternatively, you may have an application where you wish to stream all data chunk by chunk.
-:py:meth:`~nptdms.TdmsFile.data_chunks` is a generator that produces instances of
-:py:class:`~nptdms.DataChunk`, which can be used after opening a TDMS file with
-:py:meth:`~nptdms.TdmsFile.open`.
+        all_channel_data = channel[:]
+        data_subset = channel[100:200]
+
+TDMS files are written in multiple segments, where each segment can in turn have
+multiple chunks of data.
+When accessing a value or a slice of data in a channel, npTDMS will read whole chunks at a time.
+npTDMS also allows streaming data from a file chunk by chunk using
+:py:meth:`nptdms.TdmsFile.data_chunks`. This is a generator that produces instances of
+:py:class:`~nptdms.DataChunk`.
 For example, to compute the mean of a channel::
 
     channel_sum = 0.0
@@ -94,19 +96,27 @@ For example, to compute the mean of a channel::
             channel_sum += channel_chunk[:].sum()
     channel_mean = channel_sum / channel_length
 
-This approach can also be useful to stream TDMS data to another format on disk or into a data store.
-It's also possible to stream data chunks for a single channel using :py:meth:`~nptdms.TdmsChannel.data_chunks`::
+This approach can be useful to stream TDMS data to another format on disk or into a data store.
+It's also possible to stream data chunks for a single channel using :py:meth:`nptdms.TdmsChannel.data_chunks`::
 
     with TdmsFile.open(tdms_file_path) as tdms_file:
         channel = tdms_file[group_name][channel_name]
         for chunk in channel.data_chunks():
             channel_chunk_data = chunk[:]
 
-In cases where you don't need to read the file data and only need to read metadata, you can
+If you don't need to read the channel data at all and only need to read metadata, you can
 also use the static :py:meth:`~nptdms.TdmsFile.read_metadata` method::
 
     tdms_file = TdmsFile.read_metadata(tdms_file_path)
 
+In cases where you need to work with large arrays of channel data as if all data was in memory,
+you can also pass the ``memmap_dir`` argument when reading a file.
+This will read data into memory mapped numpy arrays on disk,
+and your operating system will then page data in and out of memory as required::
+
+    with tempfile.TemporaryDirectory() as temp_memmap_dir:
+        tdms_file = TdmsFile.read(tdms_file_path, memmap_dir=temp_memmap_dir)
+
 Timestamps
 ----------
 
@@ -115,7 +125,7 @@ Note that TDMS files are capable of storing times with a precision of 2 :sup:`-6
 so some precision is lost when reading them in npTDMS.
 
 Timestamps in TDMS files are stored in UTC time and npTDMS does not do any timezone conversions.
-If you would like to convert a time from a TDMS file to your local timezone,
+If timestamps need to be converted to the local timezone,
 the arrow package is recommended. For example::
 
     import datetime
@@ -132,12 +142,22 @@ Scaled data
 -----------
 
 The TDMS format supports different ways of scaling data, and DAQmx raw data in particular is usually scaled.
-The :py:attr:`~nptdms.TdmsChannel.data` property of the channel returns this scaled data.
-You can additionally use the :py:attr:`~nptdms.TdmsChannel.raw_data` property to access the unscaled data.
+The data retrieved from a :py:attr:`~nptdms.TdmsChannel` has scaling applied.
+If you have opened a TDMS file with :py:meth:`~nptdms.TdmsFile.read`,
+you can access the raw unscaled data with the :py:attr:`~nptdms.TdmsChannel.raw_data` property of a channel.
 Note that DAQmx channels may have multiple raw scalers rather than a single raw data channel,
 in which case you need to use the :py:attr:`~nptdms.TdmsChannel.raw_scaler_data`
 property to access the raw data as a dictionary of scaler id to raw data array.
 
+When you've opened a TDMS file with :py:meth:`~nptdms.TdmsFile.open`, you instead need to use
+:py:attr:`~nptdms.TdmsChannel.read_data`, passing ``scaled=False``::
+
+    with TdmsFile.open(tdms_file_path) as tdms_file:
+        channel = tdms_file[group_name][channel_name]
+        unscaled_data = channel.read_data(scaled=False)
+
+This will return an array of raw data, or a dictionary of scaler id to raw scaler data for DAQmx data.
+
 Conversion to other formats
 ---------------------------
 

diff --git a/docs/writing.rst b/docs/writing.rst
@@ -31,15 +31,15 @@ instance of one of:
 - :py:class:`nptdms.TdmsGroup` or :py:class:`nptdms.TdmsChannel`.
   A TDMS object that was read from a TDMS file using :py:class:`nptdms.TdmsFile`.
 
-Each of ``RootObject``, ``GroupObject`` and ``ChannelObject``
+Each of :py:class:`~nptdms.RootObject`, :py:class:`~nptdms.GroupObject` and :py:class:`~nptdms.ChannelObject`
 may optionally have properties associated with them, which
 are passed into the ``__init__`` method as a dictionary.
 The data types supported as property values are:
 
 - Integers
 - Floating point values
 - Strings
-- datetime objects
+- datetime or numpy datetime64 objects
 - Boolean values
 
 For more control over the data type used to represent a property value, for example
@@ -63,10 +63,15 @@ is given below::
     channel_object = ChannelObject("group_1", "channel_1", data, properties={})
 
     with TdmsWriter("my_file.tdms") as tdms_writer:
+        # Write first segment
         tdms_writer.write_segment([
             root_object,
             group_object,
             channel_object])
+        # Write another segment with more data for the same channel
+        more_data = np.array([6.0, 7.0, 8.0, 9.0, 10.0])
+        channel_object = ChannelObject("group_1", "channel_1", more_data, properties={})
+        tdms_writer.write_segment([channel_object])
 
 You could also read a TDMS file and then re-write it by passing
 :py:class:`~nptdms.TdmsGroup` and :py:class:`~nptdms.TdmsChannel`