Skip to content

Commit

Permalink
Update documentations for data module (#117)
Browse files Browse the repository at this point in the history
Update docs & docstrings for RecordData
  • Loading branch information
huzecong committed Jul 23, 2019
1 parent bf2d655 commit ddfd225
Show file tree
Hide file tree
Showing 5 changed files with 237 additions and 171 deletions.
82 changes: 59 additions & 23 deletions docs/code/data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,11 @@ Vocabulary

:hidden:`SpecialTokens`
~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: texar.data.SpecialTokens
:members:

:hidden:`Vocab`
~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: texar.data.Vocab
:members:

Expand All @@ -31,7 +29,6 @@ Embedding

:hidden:`Embedding`
~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: texar.data.Embedding
:members:

Expand All @@ -41,79 +38,118 @@ Embedding

:hidden:`load_glove`
~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: texar.data.load_glove

Data
==========

Data Sources
==============

:hidden:`DataSource`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: texar.data.DataSource
:members:

:hidden:`SequenceDataSource`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: texar.data.SequenceDataSource
:members:

:hidden:`IterDataSource`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: texar.data.IterDataSource
:members:

:hidden:`ZipDataSource`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: texar.data.ZipDataSource
:members:

:hidden:`FilterDataSource`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: texar.data.FilterDataSource
:members:

:hidden:`RecordDataSource`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: texar.data.RecordDataSource
:members:

:hidden:`TextLineDataSource`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: texar.data.TextLineDataSource
:members:

:hidden:`PickleDataSource`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: texar.data.PickleDataSource
:members:



Data Loaders
=============

:hidden:`DataBase`
~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: texar.data.DataBase
:members:

.. automethod:: process

.. automethod:: collate

:hidden:`MonoTextData`
~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: texar.data.MonoTextData
:members:
:inherited-members:
:exclude-members: make_vocab,make_embedding
:exclude-members: make_vocab,make_embedding,process,collate

:hidden:`PairedTextData`
~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: texar.data.PairedTextData
:members:
:inherited-members:
:exclude-members: make_vocab,make_embedding
:exclude-members: make_vocab,make_embedding,process,collate

:hidden:`ScalarData`
~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: texar.data.ScalarData
:members:
:inherited-members:
:exclude-members: process,collate

:hidden:`MultiAlignedData`
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: texar.data.MultiAlignedData
:members:
:inherited-members:
:exclude-members: make_vocab,make_embedding,process,collate,to

:hidden:`RecordData`
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: texar.data.RecordData
:members:
:exclude-members: process,collate

Data Iterators
===============

:hidden:`DataIterator`
~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: texar.data.DataIterator
:members:

:hidden:`TrainTestDataIterator`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: texar.data.TrainTestDataIterator
:members:

:hidden:`BatchingStrategy`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: texar.data.BatchingStrategy
:members:

:hidden:`TokenCountBatchingStrategy`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: texar.data.TokenCountBatchingStrategy
:members:

:exclude-members: reset_batch,add_example


Data Utilities
Expand Down
49 changes: 43 additions & 6 deletions texar/data/data/data_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
class DataSource(Generic[RawExample], ABC):
r"""Base class for all datasets. Different to PyTorch
:class:`~torch.utils.data.Dataset`, subclasses of this class are not
required to implement `__getitem__` (default implementation raises
required to implement :meth:`__getitem__` (default implementation raises
`TypeError`), which is beneficial for certain sources that only supports
iteration (reading from text files, reading Python iterators, etc.)
"""
Expand All @@ -63,6 +63,12 @@ def __len__(self) -> int:

class SequenceDataSource(DataSource[RawExample]):
r"""Data source for reading from Python sequences.
This data source supports indexing.
Args:
sequence: The Python sequence to read from. Note that a sequence should
be iterable and supports `len`.
"""

def __init__(self, sequence: Sequence[RawExample]):
Expand All @@ -82,6 +88,11 @@ class IterDataSource(DataSource[RawExample]):
r"""Data source for reading from Python iterables. Please note: if passed
an *iterator* and caching strategy is set to 'none', then the data source
can only be iterated over once.
This data source does not support indexing.
Args:
iterable: The Python iterable to read from.
"""

def __init__(self, iterable: Iterable[RawExample]):
Expand All @@ -92,7 +103,15 @@ def __iter__(self) -> Iterator[RawExample]:


class ZipDataSource(DataSource[Tuple[RawExample, ...]]):
r"""Data source by combining multiple sources.
r"""Data source by combining multiple sources. The raw examples returned
from this data source are tuples, with elements being raw examples from each
of the constituting data sources.
This data source supports indexing if all the constituting data sources
support indexing.
Args:
sources: The list of data sources to combine.
"""

def __init__(self, *sources: DataSource[RawExample]):
Expand All @@ -109,9 +128,18 @@ def __len__(self) -> int:


class FilterDataSource(DataSource[RawExample]):
r"""Data source for filtering raw example with user-specified filter
function. Only those examples for which the filter functions returns
`True` are returned.
r"""Data source for filtering raw examples with a user-specified filter
function. Only examples for which the filter functions returns `True` are
returned.
This data source supports indexing if the wrapped data source supports
indexing.
Args:
source: The data source to filter.
filter_fn: A callable taking a raw example as argument and returning a
boolean value, indicating whether the raw example should be
**kept**.
"""

def __init__(self, source: DataSource[RawExample],
Expand All @@ -126,7 +154,16 @@ def __iter__(self) -> Iterator[RawExample]:


class RecordDataSource(DataSource[Dict[str, RawExample]]):
r"""Data source by structuring multiple source.
r"""Data source by structuring multiple sources. The raw examples returned
from this data source are dictionaries, with values being raw examples from
each of the constituting data sources.
This data source supports indexing if all the constituting data sources
support indexing.
Args:
sources: A dictionary mapping names to data sources, containing the
data sources to combine.
"""

def __init__(self, sources: Dict[str, DataSource[RawExample]]):
Expand Down
4 changes: 2 additions & 2 deletions texar/data/data/multi_aligned_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,8 +103,8 @@ class MultiAlignedData(
the "datasets" list of :attr:`hparams`, and result in a Dataset whose
element is a python `dict` containing data fields from each of the
specified datasets. Fields from a text dataset or Record dataset have
names prefixed by its "data_name". Fields from a scalar dataset are
specified by its "data_name".
names prefixed by its :attr:`"data_name"`. Fields from a scalar dataset are
specified by its :attr:`"data_name"`.
Example:
Expand Down

0 comments on commit ddfd225

Please sign in to comment.