Skip to content

Commit

Permalink
RangeSet: support ranges with zero padding of mixed lengths (#293) (#473
Browse files Browse the repository at this point in the history
)

This patch adds support for mixed lengths 0-padding ranges by
always using strings instead of integers in the inner set. No need to
keep track of padding length per set anymore, as this information is
self contained in the strings.

Old behavior (single padding length supported):

    $ cluset -f bar002,bar01,bar0
    bar[000-002]

New behavior (mixed padding lengths supported):

    $ cluset -f bar002,bar01,bar0
    bar[0,01,002]

RangeSet.padding is now available as a property. In case of zero padding
with mixed lengths, it returns the maximum padding length. It can also
still be used to force a fixed-length padding on the set.

Example:

    >>> r = RangeSet("0,01,002")
    >>> r
    0,01,002
    >>> r.padding
    3
    >>> r.padding = 4
    >>> r
    0000-0002
    >>> r.padding = None
    >>> r
    0-2

Older versions of RangeSet are automatically converted when unpickled.

Add check when parsing bogus ranges like 01-010. This is stricter but
should avoid mistake and syntax error.

Closes #293 #442
  • Loading branch information
thiell committed Aug 8, 2022
1 parent 314767d commit 5a41bc0
Show file tree
Hide file tree
Showing 8 changed files with 674 additions and 405 deletions.
36 changes: 23 additions & 13 deletions doc/sphinx/guide/rangesets.rst
Expand Up @@ -14,23 +14,30 @@ RangeSet class
--------------

The :class:`.RangeSet` class implements a mutable, ordered set of cluster node
indexes (one dimension) featuring a fast range-based API. This class is used
by the :class:`.NodeSet` class (see :ref:`class-NodeSet`). Since version 1.6,
:class:`.RangeSet` really derives from standard Python set class (`Python
sets`_), and thus provides methods like :meth:`.RangeSet.union`,
indexes (over a single dimension) featuring a fast range-based API. This class
is used by the :class:`.NodeSet` class (see :ref:`class-NodeSet`). Since
version 1.6, :class:`.RangeSet` actually derives from the standard Python set
class (`Python sets`_), and thus provides methods like :meth:`.RangeSet.union`,
:meth:`.RangeSet.intersection`, :meth:`.RangeSet.difference`,
:meth:`.RangeSet.symmetric_difference` and their in-place versions
:meth:`.RangeSet.update`, :meth:`.RangeSet.intersection_update`,
:meth:`.RangeSet.difference_update()` and
:meth:`.RangeSet.symmetric_difference_update`.

Since v1.6, padding of ranges (eg. *003-009*) can be managed through a public
:class:`.RangeSet` instance variable named padding. It may be changed at any
time. Padding is a simple display feature per RangeSet object, thus current
padding value is not taken into account when computing set operations. Also
since v1.6, :class:`.RangeSet` is itself an iterator over its items as
integers (instead of strings). To iterate over string items as before (with
optional padding), you can now use the :meth:`.RangeSet.striter()` method.
In v1.9, the implementation of zero-based padding of indexes (e.g. `001`) has
been improved. The inner set contains indexes as strings with the padding
included, which allows the use of mixed length zero-padded indexes (eg. using
both `01` and `001` is valid and supported in the same object). Prior to v1.9,
zero-padding was a simple display feature of fixed length per
:class:`.RangeSet` object, and indexes where stored as integers in the inner
set.

To iterate over indexes as strings with zero-padding included, you can now
iterate over the :class:`.RangeSet` object (:meth:`.RangeSet.__iter__()`),
or still use the :meth:`.RangeSet.striter()` method which has not changed.
To iterate over the set's indexes as integers, you may use the new method
:meth:`.RangeSet.intiter()`, which is the equivalent of iterating over the
:class:`.RangeSet` object before v1.9.

.. _class-RangeSetND:

Expand All @@ -47,6 +54,8 @@ tuples, for instance::

>>> from ClusterShell.RangeSet import RangeSet, RangeSetND
>>> r1 = RangeSet("1-5/2")
>>> list(r1)
['1', '3', '5']
>>> r2 = RangeSet("10-12")
>>> r3 = RangeSet("0-4/2")
>>> r4 = RangeSet("10-12")
Expand All @@ -57,7 +66,8 @@ tuples, for instance::
0-5; 10-12

>>> print list(rnd)
[(0, 10), (0, 11), (0, 12), (1, 10), (1, 11), (1, 12), (2, 10), (2, 11), (2, 12), (3, 10), (3, 11), (3, 12), (4, 10), (4, 11), (4, 12), (5, 10), (5, 11), (5, 12)]
[('0', '10'), ('0', '11'), ('0', '12'), ('1', '10'), ('1', '11'), ('1', '12'), ('2', '10'), ('2', '11'), ('2', '12'), ('3', '10'), ('3', '11'), ('3', '12'), ('4', '10'), ('4', '11'), ('4', '12'), ('5', '10'), ('5', '11'), ('5', '12')]

>>> r1 = RangeSetND([(0, 4), (0, 5), (1, 4), (1, 5)])
>>> len(r1)
4
Expand All @@ -70,7 +80,7 @@ tuples, for instance::
>>> str(r)
'1; 4-5\n'
>>> list(r)
[(1, 4), (1, 5)]
[('1', '4'), ('1', '5')]


.. _Python sets: http://docs.python.org/library/sets.html
35 changes: 19 additions & 16 deletions doc/sphinx/tools/nodeset.rst
Expand Up @@ -268,7 +268,7 @@ are automatically padded with zeros as well. For example::

$ nodeset -e node[08-11]
node08 node09 node10 node11

$ nodeset -f node001 node002 node003 node005
node[001-003,005]

Expand All @@ -278,23 +278,20 @@ also supported, for example::
$ nodeset -e node[000-012/4]
node000 node004 node008 node012

Nevertheless, care should be taken when dealing with padding, as a zero-padded
node name has priority over a normal one, for example::
Since v1.9, mixed length padding is allowed, for example::

$ nodeset -f node1 node02
node[01-02]
$ nodeset -f node2 node01 node001
node[2,01,001]

To clarify, *nodeset* will always try to coalesce node names by their
numerical index first (without taking care of any zero-padding), and then will
use the first zero-padding rule encountered. In the following example, the
first zero-padding rule found is *node01*'s one::
When mixed length zero-padding is encountered, indexes with smaller padding
length are returned first, as you can see in the example above (``2`` comes
before ``01``).

$ nodeset -f node01 node002
node[01-02]
Since v1.9, when using node sets with multiple dimensions, each dimension (or
axis) may also use mixed length zero-padding::

That said, you can see it is not possible to mix *node01* and *node001* in the
same node set (not supported by the :class:`.NodeSet` class), but that would
be a tricky case anyway!
$ nodeset -f foo1bar1 foo1bar00 foo1bar01 foo004bar1 foo004bar00 foo004bar01
foo[1,004]bar[1,00-01]


Leading and trailing digits
Expand Down Expand Up @@ -325,7 +322,13 @@ Examples with both bracket leading and trailing digits::
$ nodeset --autostep=auto -f node-00[1-6]0
node-[0010-0060/10]

Still, using this syntax can be error-prone especially if used with node sets
Example with leading digit and mixed length zero padding (supported since
v1.9)::

$ nodeset -f node1[00-02,000-032/8]
node[100-102,1000,1008,1016,1024,1032]

Using this syntax can be error-prone especially if used with node sets
without 0-padding or with the */step* syntax and also requires additional
processing by the parser. In general, we recommend writing the whole rangeset
inside the brackets.
Expand All @@ -350,7 +353,7 @@ the union operation will be computed, for example::

$ nodeset -f node[1-3] node[4-7]
node[1-7]

$ nodeset -f node[1-3] node[2-7] node[5-8]
node[1-8]

Expand Down
47 changes: 23 additions & 24 deletions lib/ClusterShell/NodeSet.py
Expand Up @@ -165,44 +165,38 @@ def set_autostep(self, val):

def _iter(self):
"""Iterator on internal item tuples
(pattern, indexes, padding, autostep)."""
(pattern, indexes, autostep)."""
for pat, rset in sorted(self._patterns.items()):
if rset:
autostep = rset.autostep
if rset.dim() == 1:
assert isinstance(rset, RangeSet)
padding = rset.padding
for idx in rset:
yield pat, (idx,), (padding,), autostep
yield pat, (idx,), autostep
else:
for args, padding in rset.iter_padding():
yield pat, args, padding, autostep
for rvec in rset:
yield pat, rvec, autostep
else:
yield pat, None, None, None
yield pat, None, None

def _iterbase(self):
"""Iterator on single, one-item NodeSetBase objects."""
for pat, ivec, pad, autostep in self._iter():
for pat, ivec, autostep in self._iter():
rset = None # 'no node index' by default
if ivec is not None:
assert len(ivec) > 0
if len(ivec) == 1:
rset = RangeSet.fromone(ivec[0], pad[0] or 0, autostep)
rset = RangeSet.fromone(ivec[0], autostep)
else:
rset = RangeSetND([ivec], pad, autostep)
rset = RangeSetND([ivec], autostep)
yield NodeSetBase(pat, rset)

def __iter__(self):
"""Iterator on single nodes as string."""
# Does not call self._iterbase() + str() for better performance.
for pat, ivec, pads, _ in self._iter():
for pat, ivec, _ in self._iter():
if ivec is not None:
# For performance reasons, add a special case for 1D RangeSet
if len(ivec) == 1:
yield pat % ("%0*d" % (pads[0] or 0, ivec[0]))
else:
yield pat % tuple(["%0*d" % (pad or 0, i) \
for pad, i in zip(pads, ivec)])
yield pat % ivec
else:
yield pat % ()

Expand All @@ -214,14 +208,13 @@ def __iter__(self):

def nsiter(self):
"""Object-based NodeSet iterator on single nodes."""
for pat, ivec, pads, autostep in self._iter():
for pat, ivec, autostep in self._iter():
nodeset = self.__class__()
if ivec is not None:
if len(ivec) == 1:
pad = pads[0] or 0
nodeset._add_new(pat, RangeSet.fromone(ivec[0], pad))
nodeset._add_new(pat, RangeSet.fromone(ivec[0]))
else:
nodeset._add_new(pat, RangeSetND([ivec], pads, autostep))
nodeset._add_new(pat, RangeSetND([ivec], autostep))
else:
nodeset._add_new(pat, None)
yield nodeset
Expand Down Expand Up @@ -1045,12 +1038,18 @@ def _scan_string(self, nsstr, autostep):
pfxlen, sfxlen = len(pfx), len(sfx)

if sfxlen > 0:
# amending trailing digits generates /steps
sfx, rng = self._amend_trailing_digits(sfx, rng)
try:
# amending trailing digits generates /steps
sfx, rng = self._amend_trailing_digits(sfx, rng)
except RangeSetParseError as ex:
raise NodeSetParseRangeError(ex)

if pfxlen > 0:
# this method supports /steps
pfx, rng = self._amend_leading_digits(pfx, rng)
try:
# this method supports /steps
pfx, rng = self._amend_leading_digits(pfx, rng)
except RangeSetParseError as ex:
raise NodeSetParseRangeError(ex)
if pfx:
# scan any nonempty pfx as a single node (no bracket)
pfx, pfxrvec = self._scan_string_single(pfx, autostep)
Expand Down

0 comments on commit 5a41bc0

Please sign in to comment.