Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strange behaviour when factorizing (new '0' issue with nodeset) #293

Closed
nicolasdeferrieres opened this issue Mar 3, 2016 · 10 comments · Fixed by #473
Closed

strange behaviour when factorizing (new '0' issue with nodeset) #293

nicolasdeferrieres opened this issue Mar 3, 2016 · 10 comments · Fixed by #473

Comments

@nicolasdeferrieres
Copy link

Hi,

Recently I had some issue with factorization (#288 which was a duplicated of #286) which I fixed by installing the last version (from master branch). It fixed my issue.

I have now a new one (after eating my '0', it's adding an extra '0'):

$ cat liste_failed_for_clush
firstgroupa001
firstgroupa002
secondgroup001
secondgroup002
secondgroup003
secondgroup005
secondgroup006
secondgroup008
namewithissue007
namewithissue008
namewithissue009
namewithissue01
namewithissue02
namewithissue03
namewithissue04
namewithissue10
$ paste -sd, liste_failed_for_clush | nodeset -f
firstgroupa[001-002],namewithissue[001-004,007-010],secondgroup[001-003,005-006,008]

It is adding a 0 to the last five servers!
namewithissue[001-004,007-010] should be namewithissue[01-04,10],namewithissue[007-009]

If the servers are not in the same order (starting with 2 digits server's name instead of the 3 digits one):

$ cat liste_failed_for_clush2
namewithissue01
namewithissue02
namewithissue03
namewithissue04
namewithissue10
firstgroupa001
firstgroupa002
secondgroup001
secondgroup002
secondgroup003
secondgroup005
secondgroup006
secondgroup008
namewithissue007
namewithissue008
namewithissue009
$ paste -sd, liste_failed_for_clush2 | nodeset -f
firstgroupa[001-002],namewithissue[01-04,07-10],secondgroup[001-003,005-006,008]

This time, it's eating a '0'.

As I saw the last version was released (which should be the same I installed 10 days before), I installed it instead, and it is the same behaviour

$ rpm -qa | grep clustershell
clustershell-1.7.1-1.el6.noarch
$ /usr/bin/nodeset --version
nodeset 1.7.1

I even try to rollback to my initial version (1.7-1, and it is the same).

Is it a new bug?

Thanks

@thiell
Copy link
Collaborator

thiell commented Mar 3, 2016

Hi @nicolasdeferrieres!

Thanks for reporting. I would say, it's a known limitation. The current behavior is even documented:

http://clustershell.readthedocs.org/en/latest/tools/nodeset.html#zero-padding

nodeset will always try to coalesce node names by their numerical index first (without taking care of any zero-padding), and then will use the first zero-padding rule encountered.

Indeed, internally nodeset will recognize "namewithissue10" or "namewithissue008" as the same pattern "namewithissue%s" and will apply the largest padding value, here used for 008, to all indexes.

That is clearly a limitation we can work on. You're the first to report that, but this behavior has been there for a long time now. For me (I'm a cluster admin), this is a tricky case, it will lead to errors (human errors and/or or weird behavior from other tools).

Let us know what you think. I'm open to hear arguments :)

@nicolasdeferrieres
Copy link
Author

Hi @thiell,

Thanks for the answer.

Indeed, this behavior is documented. I tried with an older version (1.6), and the behavior is the same.

However, this behavior troubles me a lot.

I'm giving a list of servers, and nodeset will output a different one!

In my case, I was fortunate, because the names altered by nodeset don't exists in my infrastructure (I know, it is dangerous to have a server named bar01 and another one named bar001, but still it can exists (a naming convention migration for example)).

Furthermore, you can see that when the first server has a name with 2 digits, it has a different behavior than when the first server has 3 digits.

$ nodeset -f bar01,bar002
bar[01-02]
$ nodeset -f bar002,bar01
bar[001-002]

For me (in an ideal world!), nodeset should be here to make groups of servers (from a input list of servers) not to change the meaning of this input.

I'm expecting that nodest will behave like:

$ nodeset -f bar01,bar002
bar0[1,02]
$ nodeset -f bar002,bar01
bar0[1,02]

Or even

$ nodeset -f bar01,bar002
bar01,bar002
$ nodeset -f bar002,bar01
bar01,bar002

And in rare cases:

$ nodeset -f bar01,bar001
bar01,bar001 (or bar0[1,01])

In the way I need to be sure that the servers on which my commands will be launched are the one I gave!

By talking to other users that encounter this strange behavior, they usually bypass by making two (or more) lists when it arrives.

But when working on a list of several thousands of servers, it is possible that the human will not see that the servers are not the one he gave in input (actions on unwanted servers, and servers forgotten).

Here are part of my arguments! :-) I'm open to hear arguments too.

@degremont
Copy link
Collaborator

Hi @nicolasdeferrieres

By talking to other users that encounter this strange behavior, they usually bypass by making two (or more) lists when it arrives.

Could you give us more details about that? Do you know a lot of people doing this kind of things? :)

@nicolasdeferrieres
Copy link
Author

Hi @degremont

It's just a simple manual bypass:
When they encounter this bad factorization, they split the list in order to avoid the "limitation":
list1:

firstgroupa001
firstgroupa002
secondgroup001
secondgroup002
secondgroup003
secondgroup005
secondgroup006
secondgroup008
namewithissue007
namewithissue008
namewithissue009

list2:

namewithissue01
namewithissue02
namewithissue03
namewithissue04
namewithissue10

And launching separately:

clush -bw $(paste -sd, list1) "command"
clush -bw $(paste -sd, list2) "command"

@degremont
Copy link
Collaborator

We will probably have to support padding per pattern and no more per rangeset. This is a not trivial change that could go in 1.8 or 1.9...

@thiell thiell added this to the 1.8 milestone May 7, 2016
@thiell thiell modified the milestones: 1.8, 1.9 Sep 17, 2017
@thiell thiell removed the WIP label Sep 17, 2017
@sg184
Copy link

sg184 commented Mar 19, 2019

Can you please say me release date for Clustershell 1.9.
We are waiting for this bug to fixed.

@degremont
Copy link
Collaborator

I'm sorry but we do not have planned release date for 1.9
This bug needs a deep change in NodeSet code and need to be carefully implemented.

@sg184
Copy link

sg184 commented Mar 26, 2019 via email

@btravouillon
Copy link

btravouillon commented May 16, 2019

Hi,

Similar issue here with nodeset:

$ nodeset -f --split=2 r14c[11-13],r02c[03-04]
r14c[11-13]
r2c[3-4]

Interestingly, given the number of elements in each nodeset, ClusterShell can process the nodesets in a different order and the issue won't show.

eg. 3 nodes in r14c[11-13], 4 in r02c[03-06]:

$ nodeset -f --split=2 r14c[11-13],r02c[03-06]
r02c[03-06]
r14c[11-13]

With a small debug patch, I observed the padding is not correct:

$ nodeset -f --split=2 r14c[11-13],r02c[03-04]
DBG RangeSetND._init_ rgvec: (14, 11), pads: [None, None]
DBG RangeSetND._init_ rgvec: (14, 12), pads: [None, None]
DBG RangeSetND._init_ rgvec: (14, 13), pads: [None, None]
r14c[11-13]
DBG RangeSetND._init_ rgvec: (2, 3), pads: [None, None]
DBG RangeSetND._init_ rgvec: (2, 4), pads: [None, None]
r2c[3-4]

This is due to RangesetND.__getitem__() looping over index 0 of self._veclist to set the padding.

@andrewbogott
Copy link

I am also encountering this issue. I have two hosts in my cluster: deployment-logstash2 and deployment-logstash03. Clustershell tries to access the nonexistent host deployment-logstash02 and ignores deployment-logstash2.

thiell added a commit to thiell/clustershell that referenced this issue Jun 28, 2022
)

This patch adds support for mixed lengths 0-padding ranges by
using sets of strings instead of integers. No need to keep
track of padding length anymore, as this information is self
contained in the strings.

  $ cluset -f bar002,bar01
  bar[01,002]

Deprecated RangeSet.padding which is now a noop.

Closes cea-hpc#293 cea-hpc#442
thiell added a commit to thiell/clustershell that referenced this issue Jul 1, 2022
)

This patch adds support for mixed lengths 0-padding ranges by
always using strings instead of integers in the inner set. No need to
keep track of padding length per set anymore, as this information is
self contained in the strings.

Old behavior (single padding length supported):

    $ cluset -f bar002,bar01,bar0
    bar[000-002]

New behavior (mixed padding lengths supported):

    $ cluset -f bar002,bar01,bar0
    bar[0,01,002]

RangeSet.padding is now available as a property. In case of zero padding
with mixed lengths, it returns the maximum padding length. It can also
still be used to force a fixed-length padding on the set.

Example:

    >>> r = RangeSet("0,01,002")
    >>> r
    0,01,002
    >>> r.padding
    3
    >>> r.padding = 4
    >>> r
    0000-0002
    >>> r.padding = None
    >>> r
    0-2

Older versions of RangeSet are automatically converted when unpickled.

Closes cea-hpc#293 cea-hpc#442
thiell added a commit that referenced this issue Aug 8, 2022
)

This patch adds support for mixed lengths 0-padding ranges by
always using strings instead of integers in the inner set. No need to
keep track of padding length per set anymore, as this information is
self contained in the strings.

Old behavior (single padding length supported):

    $ cluset -f bar002,bar01,bar0
    bar[000-002]

New behavior (mixed padding lengths supported):

    $ cluset -f bar002,bar01,bar0
    bar[0,01,002]

RangeSet.padding is now available as a property. In case of zero padding
with mixed lengths, it returns the maximum padding length. It can also
still be used to force a fixed-length padding on the set.

Example:

    >>> r = RangeSet("0,01,002")
    >>> r
    0,01,002
    >>> r.padding
    3
    >>> r.padding = 4
    >>> r
    0000-0002
    >>> r.padding = None
    >>> r
    0-2

Older versions of RangeSet are automatically converted when unpickled.

Add check when parsing bogus ranges like 01-010. This is stricter but
should avoid mistake and syntax error.

Closes #293 #442
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants