`dataset` pending failures #9901

abtink · 2024-03-07T04:03:38Z

This is follow up from #9871 (comment). Filing as its own issue so can be discussed.

Copying the comment:

@abtink All of the log lines are actually present! I think I found the issue. On node 1 (Leader), the pending dataset is used to update the security policy's Rotation Time to 2. This works partly: it does set the KeyManager's mRotationTime to the new value 2. However, the value in the active dataset, after the pending dataset is applied, stays at its old value 672. Other nodes receiving the active dataset from the Leader will therefore use the value 672 instead of 2 and so after the first change of tKSC +1 , such an other node won't go along with the tKSC increase if the difference is only +1. Because it comes too early.

To reproduce on a node:
dataset init active
dataset securitypolicy 2 onrc
dataset delay 500
dataset commit pending
Now wait 0.5 sec and type dataset active to show the new active dataset:it will still show Security Policy: 672 onrc 0. Also the- wrong/old value 672 will be sent to a new Child in Child ID Response. For changing the channel there's the same issue: the new channel will be used, but doesn't show up in dataset active output.

The text was updated successfully, but these errors were encountered:

abtink · 2024-03-07T04:10:04Z

@EskoDijk I think we need to change/increase the activetimestamp before dataset commit pending.

Adding dataset timestamp 2.

dataset init active
dataset securitypolicy 2 onrc
dataset delay 500
dataset activetimestamp 2
dataset commit pending

With this it works for me:

> dataset active
Active Timestamp: 1
Channel: 24
Channel Mask: 0x07fff800
Ext PAN ID: a5fbc584460bd282
Mesh Local Prefix: fdd3:47f2:d755:1d1a::/64
Network Key: a941c9a0206348f9cf73396ea6ba8503
Network Name: OpenThread-8d86
PAN ID: 0x8d86
PSKc: 0728ba6e96250fee05b5597b65285905
Security Policy: 672 onrc 0
Done

> dataset init active
Done
> dataset securitypolicy 2 onrc
Done
> dataset delay 500
Done
> dataset activetimestamp 2
Done

 
> dataset
Active Timestamp: 2
Channel: 24
Channel Mask: 0x07fff800
Delay: 500
Ext PAN ID: a5fbc584460bd282
Mesh Local Prefix: fdd3:47f2:d755:1d1a::/64
Network Key: a941c9a0206348f9cf73396ea6ba8503
Network Name: OpenThread-8d86
PAN ID: 0x8d86
PSKc: 0728ba6e96250fee05b5597b65285905
Security Policy: 2 onrc 0
Done

> dataset commit pending
Done

# Wait for 0.5 second
 
> dataset active
Active Timestamp: 2
Channel: 24
Channel Mask: 0x07fff800
Ext PAN ID: a5fbc584460bd282
Mesh Local Prefix: fdd3:47f2:d755:1d1a::/64
Network Key: a941c9a0206348f9cf73396ea6ba8503
Network Name: OpenThread-8d86
PAN ID: 0x8d86
PSKc: 0728ba6e96250fee05b5597b65285905
Security Policy: 2 onrc 0
Done
>

EskoDijk · 2024-03-07T08:38:01Z

@abtink thanks for spinning out this issue and looking into it! I confirm that adding an active timestamp with a higher value works. If the active timestamp is lower than, or equal to, the current active timestamp value then the expected behavior is that when the pending dataset is applied, it does not lead to an update of the active dataset.

So the current spec allows creating pending datasets that effectively "do nothing" once applied. (Section 8.4.3.4).
The current code seems to go along with that - it checks in Error DatasetManager::Save(const Dataset &aDataset) for the conditions as specified. However, the problem I see there is that the function aDataset.ApplyConfiguration is called even when compare <= 0 . That shouldn't be the case as the "old" active dataset is still the valid one.
Btw this function is called from void PendingDatasetManager::HandleDelayTimer(void) , and the Save function probably thinks that the pending-dataset-converted-into-an-active-dataset is an active dataset, which it isn't just yet.

I may be wrong here but this is a quick guess of what happens based on my log output.

EskoDijk · 2024-03-07T09:23:59Z

Another thing worth noting: it's possible to construct a pending dataset without the Pending Timestamp. Notably when copying an active dataset and modifying some items, that can happen - as we did in our test CLI commands in this issue. By default there is no Pending Timestamp and it has to be set explicitly.

Missing Pending Timestamp has the property that the pending dataset will be applied on the node/Leader that created it, but it won't propagate to other nodes in the mesh. I'm not sure if the spec allows such a Pending Dataset without the Pending Timestamp. We could let the command dataset commit pending give an error if it's not provided?

Background:

In the spec 4.4.24 we have "The Pending Timestamp is not included and MUST be sent as a separate Pending Timestamp TLV"
in 4.5.1 "The sender MUST include its Active Timestamp and MUST include its Pending Timestamp if it has a valid Pending Operational Dataset and maintains a valid Delay Timer".
5.15.2 has for the case that the sender stores a valid Pending Dataset: "Otherwise, the MLE Data Response MUST contain an MLE Pending Timestamp TLV with the value of the local Pending Timestamp". So logically if no Pending Timestamp TLV is included in the MLE message, then the sender doesn't store a valid Pending Dataset.

abtink · 2024-03-07T22:30:17Z

Yes. The dataset commit [pending/active] commands are raw and flexible but I guess may also let user mess up the network. It may be good to add more safety checks for them. We can update/fix these later.

@EskoDijk , do you know about otDatasetUpdaterRequestUpdate(). It should be easier to use and do more checks:

Supported in CLI as dataset updater but its documenation is missing. Will add the docs later.

EskoDijk · 2024-03-08T07:55:55Z

No, I havent' used the dataset-updater until now - thanks for the hint.
I agree also with the user's right to mess up the network (we need that flexibility also during certification ;-) ). For the missing Pending Timestamp topic: the dataset commit pending could still allow a missing Pending Timestamp on a Thread reference device, and show an error/warning otherwise. (In case of warning, a user could still use a "force" parameter to still do it.)

For the topic of aDataset.ApplyConfiguration being called also when the pending dataset is not accepted: this has higher priority to be fixed I think. It's not in line with specification. It can lead to network disruption in particular normal, non-CLI use cases (e.g. involving one Commissioner making a pending dataset change, while later a Commissioner makes an active dataset change that gets applied before the delay timer of the 1st change runs out.)

abtink · 2024-03-08T16:20:10Z

For the topic of aDataset.ApplyConfiguration being called also when the pending dataset is not accepted

I agree this seems to like a bug (sort of a bad one). Will investigate it later.

abtink · 2024-03-12T18:30:04Z

@EskoDijk I dig a bit more in the code.

I see there are a bunch of validation checks done in DatasetManager::HandleSet() (which processes a MGMT_SET request):

openthread/src/core/meshcop/dataset_manager_ftd.cpp

Line 74 in be06d7c

    
           Error DatasetManager::HandleSet(Coap::Message &aMessage, const Ip6::MessageInfo &aMessageInfo)

This includes active and pending timestamp (based on dataset type), channel checks, etc. This all seems fine to me. And I believe the MGMT_SET commands are the main way we intend commissioner (or other device) to register new datasets.

The dataset commit pending (which maps to Dataset::Save() call) does not do any checks but I guess this is okay as I think its intended for testing mainly (its flexibility might be prioritized over strict checks).

Thoughts?

EskoDijk · 2024-03-13T09:11:32Z

@abtink Good that checks are done at the Leader when taking in a new pending dataset. I'm also okay with having the CLI commands be more flexibile and allow (for testing) to use non-compliant datasets. What I'd like to do linked to this issue is to create a CLI documentation PR that adds some warnings about this - that the dataset CLI is not the intended way to update datasets, and that Thread devices in production MUST NOT autonomously update datasets (only a Commissioner can do this).

Still, for the ApplyConfiguration issue, currently we have checks required by the spec 8.4.3.4. These are specified regardless of the origin of the pending dataset. Although doing these checks may be less useful when the pending dataset always comes through a MGMT_PENDING_SET command, it seems easier to update the code than to change the spec for this. Since there are already some limited checks done (e.g. if (isNetworkKeyUpdated || compare > 0)) it may be worth just getting this right.

That said, I don't fully understand current code for DatasetManager::Save(const Dataset &aDataset) : it explicitly allows a new configuration to be applied but the related dataset not being saved into memory. If there's a use case for that, I feel that it should be at least commented in the code why/how.

EskoDijk · 2024-04-12T09:49:15Z

Added PR #10026 for the CLI documentation issue mentioned in my last comment.

Still pending for this issue, are the 2 problems mentioned in the last 2 paragraphs of my last comment.

…ing datasets (#10026) Based on discussion in #9901 , this clarifies in the documentation that the CLI `dataset` commands can only be used in specific situations (like testing), and that these commands allow setting invalid (combinations of) parameters also. It points to the Commissioner as the correct way of doing dataset updates in production. It also adds "Overview" sections that were missing in two README files.

abtink mentioned this issue Mar 7, 2024

[key-manager] update how key guard time is determined and applied #9871

Merged

EskoDijk mentioned this issue Apr 12, 2024

[cli] update README files to clarify a Commissioner is used for changing datasets #10026

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`dataset` pending failures #9901

`dataset` pending failures #9901

abtink commented Mar 7, 2024

abtink commented Mar 7, 2024

EskoDijk commented Mar 7, 2024

EskoDijk commented Mar 7, 2024 •

edited

abtink commented Mar 7, 2024

EskoDijk commented Mar 8, 2024

abtink commented Mar 8, 2024

abtink commented Mar 12, 2024

EskoDijk commented Mar 13, 2024

EskoDijk commented Apr 12, 2024

dataset pending failures #9901

dataset pending failures #9901

Comments

abtink commented Mar 7, 2024

abtink commented Mar 7, 2024

EskoDijk commented Mar 7, 2024

EskoDijk commented Mar 7, 2024 • edited

abtink commented Mar 7, 2024

EskoDijk commented Mar 8, 2024

abtink commented Mar 8, 2024

abtink commented Mar 12, 2024

EskoDijk commented Mar 13, 2024

EskoDijk commented Apr 12, 2024

`dataset` pending failures #9901

`dataset` pending failures #9901

EskoDijk commented Mar 7, 2024 •

edited