Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset pending failures #9901

Open
abtink opened this issue Mar 7, 2024 · 9 comments
Open

dataset pending failures #9901

abtink opened this issue Mar 7, 2024 · 9 comments

Comments

@abtink
Copy link
Member

abtink commented Mar 7, 2024

This is follow up from #9871 (comment). Filing as its own issue so can be discussed.

Copying the comment:

@abtink All of the log lines are actually present! I think I found the issue. On node 1 (Leader), the pending dataset is used to update the security policy's Rotation Time to 2. This works partly: it does set the KeyManager's mRotationTime to the new value 2. However, the value in the active dataset, after the pending dataset is applied, stays at its old value 672. Other nodes receiving the active dataset from the Leader will therefore use the value 672 instead of 2 and so after the first change of tKSC +1 , such an other node won't go along with the tKSC increase if the difference is only +1. Because it comes too early.

To reproduce on a node:

dataset init active
dataset securitypolicy 2 onrc
dataset delay 500
dataset commit pending

Now wait 0.5 sec and type dataset active to show the new active dataset:it will still show Security Policy: 672 onrc 0. Also the- wrong/old value 672 will be sent to a new Child in Child ID Response. For changing the channel there's the same issue: the new channel will be used, but doesn't show up in dataset active output.

@abtink
Copy link
Member Author

abtink commented Mar 7, 2024

@EskoDijk I think we need to change/increase the activetimestamp before dataset commit pending.

Adding dataset timestamp 2.

dataset init active
dataset securitypolicy 2 onrc
dataset delay 500
dataset activetimestamp 2
dataset commit pending

With this it works for me:

> dataset active
Active Timestamp: 1
Channel: 24
Channel Mask: 0x07fff800
Ext PAN ID: a5fbc584460bd282
Mesh Local Prefix: fdd3:47f2:d755:1d1a::/64
Network Key: a941c9a0206348f9cf73396ea6ba8503
Network Name: OpenThread-8d86
PAN ID: 0x8d86
PSKc: 0728ba6e96250fee05b5597b65285905
Security Policy: 672 onrc 0
Done

> dataset init active
Done
> dataset securitypolicy 2 onrc
Done
> dataset delay 500
Done
> dataset activetimestamp 2
Done

 
> dataset
Active Timestamp: 2
Channel: 24
Channel Mask: 0x07fff800
Delay: 500
Ext PAN ID: a5fbc584460bd282
Mesh Local Prefix: fdd3:47f2:d755:1d1a::/64
Network Key: a941c9a0206348f9cf73396ea6ba8503
Network Name: OpenThread-8d86
PAN ID: 0x8d86
PSKc: 0728ba6e96250fee05b5597b65285905
Security Policy: 2 onrc 0
Done

> dataset commit pending
Done

# Wait for 0.5 second
 
> dataset active
Active Timestamp: 2
Channel: 24
Channel Mask: 0x07fff800
Ext PAN ID: a5fbc584460bd282
Mesh Local Prefix: fdd3:47f2:d755:1d1a::/64
Network Key: a941c9a0206348f9cf73396ea6ba8503
Network Name: OpenThread-8d86
PAN ID: 0x8d86
PSKc: 0728ba6e96250fee05b5597b65285905
Security Policy: 2 onrc 0
Done
> 

@EskoDijk
Copy link
Contributor

EskoDijk commented Mar 7, 2024

@abtink thanks for spinning out this issue and looking into it! I confirm that adding an active timestamp with a higher value works. If the active timestamp is lower than, or equal to, the current active timestamp value then the expected behavior is that when the pending dataset is applied, it does not lead to an update of the active dataset.

So the current spec allows creating pending datasets that effectively "do nothing" once applied. (Section 8.4.3.4).
The current code seems to go along with that - it checks in Error DatasetManager::Save(const Dataset &aDataset) for the conditions as specified. However, the problem I see there is that the function aDataset.ApplyConfiguration is called even when compare <= 0 . That shouldn't be the case as the "old" active dataset is still the valid one.
Btw this function is called from void PendingDatasetManager::HandleDelayTimer(void) , and the Save function probably thinks that the pending-dataset-converted-into-an-active-dataset is an active dataset, which it isn't just yet.

I may be wrong here but this is a quick guess of what happens based on my log output.

@EskoDijk
Copy link
Contributor

EskoDijk commented Mar 7, 2024

Another thing worth noting: it's possible to construct a pending dataset without the Pending Timestamp. Notably when copying an active dataset and modifying some items, that can happen - as we did in our test CLI commands in this issue. By default there is no Pending Timestamp and it has to be set explicitly.

Missing Pending Timestamp has the property that the pending dataset will be applied on the node/Leader that created it, but it won't propagate to other nodes in the mesh. I'm not sure if the spec allows such a Pending Dataset without the Pending Timestamp. We could let the command dataset commit pending give an error if it's not provided?

Background:

  • In the spec 4.4.24 we have "The Pending Timestamp is not included and MUST be sent as a separate Pending Timestamp TLV"
  • in 4.5.1 "The sender MUST include its Active Timestamp and MUST include its Pending Timestamp if it has a valid Pending Operational Dataset and maintains a valid Delay Timer".
  • 5.15.2 has for the case that the sender stores a valid Pending Dataset: "Otherwise, the MLE Data Response MUST contain an MLE Pending Timestamp TLV with the value of the local Pending Timestamp". So logically if no Pending Timestamp TLV is included in the MLE message, then the sender doesn't store a valid Pending Dataset.

@abtink
Copy link
Member Author

abtink commented Mar 7, 2024

Yes. The dataset commit [pending/active] commands are raw and flexible but I guess may also let user mess up the network. It may be good to add more safety checks for them. We can update/fix these later.

@EskoDijk , do you know about otDatasetUpdaterRequestUpdate(). It should be easier to use and do more checks:

Supported in CLI as dataset updater but its documenation is missing. Will add the docs later.

@EskoDijk
Copy link
Contributor

EskoDijk commented Mar 8, 2024

No, I havent' used the dataset-updater until now - thanks for the hint.
I agree also with the user's right to mess up the network (we need that flexibility also during certification ;-) ). For the missing Pending Timestamp topic: the dataset commit pending could still allow a missing Pending Timestamp on a Thread reference device, and show an error/warning otherwise. (In case of warning, a user could still use a "force" parameter to still do it.)

For the topic of aDataset.ApplyConfiguration being called also when the pending dataset is not accepted: this has higher priority to be fixed I think. It's not in line with specification. It can lead to network disruption in particular normal, non-CLI use cases (e.g. involving one Commissioner making a pending dataset change, while later a Commissioner makes an active dataset change that gets applied before the delay timer of the 1st change runs out.)

@abtink
Copy link
Member Author

abtink commented Mar 8, 2024

For the topic of aDataset.ApplyConfiguration being called also when the pending dataset is not accepted

I agree this seems to like a bug (sort of a bad one). Will investigate it later.

@abtink
Copy link
Member Author

abtink commented Mar 12, 2024

@EskoDijk I dig a bit more in the code.

I see there are a bunch of validation checks done in DatasetManager::HandleSet() (which processes a MGMT_SET request):

Error DatasetManager::HandleSet(Coap::Message &aMessage, const Ip6::MessageInfo &aMessageInfo)

This includes active and pending timestamp (based on dataset type), channel checks, etc. This all seems fine to me. And I believe the MGMT_SET commands are the main way we intend commissioner (or other device) to register new datasets.

The dataset commit pending (which maps to Dataset::Save() call) does not do any checks but I guess this is okay as I think its intended for testing mainly (its flexibility might be prioritized over strict checks).

Thoughts?

@EskoDijk
Copy link
Contributor

@abtink Good that checks are done at the Leader when taking in a new pending dataset. I'm also okay with having the CLI commands be more flexibile and allow (for testing) to use non-compliant datasets. What I'd like to do linked to this issue is to create a CLI documentation PR that adds some warnings about this - that the dataset CLI is not the intended way to update datasets, and that Thread devices in production MUST NOT autonomously update datasets (only a Commissioner can do this).

Still, for the ApplyConfiguration issue, currently we have checks required by the spec 8.4.3.4. These are specified regardless of the origin of the pending dataset. Although doing these checks may be less useful when the pending dataset always comes through a MGMT_PENDING_SET command, it seems easier to update the code than to change the spec for this. Since there are already some limited checks done (e.g. if (isNetworkKeyUpdated || compare > 0)) it may be worth just getting this right.

That said, I don't fully understand current code for DatasetManager::Save(const Dataset &aDataset) : it explicitly allows a new configuration to be applied but the related dataset not being saved into memory. If there's a use case for that, I feel that it should be at least commented in the code why/how.

@EskoDijk
Copy link
Contributor

Added PR #10026 for the CLI documentation issue mentioned in my last comment.

Still pending for this issue, are the 2 problems mentioned in the last 2 paragraphs of my last comment.

jwhui pushed a commit that referenced this issue Apr 16, 2024
…ing datasets (#10026)

Based on discussion in #9901 , this clarifies in the documentation
that the CLI `dataset` commands can only be used in specific
situations (like testing), and that these commands allow setting
invalid (combinations of) parameters also. It points to the
Commissioner as the correct way of doing dataset updates in
production.

It also adds "Overview" sections that were missing in two README
files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants