Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alarm streaming: CLEARED versus TOMBSTONE #548

Open
roshan-joyce-fujitsu opened this issue Aug 21, 2023 · 9 comments
Open

Alarm streaming: CLEARED versus TOMBSTONE #548

roshan-joyce-fujitsu opened this issue Aug 21, 2023 · 9 comments

Comments

@roshan-joyce-fujitsu
Copy link

In TR547 v2.0 page 60, it says:

CLEARED should not be used. A cleared alarm is a DELETE/TOMBSTONE.

However in section 4.7 (page 63), it says:

1. The Tombstone/clear shall cause the compaction process to remove the corresponding active alarm.
2. The Tombstone shall also cause the compaction process to remove the clear (that immediately precedes it)

This raises the question whether a stream-record needs to be created with perceived-severity as CLEARED when an alarm is cleared and then delete/tombstone the original alarm.

This also raises the confusion about what should be used as the "key" for alarms in a compacted log?
Should it be the detector-uuid?

@nigel-r-davis
Copy link
Collaborator

In TR-548 the alarm is considered as an entity. When the alarm state goes to cleared, the entity is considered as having been deleted (so the active alarm list essentially is being considered as the presence of alarm entities). There should be no entity representing the existence of a CLEARED alarm (otherwise the log would continue to grow with cleared alarms. The delete can carry extra details if necessary related to the clearing reason etc. (in the same way it could carry deletion reason for any other case). This has not been assumed to be necessary at this stage and hence not detailed. Are there details in the cleared alarm that you think need to be conveyed?

The text in TR-548 is sloppy. It should state that "The Tombstone (representing the clear) shall cause..." and "The Tombstone shall also cause the compaction process to remove the Delete (representing the clear alarm)..". I will read through the related sections and try to catch any related ambiguities.

I hope this clarifies.

@nigel-r-davis
Copy link
Collaborator

I have checked the text and have changed the section you commented on to...

As noted earlier, the alarm DELETE record (representing the alarm clear) shall be followed immediately by a Tombstone record. As also noted, the deletion of an alarm detector, where the alarm was active prior to the deletion, shall cause at least the logging of a Tombstone record. Allowing for compaction delay:

  1. The Tombstone/Delete (representing the alarm clear) shall cause the compaction process to remove the corresponding active alarm.
  2. The Tombstone shall also cause the compaction process to remove the Delete record (that immediately precedes it)

@nigel-r-davis
Copy link
Collaborator

I also made a minor adjustment to the earlier section...

 CLEARED should not be used. A cleared alarm is represented by a DELETE/TOMBSTONE.

@roshan-joyce-fujitsu
Copy link
Author

In TR-548 the alarm is considered as an entity. When the alarm state goes to cleared, the entity is considered as having been deleted (so the active alarm list essentially is being considered as the presence of alarm entities). There should be no entity representing the existence of a CLEARED alarm (otherwise the log would continue to grow with cleared alarms. The delete can carry extra details if necessary related to the clearing reason etc. (in the same way it could carry deletion reason for any other case). This has not been assumed to be necessary at this stage and hence not detailed. Are there details in the cleared alarm that you think need to be conveyed?

No, I did not have any extra clear information in mind. I was merely trying to understand how a clear would be modeled in the stream.

@roshan-joyce-fujitsu
Copy link
Author

When an alarm is modeled as an entity, what is its unique id?

Is it the log-record/log-record-body/condition-detector/detector-uuid?
Or a combination of some other fields?

If the controller is using a compacted log in Kafka to maintain the alarm stream, what would be the "key" for a message corresponding to the log-record for an alarm?

@amazzini
Copy link
Collaborator

Regarding alarm entity, we need to check if the item is clarified in TR-547 - and if not, maybe reuse TR-548 statements.

@roshan-joyce-fujitsu
Copy link
Author

Hi @amazzini , @nigel-r-davis

Regarding the unique identifier of an alarm, following are the contenders (in my understanding):

  1. tapi-fm:fault-management-context/active-condition/uuid
  2. A combination of the following fields of tapi-fm:fault-management-context/active-condition:
    • target-object-identifier
    • detected-condition/detected-condition-name
    • detected-condition/detected-condition-qualifier
  3. log-record/log-record-body/condition-detector/detector-uuid
  4. A combination of the following fields of log-record-body/condition-detector/:
    • measured-entity-uuid
    • detected-condition/detected-condition-name
    • detected-condition/detected-condition-qualifier

@amazzini
Copy link
Collaborator

Regarding option 2, also the target-object-name (local class, child of global class identified by target-object-identifier).
There is a 5th contender, similar to 2 but based on event-notification plus detected-condition, e.g. in case the active-condition is not available.

@nigel-r-davis
Copy link
Collaborator

Hi,

From a streaming perspective, the entity-key needs to be the same value for the raise, any changes and the clear for any particular detection. The entity-key needs to be unique across all alarms active at a time. It is OK for an entity-key to be used again, so long as it does not violate the uniqueness need.

So, for example, the entity-key could be a unique identifier of the detector (for a single condition). In this case, whenever the alarm is active, it will use the same identifier. But clearly, the active-clear-active cycle will ensure temporal isolation and compaction will clean up as usual.

The entity-key is usually a uuid format. There is no reason in the standard to specify how the entity-key is generated as it is treated as opaque.

From a standard perspective, you could use any of the methods you propose so considering each in turn:

  1. Looks good so long as the uuid is not reused (other than for the same detector)
  2. Looks good as I suspect the combination of condition name and qualifier gives a single unique alarm point
  3. Possible a challenge if the detector can raise more than one alarm, but assuming that it cannot by definition, then also good
  4. Similar to (2)

I hope this is clear enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants