-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
feat(ACI): Allow processing dynamic data conditions #93851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## master #93851 +/- ##
==========================================
+ Coverage 87.97% 88.02% +0.04%
==========================================
Files 10364 10381 +17
Lines 599531 601651 +2120
Branches 23306 23306
==========================================
+ Hits 527435 529596 +2161
+ Misses 71627 71586 -41
Partials 469 469 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Picking up work on tests, but adding this review to indicate that I've seen your latest changes.
else: | ||
self.trigger_resolve_counts[trigger.id] = 0 | ||
is_anomalous = has_anomaly(potential_anomaly, trigger.label) | ||
fired_incident_triggers = self.handle_trigger_anomalies( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very clean, nice!
4490ac3
to
1431ea3
Compare
# this is a bit of a hack - anomaly detection data packets send extra data we need to pass along | ||
values = data_packet.packet["values"] | ||
if values.get("value") is not None: | ||
return values.get("value") | ||
return values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this solution but here's what's happening:
We need to send additional data in an anomaly detection data packet so instead of passing:
values={"value": aggregation_value}
as we do for a regular metric issue, we pass:
values={
"values": {
"value": aggregation_value,
"source_id": str(self.subscription.id),
"subscription_id": subscription_update["subscription_id"],
"timestamp": self.last_update,
},
},
this works great for anomaly detection, but if we modify the data packet for a static or dynamic metric issue like:
values={
"values": {
"value": aggregation_value,
},
},
we'll run into an issue here because value
is nested inside of a dictionary.
The workaround here extracts the correct value for a regular metric issue and passes along the entire dict for dynamic ones as needed, but it feels a little hacky. I haven't yet come up with a better solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is too hacky tbh. i think the wonkiness is coming from us having different data packets for a single detector. if we want to improve this, i think it'd be like introducing a new kind of detector, and each one handles their own data packet formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
high level i think this looks great -- maps the data packet correctly and should solve the issue. we could probably improve the type safety a bit by providing a new type to the data condition handler, and extra credit for the union type.
# this is a bit of a hack - anomaly detection data packets send extra data we need to pass along | ||
values = data_packet.packet["values"] | ||
if values.get("value") is not None: | ||
return values.get("value") | ||
return values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is too hacky tbh. i think the wonkiness is coming from us having different data packets for a single detector. if we want to improve this, i think it'd be like introducing a new kind of detector, and each one handles their own data packet formats.
src/sentry/incidents/handlers/condition/anomaly_detection_handler.py
Outdated
Show resolved
Hide resolved
|
||
if has_anomaly and not trigger_matches_status: | ||
metrics.incr( | ||
"incidents.alert_rules.threshold.alert", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious - do you have any plans to make a new data dog dashboard for incidents? It could be a cool use case to have the workflow engine powerpack + all these metrics on a single dashboard.
timestamp=self.last_update, | ||
) | ||
if self.alert_rule.detection_type == AlertRuleDetectionType.DYNAMIC: | ||
packet = QuerySubscriptionUpdate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 should we create a second data packet type definition for this? Then we can do something like: MetricIssueDataPacket = QuerySubscriptionUpdate | AnomalyDetectionUpdate
@with_feature("organizations:anomaly-detection-alerts") | ||
@with_feature("organizations:anomaly-detection-rollout") | ||
@with_feature("organizations:workflow-engine-metric-alert-processing") | ||
def test_seer_call_dual_processing(self, mock_seer_request: MagicMock): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we decompose this test into multiple bits? it's a bit hard for me to tell how this is testing seer calls -- maybe we could pull stuff out into setup methods and verify_deserialized_body
into a method to DRY it a little?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I broke this out into 3 tests but the next PR I want to work on is breaking this gigantic test file out into like 4 files and refactoring the workflow engine parts https://getsentry.atlassian.net/browse/ACI-392 so I'm gonna punt cleaning it up in this one.
seer_return_value_3: DetectAnomaliesResponse = { | ||
"success": True, | ||
"timeseries": [ | ||
{ | ||
"anomaly": {"anomaly_score": 0.5, "anomaly_type": AnomalyType.NONE.value}, | ||
"timestamp": 1, | ||
"value": 1, | ||
} | ||
], | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might cool to turn this into a helper method with kwargs / default values to generate these through all the tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Method Returns Incorrect Type
The extract_value
method is annotated to return int
, but it returns a dict
when values.get("value")
is None
. This violates the type contract, leading to runtime errors for code expecting an integer.
src/sentry/incidents/grouptype.py#L101-L107
sentry/src/sentry/incidents/grouptype.py
Lines 101 to 107 in 7c6773e
def extract_value(self, data_packet: DataPacket[QuerySubscriptionUpdate]) -> int: | |
# this is a bit of a hack - anomaly detection data packets send extra data we need to pass along | |
values = data_packet.packet["values"] | |
if values.get("value") is not None: | |
return values.get("value") | |
return values |
Bug: Function Returns Falsy Object Instead of Boolean
The function get_anomaly_evaluation_from_workflow_engine
has an inconsistent return type. When an anomaly evaluation is found but evaluates as falsy (e.g., an empty DetectorEvaluationResult
object), the function incorrectly returns this object instead of False
. This violates the bool | None
expectation of the calling code in subscription_processor.py
, causing an AssertionError
where isinstance(is_anomalous, bool)
fails.
src/sentry/seer/anomaly_detection/get_historical_anomalies.py#L45-L58
sentry/src/sentry/seer/anomaly_detection/get_historical_anomalies.py
Lines 45 to 58 in 7c6773e
def get_anomaly_evaluation_from_workflow_engine( | |
detector: Detector, | |
data_packet_processing_results: list[ | |
tuple[Detector, dict[DetectorGroupKey, DetectorEvaluationResult]] | |
], | |
) -> bool | DetectorEvaluationResult | None: | |
evaluation = None | |
for result in data_packet_processing_results: | |
if result[0] == detector: | |
evaluation = result[1].get("values") | |
if evaluation: | |
return evaluation.priority == DetectorPriorityLevel.HIGH | |
return evaluation |
src/sentry/incidents/subscription_processor.py#L452-L458
sentry/src/sentry/incidents/subscription_processor.py
Lines 452 to 458 in 7c6773e
is_anomalous = get_anomaly_evaluation_from_workflow_engine(detector, results) | |
if is_anomalous is None: | |
# we only care about True and False — None indicates no change | |
continue | |
assert isinstance(is_anomalous, bool) |
Was this report helpful? Give feedback by reacting with 👍 or 👎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to add the types and improve the type safety here, but can also do as a follow up etc.
def extract_value(self, data_packet: DataPacket[QuerySubscriptionUpdate]) -> int: | ||
return data_packet.packet["values"]["value"] | ||
# this is a bit of a hack - anomaly detection data packets send extra data we need to pass along | ||
values = data_packet.packet["values"] | ||
if values.get("value") is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 🤔 it might be a little safer to do this using types; https://github.com/getsentry/sentry/pull/93851/files#r2170337582
would let us do if isintance(data_packet, SeerDataPacket)
kind of thing instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool I'll do a follow up just to get this out 👍
Suspect IssuesThis pull request was deployed and Sentry observed the following issues: Did you find this useful? React with a 👍 or 👎 |
Disables dual processing dynamic alerts until we can fix https://sentry.sentry.io/issues/6713407596/?environment=prod&project=1&query=%21level%3Ainfo%20%21level%3Awarning&referrer=issue-stream&stream_index=0 which is pausing deploys. Because I merged #94574 after merging #93851 I can't safely revert it.
Enable processing dynamic data conditions through workflow engine.