Skip to content

feat(ACI): Allow processing dynamic data conditions #93851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 27, 2025
Merged

Conversation

ceorourke
Copy link
Member

Enable processing dynamic data conditions through workflow engine.

@ceorourke ceorourke requested a review from a team as a code owner June 18, 2025 19:52
@ceorourke ceorourke requested review from a team and mifu67 June 18, 2025 19:52
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jun 18, 2025
@ceorourke ceorourke marked this pull request as draft June 18, 2025 19:54
Copy link

codecov bot commented Jun 18, 2025

Codecov Report

Attention: Patch coverage is 99.53488% with 1 line in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/sentry/incidents/subscription_processor.py 97.36% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #93851      +/-   ##
==========================================
+ Coverage   87.97%   88.02%   +0.04%     
==========================================
  Files       10364    10381      +17     
  Lines      599531   601651    +2120     
  Branches    23306    23306              
==========================================
+ Hits       527435   529596    +2161     
+ Misses      71627    71586      -41     
  Partials      469      469              

Copy link
Contributor

@mifu67 mifu67 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Picking up work on tests, but adding this review to indicate that I've seen your latest changes.

else:
self.trigger_resolve_counts[trigger.id] = 0
is_anomalous = has_anomaly(potential_anomaly, trigger.label)
fired_incident_triggers = self.handle_trigger_anomalies(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very clean, nice!

Comment on lines +103 to +107
# this is a bit of a hack - anomaly detection data packets send extra data we need to pass along
values = data_packet.packet["values"]
if values.get("value") is not None:
return values.get("value")
return values
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this solution but here's what's happening:

We need to send additional data in an anomaly detection data packet so instead of passing:

values={"value": aggregation_value}

as we do for a regular metric issue, we pass:

  values={
      "values": {
          "value": aggregation_value,
          "source_id": str(self.subscription.id),
          "subscription_id": subscription_update["subscription_id"],
          "timestamp": self.last_update,
      },
  },

this works great for anomaly detection, but if we modify the data packet for a static or dynamic metric issue like:

  values={
      "values": {
          "value": aggregation_value,
      },
  },

we'll run into an issue here because value is nested inside of a dictionary.

The workaround here extracts the correct value for a regular metric issue and passes along the entire dict for dynamic ones as needed, but it feels a little hacky. I haven't yet come up with a better solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is too hacky tbh. i think the wonkiness is coming from us having different data packets for a single detector. if we want to improve this, i think it'd be like introducing a new kind of detector, and each one handles their own data packet formats.

@ceorourke ceorourke marked this pull request as ready for review June 26, 2025 20:31
@ceorourke ceorourke requested a review from a team as a code owner June 26, 2025 20:31
@ceorourke ceorourke requested review from a team and mifu67 and removed request for a team June 26, 2025 20:31
Copy link
Contributor

@saponifi3d saponifi3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high level i think this looks great -- maps the data packet correctly and should solve the issue. we could probably improve the type safety a bit by providing a new type to the data condition handler, and extra credit for the union type.

Comment on lines +103 to +107
# this is a bit of a hack - anomaly detection data packets send extra data we need to pass along
values = data_packet.packet["values"]
if values.get("value") is not None:
return values.get("value")
return values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is too hacky tbh. i think the wonkiness is coming from us having different data packets for a single detector. if we want to improve this, i think it'd be like introducing a new kind of detector, and each one handles their own data packet formats.


if has_anomaly and not trigger_matches_status:
metrics.incr(
"incidents.alert_rules.threshold.alert",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious - do you have any plans to make a new data dog dashboard for incidents? It could be a cool use case to have the workflow engine powerpack + all these metrics on a single dashboard.

timestamp=self.last_update,
)
if self.alert_rule.detection_type == AlertRuleDetectionType.DYNAMIC:
packet = QuerySubscriptionUpdate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 should we create a second data packet type definition for this? Then we can do something like: MetricIssueDataPacket = QuerySubscriptionUpdate | AnomalyDetectionUpdate

@with_feature("organizations:anomaly-detection-alerts")
@with_feature("organizations:anomaly-detection-rollout")
@with_feature("organizations:workflow-engine-metric-alert-processing")
def test_seer_call_dual_processing(self, mock_seer_request: MagicMock):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we decompose this test into multiple bits? it's a bit hard for me to tell how this is testing seer calls -- maybe we could pull stuff out into setup methods and verify_deserialized_body into a method to DRY it a little?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I broke this out into 3 tests but the next PR I want to work on is breaking this gigantic test file out into like 4 files and refactoring the workflow engine parts https://getsentry.atlassian.net/browse/ACI-392 so I'm gonna punt cleaning it up in this one.

Comment on lines +738 to +747
seer_return_value_3: DetectAnomaliesResponse = {
"success": True,
"timeseries": [
{
"anomaly": {"anomaly_score": 0.5, "anomaly_type": AnomalyType.NONE.value},
"timestamp": 1,
"value": 1,
}
],
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might cool to turn this into a helper method with kwargs / default values to generate these through all the tests

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Method Returns Incorrect Type

The extract_value method is annotated to return int, but it returns a dict when values.get("value") is None. This violates the type contract, leading to runtime errors for code expecting an integer.

src/sentry/incidents/grouptype.py#L101-L107

def extract_value(self, data_packet: DataPacket[QuerySubscriptionUpdate]) -> int:
# this is a bit of a hack - anomaly detection data packets send extra data we need to pass along
values = data_packet.packet["values"]
if values.get("value") is not None:
return values.get("value")
return values

Fix in Cursor


Bug: Function Returns Falsy Object Instead of Boolean

The function get_anomaly_evaluation_from_workflow_engine has an inconsistent return type. When an anomaly evaluation is found but evaluates as falsy (e.g., an empty DetectorEvaluationResult object), the function incorrectly returns this object instead of False. This violates the bool | None expectation of the calling code in subscription_processor.py, causing an AssertionError where isinstance(is_anomalous, bool) fails.

src/sentry/seer/anomaly_detection/get_historical_anomalies.py#L45-L58

def get_anomaly_evaluation_from_workflow_engine(
detector: Detector,
data_packet_processing_results: list[
tuple[Detector, dict[DetectorGroupKey, DetectorEvaluationResult]]
],
) -> bool | DetectorEvaluationResult | None:
evaluation = None
for result in data_packet_processing_results:
if result[0] == detector:
evaluation = result[1].get("values")
if evaluation:
return evaluation.priority == DetectorPriorityLevel.HIGH
return evaluation

src/sentry/incidents/subscription_processor.py#L452-L458

is_anomalous = get_anomaly_evaluation_from_workflow_engine(detector, results)
if is_anomalous is None:
# we only care about True and False — None indicates no change
continue
assert isinstance(is_anomalous, bool)

Fix in Cursor


Was this report helpful? Give feedback by reacting with 👍 or 👎

@ceorourke ceorourke requested a review from saponifi3d June 27, 2025 19:42
Copy link
Contributor

@saponifi3d saponifi3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to add the types and improve the type safety here, but can also do as a follow up etc.

Comment on lines 102 to +105
def extract_value(self, data_packet: DataPacket[QuerySubscriptionUpdate]) -> int:
return data_packet.packet["values"]["value"]
# this is a bit of a hack - anomaly detection data packets send extra data we need to pass along
values = data_packet.packet["values"]
if values.get("value") is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 🤔 it might be a little safer to do this using types; https://github.com/getsentry/sentry/pull/93851/files#r2170337582

would let us do if isintance(data_packet, SeerDataPacket) kind of thing instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool I'll do a follow up just to get this out 👍

@ceorourke ceorourke merged commit 3097b96 into master Jun 27, 2025
65 checks passed
@ceorourke ceorourke deleted the ceorourke/ACI-368 branch June 27, 2025 20:03
Copy link

sentry-io bot commented Jun 27, 2025

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

Did you find this useful? React with a 👍 or 👎

ceorourke added a commit that referenced this pull request Jun 27, 2025
Disables dual processing dynamic alerts until we can fix
https://sentry.sentry.io/issues/6713407596/?environment=prod&project=1&query=%21level%3Ainfo%20%21level%3Awarning&referrer=issue-stream&stream_index=0
which is pausing deploys.

Because I merged #94574 after
merging #93851 I can't safely
revert it.
@github-actions github-actions bot locked and limited conversation to collaborators Jul 13, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants