Skip to content

Conversation

@jchrostek-dd
Copy link
Contributor

Task

https://datadoghq.atlassian.net/browse/SVLS-7836?atlOrigin=eyJpIjoiMWNmZTMzOGE4NGEwNDE4MTk5Njk0N2ZmMmU3MzExMjgiLCJwIjoiaiJ9

Overview

The extension neither creates SnapStart spans nor emits SnapStart metrics. This PR adds both.

When a lambda with snapshot enabled is invoked for the first time, we get Platform.RestoreStart and Platform.RestoreReport. These effectively take the place of Platform.InitStart and Platform.InitReport events, so our code flow is pretty much identical to how we handle the cold start span and duration metric.

Note - When a SnapStart instance is restored, we actually receive the Platform.InitStart and Platform.InitReport events in addition to the Platform.RestoreStart and Platform.RestoreReport. However, the Init events are not from the sandbox starting for that invoke. These Init events are actually generated from when the Snapshot is created. This is very misleading - You can see that this trace is more than 3 hours long. The lambda was invoked more than 3 hours after the snapshot version was created. (This is the current experience).

Testing

I deployed my own extension with the changes and confirmed we are now getting a restore span and not an init span, link.

@jchrostek-dd jchrostek-dd requested a review from a team as a code owner October 30, 2025 21:54
pub duration_ms: f64,
}

/// Restore report metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test coverage for new struct and new field above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@litianningdatadog
Copy link
Contributor

besides the unit tests, can we consider adding tests in bottlecap/tests/ to verify:
- SnapStart events create restore spans
- Restore spans are prioritized over cold start spans
- Metrics are correctly emitted

{
error!("Failed to send platform restore report to processor: {}", e);
}
}
Copy link
Contributor

@litianningdatadog litianningdatadog Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we log error in case metrics is None as it may indicate malformed data?

Copy link
Contributor

@litianningdatadog litianningdatadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.

@jchrostek-dd jchrostek-dd merged commit 290016e into main Nov 6, 2025
39 checks passed
@jchrostek-dd jchrostek-dd deleted the john/snapstart-span branch November 6, 2025 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants