Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUM-1040 Stop Core Instance #1541

Merged
merged 8 commits into from
Jan 15, 2024
Merged

Conversation

maxep
Copy link
Member

@maxep maxep commented Nov 7, 2023

What and why?

Add public interface for stopping core instance of the SDK. Stopping the core will deallocate all Features and their storage & upload units.

How?

In parity with Android, the stopInstance static method will stop and unregister all feature of the core. Their storage and upload process will be deallocated so no event-write-context and upload will execute.

Note

This change was a good opportunity to streamline flushing mechanism of the core (e.g. #1522). But the number of GDC queues involved in the core and Features made the task very tedious with high risk of introducing flakiness. I believe we should continue decreasing the number of queues, by using exclusion-queue as described in WWDC 2017 - Modernizing Grand Central Dispatch Usage, with clear and efficient queue hierarchy. We should do so after building continuous benchmark to make sure we don't decrease performances.

Review checklist

  • Feature or bugfix MUST have appropriate tests (unit, integration)
  • Make sure each commit and the PR mention the Issue number or JIRA reference
  • Add CHANGELOG entry for user facing changes

Custom CI job configuration (optional)

  • Run unit tests for Core, RUM, Trace, Logs, CR and WVT
  • Run unit tests for Session Replay
  • Run integration tests
  • Run smoke tests
  • Run tests for tools/

@maxep maxep self-assigned this Nov 7, 2023
@maxep maxep requested a review from a team as a code owner November 7, 2023 12:57
@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Nov 7, 2023

Datadog Report

Branch report: maxep/RUM-1040/stop-core-instance
Commit report: 451d541

dd-sdk-ios: 0 Failed, 0 New Flaky, 2415 Passed, 0 Skipped, 4m 44.84s Wall Time

maciejburda
maciejburda previously approved these changes Nov 8, 2023
Copy link
Contributor

@maciejburda maciejburda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@maxep maxep marked this pull request as draft November 13, 2023 09:34
@maxep maxep force-pushed the maxep/RUM-1040/stop-core-instance branch from 451d541 to a768e3e Compare November 15, 2023 10:00
@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Nov 15, 2023

Datadog Report

Branch report: maxep/RUM-1040/stop-core-instance
Commit report: a768e3e

dd-sdk-ios: 0 Failed, 0 New Flaky, 2419 Passed, 0 Skipped, 4m 49.62s Wall Time

@maxep maxep force-pushed the maxep/RUM-1040/stop-core-instance branch from a768e3e to 4774bc5 Compare January 8, 2024 16:28
@maxep maxep marked this pull request as ready for review January 8, 2024 16:29
@maxep maxep requested a review from a team as a code owner January 8, 2024 16:29
@maxep maxep force-pushed the maxep/RUM-1040/stop-core-instance branch 2 times, most recently from 9534e0c to 33373b7 Compare January 9, 2024 18:37
@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Jan 9, 2024

Datadog Report

Branch report: maxep/RUM-1040/stop-core-instance
Commit report: 6066c10
Test service: dd-sdk-ios

✅ 0 Failed, 2545 Passed, 0 Skipped, 25m 10s Wall Time
🔻 Test Sessions change in coverage: 6 decreased, 5 increased, 3 no change

🔻 Code Coverage Decreases vs Default Branch (6)

This report shows up to 5 code coverage decreases.

  • test DatadogLogsTests iOS 44.74% (Δ-0.15%) - Details
  • test DatadogLogsTests tvOS 44.80% (Δ-0.15%) - Details
  • test DatadogInternalTests tvOS 81.21% (Δ-0.06%) - Details
  • test DatadogWebViewTrackingTests iOS 23.05% (Δ-0.02%) - Details
  • test DatadogCrashReportingTests tvOS 29.03% (Δ-0.02%) - Details

@maxep maxep force-pushed the maxep/RUM-1040/stop-core-instance branch from 33373b7 to bbbe421 Compare January 10, 2024 09:12
@maxep
Copy link
Member Author

maxep commented Jan 10, 2024

This PR is finally ready for review, it has increased in size due to the added integration tests.

Comment on lines +50 to +57
def do_DELETE(self):
"""
Routes all incoming DELETE requests
"""
self.__route([
(r"/requests$", self.__DELETE_requests),
])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding this endpoint to remove recorded requests so we can clear the mock server before restarting the core instance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👌

@@ -39,11 +39,6 @@ private class CompositedURLSessionDelegate: NSObject, URLSessionTaskDelegate, UR
}
}

/// An example of instrumenting existing `URLSessionDelegate` with `DDURLSessionDelegate` through inheritance.
private class CustomURLSessionDelegate: NSObject, URLSessionDataDelegate {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving to its own file with internal accessor for re-use in stop-core-instance test.

@maxep maxep force-pushed the maxep/RUM-1040/stop-core-instance branch from bbbe421 to 495c05b Compare January 10, 2024 11:07
CHANGELOG.md Outdated Show resolved Hide resolved
@@ -489,13 +497,10 @@ public enum Datadog {
#endif

internal static func internalFlushAndDeinitialize(instanceName: String = CoreRegistry.defaultInstanceName) {
assert(CoreRegistry.instance(named: instanceName) is DatadogCore, "SDK must be first initialized.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no assert anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember why I removed it TBH! probably in an attempt to flush before stopping the core. Now, I don't see why we assert initialisation here 🤔

DatadogCore/Sources/Datadog.swift Show resolved Hide resolved
@maxep maxep force-pushed the maxep/RUM-1040/stop-core-instance branch from 495c05b to 1f31222 Compare January 10, 2024 12:56
Copy link
Contributor

@ganeshnj ganeshnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, can we do some testing that no objects are being leaked when the core instance is stopped. Ideally we want to have memory graph something like

     ...................
..../                   \......
 start               stop

Comment on lines 264 to 318
scope.eventWriteContext { context, writer in
writer.write(value: FeatureMock.Event(event: "should not be sent"))
}

core.stop()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question/ This looks flaky ❄️ , doesn't it? If it takes more than "min file age" from writing an event to stopping the core, data will be uploaded. What is the rationale here?

Copy link
Member Author

@maxep maxep Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I can remove the first EWC. Rationale was to assert that no upload is triggered after stopping the core, even with event in the pipe.

Comment on lines +50 to +57
def do_DELETE(self):
"""
Routes all incoming DELETE requests
"""
self.__route([
(r"/requests$", self.__DELETE_requests),
])

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👌

@ncreated
Copy link
Collaborator

ncreated commented Jan 11, 2024

This looks good, can we do some testing that no objects are being leaked when the core instance is stopped. Ideally we want to have memory graph something like

     ...................
..../                   \......
 start               stop

I agree with @ganeshnj ☝️ that we must test SDK allocations. However, I don't think the stop() model proposed in this PR delivers this promise. It is rather "hard stop" implementation that only deallocates DatadogCore instance without awaiting pending asynchronous operations that may be still keeping it. Correct me if I'm wrong @maxep

@maxep maxep force-pushed the maxep/RUM-1040/stop-core-instance branch from 27efbdc to 3324a70 Compare January 11, 2024 13:12
@maxep
Copy link
Member Author

maxep commented Jan 11, 2024

@ncreated, @ganeshnj That's a good call!
It is indeed a hard stop that deallocates the core and its features, it does not wait for async operations (same in Android btw), see the note in the PR description. I will run some profiling 👍

Copy link
Collaborator

@ncreated ncreated left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done 👌. Great work on fixing the memory profile 💪.

Comment on lines +311 to +312
let core = CoreRegistry.unregisterInstance(named: instanceName) as? DatadogCore
core?.stop()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question/ Why not making stop() a requirement in DatadogCoreProtocol? I have no opinion on this (both may be fine), just wondering on what was the rationale.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stop is only required from high level apis, there is no usage from Features so there no need for making it part of the protocol. The requirement might change in the future tho!

@maxep maxep merged commit 6b0928a into develop Jan 15, 2024
9 checks passed
@maxep maxep deleted the maxep/RUM-1040/stop-core-instance branch January 15, 2024 11:18
@ncreated ncreated mentioned this pull request Jan 25, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants