Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUM-1837 Update logic to send N batches sequentially in each cycle #1531

Merged
merged 30 commits into from
Nov 17, 2023

Conversation

maciejburda
Copy link
Contributor

@maciejburda maciejburda commented Oct 25, 2023

What and why?

Modifies the core logic to get a list of batches instead of single batch. It's done through a new configuration called BatchProcessingLevel that allows controlling the amount of batches processed sequentially without a delay within one reading/uploading cycle. Currently it exposed 3 levels: low, medium and high that translate to 1, 10 and 100 of batches processed. By default it's taking up to 10 batches in a cycle.

This logic improves the data upload when batch back pressure occurs.

How?

Configuration is delivered to DataUploadWorker which implementation was modified to read up to X batches instead of single batch.

Review checklist

  • Feature or bugfix MUST have appropriate tests (unit, integration)
  • Make sure each commit and the PR mention the Issue number or JIRA reference
  • Add CHANGELOG entry for user facing changes

Custom CI job configuration (optional)

  • Run unit tests for Core, RUM, Trace, Logs, CR and WVT
  • Run unit tests for Session Replay
  • Run integration tests
  • Run smoke tests
  • Run tests for tools/

@maciejburda maciejburda force-pushed the maciey/RUM-1837-batch-uploading-modification branch 3 times, most recently from 804a108 to dceec59 Compare October 30, 2023 11:23
@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Oct 30, 2023

Datadog Report

Branch report: maciey/RUM-1837-batch-uploading-modification
Commit report: 6f1660e

dd-sdk-ios: 1 Failed (0 Known Flaky), 0 New Flaky, 2705 Passed, 0 Skipped, 25m 13.94s Wall Time

❌ Failed Tests (1)

  • testRUMStopSessionScenario - RUMStopSessionScenarioTests

@maciejburda maciejburda marked this pull request as ready for review October 30, 2023 11:59
@maciejburda maciejburda requested a review from a team as a code owner October 30, 2023 11:59
Copy link
Collaborator

@ncreated ncreated left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this improvement be measured? During backlog grooming we discussed adding "batch processing level" to "Batch Deleted" telemetry, so we can correlate batch delivery with certain configuration. This part of definition of done seems lacking.

CHANGELOG.md Outdated Show resolved Hide resolved
@maciejburda maciejburda force-pushed the maciey/RUM-1837-batch-uploading-modification branch from c69045b to 53b36cc Compare November 6, 2023 11:47
@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Nov 6, 2023

Datadog Report

Branch report: maciey/RUM-1837-batch-uploading-modification
Commit report: 49400c8

dd-sdk-ios: 0 Failed, 0 New Flaky, 2708 Passed, 0 Skipped, 27m 2.58s Wall Time

Copy link
Member

@maxep maxep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I see room for improvement in perfs: our DataUploader uses a semaphore which will be blocking the thread. So I think we should balance the load instead of uploading in a loop. I left a suggestion in comment, LMKWYT!

DatadogCore/Sources/Core/Storage/Reading/Reader.swift Outdated Show resolved Hide resolved
DatadogCore/Sources/Core/Upload/DataUploadWorker.swift Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
DatadogCore/Sources/Core/Storage/Reading/Reader.swift Outdated Show resolved Hide resolved
DatadogCore/Sources/Core/Upload/DataUploadWorker.swift Outdated Show resolved Hide resolved
DatadogCore/Sources/Core/Upload/DataUploadWorker.swift Outdated Show resolved Hide resolved
@maciejburda
Copy link
Contributor Author

maciejburda commented Nov 10, 2023

@maxep @ncreated
I've made RAM tests using log pressure example from:
#185

10 batches:
10-batches

develop:
develop

Since files are symlinks turned into data just before upload synchronously - we just see extended allocation, but it's not a spike. In my tests it was way faster to upload everything when using 10-batches - which is a goal of the PR.

Using multiple DispatchWorkItem will be cumbersome as we rely on plenty of different states. I actually explored this solution before jumping into a loop, and I would advice strongly against it. Blocking the thread is in my opinion fine. DataUploadWorker was designed in a way where it's in control of it's own work and since DataUploader is using semaphore it's playing nicely together in a synchronous fashion. Note that we only schedule next DispatchWorkItem after we finish current one.

I think that there is a potential in refactoring this into a OperationQueue, but it will become quite a big change. I think it's better iterate on existing setup first.

@maciejburda maciejburda requested a review from a team as a code owner November 13, 2023 17:36
@maciejburda
Copy link
Contributor Author

maciejburda commented Nov 13, 2023

@maxep @ncreated I refactored the code to use multiple DispatchWorkItem maintaining all the designed behaviours. I added ability to cancel, which can come in handy when we want to restart the flow under certain conditions (i.e. when app goes into background).

Please take another look 🙏

@ncreated ncreated self-requested a review November 14, 2023 13:12
@maciejburda maciejburda force-pushed the maciey/RUM-1837-batch-uploading-modification branch from 4104f67 to daa7235 Compare November 14, 2023 14:24
@maciejburda maciejburda changed the title RUM-1837 Update logic to send 10 batches sequentially in each cycle RUM-1837 Update logic to send N batches sequentially in each cycle Nov 14, 2023
@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Nov 14, 2023

Datadog Report

Branch report: maciey/RUM-1837-batch-uploading-modification
Commit report: af5ac06

❄️ dd-sdk-ios: 1 Failed (0 Known Flaky), 1 New Flaky, 2355 Passed, 0 Skipped, 4m 56.32s Wall Time

❌ Failed Tests (1)

  • testWhenCPUUnderHeavyLoadItMeasuresHigherCPUTicks - VitalCPUReaderTest - Details

    Expand for error
     Assertion Failure at VitalCPUReaderTests.swift:24: XCTAssertGreaterThan failed: ("188.20427311010408") is not greater than ("206.39051796067693")
    

New Flaky Tests (1)

  • testWhenDataIsBeingUploaded_itPrintsUploadProgressInformation - DataUploadWorkerTests - Last Failure

    Expand for error
     
     Assertion Failure at DataUploadWorkerTests.swift:375: XCTAssertEqual failed: ("3") is not equal to ("2")
     Assertion Failure at DataUploadWorkerTests.swift:377: XCTAssertEqual failed: ("NTP time synchronization completed.
     Server time will be used for signing events (0.015s difference with device time).") is not equal to ("⏳ (DlKtZ0nALW) Uploading batches...") - Batch start information should be printed to \`userLogger\`. All captured logs:
     [(level: DatadogInternal.CoreLoggerLevel.debug, message: "NTP time synchronization completed.\nServer time will be used for signing events (0.015s difference with device time).", error: nil), (level: DatadogInternal.CoreLoggerLevel.debug, message: "⏳ (DlKtZ0nALW) Uploading batches...", error: nil), (level: DatadogInternal.CoreLoggerLevel.debug, message: "   → (DlKtZ0nALW) not delivered, will be retransmitted: xeKlmUsPWx", error: nil)]
     Assertion Failure at DataUploadWorkerTests.swift:383: XCTAssertEqual failed: ("⏳ (DlKtZ0nALW) Uploading batches...") is not equal to ("   → (DlKtZ0nALW) not delivered, will be retransmitted: xeKlmUsPWx") - Batch completion information should be printed to \`userLogger\`. All captured logs:
     [(level: DatadogInternal.CoreLoggerLevel.debug, message: "NTP time synchronization completed.\nServer time will be used for signing events (0.015s difference with device time).", error: nil), (level: DatadogInternal.CoreLoggerLevel.debug, message: "⏳ (DlKtZ0nALW) Uploading batches...", error: nil), (level: DatadogInternal.CoreLoggerLevel.debug, message: "   → (DlKtZ0nALW) not delivered, will be retransmitted: xeKlmUsPWx", error: nil)]
    

Copy link
Member

@maxep maxep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @maciejburda 👏
I've left one minor comment, but it looks great!

DatadogCore/Sources/Core/Upload/DataUploadWorker.swift Outdated Show resolved Hide resolved
maxep
maxep previously approved these changes Nov 16, 2023
Copy link
Member

@maxep maxep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the updates 🙏

@maciejburda maciejburda merged commit 7ee772b into develop Nov 17, 2023
21 checks passed
@maciejburda maciejburda deleted the maciey/RUM-1837-batch-uploading-modification branch November 17, 2023 11:02
@maciejburda maciejburda mentioned this pull request Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants