Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Fixes encoded strings performance #2056

Merged
merged 5 commits into from
Dec 17, 2020

Conversation

fnarenji
Copy link
Contributor

@fnarenji fnarenji commented Dec 10, 2020

Pull Request Template

Description

This PR optimizes encoded string performance by:

  • improve the algorithm that identifies the charset to remove redundant loops in TryEncodeCompressedString
  • keep track of the offsets at which string reference indexes are stored to avoid a full-pass over all values of the document (& cuts allocations by 50%)

Before:

Method Serializer EnableEncodedStrings Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Write Text False 126.0 ms 2.44 ms 2.81 ms 1000.0000 - - 75.67 MB
Write Text True 121.8 ms 2.28 ms 2.02 ms 1000.0000 - - 75.67 MB
Write BinaryNaive False 241.4 ms 1.64 ms 1.37 ms 1000.0000 - - 82.24 MB
Write BinaryNaive True 515.9 ms 5.00 ms 4.43 ms 3000.0000 - - 183.48 MB
Write BinaryBlitting False 150.8 ms 3.01 ms 7.21 ms 1000.0000 - - 71.71 MB
Write BinaryBlitting True 415.8 ms 3.27 ms 3.06 ms 3000.0000 - - 172.95 MB

After:

Method Serializer EnableEncodedStrings Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
Write Text False 123.2 ms 1.96 ms 1.74 ms 122.7 ms 1000.0000 - - 75.67 MB
Write Text True 121.3 ms 0.78 ms 0.65 ms 121.0 ms 1000.0000 - - 75.67 MB
Write BinaryNaive False 249.7 ms 4.97 ms 8.30 ms 245.8 ms 1000.0000 - - 86.15 MB
Write BinaryNaive True 278.5 ms 4.75 ms 4.45 ms 278.4 ms 1000.0000 - - 99.1 MB
Write BinaryBlitting False 147.2 ms 2.89 ms 4.67 ms 146.7 ms 1000.0000 - - 73.54 MB
Write BinaryBlitting True 178.1 ms 3.51 ms 5.14 ms 177.8 ms 1000.0000 - - 86.49 MB

BinaryNaive exclusively uses public SDK primitives, whereas BinaryBlitting uses the typed JSON shortcuts. The EnableEncodedStrings has no impact on Text run.

Benchmark Updates

Text Navigator to Binary Writer

Before
Method payload sourceFormat destinationFormat Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
NavigatorToWriter NutritionData Text Binary 34,552.14 us 619.581 us 579.557 us 1000.0000 333.3333 - 5572.44 KB
After
Method payload sourceFormat destinationFormat Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
NavigatorToWriter NutritionData Text Binary 24,779.95 us 515.386 us 755.445 us 875.0000 500.0000 125.0000 5053.33 KB

30% reduction in time and 10% reduction in memory

Binary String Write MicroBenchmark

Before
Method namedWriteDelegate benchmarkSerializationFormat Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ExecuteWriteMicroBenchmark string Binary 1,023.25 ms 9.114 ms 8.525 ms 35000.0000 2000.0000 2000.0000 145.39 MB
After
Method namedWriteDelegate benchmarkSerializationFormat Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ExecuteWriteMicroBenchmark string Binary 452.70 ms 8.968 ms 15.229 ms 36000.0000 2000.0000 2000.0000 161.39 MB

half the time

Type of change

Closing issues

To automatically close an issue: closes #IssueNumber

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the required format: "[Internal] Category: (Adds|Fixes|Refactors) Description"

Examples:
Diagnostics: Adds GetElapsedClientLatency to CosmosDiagnostics
PartitionKey: Fixes null reference when using default(PartitionKey)
[v4] Client Encryption: Refactors code to external project
[Internal] Query: Adds code generator for CosmosNumbers for easy additions in the future.

@fnarenji fnarenji force-pushed the users/flnarenj/m/encstrperf branch 2 times, most recently from 2588775 to f654b42 Compare December 10, 2020 08:29
@fnarenji fnarenji changed the title Internal Performance: adds performance improvement to encoded strings Internal Performance: Adds performance improvement to encoded strings Dec 10, 2020
@fnarenji fnarenji changed the title Internal Performance: Adds performance improvement to encoded strings Internal Performance: Fixes encoded strings performance Dec 10, 2020
bchong95
bchong95 previously approved these changes Dec 16, 2020
Copy link
Contributor

@bchong95 bchong95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks good to me. Once I get back to a computer I will run the benchmarks on my machine and update the baselines.

Thanks for making the change :)

@ealsur
Copy link
Member

ealsur commented Dec 16, 2020

This is not Internal though, right? If it would benefit end users, so can we change the title?

@fnarenji fnarenji changed the title Internal Performance: Fixes encoded strings performance Performance: Fixes encoded strings performance Dec 16, 2020
@bchong95
Copy link
Contributor

This is not Internal though, right? If it would benefit end users, so can we change the title?

No user uses binary writer, so it's still internal.

Copy link
Contributor

@bchong95 bchong95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the change. Very excited for the 2x speed up on the binary string write path!

@sboshra sboshra merged commit 9415200 into master Dec 17, 2020
@sboshra sboshra deleted the users/flnarenj/m/encstrperf branch December 17, 2020 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants