Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use +@ instead of dup for duplicating strings #2704

Merged
merged 3 commits into from
Apr 11, 2023

Conversation

nirebu
Copy link
Contributor

@nirebu nirebu commented Mar 21, 2023

String#+@ was introduced in Ruby 2.3: it is faster and a bit cheaper on
the memory side as well than #dup. However, to retain the helper contract
we need to return "thawed" copies of the original values, so we need to
first call String#-@ to return a frozen copy, then call String#+@ to
effectively have an unfrozen copy of the original value.

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  gem "benchmark-ips"
  gem "benchmark-memory"
end

Benchmark.ips do |x|
  x.report("+(-@)") { v = "foo"; +(-v) }
  x.report("dup") { v = "foo"; v.dup }

  x.compare!
end

Benchmark.memory do |x|
  x.report("+(-@)") { v = "foo"; +(-v) }
  x.report("dup") { v = "foo"; v.dup }

  x.compare!
end
Warming up --------------------------------------
               +(-@)   781.717k i/100ms
                 dup   711.791k i/100ms
Calculating -------------------------------------
               +(-@)      7.801M (± 0.8%) i/s -     39.086M in   5.010682s
                 dup      7.138M (± 0.7%) i/s -     36.301M in   5.085850s

Comparison:
               +(-@):  7800971.1 i/s
                 dup:  7138065.1 i/s - 1.09x  slower

Calculating -------------------------------------
               +(-@)    80.000  memsize (     0.000  retained)
                         2.000  objects (     0.000  retained)
                         1.000  strings (     0.000  retained)
                 dup    80.000  memsize (     0.000  retained)
                         2.000  objects (     0.000  retained)
                         1.000  strings (     0.000  retained)

Comparison:
               +(-@):         80 allocated
                 dup:         80 allocated - same

What does this PR do?
Changes how the gem duplicates frozen strings, in favor of a more performant method.

Motivation
I was benchmarking a client application and found the modified line as a source of retained strings. Even if I couldn't replicate the issue in the synthetic benchmark, this change reduced the number of retained strings coming from that allocation.

Additional Notes
Additional ref: rails/rails@1b86d90

How to test the change?

Please find above the code for benchmarking the change.

@nirebu nirebu requested a review from a team March 21, 2023 07:47
@github-actions github-actions bot added the core Involves Datadog core libraries label Mar 21, 2023
@TonyCTHsu
Copy link
Contributor

👋 @nirebu , I am aware that In Ruby 2.3 or later unary plus operator is faster than String#dup is documented. However, this would break our support policy for Ruby 2.1/2.2.

Our team is currently planning the deprecation for those old rubies, perhaps we would introduce the changes by then

@nirebu
Copy link
Contributor Author

nirebu commented Mar 21, 2023

@TonyCTHsu thanks for getting back. Totally agree with you, judging by the broken tests I can guess this is not the right time for this.

Closing this then :) Will get back to it in the future!

@nirebu nirebu closed this Mar 21, 2023
@marcotc
Copy link
Member

marcotc commented Mar 24, 2023

@nirebu, given this is a hot code path, could you add a guard around old ruby versions and actually implement this for newer rubies? Something like:

# Introduced in Ruby 2.3
if String.method_defined?(:+@)
  def self.frozen_or_dup(v)
    v.frozen? ? v : +v
  end
else
  def self.frozen_or_dup(v)
    v.frozen? ? v : v.dup
  end
end

@marcotc marcotc reopened this Mar 24, 2023
@marcotc marcotc added the performance Involves performance (e.g. CPU, memory, etc) label Mar 24, 2023
@nirebu
Copy link
Contributor Author

nirebu commented Mar 24, 2023

@nirebu, given this is a hot code path, could you add a guard around old ruby versions and actually implement this for newer rubies?

@marcotc It's a hot code path indeed, and thanks for the suggestion. I can take care of it.

I will take a look at the failing specs as well.

@nirebu nirebu changed the title Use @+ instead of dup for duplicating strings Use +@ instead of dup for duplicating strings Mar 26, 2023
@nirebu
Copy link
Contributor Author

nirebu commented Mar 26, 2023

@marcotc I've updated my PR with your suggestion, however the failing specs tell a different story: it seems that it is mandatory to have a different object_id on the frozen_or_dup return value.

Given how String#+@ returns self if the string is not frozen, and if this behaviour is desirable, the specs should be tweaked as well to just check the value instead of be_a_frozen_copy_of, which in turns expects different object_ids.

I have yet to dig in how these Strings are used downstream, but it seems that we want different objects given this comment.

EDIT: I've read #1783 's discussion, and it seems that changing the behaviour to use +@ is a breaking change. As far as I understood, users are able to modify spans, and having non-copies of those objects could lead to some undesirable side effects. I say this because we would eventually call freeze on something that the library doesn't expect to become frozen if it wasn't in the first place.

@marcotc
Copy link
Member

marcotc commented Mar 27, 2023

Hey, @nirebu you are completely right, and sorry for us not catching this earlier.

We could technically use -@ to return a frozen copy of the String or itself if already frozen: it would provide us the same level of confidence that the String value won't change on us during processing.
This is a semantic change, but if all tests pass it might be promising.

One issue might be related to users of the the Processing Pipeline, given you are allowed to arbitrarily modify the span: it's possible users are performing modifying operation on Strings belonging to a trace or span.

@nirebu nirebu force-pushed the nirebu/avoid-string-dup branch 2 times, most recently from cf257fe to 615ebe5 Compare March 28, 2023 08:10
@nirebu
Copy link
Contributor Author

nirebu commented Mar 28, 2023

@marcotc I've updated my PR (and main comment). The full explanation is in the commit message and main comment, however I think I've found a way to squeeze a bit more performance while keeping the same contract for the helper: if v is frozen, return it, otherwise return a non-frozen copy (which may end up or not end up frozen where it's used).

Another optimisation I can think of would be to have a specialised helper to use in theDatadog::Tracing::Correlation::Identifier class, where the results are then frozen again. I've added a commit to add this, let me know what you think.

# String#+@ was introduced in Ruby 2.3
if String.method_defined?(:+@) && String.method_defined?(:-@)
def self.frozen_or_dup(v)
v.frozen? ? v : +(-v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nirebu, can you leave a comment here explaining why +(-v) is being done here? I think this will raise questions in the future when someone stumbles upon this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcotc added!

@marcotc
Copy link
Member

marcotc commented Mar 28, 2023

@nirebu, the Datadog::Tracing::Correlation::Identifier changes look good, thank you!

String#+@ was introduced in Ruby 2.3: it is faster and a bit cheaper on
the memory side as well than #dup. However, to retain the helper contract
we need to return "thawed" copies of the original values, so we need to
first call String#-@ to return a frozen copy, then call String#+@ to
effectively have an unfrozen copy of the original value.

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  gem "benchmark-ips"
  gem "benchmark-memory"
end

Benchmark.ips do |x|
  x.report("+(-@)") { v = "foo"; +(-v) }
  x.report("dup") { v = "foo"; v.dup }

  x.compare!
end

Benchmark.memory do |x|
  x.report("+(-@)") { v = "foo"; +(-v) }
  x.report("dup") { v = "foo"; v.dup }

  x.compare!
end

Warming up --------------------------------------
               +(-@)   781.717k i/100ms
                 dup   711.791k i/100ms
Calculating -------------------------------------
               +(-@)      7.801M (± 0.8%) i/s -     39.086M in   5.010682s
                 dup      7.138M (± 0.7%) i/s -     36.301M in   5.085850s

Comparison:
               +(-@):  7800971.1 i/s
                 dup:  7138065.1 i/s - 1.09x  slower

Calculating -------------------------------------
               +(-@)    80.000  memsize (     0.000  retained)
                         2.000  objects (     0.000  retained)
                         1.000  strings (     0.000  retained)
                 dup    80.000  memsize (     0.000  retained)
                         2.000  objects (     0.000  retained)
                         1.000  strings (     0.000  retained)

Comparison:
               +(-@):         80 allocated
                 dup:         80 allocated - same
In the Datadog::Tracing::Correlation::Identifier the unfrozen copies of
strings are then frozen again: that's a step that can be skipped if the
original value is already frozen, so we can save allocations.

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  gem "benchmark-ips"
  gem "benchmark-memory"
end

Benchmark.ips do |x|
  x.report("+(-@).freeze") { v = "foo"; c = +(-v).freeze }
  x.report("-v") { v = "foo"; c = -v }

  x.compare!
end

Benchmark.memory do |x|
  x.report("+(-@).freeze") { v = "foo"; c = +(-v).freeze }
  x.report("-v") { v = "foo"; c = -v }

  x.compare!
end

Warming up --------------------------------------
        +(-@).freeze   691.075k i/100ms
                  -v   886.370k i/100ms
Calculating -------------------------------------
        +(-@).freeze      6.965M (± 0.5%) i/s -     35.245M in   5.060545s
                  -v      8.839M (± 0.6%) i/s -     44.318M in   5.014288s

Comparison:
                  -v:  8838792.7 i/s
        +(-@).freeze:  6964819.0 i/s - 1.27x  slower

Calculating -------------------------------------
        +(-@).freeze    80.000  memsize (     0.000  retained)
                         2.000  objects (     0.000  retained)
                         1.000  strings (     0.000  retained)
                  -v    40.000  memsize (     0.000  retained)
                         1.000  objects (     0.000  retained)
                         1.000  strings (     0.000  retained)

Comparison:
                  -v:         40 allocated
        +(-@).freeze:         80 allocated - 2.00x more
@nirebu
Copy link
Contributor Author

nirebu commented Mar 29, 2023

@marcotc I should have addressed all feedback.

@nirebu
Copy link
Contributor Author

nirebu commented Apr 5, 2023

@marcotc is there anything else I can do here?

Copy link
Member

@marcotc marcotc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much, @nirebu! 🙇

@marcotc marcotc merged commit 4ce935e into DataDog:master Apr 11, 2023
152 checks passed
@github-actions github-actions bot added this to the 1.11.0 milestone Apr 11, 2023
@nirebu nirebu deleted the nirebu/avoid-string-dup branch April 12, 2023 05:49
@lloeki lloeki removed this from the 1.11.0 milestone Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Involves Datadog core libraries performance Involves performance (e.g. CPU, memory, etc) tracing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants