Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rule preferring usage of Char over String if possible for single character Strings #90

Closed
Sija opened this issue Feb 5, 2019 · 8 comments
Assignees
Labels

Comments

@Sija
Copy link
Member

Sija commented Feb 5, 2019

In many cases single character Strings could be safely replaced by Chars ("." => '.'), e.g.:

- "foo/bar".index("/")
+ "foo/bar".index('/')

⚡️ Benchmark:

require "benchmark"

Benchmark.ips do |x|
  io1 = IO::Memory.new
  io2 = IO::Memory.new

  x.report("<< Char") do
    io1 << 'a' << 'b' << 'c'
  end
  x.report("<< String") do
    io2 << "a" << "b" << "c"
  end
end
~ ❯ crystal run x.cr --release                                                                                                                                                                            
  << Char  47.06M ( 21.25ns) (±23.32%)       fastest
<< String  29.56M ( 33.83ns) (±17.99%)  1.59× slower

~ ❯ crystal run x.cr --release
  << Char  54.37M ( 18.39ns) (±31.44%)       fastest
<< String  28.09M ( 35.61ns) (±15.39%)  1.94× slower

~ ❯ crystal run x.cr --release
  << Char   48.6M ( 20.58ns) (±23.82%)       fastest
<< String  28.97M ( 34.51ns) (±17.46%)  1.68× slower

Several PRs regarding this, made by yours truly:

@veelenga veelenga self-assigned this Feb 6, 2019
@straight-shoota
Copy link
Contributor

Using a Char instead of String is not always better and it's impossible to detect automatically.

@veelenga
Copy link
Member

veelenga commented Feb 7, 2019

@straight-shoota could you please share an example when it is not better?

@Sija
Copy link
Member Author

Sija commented Feb 7, 2019

@straight-shoota I'm quite surprised but from a preliminary testing I just did it seems that in some cases using Char is actually slower 😕 It's very much implementation dependent and so it varies from method to method, but you're certainly right in saying it's not always the case...

Benchmark of String#index for example:

  String#index(Char)  206.8M (  4.84ns) (±11.90%)  0 B/op   1.72× slower
String#index(String) 355.61M (  2.81ns) (±14.81%)  0 B/op        fastest

@Sija
Copy link
Member Author

Sija commented Feb 7, 2019

Whoaaa, there's sth srsly wrong with some of the methods accepting Char, see this:

require "benchmark"

N = 10_000
str = "#{"x" * 100}/#{"y" * 100}@#{"z" * 100}"

Benchmark.ips do |x|
  x.report("String#includes?(Char)") do
    N.times { str.includes?('@') }
  end
  x.report("String#includes?(String)") do
    N.times { str.includes?("@") }
  end
end
  String#includes?(Char)  63.03k ( 15.87µs) (± 3.61%)  0 B/op  5622.70× slower
String#includes?(String) 354.38M (  2.82ns) (±16.05%)  0 B/op          fastest

String#includes? calls String#index under-the-hood, which seems to be really under-optimized for Char (calling String#each_char_with_index which in turn calls String#each_char, which creates an instance of CharIterator). What's odd is that benchmarking String#index shows difference of several magnitudes lower that String#includes?...

@straight-shoota
Copy link
Contributor

@Sija The difference of several magnitudes is very likely caused by an unintended performance optimization. 2.82 ns is way to fast for 10.000 String#includes?(String) calls, it's more likely the run time of only a single call.

Note that String#index(Char) on a string with only ASCII characters (and the search character also being ASCII) does not use a each_char but works directly on Slice#index. Otherwise there would be some allocations for the CharIterator.

@Sija
Copy link
Member Author

Sija commented Feb 7, 2019

The difference of several magnitudes is very likely caused by an unintended performance optimization. 2.82 ns is way to fast for 10.000 String#includes?(String) calls, it's more likely the run time of only a single call.

Seems you're right, removing N.times {} yields more understandable results:

  String#includes?(Char) 231.21M (  4.33ns) (±12.16%)  0 B/op   1.84× slower
String#includes?(String) 425.72M (  2.35ns) (±13.37%)  0 B/op        fastest

Note that String#index(Char) on a string with only ASCII characters (and the search character also being ASCII) does not use a each_char but works directly on Slice#index. Otherwise there would be some allocations for the CharIterator.

True that, although creating Slice is still less performant than operating on a raw pointer as String#index(String) does.

@veelenga
Copy link
Member

@Sija @straight-shoota what is your suggestion here? Should we force using char over string in some cases or not?

@veelenga
Copy link
Member

Closing this for now, @Sija be free to re-open for further discussion.

@Sija Sija added rule and removed feature labels Oct 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants