New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crystal version #20

Closed
wants to merge 12 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@akitaonrails

akitaonrails commented Jun 22, 2016

This gets the original C-based fast_blank and adds a condition to extconf.rb to compile a Crystal version instead. Everything else (but a couple of specs) should work just the same.

The goal:

  • to see if the Crystal version is competitive with the C version
  • if the numbers are competitive, it's should be clearly easier to write and maintain a Crystal version than C

I ran the tests in my machine (Ubuntu 14.04 64 bits, Ruby 2.3.1, Crystal 0.18.2). I did both the C and Crystal benchmarks in the same machine to have a fair comparison. These are the numbers:

C Version benchmark:

================== Test String Length: 0 ==================
Calculating -------------------------------------
          Fast Blank   175.955k i/100ms
  Fast ActiveSupport   177.386k i/100ms
          Slow Blank    84.206k i/100ms
      New Slow Blank   173.732k i/100ms
-------------------------------------------------
          Fast Blank     26.134M (± 6.1%) i/s -    129.855M
  Fast ActiveSupport     25.632M (±10.0%) i/s -    125.767M
          Slow Blank      1.633M (± 6.6%) i/s -      8.168M
      New Slow Blank     19.162M (±11.6%) i/s -     94.336M

Comparison:
          Fast Blank: 26134380.9 i/s
  Fast ActiveSupport: 25632474.8 i/s - 1.02x slower
      New Slow Blank: 19162018.3 i/s - 1.36x slower
          Slow Blank:  1632519.4 i/s - 16.01x slower


================== Test String Length: 6 ==================
Calculating -------------------------------------
          Fast Blank   126.348k i/100ms
  Fast ActiveSupport   148.406k i/100ms
          Slow Blank    64.285k i/100ms
      New Slow Blank    93.053k i/100ms
-------------------------------------------------
          Fast Blank      9.335M (± 6.6%) i/s -     46.370M
  Fast ActiveSupport     10.306M (± 4.3%) i/s -     51.497M
          Slow Blank      1.428M (± 6.9%) i/s -      7.136M
      New Slow Blank      2.153M (± 3.8%) i/s -     10.794M

Comparison:
  Fast ActiveSupport: 10305820.1 i/s
          Fast Blank:  9334875.2 i/s - 1.10x slower
      New Slow Blank:  2153314.6 i/s - 4.79x slower
          Slow Blank:  1428318.5 i/s - 7.22x slower


================== Test String Length: 14 ==================
Calculating -------------------------------------
          Fast Blank   171.908k i/100ms
  Fast ActiveSupport   171.954k i/100ms
          Slow Blank   111.441k i/100ms
      New Slow Blank    82.109k i/100ms
-------------------------------------------------
          Fast Blank     17.847M (±12.4%) i/s -     87.673M
  Fast ActiveSupport     19.275M (± 8.5%) i/s -     95.434M
          Slow Blank      3.201M (± 5.0%) i/s -     16.048M
      New Slow Blank      1.610M (± 5.9%) i/s -      8.047M

Comparison:
  Fast ActiveSupport: 19275233.9 i/s
          Fast Blank: 17847351.8 i/s - 1.08x slower
          Slow Blank:  3201263.8 i/s - 6.02x slower
      New Slow Blank:  1610193.5 i/s - 11.97x slower


================== Test String Length: 24 ==================
Calculating -------------------------------------
          Fast Blank   161.284k i/100ms
  Fast ActiveSupport   156.552k i/100ms
          Slow Blank   107.136k i/100ms
      New Slow Blank    75.888k i/100ms
-------------------------------------------------
          Fast Blank     11.639M (± 7.0%) i/s -     57.740M
  Fast ActiveSupport     11.571M (±10.8%) i/s -     56.985M
          Slow Blank      2.772M (± 5.5%) i/s -     13.821M
      New Slow Blank      1.489M (± 5.3%) i/s -      7.437M

Comparison:
          Fast Blank: 11638605.2 i/s
  Fast ActiveSupport: 11571159.9 i/s - 1.01x slower
          Slow Blank:  2771977.6 i/s - 4.20x slower
      New Slow Blank:  1488644.4 i/s - 7.82x slower


================== Test String Length: 136 ==================
Calculating -------------------------------------
          Fast Blank   159.586k i/100ms
  Fast ActiveSupport   160.982k i/100ms
          Slow Blank   106.354k i/100ms
      New Slow Blank    77.455k i/100ms
-------------------------------------------------
          Fast Blank     11.774M (± 5.9%) i/s -     58.568M
  Fast ActiveSupport     11.237M (±12.9%) i/s -     55.056M
          Slow Blank      2.546M (±13.4%) i/s -     12.550M
      New Slow Blank      1.282M (±16.4%) i/s -      6.274M

Comparison:
          Fast Blank: 11773722.8 i/s
  Fast ActiveSupport: 11237239.4 i/s - 1.05x slower
          Slow Blank:  2545953.1 i/s - 4.62x slower
      New Slow Blank:  1282478.2 i/s - 9.18x slower

Crystal version

================== Test String Length: 0 ==================
Calculating -------------------------------------
          Fast Blank    93.058k i/100ms
  Fast ActiveSupport    90.831k i/100ms
          Slow Blank    85.016k i/100ms
      New Slow Blank   172.320k i/100ms
-------------------------------------------------
          Fast Blank      2.338M (± 8.0%) i/s -     11.632M
  Fast ActiveSupport      2.218M (±16.6%) i/s -     10.718M
          Slow Blank      1.559M (± 9.6%) i/s -      7.736M
      New Slow Blank     18.679M (±17.5%) i/s -     89.089M

Comparison:
      New Slow Blank: 18678711.7 i/s
          Fast Blank:  2338327.0 i/s - 7.99x slower
  Fast ActiveSupport:  2218331.7 i/s - 8.42x slower
          Slow Blank:  1558660.2 i/s - 11.98x slower


================== Test String Length: 6 ==================
Calculating -------------------------------------
          Fast Blank    72.629k i/100ms
  Fast ActiveSupport    84.776k i/100ms
          Slow Blank    62.860k i/100ms
      New Slow Blank    79.570k i/100ms
-------------------------------------------------
          Fast Blank      1.831M (±13.5%) i/s -      9.006M
  Fast ActiveSupport      2.033M (± 7.4%) i/s -     10.173M
          Slow Blank      1.444M (± 4.6%) i/s -      7.229M
      New Slow Blank      2.150M (± 4.4%) i/s -     10.742M

Comparison:
      New Slow Blank:  2150109.6 i/s
  Fast ActiveSupport:  2032594.7 i/s - 1.06x slower
          Fast Blank:  1831031.2 i/s - 1.17x slower
          Slow Blank:  1444152.9 i/s - 1.49x slower


================== Test String Length: 14 ==================
Calculating -------------------------------------
          Fast Blank   103.028k i/100ms
  Fast ActiveSupport   104.600k i/100ms
          Slow Blank   117.672k i/100ms
      New Slow Blank    83.105k i/100ms
-------------------------------------------------
          Fast Blank      2.909M (± 9.8%) i/s -     14.424M
  Fast ActiveSupport      3.004M (± 8.0%) i/s -     14.958M
          Slow Blank      3.241M (± 4.1%) i/s -     16.239M
      New Slow Blank      1.624M (± 5.5%) i/s -      8.144M

Comparison:
          Slow Blank:  3241261.2 i/s
  Fast ActiveSupport:  3003942.4 i/s - 1.08x slower
          Fast Blank:  2908911.0 i/s - 1.11x slower
      New Slow Blank:  1624285.3 i/s - 2.00x slower


================== Test String Length: 24 ==================
Calculating -------------------------------------
          Fast Blank    97.126k i/100ms
  Fast ActiveSupport    96.868k i/100ms
          Slow Blank    92.417k i/100ms
      New Slow Blank    77.581k i/100ms
-------------------------------------------------
          Fast Blank      2.287M (±11.7%) i/s -     11.267M
  Fast ActiveSupport      2.507M (± 8.1%) i/s -     12.496M
          Slow Blank      2.798M (± 4.5%) i/s -     13.955M
      New Slow Blank      1.523M (± 4.7%) i/s -      7.603M

Comparison:
          Slow Blank:  2798137.8 i/s
  Fast ActiveSupport:  2506713.8 i/s - 1.12x slower
          Fast Blank:  2287223.6 i/s - 1.22x slower
      New Slow Blank:  1522530.9 i/s - 1.84x slower


================== Test String Length: 136 ==================
Calculating -------------------------------------
          Fast Blank    85.992k i/100ms
  Fast ActiveSupport    83.403k i/100ms
          Slow Blank   107.259k i/100ms
      New Slow Blank    79.743k i/100ms
-------------------------------------------------
          Fast Blank      1.851M (±11.7%) i/s -      9.115M
  Fast ActiveSupport      1.955M (± 8.6%) i/s -      9.758M
          Slow Blank      2.761M (± 7.6%) i/s -     13.729M
      New Slow Blank      1.514M (± 8.3%) i/s -      7.496M

Comparison:
          Slow Blank:  2760863.7 i/s
  Fast ActiveSupport:  1954917.9 i/s - 1.41x slower
          Fast Blank:  1850951.5 i/s - 1.49x slower
      New Slow Blank:  1513753.1 i/s - 1.82x slower

akitaonrails added some commits Jun 22, 2016

Adding a Crystal implementation that mimics exactly the Ruby version.…
… But it's still not 100% compatible with the original ActiveSupport version (the #blank_as?)
Adding the same #blank_as? implementation, more specs pass but some s…
…till don't. Adjusting the benchmark to require the extension directly. Adjusting the README to update the C benchmark and add the Crystal numbers
@luizpaulogodoy

This comment has been minimized.

Show comment
Hide comment
@luizpaulogodoy

luizpaulogodoy commented Jun 22, 2016

👍

reimplenting like the C version, but Crystal is still slow. Also tryi…
…ng to cast '\u0000' to ruby string through libruby's String#from_ruby blows an ArgumentError string contains null. Still need investigating
@akitaonrails

This comment has been minimized.

Show comment
Hide comment
@akitaonrails

akitaonrails Jun 23, 2016

cc @asterite @phoffer8

I am still doing something wrong. The Crystal version is lagging behind even the slowest Ruby version for some reason.

I am having trouble when the string contains a null character ("\u0000"), it blows up in the libruby wrapper. This makes some specs fail.

I tried to use the same implementation as C (without using a regex) and not even using convenient methods such as String#each_char. But nothing.

Not sure if I am using some wrong CC compile/optimization flag maybe?

Could use some help investigating.

akitaonrails commented Jun 23, 2016

cc @asterite @phoffer8

I am still doing something wrong. The Crystal version is lagging behind even the slowest Ruby version for some reason.

I am having trouble when the string contains a null character ("\u0000"), it blows up in the libruby wrapper. This makes some specs fail.

I tried to use the same implementation as C (without using a regex) and not even using convenient methods such as String#each_char. But nothing.

Not sure if I am using some wrong CC compile/optimization flag maybe?

Could use some help investigating.

@akitaonrails

This comment has been minimized.

Show comment
Hide comment
@akitaonrails

akitaonrails Jun 23, 2016

The latest patch is still performing bad:

Fast Blank is the Crystal #blank? and Fast ActiveSupport is #blank_as?

Slow Blank and New Slow Blank is pure MRI Ruby with regex. The Crystal version shouldn't be this close to the Ruby version.

================== Test String Length: 0 ==================
Calculating -------------------------------------
          Fast Blank   130.848k i/100ms
  Fast ActiveSupport   137.333k i/100ms
          Slow Blank    83.552k i/100ms
      New Slow Blank   172.358k i/100ms
-------------------------------------------------
          Fast Blank      7.857M (± 9.7%) i/s -     38.731M
  Fast ActiveSupport      7.746M (±12.5%) i/s -     37.904M
          Slow Blank      1.472M (±14.7%) i/s -      7.185M
      New Slow Blank     17.987M (±16.9%) i/s -     86.524M

Comparison:
      New Slow Blank: 17987192.9 i/s
          Fast Blank:  7856517.6 i/s - 2.29x slower
  Fast ActiveSupport:  7746000.4 i/s - 2.32x slower
          Slow Blank:  1471995.7 i/s - 12.22x slower


================== Test String Length: 6 ==================
Calculating -------------------------------------
          Fast Blank    86.678k i/100ms
  Fast ActiveSupport    79.319k i/100ms
          Slow Blank    65.693k i/100ms
      New Slow Blank    70.350k i/100ms
-------------------------------------------------
          Fast Blank      2.023M (±15.0%) i/s -      9.881M
  Fast ActiveSupport      2.308M (±18.9%) i/s -     11.105M
          Slow Blank      1.088M (±19.3%) i/s -      5.255M
      New Slow Blank      1.625M (±23.1%) i/s -      7.598M

Comparison:
  Fast ActiveSupport:  2308215.1 i/s
          Fast Blank:  2023149.3 i/s - 1.14x slower
      New Slow Blank:  1624922.4 i/s - 1.42x slower
          Slow Blank:  1088234.4 i/s - 2.12x slower


================== Test String Length: 14 ==================
Calculating -------------------------------------
          Fast Blank    74.098k i/100ms
  Fast ActiveSupport    67.948k i/100ms
          Slow Blank    73.262k i/100ms
      New Slow Blank    61.959k i/100ms
-------------------------------------------------
          Fast Blank      3.374M (±13.8%) i/s -     16.450M
  Fast ActiveSupport      2.692M (± 9.9%) i/s -     13.318M
          Slow Blank      3.146M (± 8.7%) i/s -     15.532M
      New Slow Blank      1.623M (± 6.1%) i/s -      8.117M

Comparison:
          Fast Blank:  3373729.1 i/s
          Slow Blank:  3145984.1 i/s - 1.07x slower
  Fast ActiveSupport:  2692438.7 i/s - 1.25x slower
      New Slow Blank:  1623077.0 i/s - 2.08x slower


================== Test String Length: 24 ==================
Calculating -------------------------------------
          Fast Blank    91.982k i/100ms
  Fast ActiveSupport    92.785k i/100ms
          Slow Blank    89.864k i/100ms
      New Slow Blank    67.668k i/100ms
-------------------------------------------------
          Fast Blank      1.926M (±10.4%) i/s -      9.566M
  Fast ActiveSupport      2.229M (± 6.4%) i/s -     11.134M
          Slow Blank      2.761M (± 6.1%) i/s -     13.749M
      New Slow Blank      1.488M (± 6.8%) i/s -      7.443M

Comparison:
          Slow Blank:  2761086.7 i/s
  Fast ActiveSupport:  2229438.7 i/s - 1.24x slower
          Fast Blank:  1926024.9 i/s - 1.43x slower
      New Slow Blank:  1488038.2 i/s - 1.86x slower


================== Test String Length: 136 ==================
Calculating -------------------------------------
          Fast Blank    50.628k i/100ms
  Fast ActiveSupport    51.896k i/100ms
          Slow Blank   105.472k i/100ms
      New Slow Blank    80.191k i/100ms
-------------------------------------------------
          Fast Blank    777.116k (± 5.0%) i/s -      3.898M
  Fast ActiveSupport    793.106k (± 7.5%) i/s -      3.944M
          Slow Blank      2.535M (±14.1%) i/s -     12.340M
      New Slow Blank      1.372M (±10.6%) i/s -      6.816M

Comparison:
          Slow Blank:  2534870.9 i/s
      New Slow Blank:  1372307.8 i/s - 1.85x slower
  Fast ActiveSupport:   793105.8 i/s - 3.20x slower
          Fast Blank:   777116.2 i/s - 3.26x slower

akitaonrails commented Jun 23, 2016

The latest patch is still performing bad:

Fast Blank is the Crystal #blank? and Fast ActiveSupport is #blank_as?

Slow Blank and New Slow Blank is pure MRI Ruby with regex. The Crystal version shouldn't be this close to the Ruby version.

================== Test String Length: 0 ==================
Calculating -------------------------------------
          Fast Blank   130.848k i/100ms
  Fast ActiveSupport   137.333k i/100ms
          Slow Blank    83.552k i/100ms
      New Slow Blank   172.358k i/100ms
-------------------------------------------------
          Fast Blank      7.857M (± 9.7%) i/s -     38.731M
  Fast ActiveSupport      7.746M (±12.5%) i/s -     37.904M
          Slow Blank      1.472M (±14.7%) i/s -      7.185M
      New Slow Blank     17.987M (±16.9%) i/s -     86.524M

Comparison:
      New Slow Blank: 17987192.9 i/s
          Fast Blank:  7856517.6 i/s - 2.29x slower
  Fast ActiveSupport:  7746000.4 i/s - 2.32x slower
          Slow Blank:  1471995.7 i/s - 12.22x slower


================== Test String Length: 6 ==================
Calculating -------------------------------------
          Fast Blank    86.678k i/100ms
  Fast ActiveSupport    79.319k i/100ms
          Slow Blank    65.693k i/100ms
      New Slow Blank    70.350k i/100ms
-------------------------------------------------
          Fast Blank      2.023M (±15.0%) i/s -      9.881M
  Fast ActiveSupport      2.308M (±18.9%) i/s -     11.105M
          Slow Blank      1.088M (±19.3%) i/s -      5.255M
      New Slow Blank      1.625M (±23.1%) i/s -      7.598M

Comparison:
  Fast ActiveSupport:  2308215.1 i/s
          Fast Blank:  2023149.3 i/s - 1.14x slower
      New Slow Blank:  1624922.4 i/s - 1.42x slower
          Slow Blank:  1088234.4 i/s - 2.12x slower


================== Test String Length: 14 ==================
Calculating -------------------------------------
          Fast Blank    74.098k i/100ms
  Fast ActiveSupport    67.948k i/100ms
          Slow Blank    73.262k i/100ms
      New Slow Blank    61.959k i/100ms
-------------------------------------------------
          Fast Blank      3.374M (±13.8%) i/s -     16.450M
  Fast ActiveSupport      2.692M (± 9.9%) i/s -     13.318M
          Slow Blank      3.146M (± 8.7%) i/s -     15.532M
      New Slow Blank      1.623M (± 6.1%) i/s -      8.117M

Comparison:
          Fast Blank:  3373729.1 i/s
          Slow Blank:  3145984.1 i/s - 1.07x slower
  Fast ActiveSupport:  2692438.7 i/s - 1.25x slower
      New Slow Blank:  1623077.0 i/s - 2.08x slower


================== Test String Length: 24 ==================
Calculating -------------------------------------
          Fast Blank    91.982k i/100ms
  Fast ActiveSupport    92.785k i/100ms
          Slow Blank    89.864k i/100ms
      New Slow Blank    67.668k i/100ms
-------------------------------------------------
          Fast Blank      1.926M (±10.4%) i/s -      9.566M
  Fast ActiveSupport      2.229M (± 6.4%) i/s -     11.134M
          Slow Blank      2.761M (± 6.1%) i/s -     13.749M
      New Slow Blank      1.488M (± 6.8%) i/s -      7.443M

Comparison:
          Slow Blank:  2761086.7 i/s
  Fast ActiveSupport:  2229438.7 i/s - 1.24x slower
          Fast Blank:  1926024.9 i/s - 1.43x slower
      New Slow Blank:  1488038.2 i/s - 1.86x slower


================== Test String Length: 136 ==================
Calculating -------------------------------------
          Fast Blank    50.628k i/100ms
  Fast ActiveSupport    51.896k i/100ms
          Slow Blank   105.472k i/100ms
      New Slow Blank    80.191k i/100ms
-------------------------------------------------
          Fast Blank    777.116k (± 5.0%) i/s -      3.898M
  Fast ActiveSupport    793.106k (± 7.5%) i/s -      3.944M
          Slow Blank      2.535M (±14.1%) i/s -     12.340M
      New Slow Blank      1.372M (±10.6%) i/s -      6.816M

Comparison:
          Slow Blank:  2534870.9 i/s
      New Slow Blank:  1372307.8 i/s - 1.85x slower
  Fast ActiveSupport:   793105.8 i/s - 3.20x slower
          Fast Blank:   777116.2 i/s - 3.26x slower
@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Jun 23, 2016

@akitaonrails I'm getting:

$ cd ext && ruby fast_blank/extconf.rb && make
checking for crystal... yes
checking for llvm-config... yes
Crystal version of the Makefile copied
crystal src/cr_fast_blank.cr --release --link-flags "-dynamic -bundle -Wl,-undefined,dynamic_lookup" -o fast_blank.bundle
mv cr_fast_blank.bundle fast_blank.bundle
mv: cr_fast_blank.bundle: No such file or directory
make: *** [fast_blank.bundle] Error 1

What am I doing wrong?

asterite commented Jun 23, 2016

@akitaonrails I'm getting:

$ cd ext && ruby fast_blank/extconf.rb && make
checking for crystal... yes
checking for llvm-config... yes
Crystal version of the Makefile copied
crystal src/cr_fast_blank.cr --release --link-flags "-dynamic -bundle -Wl,-undefined,dynamic_lookup" -o fast_blank.bundle
mv cr_fast_blank.bundle fast_blank.bundle
mv: cr_fast_blank.bundle: No such file or directory
make: *** [fast_blank.bundle] Error 1

What am I doing wrong?

@akitaonrails

This comment has been minimized.

Show comment
Hide comment
@akitaonrails

akitaonrails Jun 23, 2016

@asterite sorry, my mistake. because there is a 'fast_ruby' folder I had to name the crystal version as 'src/cr_fast_ruby.cr' and the crystal compile generates a 'src/cr_fast_ruby.bundle', after that I 'mv' it down to '..' as 'fast_ruby.bundle'. Seems like in the mac version I forgot to type 'src/' in the mv command. Just pushed a commit to fix that, please pull and try again.

btw, the makefile makes different calls if you're under ubuntu or os x, I am on ubuntu which is why I overlooked this os x related typo.

akitaonrails commented Jun 23, 2016

@asterite sorry, my mistake. because there is a 'fast_ruby' folder I had to name the crystal version as 'src/cr_fast_ruby.cr' and the crystal compile generates a 'src/cr_fast_ruby.bundle', after that I 'mv' it down to '..' as 'fast_ruby.bundle'. Seems like in the mac version I forgot to type 'src/' in the mv command. Just pushed a commit to fix that, please pull and try again.

btw, the makefile makes different calls if you're under ubuntu or os x, I am on ubuntu which is why I overlooked this os x related typo.

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Jun 23, 2016

@akitaonrails Still no luck:

$ pwd
/Users/asterite-manas/Sandbox/crystal/fast_blank
$ cd ext && ruby fast_blank/extconf.rb && make
checking for crystal... yes
checking for llvm-config... yes
Crystal version of the Makefile copied
crystal src/cr_fast_blank.cr --release --link-flags "-dynamic -bundle -Wl,-undefined,dynamic_lookup" -o fast_blank.bundle
mv src/cr_fast_blank.bundle fast_blank.bundle
mv: src/cr_fast_blank.bundle: No such file or directory
make: *** [fast_blank.bundle] Error 1

asterite commented Jun 23, 2016

@akitaonrails Still no luck:

$ pwd
/Users/asterite-manas/Sandbox/crystal/fast_blank
$ cd ext && ruby fast_blank/extconf.rb && make
checking for crystal... yes
checking for llvm-config... yes
Crystal version of the Makefile copied
crystal src/cr_fast_blank.cr --release --link-flags "-dynamic -bundle -Wl,-undefined,dynamic_lookup" -o fast_blank.bundle
mv src/cr_fast_blank.bundle fast_blank.bundle
mv: src/cr_fast_blank.bundle: No such file or directory
make: *** [fast_blank.bundle] Error 1
@akitaonrails

This comment has been minimized.

Show comment
Hide comment
@akitaonrails

akitaonrails Jun 23, 2016

This is strange. Extconf.rb just created the correct makefile. You can cd ext to it. And you can see if you have cr_fast_blank.bundle. Just mv it to fast_blank.bundle and that's it. Sorry, I'm going to the airport and I will only be able to tweak it tomorrow.

akitaonrails commented Jun 23, 2016

This is strange. Extconf.rb just created the correct makefile. You can cd ext to it. And you can see if you have cr_fast_blank.bundle. Just mv it to fast_blank.bundle and that's it. Sorry, I'm going to the airport and I will only be able to tweak it tomorrow.

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Jun 23, 2016

@akitaonrails I managed to compile and run it :-)

asterite commented Jun 23, 2016

@akitaonrails I managed to compile and run it :-)

@akitaonrails

This comment has been minimized.

Show comment
Hide comment
@akitaonrails

akitaonrails Jul 5, 2016

Hi, sorry for the long delay. I finally got time to return to this issue.

@asterite I noticed that the resulting binaries on Ubuntu and OS X have different behaviour. On OS X El Capitan I have consistently GOOD results:

================== Test String Length: 0 ==================
Warming up --------------------------------------
          Fast Blank   224.561k i/100ms
  Fast ActiveSupport   222.267k i/100ms
          Slow Blank    65.257k i/100ms
      New Slow Blank   218.927k i/100ms
Calculating -------------------------------------
          Fast Blank     21.967M (± 4.6%) i/s -    109.586M in   5.001229s
  Fast ActiveSupport     22.101M (± 4.1%) i/s -    110.467M in   5.007896s
          Slow Blank      1.073M (± 2.0%) i/s -      5.416M in   5.049760s
      New Slow Blank     19.429M (± 4.0%) i/s -     96.985M in   5.000880s

Comparison:
  Fast ActiveSupport: 22100706.9 i/s
          Fast Blank: 21967083.6 i/s - same-ish: difference falls within error
      New Slow Blank: 19428558.7 i/s - 1.14x slower
          Slow Blank:  1073053.0 i/s - 20.60x slower


================== Test String Length: 6 ==================
Warming up --------------------------------------
          Fast Blank   191.657k i/100ms
  Fast ActiveSupport   197.443k i/100ms
          Slow Blank    59.726k i/100ms
      New Slow Blank    92.509k i/100ms
Calculating -------------------------------------
          Fast Blank      8.592M (± 3.4%) i/s -     42.931M in   5.002842s
  Fast ActiveSupport      9.323M (± 3.2%) i/s -     46.794M in   5.024754s
          Slow Blank    933.543k (± 2.0%) i/s -      4.718M in   5.056400s
      New Slow Blank      1.774M (± 2.7%) i/s -      8.881M in   5.010976s

Comparison:
  Fast ActiveSupport:  9323242.7 i/s
          Fast Blank:  8592076.6 i/s - 1.09x slower
      New Slow Blank:  1773591.7 i/s - 5.26x slower
          Slow Blank:   933543.2 i/s - 9.99x slower


================== Test String Length: 14 ==================
Warming up --------------------------------------
          Fast Blank   217.579k i/100ms
  Fast ActiveSupport   219.251k i/100ms
          Slow Blank   108.983k i/100ms
      New Slow Blank    64.484k i/100ms
Calculating -------------------------------------
          Fast Blank     17.892M (± 3.9%) i/s -     89.425M in   5.006639s
  Fast ActiveSupport     18.786M (± 3.9%) i/s -     93.839M in   5.003915s
          Slow Blank      2.411M (± 3.3%) i/s -     12.097M in   5.022778s
      New Slow Blank      1.030M (± 2.1%) i/s -      5.159M in   5.011957s

Comparison:
  Fast ActiveSupport: 18786330.2 i/s
          Fast Blank: 17892076.9 i/s - same-ish: difference falls within error
          Slow Blank:  2411216.3 i/s - 7.79x slower
      New Slow Blank:  1029749.9 i/s - 18.24x slower


================== Test String Length: 24 ==================
Warming up --------------------------------------
          Fast Blank   201.588k i/100ms
  Fast ActiveSupport   205.157k i/100ms
          Slow Blank    95.901k i/100ms
      New Slow Blank    61.401k i/100ms
Calculating -------------------------------------
          Fast Blank     10.735M (±11.9%) i/s -     46.970M in   5.015147s
  Fast ActiveSupport     11.836M (± 4.0%) i/s -     59.085M in   5.000830s
          Slow Blank      1.979M (± 2.9%) i/s -      9.974M in   5.043224s
      New Slow Blank    942.277k (± 3.4%) i/s -      4.728M in   5.023364s

Comparison:
  Fast ActiveSupport: 11836116.5 i/s
          Fast Blank: 10735159.6 i/s - same-ish: difference falls within error
          Slow Blank:  1979411.5 i/s - 5.98x slower
      New Slow Blank:   942276.8 i/s - 12.56x slower


================== Test String Length: 136 ==================
Warming up --------------------------------------
          Fast Blank   201.447k i/100ms
  Fast ActiveSupport   203.688k i/100ms
          Slow Blank    97.240k i/100ms
      New Slow Blank    61.616k i/100ms
Calculating -------------------------------------
          Fast Blank     10.874M (± 3.3%) i/s -     54.391M in   5.007674s
  Fast ActiveSupport     11.892M (± 3.3%) i/s -     59.477M in   5.007113s
          Slow Blank      1.994M (± 3.9%) i/s -     10.016M in   5.031827s
      New Slow Blank    964.325k (± 3.6%) i/s -      4.868M in   5.054301s

Comparison:
  Fast ActiveSupport: 11892273.2 i/s
          Fast Blank: 10874334.0 i/s - 1.09x slower
          Slow Blank:  1994065.9 i/s - 5.96x slower
      New Slow Blank:   964325.2 i/s - 12.33x slower

The "Fast" versions being the native Crystal and "Slow" being MRI Ruby.

On a Ubuntu machine I get consistently BAD results:

================== Test String Length: 0 ==================
Calculating -------------------------------------
          Fast Blank   137.644k i/100ms
  Fast ActiveSupport   138.318k i/100ms
          Slow Blank    72.109k i/100ms
      New Slow Blank   151.683k i/100ms
-------------------------------------------------
          Fast Blank      8.063M (± 4.1%) i/s -     40.330M
  Fast ActiveSupport      7.880M (± 8.6%) i/s -     39.006M
          Slow Blank      1.607M (± 6.8%) i/s -      8.004M
      New Slow Blank     19.326M (± 8.1%) i/s -     95.864M

Comparison:
      New Slow Blank: 19325547.8 i/s
          Fast Blank:  8063380.8 i/s - 2.40x slower
  Fast ActiveSupport:  7879726.2 i/s - 2.45x slower
          Slow Blank:  1606953.4 i/s - 12.03x slower


================== Test String Length: 6 ==================
Calculating -------------------------------------
          Fast Blank   108.852k i/100ms
  Fast ActiveSupport    70.605k i/100ms
          Slow Blank    66.627k i/100ms
      New Slow Blank    84.926k i/100ms
-------------------------------------------------
          Fast Blank      3.951M (±15.9%) i/s -     17.525M
  Fast ActiveSupport      1.270M (± 4.2%) i/s -      6.354M
          Slow Blank      1.338M (± 7.0%) i/s -      6.663M
      New Slow Blank      2.194M (± 5.0%) i/s -     10.955M

Comparison:
          Fast Blank:  3950536.7 i/s
      New Slow Blank:  2194200.6 i/s - 1.80x slower
          Slow Blank:  1338334.2 i/s - 2.95x slower
  Fast ActiveSupport:  1269842.9 i/s - 3.11x slower


================== Test String Length: 14 ==================
Calculating -------------------------------------
          Fast Blank   117.034k i/100ms
  Fast ActiveSupport   106.957k i/100ms
          Slow Blank   100.057k i/100ms
      New Slow Blank    74.057k i/100ms
-------------------------------------------------
          Fast Blank      3.864M (± 4.9%) i/s -     19.311M
  Fast ActiveSupport      3.037M (± 4.2%) i/s -     15.188M
          Slow Blank      3.314M (± 5.0%) i/s -     16.609M
      New Slow Blank      1.600M (± 5.3%) i/s -      7.998M

Comparison:
          Fast Blank:  3863724.5 i/s
          Slow Blank:  3313866.0 i/s - 1.17x slower
  Fast ActiveSupport:  3037103.9 i/s - 1.27x slower
      New Slow Blank:  1599564.4 i/s - 2.42x slower


================== Test String Length: 24 ==================
Calculating -------------------------------------
          Fast Blank    98.364k i/100ms
  Fast ActiveSupport    71.871k i/100ms
          Slow Blank    89.585k i/100ms
      New Slow Blank    70.762k i/100ms
-------------------------------------------------
          Fast Blank      2.801M (± 5.9%) i/s -     13.968M
  Fast ActiveSupport      1.399M (± 4.9%) i/s -      7.043M
          Slow Blank      2.545M (± 5.4%) i/s -     12.721M
      New Slow Blank      1.468M (± 4.8%) i/s -      7.359M

Comparison:
          Fast Blank:  2801422.3 i/s
          Slow Blank:  2544573.4 i/s - 1.10x slower
      New Slow Blank:  1468243.3 i/s - 1.91x slower
  Fast ActiveSupport:  1398659.2 i/s - 2.00x slower


================== Test String Length: 136 ==================
Calculating -------------------------------------
          Fast Blank    55.046k i/100ms
  Fast ActiveSupport    45.574k i/100ms
          Slow Blank    90.240k i/100ms
      New Slow Blank    71.276k i/100ms
-------------------------------------------------
          Fast Blank    858.295k (± 3.8%) i/s -      4.294M
  Fast ActiveSupport    651.345k (± 2.8%) i/s -      3.281M
          Slow Blank      2.597M (± 4.7%) i/s -     12.995M
      New Slow Blank      1.467M (± 5.6%) i/s -      7.341M

Comparison:
          Slow Blank:  2596516.9 i/s
      New Slow Blank:  1466785.4 i/s - 1.77x slower
          Fast Blank:   858294.6 i/s - 3.03x slower
  Fast ActiveSupport:   651345.3 i/s - 3.99x slower

So I have 3 hypothesis:

  1. LLVM on OS X is simply better than on Ubuntu 14.04
  2. My Makefile uses 2 different ways to compile, on the OS X version just the crystal command is all it takes to generate the .bundle file. On Ubuntu I use CC to link the .o generated by crystal into a shared-object. Could this add extra overheads somehow? Could the crystal command generate the .so directly?
  3. Are there different compile flags to optimize for Linux?

@asterite could you help out with those? I am not so familiar with the LLVM tooling enough to know those answers.

akitaonrails commented Jul 5, 2016

Hi, sorry for the long delay. I finally got time to return to this issue.

@asterite I noticed that the resulting binaries on Ubuntu and OS X have different behaviour. On OS X El Capitan I have consistently GOOD results:

================== Test String Length: 0 ==================
Warming up --------------------------------------
          Fast Blank   224.561k i/100ms
  Fast ActiveSupport   222.267k i/100ms
          Slow Blank    65.257k i/100ms
      New Slow Blank   218.927k i/100ms
Calculating -------------------------------------
          Fast Blank     21.967M (± 4.6%) i/s -    109.586M in   5.001229s
  Fast ActiveSupport     22.101M (± 4.1%) i/s -    110.467M in   5.007896s
          Slow Blank      1.073M (± 2.0%) i/s -      5.416M in   5.049760s
      New Slow Blank     19.429M (± 4.0%) i/s -     96.985M in   5.000880s

Comparison:
  Fast ActiveSupport: 22100706.9 i/s
          Fast Blank: 21967083.6 i/s - same-ish: difference falls within error
      New Slow Blank: 19428558.7 i/s - 1.14x slower
          Slow Blank:  1073053.0 i/s - 20.60x slower


================== Test String Length: 6 ==================
Warming up --------------------------------------
          Fast Blank   191.657k i/100ms
  Fast ActiveSupport   197.443k i/100ms
          Slow Blank    59.726k i/100ms
      New Slow Blank    92.509k i/100ms
Calculating -------------------------------------
          Fast Blank      8.592M (± 3.4%) i/s -     42.931M in   5.002842s
  Fast ActiveSupport      9.323M (± 3.2%) i/s -     46.794M in   5.024754s
          Slow Blank    933.543k (± 2.0%) i/s -      4.718M in   5.056400s
      New Slow Blank      1.774M (± 2.7%) i/s -      8.881M in   5.010976s

Comparison:
  Fast ActiveSupport:  9323242.7 i/s
          Fast Blank:  8592076.6 i/s - 1.09x slower
      New Slow Blank:  1773591.7 i/s - 5.26x slower
          Slow Blank:   933543.2 i/s - 9.99x slower


================== Test String Length: 14 ==================
Warming up --------------------------------------
          Fast Blank   217.579k i/100ms
  Fast ActiveSupport   219.251k i/100ms
          Slow Blank   108.983k i/100ms
      New Slow Blank    64.484k i/100ms
Calculating -------------------------------------
          Fast Blank     17.892M (± 3.9%) i/s -     89.425M in   5.006639s
  Fast ActiveSupport     18.786M (± 3.9%) i/s -     93.839M in   5.003915s
          Slow Blank      2.411M (± 3.3%) i/s -     12.097M in   5.022778s
      New Slow Blank      1.030M (± 2.1%) i/s -      5.159M in   5.011957s

Comparison:
  Fast ActiveSupport: 18786330.2 i/s
          Fast Blank: 17892076.9 i/s - same-ish: difference falls within error
          Slow Blank:  2411216.3 i/s - 7.79x slower
      New Slow Blank:  1029749.9 i/s - 18.24x slower


================== Test String Length: 24 ==================
Warming up --------------------------------------
          Fast Blank   201.588k i/100ms
  Fast ActiveSupport   205.157k i/100ms
          Slow Blank    95.901k i/100ms
      New Slow Blank    61.401k i/100ms
Calculating -------------------------------------
          Fast Blank     10.735M (±11.9%) i/s -     46.970M in   5.015147s
  Fast ActiveSupport     11.836M (± 4.0%) i/s -     59.085M in   5.000830s
          Slow Blank      1.979M (± 2.9%) i/s -      9.974M in   5.043224s
      New Slow Blank    942.277k (± 3.4%) i/s -      4.728M in   5.023364s

Comparison:
  Fast ActiveSupport: 11836116.5 i/s
          Fast Blank: 10735159.6 i/s - same-ish: difference falls within error
          Slow Blank:  1979411.5 i/s - 5.98x slower
      New Slow Blank:   942276.8 i/s - 12.56x slower


================== Test String Length: 136 ==================
Warming up --------------------------------------
          Fast Blank   201.447k i/100ms
  Fast ActiveSupport   203.688k i/100ms
          Slow Blank    97.240k i/100ms
      New Slow Blank    61.616k i/100ms
Calculating -------------------------------------
          Fast Blank     10.874M (± 3.3%) i/s -     54.391M in   5.007674s
  Fast ActiveSupport     11.892M (± 3.3%) i/s -     59.477M in   5.007113s
          Slow Blank      1.994M (± 3.9%) i/s -     10.016M in   5.031827s
      New Slow Blank    964.325k (± 3.6%) i/s -      4.868M in   5.054301s

Comparison:
  Fast ActiveSupport: 11892273.2 i/s
          Fast Blank: 10874334.0 i/s - 1.09x slower
          Slow Blank:  1994065.9 i/s - 5.96x slower
      New Slow Blank:   964325.2 i/s - 12.33x slower

The "Fast" versions being the native Crystal and "Slow" being MRI Ruby.

On a Ubuntu machine I get consistently BAD results:

================== Test String Length: 0 ==================
Calculating -------------------------------------
          Fast Blank   137.644k i/100ms
  Fast ActiveSupport   138.318k i/100ms
          Slow Blank    72.109k i/100ms
      New Slow Blank   151.683k i/100ms
-------------------------------------------------
          Fast Blank      8.063M (± 4.1%) i/s -     40.330M
  Fast ActiveSupport      7.880M (± 8.6%) i/s -     39.006M
          Slow Blank      1.607M (± 6.8%) i/s -      8.004M
      New Slow Blank     19.326M (± 8.1%) i/s -     95.864M

Comparison:
      New Slow Blank: 19325547.8 i/s
          Fast Blank:  8063380.8 i/s - 2.40x slower
  Fast ActiveSupport:  7879726.2 i/s - 2.45x slower
          Slow Blank:  1606953.4 i/s - 12.03x slower


================== Test String Length: 6 ==================
Calculating -------------------------------------
          Fast Blank   108.852k i/100ms
  Fast ActiveSupport    70.605k i/100ms
          Slow Blank    66.627k i/100ms
      New Slow Blank    84.926k i/100ms
-------------------------------------------------
          Fast Blank      3.951M (±15.9%) i/s -     17.525M
  Fast ActiveSupport      1.270M (± 4.2%) i/s -      6.354M
          Slow Blank      1.338M (± 7.0%) i/s -      6.663M
      New Slow Blank      2.194M (± 5.0%) i/s -     10.955M

Comparison:
          Fast Blank:  3950536.7 i/s
      New Slow Blank:  2194200.6 i/s - 1.80x slower
          Slow Blank:  1338334.2 i/s - 2.95x slower
  Fast ActiveSupport:  1269842.9 i/s - 3.11x slower


================== Test String Length: 14 ==================
Calculating -------------------------------------
          Fast Blank   117.034k i/100ms
  Fast ActiveSupport   106.957k i/100ms
          Slow Blank   100.057k i/100ms
      New Slow Blank    74.057k i/100ms
-------------------------------------------------
          Fast Blank      3.864M (± 4.9%) i/s -     19.311M
  Fast ActiveSupport      3.037M (± 4.2%) i/s -     15.188M
          Slow Blank      3.314M (± 5.0%) i/s -     16.609M
      New Slow Blank      1.600M (± 5.3%) i/s -      7.998M

Comparison:
          Fast Blank:  3863724.5 i/s
          Slow Blank:  3313866.0 i/s - 1.17x slower
  Fast ActiveSupport:  3037103.9 i/s - 1.27x slower
      New Slow Blank:  1599564.4 i/s - 2.42x slower


================== Test String Length: 24 ==================
Calculating -------------------------------------
          Fast Blank    98.364k i/100ms
  Fast ActiveSupport    71.871k i/100ms
          Slow Blank    89.585k i/100ms
      New Slow Blank    70.762k i/100ms
-------------------------------------------------
          Fast Blank      2.801M (± 5.9%) i/s -     13.968M
  Fast ActiveSupport      1.399M (± 4.9%) i/s -      7.043M
          Slow Blank      2.545M (± 5.4%) i/s -     12.721M
      New Slow Blank      1.468M (± 4.8%) i/s -      7.359M

Comparison:
          Fast Blank:  2801422.3 i/s
          Slow Blank:  2544573.4 i/s - 1.10x slower
      New Slow Blank:  1468243.3 i/s - 1.91x slower
  Fast ActiveSupport:  1398659.2 i/s - 2.00x slower


================== Test String Length: 136 ==================
Calculating -------------------------------------
          Fast Blank    55.046k i/100ms
  Fast ActiveSupport    45.574k i/100ms
          Slow Blank    90.240k i/100ms
      New Slow Blank    71.276k i/100ms
-------------------------------------------------
          Fast Blank    858.295k (± 3.8%) i/s -      4.294M
  Fast ActiveSupport    651.345k (± 2.8%) i/s -      3.281M
          Slow Blank      2.597M (± 4.7%) i/s -     12.995M
      New Slow Blank      1.467M (± 5.6%) i/s -      7.341M

Comparison:
          Slow Blank:  2596516.9 i/s
      New Slow Blank:  1466785.4 i/s - 1.77x slower
          Fast Blank:   858294.6 i/s - 3.03x slower
  Fast ActiveSupport:   651345.3 i/s - 3.99x slower

So I have 3 hypothesis:

  1. LLVM on OS X is simply better than on Ubuntu 14.04
  2. My Makefile uses 2 different ways to compile, on the OS X version just the crystal command is all it takes to generate the .bundle file. On Ubuntu I use CC to link the .o generated by crystal into a shared-object. Could this add extra overheads somehow? Could the crystal command generate the .so directly?
  3. Are there different compile flags to optimize for Linux?

@asterite could you help out with those? I am not so familiar with the LLVM tooling enough to know those answers.

@akitaonrails

This comment has been minimized.

Show comment
Hide comment
@akitaonrails

akitaonrails Jul 5, 2016

Ok, stupid me, I forgot to add the --release flag in the Linux version! Just by adding it I already had different results (WAY better):

cc @asterite

================== Test String Length: 0 ==================
Calculating -------------------------------------
          Fast Blank   152.235k i/100ms
  Fast ActiveSupport   146.990k i/100ms
          Slow Blank    77.584k i/100ms
      New Slow Blank   159.082k i/100ms
-------------------------------------------------
          Fast Blank     10.356M (±16.2%) i/s -     45.670M
  Fast ActiveSupport     10.679M (± 5.6%) i/s -     53.210M
          Slow Blank      1.615M (± 7.1%) i/s -      8.069M
      New Slow Blank     19.509M (± 8.2%) i/s -     96.722M

Comparison:
      New Slow Blank: 19509084.4 i/s
  Fast ActiveSupport: 10679497.5 i/s - 1.83x slower
          Fast Blank: 10356215.8 i/s - 1.88x slower
          Slow Blank:  1614781.4 i/s - 12.08x slower


================== Test String Length: 6 ==================
Calculating -------------------------------------
          Fast Blank   146.407k i/100ms
  Fast ActiveSupport   143.819k i/100ms
          Slow Blank    64.780k i/100ms
      New Slow Blank    87.385k i/100ms
-------------------------------------------------
          Fast Blank      8.063M (± 6.4%) i/s -     40.116M
  Fast ActiveSupport      7.944M (± 5.7%) i/s -     39.694M
          Slow Blank      1.334M (± 7.2%) i/s -      6.672M
      New Slow Blank      2.134M (± 6.0%) i/s -     10.661M

Comparison:
          Fast Blank:  8063131.3 i/s
  Fast ActiveSupport:  7944055.0 i/s - 1.01x slower
      New Slow Blank:  2133716.8 i/s - 3.78x slower
          Slow Blank:  1334160.2 i/s - 6.04x slower


================== Test String Length: 14 ==================
Calculating -------------------------------------
          Fast Blank   142.824k i/100ms
  Fast ActiveSupport   143.170k i/100ms
          Slow Blank   101.076k i/100ms
      New Slow Blank    73.825k i/100ms
-------------------------------------------------
          Fast Blank      8.167M (± 6.5%) i/s -     40.705M
  Fast ActiveSupport      7.868M (± 5.9%) i/s -     39.229M
          Slow Blank      3.335M (± 5.8%) i/s -     16.678M
      New Slow Blank      1.554M (± 5.5%) i/s -      7.752M

Comparison:
          Fast Blank:  8167184.0 i/s
  Fast ActiveSupport:  7867974.8 i/s - 1.04x slower
          Slow Blank:  3334851.6 i/s - 2.45x slower
      New Slow Blank:  1553911.8 i/s - 5.26x slower


================== Test String Length: 24 ==================
Calculating -------------------------------------
          Fast Blank   131.060k i/100ms
  Fast ActiveSupport   131.229k i/100ms
          Slow Blank    85.616k i/100ms
      New Slow Blank    72.079k i/100ms
-------------------------------------------------
          Fast Blank      5.866M (± 5.4%) i/s -     29.357M
  Fast ActiveSupport      5.766M (± 5.2%) i/s -     28.739M
          Slow Blank      2.395M (±12.3%) i/s -     11.815M
      New Slow Blank      1.204M (±16.4%) i/s -      5.838M

Comparison:
          Fast Blank:  5865634.8 i/s
  Fast ActiveSupport:  5766221.6 i/s - 1.02x slower
          Slow Blank:  2395286.6 i/s - 2.45x slower
      New Slow Blank:  1204401.9 i/s - 4.87x slower


================== Test String Length: 136 ==================
Calculating -------------------------------------
          Fast Blank    72.763k i/100ms
  Fast ActiveSupport    95.883k i/100ms
          Slow Blank    75.816k i/100ms
      New Slow Blank    58.862k i/100ms
-------------------------------------------------
          Fast Blank      2.488M (± 5.1%) i/s -     12.442M
  Fast ActiveSupport      2.540M (± 3.2%) i/s -     12.752M
          Slow Blank      2.533M (± 7.3%) i/s -     12.585M
      New Slow Blank      1.280M (±13.7%) i/s -      6.298M

Comparison:
  Fast ActiveSupport:  2540464.3 i/s
          Slow Blank:  2532795.4 i/s - 1.00x slower
          Fast Blank:  2488027.4 i/s - 1.02x slower
      New Slow Blank:  1279936.2 i/s - 1.98x slower

Some of the last batch of tests are not so good, but at the very least they are comparable to MRI, not worse. I will keep looking into the compilation flags.

akitaonrails commented Jul 5, 2016

Ok, stupid me, I forgot to add the --release flag in the Linux version! Just by adding it I already had different results (WAY better):

cc @asterite

================== Test String Length: 0 ==================
Calculating -------------------------------------
          Fast Blank   152.235k i/100ms
  Fast ActiveSupport   146.990k i/100ms
          Slow Blank    77.584k i/100ms
      New Slow Blank   159.082k i/100ms
-------------------------------------------------
          Fast Blank     10.356M (±16.2%) i/s -     45.670M
  Fast ActiveSupport     10.679M (± 5.6%) i/s -     53.210M
          Slow Blank      1.615M (± 7.1%) i/s -      8.069M
      New Slow Blank     19.509M (± 8.2%) i/s -     96.722M

Comparison:
      New Slow Blank: 19509084.4 i/s
  Fast ActiveSupport: 10679497.5 i/s - 1.83x slower
          Fast Blank: 10356215.8 i/s - 1.88x slower
          Slow Blank:  1614781.4 i/s - 12.08x slower


================== Test String Length: 6 ==================
Calculating -------------------------------------
          Fast Blank   146.407k i/100ms
  Fast ActiveSupport   143.819k i/100ms
          Slow Blank    64.780k i/100ms
      New Slow Blank    87.385k i/100ms
-------------------------------------------------
          Fast Blank      8.063M (± 6.4%) i/s -     40.116M
  Fast ActiveSupport      7.944M (± 5.7%) i/s -     39.694M
          Slow Blank      1.334M (± 7.2%) i/s -      6.672M
      New Slow Blank      2.134M (± 6.0%) i/s -     10.661M

Comparison:
          Fast Blank:  8063131.3 i/s
  Fast ActiveSupport:  7944055.0 i/s - 1.01x slower
      New Slow Blank:  2133716.8 i/s - 3.78x slower
          Slow Blank:  1334160.2 i/s - 6.04x slower


================== Test String Length: 14 ==================
Calculating -------------------------------------
          Fast Blank   142.824k i/100ms
  Fast ActiveSupport   143.170k i/100ms
          Slow Blank   101.076k i/100ms
      New Slow Blank    73.825k i/100ms
-------------------------------------------------
          Fast Blank      8.167M (± 6.5%) i/s -     40.705M
  Fast ActiveSupport      7.868M (± 5.9%) i/s -     39.229M
          Slow Blank      3.335M (± 5.8%) i/s -     16.678M
      New Slow Blank      1.554M (± 5.5%) i/s -      7.752M

Comparison:
          Fast Blank:  8167184.0 i/s
  Fast ActiveSupport:  7867974.8 i/s - 1.04x slower
          Slow Blank:  3334851.6 i/s - 2.45x slower
      New Slow Blank:  1553911.8 i/s - 5.26x slower


================== Test String Length: 24 ==================
Calculating -------------------------------------
          Fast Blank   131.060k i/100ms
  Fast ActiveSupport   131.229k i/100ms
          Slow Blank    85.616k i/100ms
      New Slow Blank    72.079k i/100ms
-------------------------------------------------
          Fast Blank      5.866M (± 5.4%) i/s -     29.357M
  Fast ActiveSupport      5.766M (± 5.2%) i/s -     28.739M
          Slow Blank      2.395M (±12.3%) i/s -     11.815M
      New Slow Blank      1.204M (±16.4%) i/s -      5.838M

Comparison:
          Fast Blank:  5865634.8 i/s
  Fast ActiveSupport:  5766221.6 i/s - 1.02x slower
          Slow Blank:  2395286.6 i/s - 2.45x slower
      New Slow Blank:  1204401.9 i/s - 4.87x slower


================== Test String Length: 136 ==================
Calculating -------------------------------------
          Fast Blank    72.763k i/100ms
  Fast ActiveSupport    95.883k i/100ms
          Slow Blank    75.816k i/100ms
      New Slow Blank    58.862k i/100ms
-------------------------------------------------
          Fast Blank      2.488M (± 5.1%) i/s -     12.442M
  Fast ActiveSupport      2.540M (± 3.2%) i/s -     12.752M
          Slow Blank      2.533M (± 7.3%) i/s -     12.585M
      New Slow Blank      1.280M (±13.7%) i/s -      6.298M

Comparison:
  Fast ActiveSupport:  2540464.3 i/s
          Slow Blank:  2532795.4 i/s - 1.00x slower
          Fast Blank:  2488027.4 i/s - 1.02x slower
      New Slow Blank:  1279936.2 i/s - 1.98x slower

Some of the last batch of tests are not so good, but at the very least they are comparable to MRI, not worse. I will keep looking into the compilation flags.

akitaonrails added some commits Jul 5, 2016

updating lib_ruby.cr to the latest from crystalized_ruby. Tweaking co…
…mpilation flags to make the Linux version as competitive as the OS X version, but seems like it's way more efficient on OS X still
Fixing the dirty patch in the extconf to generate the makefile in the…
… correct place, and changing the spec back to require from the ruby path instead of giving it an absolute path
Adjusting the Makefile some more, but no more performance jumps were …
…possible just tweaking linking flags. Adding rb_rescue mapping to avoid 'string contain null bytes' but still unable to make specs pass as \u0000 from Ruby ends up as just an empty string in Crystal
fixing the makefile for os x. also trying to improve a tiny bit by no…
…t creating a crystal string if the ruby version is empty by checking it with ruby's c length function, not so much of a difference though
@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Jul 6, 2016

@akitaonrails I don't think you should use Crystal's String class, you should work with raw bytes

asterite commented Jul 6, 2016

@akitaonrails I don't think you should use Crystal's String class, you should work with raw bytes

@akitaonrails

This comment has been minimized.

Show comment
Hide comment
@akitaonrails

akitaonrails Jul 6, 2016

@asterite my hypothesis is that unless I can cast the C-Ruby's VALUE* directly into either a Crystal String or Crystal Array, they will still have the disadvantage of "copying" data from C to Crystal. The C version can deal with the string directly (as it's in the very same memory space). Makes sense? Maybe we hit a wall in this particular use case.

akitaonrails commented Jul 6, 2016

@asterite my hypothesis is that unless I can cast the C-Ruby's VALUE* directly into either a Crystal String or Crystal Array, they will still have the disadvantage of "copying" data from C to Crystal. The C version can deal with the string directly (as it's in the very same memory space). Makes sense? Maybe we hit a wall in this particular use case.

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Jul 6, 2016

@akitaonrails to avoid copying memory you should use RSTRING_PTR and RSTRING_LEN. But Ruby strings can have arbitrary encoding, so you'll also want to use STR_ENC_GET and rb_enc_codepoint_len. Of course, binding RSTRING_PTR is impossible because it's a C macro, so you'll have to expand the code.

To get the same performance as the C version you'll have to write exactly the same code that the C version, only with Crystal syntax. Any other thing you do and it'll be incorrect or slow. Even the Rust implementation presented here is wrong, because Rust also assumes all strings are UTF-8. This is not the case in Ruby. You have to decode the string to check for whitespace codepoints, and decoding a string of arbitrary encoding is hard (or at least tedious). This is what Ruby's rb_enc_codepoint_len does for you.

But there's really no need to do this. I personally don't think writing Ruby extensions in Crystal is something I'd push forward, mostly because it's a GC'ed language that's mostly suitable for high-level applications. I wouldn't write a Kernel in Crystal, nor I would write Ruby extensions in it. But know that this is just my personal opinion, I'm sure other crystallers have other points of view.

asterite commented Jul 6, 2016

@akitaonrails to avoid copying memory you should use RSTRING_PTR and RSTRING_LEN. But Ruby strings can have arbitrary encoding, so you'll also want to use STR_ENC_GET and rb_enc_codepoint_len. Of course, binding RSTRING_PTR is impossible because it's a C macro, so you'll have to expand the code.

To get the same performance as the C version you'll have to write exactly the same code that the C version, only with Crystal syntax. Any other thing you do and it'll be incorrect or slow. Even the Rust implementation presented here is wrong, because Rust also assumes all strings are UTF-8. This is not the case in Ruby. You have to decode the string to check for whitespace codepoints, and decoding a string of arbitrary encoding is hard (or at least tedious). This is what Ruby's rb_enc_codepoint_len does for you.

But there's really no need to do this. I personally don't think writing Ruby extensions in Crystal is something I'd push forward, mostly because it's a GC'ed language that's mostly suitable for high-level applications. I wouldn't write a Kernel in Crystal, nor I would write Ruby extensions in it. But know that this is just my personal opinion, I'm sure other crystallers have other points of view.

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Jul 6, 2016

Another example: implementing fibonacci as a Ruby extension, in Crystal. You'll get incredible speeds. In fact if you compare bare fibonacci in Ruby and Crystal you'll see Crystal is way faster, and this will also apply to a language like Rust.

Of course, try passing a number like 100 or 1000 to that function. Ruby will happily give you the correct result, while in Crystal and Rust the result will quickly overflow, giving an incorrect result. So sure, it's faster, but it has a completely different semantic, so in my opinion those benchmarks are flawed, they are comparing apples to bananas.

asterite commented Jul 6, 2016

Another example: implementing fibonacci as a Ruby extension, in Crystal. You'll get incredible speeds. In fact if you compare bare fibonacci in Ruby and Crystal you'll see Crystal is way faster, and this will also apply to a language like Rust.

Of course, try passing a number like 100 or 1000 to that function. Ruby will happily give you the correct result, while in Crystal and Rust the result will quickly overflow, giving an incorrect result. So sure, it's faster, but it has a completely different semantic, so in my opinion those benchmarks are flawed, they are comparing apples to bananas.

@akitaonrails

This comment has been minimized.

Show comment
Hide comment
@akitaonrails

akitaonrails Jul 7, 2016

I agree with you, I got to the same conclusions. In the blog post I concluded that I chose the wrong use case (fast blank). Unless I go through all the hoops to expand the RSTRING_PTR and other macros, this would be beyond the point.

It's definitely not a Crystal flaw, it's a flaw in choosing the use case. My goal was more of a proof of concept to see how it would be. And in the positive site, even with the added overhead, the Crystal version still performs 2 to 3 times faster than C-Ruby, and the integration with a Ruby gem is more clear now.

I'd assume that some use cases with way more significantly computational intensive operations would see more benefits than a simple blank comparison (because the data copying overhead is larger than the blank comparison itself). Passing large data sets and doing expensive computation in Crystal would give some apps a better performance.

And as the algorithms get more complicated, it becomes harder to do and maintain in C and easier to do in Crystal. We would be trading raw performance for maintainability, which is a good win.

I will explore some more use cases, but this particular one is a no go, so I will close this PR.

Thanks for the help! :-)

akitaonrails commented Jul 7, 2016

I agree with you, I got to the same conclusions. In the blog post I concluded that I chose the wrong use case (fast blank). Unless I go through all the hoops to expand the RSTRING_PTR and other macros, this would be beyond the point.

It's definitely not a Crystal flaw, it's a flaw in choosing the use case. My goal was more of a proof of concept to see how it would be. And in the positive site, even with the added overhead, the Crystal version still performs 2 to 3 times faster than C-Ruby, and the integration with a Ruby gem is more clear now.

I'd assume that some use cases with way more significantly computational intensive operations would see more benefits than a simple blank comparison (because the data copying overhead is larger than the blank comparison itself). Passing large data sets and doing expensive computation in Crystal would give some apps a better performance.

And as the algorithms get more complicated, it becomes harder to do and maintain in C and easier to do in Crystal. We would be trading raw performance for maintainability, which is a good win.

I will explore some more use cases, but this particular one is a no go, so I will close this PR.

Thanks for the help! :-)

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Jul 7, 2016

I actually think combining Crystal with Ruby is a great idea, but maybe in other ways. For example https://github.com/mperham/sidekiq.cr is one such example: you write your front-end in Ruby (Ruby is wonderful for this), and then you write some CPU-demanding tasks in Crystal. In this case there's no need to mix the languages in a single process, but you are still using both of them.

asterite commented Jul 7, 2016

I actually think combining Crystal with Ruby is a great idea, but maybe in other ways. For example https://github.com/mperham/sidekiq.cr is one such example: you write your front-end in Ruby (Ruby is wonderful for this), and then you write some CPU-demanding tasks in Crystal. In this case there's no need to mix the languages in a single process, but you are still using both of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment