Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erlang implementation #10

Merged
merged 5 commits into from Sep 2, 2015

Conversation

Projects
None yet
3 participants
@potatosalad
Copy link
Contributor

potatosalad commented May 15, 2015

I was curious how Erlang would compare using binary pattern matching. It's patterned roughly after the Elixir implementation.

Update 2015-05-20: Added the option to run the the Erlang implementation using unsafe, binary, or regex mode. Unsafe reads the entire dataset into memory, which is the fastest, but may not work on larger datasets. The other modes read each file line-by-line instead.

Update 2015-05-15: Using binary:match/2,3 and binary:part/2,3 dropped the timing on my machine from 8.9 seconds to 2.5 seconds.

Hardware: MacBook Pro 2.6GHz i7 (quad core) with 16GB RAM and PCIe SSD (very similar to the hardware in the part 2 blog post)

Erlang w/ binary pattern matching (unsafe) 8.9s 2.5s
Erlang w/ binary pattern matching (binary) 4.4s
Erlang w/ regular expression (regex) 6.6s
Ruby w/ Celluloid 31.6s
Ruby w/ grosser/parallel 8.6s
Elixir 13.3s
Golang w/ Substring 8.6s

While playing around with the implementation, I wrote three other binary pattern matching implementations which used different strategies (these aren't included in this pull request):

Erlang w/ ets (erlang_ets branch) 8.9s 2.5s
Erlang w/ basho/riak_pipe (erlang_riak_pipe branch) 30.4s 15.2s
Erlang w/ single process dict (erlang_single_process branch) 33.4s 12.6s

The riak_pipe implementation is interesting as it could be run across multiple nodes and could potentially scale horizontally to handle larger datasets better than some of the other implementations.

@potatosalad potatosalad referenced this pull request May 17, 2015

Merged

Rust implementation #14

@josevalim

This comment has been minimized.

Copy link
Contributor

josevalim commented May 20, 2015

Really nice! It should not be hard to adapt Elixir's implementation after this pattern too if it is "allowed" (I thought the regex was a hard requirement).

@josevalim

This comment has been minimized.

Copy link
Contributor

josevalim commented May 20, 2015

Also worth pointing out the current implementation loads all files into memory. Which may be ok depending on the size of the dataset.

@potatosalad

This comment has been minimized.

Copy link
Contributor Author

potatosalad commented May 20, 2015

@josevalim Yeah, I probably drifted away from the original intent of this project. I saw that the go implementation used substring pattern matching, so I thought binary pattern matching might also be allowed. Plus, it was too much fun to find out how fast I could get it to go.

I made a few changes so the implementation could be run in 1 of 3 modes:

  1. unsafe uses my initial "cheating" implementation 😁 (~2.5s)
  2. binary uses binary pattern matching with file:read_line/1 (~4.4s)
  3. regex uses regular expressions with file:read_line/1 (~6.6s)
@josevalim

This comment has been minimized.

Copy link
Contributor

josevalim commented May 20, 2015

@potatosalad definitely. I have learned quite a bit with your experiment too. :D I have even open up a thread on Erlang mailing list asking about binary:split/3 which seems to be the bottleneck in the Elixir example...

@potatosalad potatosalad changed the title Erlang implementation (binary pattern matching) Erlang implementation May 20, 2015

@dimroc

This comment has been minimized.

Copy link
Owner

dimroc commented Sep 2, 2015

I'm late to the party but great job optimizing the erlang implementation. We've had a lot of language submissions that are optimized far better than my original submissions. But comparisons are becoming apples to oranges at this point because of inconsistencies across the languages (regex, substring, ascii, unicode, map reduction, etc, loading everything into memory as opposed to streaming).

Great job with the binary:split/3 discovery, I noticed the elixir implementation improved dramatically. I will merge all of these with master but write about the differences.

dimroc added a commit that referenced this pull request Sep 2, 2015

@dimroc dimroc merged commit f5118cc into dimroc:master Sep 2, 2015

@josevalim

This comment has been minimized.

Copy link
Contributor

josevalim commented Sep 5, 2015

@dimroc let me know if you would like those changes to be ported to Elixir. I would really like to follow Erlang footsteps in here otherwise it will generate a whole "Elixir is 5x slower than Erlang" which is certainly not true. :)

Thanks for merging this.

@dimroc

This comment has been minimized.

Copy link
Owner

dimroc commented Sep 5, 2015

Sure @josevalim that would be great. Can you do me two favors though?

  1. Keep the original implementation, similar to how @potatosalad has three modes.
  2. Help me explain the difference between the implementations 😄. I feel like I only have a superficial grasp on the the powers of binary/match and binary/split and would love to have a better write up in the next blog post.

You can see the early (early) draft of that post here (minus bar charts): https://github.com/dimroc/blog/blob/source/_drafts/2015-08-31-etl-language-showdown-pt3.md

Hope I'm not asking too much.

@josevalim

This comment has been minimized.

Copy link
Contributor

josevalim commented Sep 5, 2015

Definitely. I will work on that and hopefully I have something on monday. :)

@josevalim

This comment has been minimized.

Copy link
Contributor

josevalim commented Sep 6, 2015

Btw, I can explain 2 right now. :)

  1. The move from regex to binary:split/2 makes things faster because, similar to other languages, there is a cost when running regular expressions which can be removed by using operations on binaries
  2. After moving to binary:split/2, you could further speed up things by precompiling the pattern you want to split on. So if you are going to split or match "knicks", you can precompile the pattern before hand and it will improve performance on tight loops (as you skip precompiling it on every match/split). @potatosalad was particularly smart here because he calculated all uppercase and lowercase permutations of knicks into a pattern
  3. Then @potatosalad found out that he could get even better performance by not using split and using binary:match/2. binary:split/2 in Erlang is implemented in two steps, first it uses binary:match/2 to get all positions and then it uses binary:part/2 to extract a binary from those positions. However, imagine that you have a CSV row "foo,bar,baz,bat" and you are interested only on the columns bar and bat. Even if you don't use foo and baz, binary:split/2 will convert those to binaries. @potatosalad found that he could avoid the extra cost by not converting foo and baz to binaries.
  4. Bonus: @potatosalad also realized that binary:split/2 could be faster if implemented in C. Remember when I said that binary:split/2 is binary:match/2 + binary:part/2? Because binary:part/2 needs to check the boundaries and binary:split/2 is implemented in Erlang, it means that, even though binary:split/2 returns valid boundaries, binary:part/2 called by split needs to check those boundaries all the time too which adds an extra cost. @potatosalad was extra awesome by sending a PR to Erlang OTP.

@potatosalad, feel free to correct me if I got anything wrong. :)

@potatosalad

This comment has been minimized.

Copy link
Contributor Author

potatosalad commented Sep 7, 2015

@josevalim After reviewing the code again, that all sounds correct. Thank you for the explanation, it's much more informative than I wrote in the description above. Also, sorry to have caused you extra work so things wouldn't appear to be "Elixir is 5x slower than Erlang" 😬

Interesting side notes that will hopefully be part of OTP 19:

I ran the original elixir implementation which used String.split/2 against the binary:split/2,3 pull request at erlang/otp#771 and the time dropped from ~12 seconds to ~8 seconds on my machine. So the original, and arguably more maintanable, version of the code should see a 1-2x speedup without any modifications if I'm able to get my OTP PR accepted for OTP 19.

I was also curious how much a speedup a split based solution running with the OTP PR would provide for Elixir. If you're interested it's on this branch: potatosalad/etl-language-comparison@match_and_split.

Elixir (binary) OTP 18 2.9s
Elixir (regex) OTP 18 7.9s
Elixir (split) OTP 18 3.7s
Elixir (binary) OTP 18 patched 2.9s
Elixir (regex) OTP 18 patched 6.8s
Elixir (split) OTP 18 patched 2.5s

The regex implementation looks like it benefits from the binary:split/2,3 patch as well as the split implementation. We're now the same speed as the Erlang implementation I posted in this PR.

However, while fooling around with the implementations, I stumbled upon an even faster method:

Elixir (match) 1.1s

Therefore, while there is currently no similar implementation in Erlang, let the record show that Elixir is roughly 2-3x faster than Erlang 😉

Also, I think 1.1 seconds is currently the fastest runtime for any of the languages currently posted on this repository (it beats the Rust implementation by roughly 1-2x).

@dimroc

This comment has been minimized.

Copy link
Owner

dimroc commented Nov 16, 2015

This is all great information, so I'm thinking it would be great to capture details of the implementations in a README.md in the erlang/ folder. @potatosalad, would you be up to the task? You seem the most qualified 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.