Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a benchmark between *_with? and select+regexp #6

Closed
wants to merge 2 commits into from

Conversation

wolflee
Copy link
Collaborator

@wolflee wolflee commented Dec 16, 2015

Please do NOT merge, only for reference purpose.

run rspec spec/bm.rb to see the results

@wolflee
Copy link
Collaborator Author

wolflee commented Dec 16, 2015

@fahchen

@fahchen
Copy link
Member

fahchen commented Dec 16, 2015

I'm afraid that there is something wrong with your code.

2.2.3 :006 > code = 110100
 => 110100
2.2.3 :007 > gb2260 = GB2260.new
 => #<GB2260:0x007f8b3b7dcd50 @revision="2014">
2.2.3 :008 > gb2260.regex_prefectures code
 => [#<GB2260::Division:0x007f8b3b3c2120 @code="110100", @name="市辖区", @revision="2014">, #<GB2260::Division:0x007f8b3b3c2080 @code="110101", @name="东城区", @revision="2014">, #<GB2260::Division:0x007f8b3b3c1fb8 @code="110102", @name="西城区", @revision="2014">, #<GB2260::Division:0x007f8b3b3c1f18 @code="110105", @name="朝阳区", @revision="2014">, #<GB2260::Division:0x007f8b3b3c1e78 @code="110106", @name="丰台区", @revision="2014">, #<GB2260::Division:0x007f8b3b3c1db0 @code="110107", @name="石景山区", @revision="2014">, #<GB2260::Division:0x007f8b3b3c1d38 @code="110108", @name="海淀区", @revision="2014">, #<GB2260::Division:0x007f8b3b3c1c98 @code="110109", @name="门头沟区", @revision="2014">, #<GB2260::Division:0x007f8b3b3c1bd0 @code="110111", @name="房山区", @revision="2014">, #<GB2260::Division:0x007f8b3b3c1b30 @code="110112", @name="通州区", @revision="2014">, #<GB2260::Division:0x007f8b3b3c1a90 @code="110113", @name="顺义区", @revision="2014">, #<GB2260::Division:0x007f8b3b3c19c8 @code="110114", @name="昌平区", @revision="2014">, #<GB2260::Division:0x007f8b3b3c1950 @code="110115", @name="大兴区", @revision="2014">, #<GB2260::Division:0x007f8b3b3c18b0 @code="110116", @name="怀柔区", @revision="2014">, #<GB2260::Division:0x007f8b3b3c17e8 @code="110117", @name="平谷区", @revision="2014">, #<GB2260::Division:0x007f8b3b3c1748 @code="110200", @name="县", @revision="2014">, #<GB2260::Division:0x007f8b3b3c16a8 @code="110228", @name="密云县", @revision="2014">, #<GB2260::Division:0x007f8b3b3c15e0 @code="110229", @name="延庆县", @revision="2014">]
2.2.3 :009 > gb2260.prefectures code
 => [#<GB2260::Division:0x007f8b3b25b818 @code="110100", @name="市辖区", @revision="2014">, #<GB2260::Division:0x007f8b3b25b728 @code="110200", @name="县", @revision="2014">]
2.2.3 :010 >

Here is my code and benchmark https://gist.github.com/fahchen/04c126e7c2eddeda9fe6

@wolflee
Copy link
Collaborator Author

wolflee commented Dec 16, 2015

Nice catch! (and the regex)
But that's just showing that regex is more error-prone to noobs(like me) and less human readable.

Actually if we change the method like this:

  def prefectures(province_code)
    province_prefix = province_code.to_s[0,2].freeze
    Division.batch(all_code
      .select { |c| c.start_with?(province_prefix) }
      .select { |c| c.end_with?(PREFECTURE_SUFFIX) }
      .reject { |c| c.end_with?(PROVINCE_SUFFIX) },
    @revision)
  end

We can get even better performance:
Total allocated: 61239 bytes (62 objects)

The old code does a ton of allocations because it send the filter method symbols on-the-fly. The only reason it was made like this is to avoid CodeClimate duplication penalty. So if we really want the performance, we must bare with the lower CodeClimate GPA and don't "cheat" any more.

@fahchen
Copy link
Member

fahchen commented Dec 16, 2015

+1 for the lastest code

Calculating -------------------------------------
   prefectures.start_with?   143.000  i/100ms
   prefectures.regex    62.000  i/100ms
-------------------------------------------------
   prefectures.start_with?      1.464k (± 8.7%) i/s -      7.293k
   prefectures.regex    681.787  (± 3.5%) i/s -      3.410k

Comparison:
   prefectures.start_with?:     1464.1 i/s
   prefectures.regex:      681.8 i/s - 2.15x slower

@wolflee
Copy link
Collaborator Author

wolflee commented Dec 25, 2015

Closing this as come up with a better solution.

@wolflee wolflee closed this Dec 25, 2015
@wolflee wolflee deleted the regexp_perf branch December 25, 2015 05:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants