Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic solution for sorting with nils? #62

Closed
gnilrets opened this issue Feb 27, 2016 · 2 comments
Closed

Generic solution for sorting with nils? #62

gnilrets opened this issue Feb 27, 2016 · 2 comments

Comments

@gnilrets
Copy link
Contributor

I've been working on a way to improve the performance of joins and have once again run into an issue with sorting data that contains nil values. I've worked out what I think is a generic solution for sorting nil data. It does involve monkey patching NilClass, but I think it's pretty tame. Do you see any issues if we monkey patch NilClass and use Sortable when sorting DataFrames & Vectors? (Alternatively, we could use refinements instead of monkey patches).

# Puts nils before anything else
class NilClass
  include Comparable

  def <=>(other)
    other.nil? ? 0 : -1
  end
end

class Sortable
  include Comparable

  def initialize(value)
    @value = value
  end

  attr_reader :value

  def <=>(other)
    # when @value is nil, use <=> from NilClass
    # when other.value is nil, reverse comparison order and then use <=> from NilClass
    @value <=> other.value || -(other.value <=> @value)
  end
end

numbers_and_nil = (0.upto(10).to_a + [nil]*10).shuffle
puts numbers_and_nil.sort_by { |v| Sortable.new(v) }
# => [nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

strings_and_nil = (0.upto(10).to_a.map { |v| v.to_s } + [nil]*10).shuffle.map
puts strings_and_nil.sort_by { |v| Sortable.new(v) }
# => [nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, "0", "1", "10", "2", "3", "4", "5", "6", "7", "8", "9"]

arrays_and_nil = (0.upto(10).to_a + [nil]*10).shuffle.map { |v| [v] }
puts arrays_and_nil.sort_by { |v| Sortable.new(v) }
# => [[nil], [nil], [nil], [nil], [nil], [nil], [nil], [nil], [nil], [nil], [0], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]]

arrays_and_nil_in_2nd_position = (0.upto(10).to_a + [nil]*10).shuffle.map { |v| [(v || 1).to_s, v] }
puts arrays_and_nil_in_2nd_position.sort_by { |v| Sortable.new(v) }
# => [["0", 0], ["1", nil], ["1", nil], ["1", nil], ["1", nil], ["1", nil], ["1", nil], ["1", nil], ["1", nil], ["1", nil], ["1", nil], ["1", 1], ["10", 10], ["2", 2], ["3", 3], ["4", 4], ["5", 5], ["6", 6], ["7", 7], ["8", 8], ["9", 9]]
@v0dro
Copy link
Member

v0dro commented Feb 28, 2016

Looks like good solution, but won't the overhead of creating Sortable objects for each run of the sort_by create a GC bottleneck?

I'm good with monkey patching for now. We'll shift it to refinements once we drop support for Ruby 2.0. It's yet to be stable in the latest JRuby too.

@v0dro
Copy link
Member

v0dro commented Mar 9, 2016

Has been resolved in #67 and #63.

@v0dro v0dro closed this as completed Mar 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants