Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Hash#put and optimize Set#add? #8116

Merged
merged 3 commits into from
Sep 8, 2019

Conversation

asterite
Copy link
Member

@asterite asterite commented Aug 24, 2019

Right now Set#add? is implemented doing:

if includes?(object)
  false
else
  add(object)
  true
end

That's a bit inefficient because there's a double hash lookup.

It would be cool to be be able to insert a key-value pair in a Hash and know if that was a new value or not (well, what Set#add? does) but that method doesn't exist. So here we introduce it as Hash#store.

Then we rewrite Set#add? using that new method.

Benchmark code:

require "benchmark"

set = Set{0}
1000.times do |i|
  set.add?(i)
end

Benchmark.ips do |x|
  x.report("add?") do
    set.clear
    1000.times do |i|
      set.add?(i)
    end
  end
end

Results:

old: 66.60k ( 15.01µs) (± 2.27%)
new: 99.17k ( 10.08µs) (± 4.75%)

Not a big difference but I still think it's bad if this can't be implemented efficiently, and I also think Hash#put might be a generally useful method because you can't do that efficiently right now.

src/hash.cr Outdated Show resolved Hide resolved
src/hash.cr Outdated Show resolved Hide resolved
@rdp
Copy link
Contributor

rdp commented Aug 26, 2019

store is a bit of an overlapping name with "add" etc. hmm...maybe set_if_absent or something like that?

@asterite
Copy link
Member Author

I borrowed the name store from Ruby's API. It also exists there though with a slightly different semantic. But I think nobody uses store so it's probably fine to change the meaning (and it's fine either way because we are a separate language).

@jhass
Copy link
Member

jhass commented Aug 27, 2019

I think the most common name for the "insert into map" operation is put. We could even do put(key, value) #=> old_value besides put(key) { value } #=> old_value and let []= be put(key, value); value.

This also brings me to the somewhat weird behavior of store returning the old value in case of an existing value and the new value in case of an updated value, I think it might be more consistent to return the default value in the latter case, or just always nil (again, in the no existing value case) even.

@asterite
Copy link
Member Author

@jhass Great idea! I like the name put, I think it's present in C# and Java. It's also shorter than store.

I thought about having a put overload returning the old value or nil if it didn't exist. Or maybe the default value if it didn't exist? The problem with that is that the resulting type will always include nil because we can't statically know that there's a default value.

That's why I think a single put overload that gives you a block to return whatever you want might be better.

Maybe we can make it return nil if it was not present. But I'm not sure... I don't know how frequent this method will be used, I just needed something to be able to efficiently implement Set#add? and this is enough.

Let's vote for a name:

  • 👀 put
  • 🚀 store

@jhass
Copy link
Member

jhass commented Aug 27, 2019

I wouldn't mind put's return value being V? and ignoring the default value, that's already the case for []?.

@asterite
Copy link
Member Author

@jhass Hm, good point. I though []? would return the default value in case it was missing, but I guess it makes sense for it to return nil because one usually knows whether a hash has a default value or not and if it does then one uses [] instead of []?. So I guess put returning the old value or nil makes sense. Once we decide on the name I'll update this PR.

@RX14
Copy link
Contributor

RX14 commented Sep 7, 2019

Looks like everyone's agreed on put, this can probably be moved forward.

This lets you store a key-value pair and know whether an old value was
present and what is it. We might be tempted to use a method that returns
the value or `nil` if it wasn't already there but this doesn't work if
values allow `nil` as a valid value. So `store` is in a way the
equivalent of `fetch` but when storing a value.
@asterite asterite changed the title Add Hash#store and optimize Set#add? Add Hash#put and optimize Set#add? Sep 8, 2019
@asterite
Copy link
Member Author

asterite commented Sep 8, 2019

Renamed! If we merge we can squash the commits or I can rewrite the history, but maybe it's not bad to keep the commit about the decision to rename store to put.

Copy link
Member

@straight-shoota straight-shoota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think wee need the history leading to the result. There is still a reference to the PR and the discussion behind it.

@asterite asterite added this to the 0.31.0 milestone Sep 8, 2019
@asterite asterite merged commit 08ff145 into crystal-lang:master Sep 8, 2019
@asterite asterite deleted the opt/set-add branch September 8, 2019 10:17
@RX14
Copy link
Contributor

RX14 commented Sep 9, 2019

I'm having second thoughts about this, since I was confused by the name and thought this was a "put this value if it doesn't exist, else do nothing" call, which Hash is missing (but seems useful).

This method should probably be #replace, and have a version without the block which always returns the previous value, in addition to the current version.

@straight-shoota
Copy link
Member

Java has Map.put(key, value) (equal to Hash#[]=) and HashMap.putIfAbsent(key, value) which only stores if the key does not exist. The counterpart Map.replace(key, value) only stores if the key exists. There is also an overload Map.replace(key, old_value, new_value) which only stores if the key is set to old_value.

dnamsons pushed a commit to dnamsons/crystal that referenced this pull request Jan 10, 2020
* Add `Hash#store`

This lets you store a key-value pair and know whether an old value was
present and what is it. We might be tempted to use a method that returns
the value or `nil` if it wasn't already there but this doesn't work if
values allow `nil` as a valid value. So `store` is in a way the
equivalent of `fetch` but when storing a value.

* Set: optimize `add?` by avoiding a double hash lookup

* Rename `Hash#store` to `Hash#put`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants