Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add `Hash#put` and optimize `Set#add?` #8116

Merged
merged 3 commits into from Sep 8, 2019

Conversation

@asterite
Copy link
Member

commented Aug 24, 2019

Right now Set#add? is implemented doing:

if includes?(object)
  false
else
  add(object)
  true
end

That's a bit inefficient because there's a double hash lookup.

It would be cool to be be able to insert a key-value pair in a Hash and know if that was a new value or not (well, what Set#add? does) but that method doesn't exist. So here we introduce it as Hash#store.

Then we rewrite Set#add? using that new method.

Benchmark code:

require "benchmark"

set = Set{0}
1000.times do |i|
  set.add?(i)
end

Benchmark.ips do |x|
  x.report("add?") do
    set.clear
    1000.times do |i|
      set.add?(i)
    end
  end
end

Results:

old: 66.60k ( 15.01µs) (± 2.27%)
new: 99.17k ( 10.08µs) (± 4.75%)

Not a big difference but I still think it's bad if this can't be implemented efficiently, and I also think Hash#put might be a generally useful method because you can't do that efficiently right now.

src/hash.cr Outdated Show resolved Hide resolved
src/hash.cr Outdated Show resolved Hide resolved
@asterite asterite force-pushed the asterite:opt/set-add branch from a9d4fac to 6cd8286 Aug 25, 2019
@rdp

This comment has been minimized.

Copy link
Contributor

commented Aug 26, 2019

store is a bit of an overlapping name with "add" etc. hmm...maybe set_if_absent or something like that?

@asterite

This comment has been minimized.

Copy link
Member Author

commented Aug 26, 2019

I borrowed the name store from Ruby's API. It also exists there though with a slightly different semantic. But I think nobody uses store so it's probably fine to change the meaning (and it's fine either way because we are a separate language).

@jhass

This comment has been minimized.

Copy link
Member

commented Aug 27, 2019

I think the most common name for the "insert into map" operation is put. We could even do put(key, value) #=> old_value besides put(key) { value } #=> old_value and let []= be put(key, value); value.

This also brings me to the somewhat weird behavior of store returning the old value in case of an existing value and the new value in case of an updated value, I think it might be more consistent to return the default value in the latter case, or just always nil (again, in the no existing value case) even.

@asterite

This comment has been minimized.

Copy link
Member Author

commented Aug 27, 2019

@jhass Great idea! I like the name put, I think it's present in C# and Java. It's also shorter than store.

I thought about having a put overload returning the old value or nil if it didn't exist. Or maybe the default value if it didn't exist? The problem with that is that the resulting type will always include nil because we can't statically know that there's a default value.

That's why I think a single put overload that gives you a block to return whatever you want might be better.

Maybe we can make it return nil if it was not present. But I'm not sure... I don't know how frequent this method will be used, I just needed something to be able to efficiently implement Set#add? and this is enough.

Let's vote for a name:

  • 👀 put
  • 🚀 store
@jhass

This comment has been minimized.

Copy link
Member

commented Aug 27, 2019

I wouldn't mind put's return value being V? and ignoring the default value, that's already the case for []?.

@asterite

This comment has been minimized.

Copy link
Member Author

commented Aug 27, 2019

@jhass Hm, good point. I though []? would return the default value in case it was missing, but I guess it makes sense for it to return nil because one usually knows whether a hash has a default value or not and if it does then one uses [] instead of []?. So I guess put returning the old value or nil makes sense. Once we decide on the name I'll update this PR.

@RX14

This comment has been minimized.

Copy link
Member

commented Sep 7, 2019

Looks like everyone's agreed on put, this can probably be moved forward.

asterite added 3 commits Aug 24, 2019
This lets you store a key-value pair and know whether an old value was
present and what is it. We might be tempted to use a method that returns
the value or `nil` if it wasn't already there but this doesn't work if
values allow `nil` as a valid value. So `store` is in a way the
equivalent of `fetch` but when storing a value.
@asterite asterite force-pushed the asterite:opt/set-add branch from 6cd8286 to e14da91 Sep 8, 2019
@asterite asterite changed the title Add `Hash#store` and optimize `Set#add?` Add `Hash#put` and optimize `Set#add?` Sep 8, 2019
@asterite

This comment has been minimized.

Copy link
Member Author

commented Sep 8, 2019

Renamed! If we merge we can squash the commits or I can rewrite the history, but maybe it's not bad to keep the commit about the decision to rename store to put.

Copy link
Member

left a comment

I don't think wee need the history leading to the result. There is still a reference to the PR and the discussion behind it.

@asterite asterite added this to the 0.31.0 milestone Sep 8, 2019
@asterite asterite merged commit 08ff145 into crystal-lang:master Sep 8, 2019
5 of 6 checks passed
5 of 6 checks passed
ci/circleci: test_preview_mt Your tests failed on CircleCI
Details
ci/circleci: check_format Your tests passed on CircleCI!
Details
ci/circleci: test_darwin Your tests passed on CircleCI!
Details
ci/circleci: test_linux Your tests passed on CircleCI!
Details
ci/circleci: test_linux32 Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@asterite asterite deleted the asterite:opt/set-add branch Sep 8, 2019
@RX14

This comment has been minimized.

Copy link
Member

commented Sep 9, 2019

I'm having second thoughts about this, since I was confused by the name and thought this was a "put this value if it doesn't exist, else do nothing" call, which Hash is missing (but seems useful).

This method should probably be #replace, and have a version without the block which always returns the previous value, in addition to the current version.

@straight-shoota

This comment has been minimized.

Copy link
Member

commented Sep 9, 2019

Java has Map.put(key, value) (equal to Hash#[]=) and HashMap.putIfAbsent(key, value) which only stores if the key does not exist. The counterpart Map.replace(key, value) only stores if the key exists. There is also an overload Map.replace(key, old_value, new_value) which only stores if the key is set to old_value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.