Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ordered dict based on SwissDict #38167

Conversation

PallHaraldsson
Copy link
Contributor

@PallHaraldsson PallHaraldsson commented Oct 24, 2020

Fixes #38163 (not 38135 that got here by accident...).

See this interesting talk, why I chose this implementation:

https://www.youtube.com/watch?v=ncHmEUmJZf4&t=3s

This talk describes the process of design and optimization that starts with std::unordered_map and ends with a new design we call "SwissTable", a 2-level N-way associative hash table. Our implementation of this new design gets 2-3x better performance with significant memory reductions (compared to unordered_map) and is being broadly deployed across Google.

https://abseil.io/blog/20180927-swisstables

[EDIT: Attribution: Code is modified code from @eulerkochy's taken from JuliaCollections/DataStructures.jl.]

@KristofferC
Copy link
Sponsor Member

Testing if CI ok with it. And it seems to be, with "1 failing, 3 pending, and 12 successful checks".

I mean, of course it is since the new Dict you added is not used anywhere.

@PallHaraldsson PallHaraldsson marked this pull request as ready for review October 25, 2020 21:14
@PallHaraldsson
Copy link
Contributor Author

It passes now, with tests, except for a false alarm for freebsd:

command timed out: 3600 seconds without output running [b'sh', b'-c', b'gmake -j6 VERBOSE=1 TAGGED_RELEASE_BANNER="Official https://julialang.org/ release" SRCCACHE=/tmp/srccache USECCACHE=1 JULIA_CPU_TARGET="generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)" MARCH=x86-64  LLVM_ASSERTIONS=1 FORCE_ASSERTIONS=1 USE_BINARYBUILDER=1  release'], attempting to kill
signal (15): Terminated
in expression starting at compiler/bootstrap.jl:8

@KristofferC
Copy link
Sponsor Member

I am sorry but I am going to close this. There is no desire to have a second implementation of a dictionary with the only difference about it being ordered. People who want such a dictionary can just get the package.

@tpapp
Copy link
Contributor

tpapp commented Oct 26, 2020

The code has significant overlap with

https://github.com/JuliaCollections/DataStructures.jl/blob/9f3295869aa976309ef94947029e404c5970dd18/src/swiss_dict.jl

without attribution. @PallHaraldsson, can you please explain? Wording of the PR implies that this is your work, eg

See this interesting talk, why I chose this implementation:

Do you happen to be the same person as @eulerkochy, who implemented this as JuliaCollections/DataStructures.jl#634?

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Oct 26, 2020

without attribution [..] can you please explain?

Yes. This is an important matter to answer, otherwise I wasn't going to (do more at JuliaLang regarding Dicts or) comment on this PR, as it was closed, and made clear it will not be merged, so this is a moot point?

"based on SwissDict" (i.e. Julia code), is/was my imperfect attribution, while I informed the author "I'm copying your implementation to Base" at: JuliaCollections/DataStructures.jl#693 after I started this PR here, before I took Draft status off.

No, I'm not @eulerkochy, I wish! We've corresponded on benchmarking elsewhere prior, where he actually responded (and explained he's busy).

I can see my language confused, while I quoted Google's "SwissTable" that's not the same as the Julia code term SwissDict (I'm not aware that Google uses that term), for the code I used, implying that I reused that Julia code, not me rewriting the C++ code. The code here (his code, mostly) is a reimplementation of "new “Swiss Table” family of hashtables in Abseil", I'm assuming following that design.

I should have been more careful (while mens rea doesn't apply), with taking the Draft status off, or even before, adding precise attribution. I did put in "(from OrderedCollections.jl/DataStructures.jl)" at the very top on my first PR in the series of Dict PRs... #37804. I did NOT specifying a person in any of the cases. I believe I tracked down @oxinabox as the (only?) writer (and yes, we are also different persons) for OrderedDict. So I'm sorry, to you too if this wasn't kosher.

I wouldn't say I'm not taking the MIT license seriously, "only requiring preservation of copyright", and I note their names aren't listed in the files I copied from, thinking you can copy from one MIT licenced project to another MIT licensed (Julia).

If I didn't follow protocol, then please inform me, so I can do better.

[@eulerkochy deserves all the credit for the implementation for SwissDict, and of course Google for the design. It's been said this is "just copy-paste a Dict implementation from a package", so was at least clear to some, while I did a bit more, didn't only rename, I added stuff at the bottom of the file.]

@tpapp
Copy link
Contributor

tpapp commented Oct 27, 2020

If I didn't follow protocol, then please inform me, so I can do better.

Generally, when you copy a large body of code from some source, it is common courtesy to be very explicit about that. Not only for legal and ethical reasons; it also informs the readers of the PR that we are talking about code that has already been reviewed and tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ODict to Base for OrderedDict
3 participants