Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should this package be in Base? Or as stdlib? #31

Open
PallHaraldsson opened this issue Sep 28, 2020 · 4 comments
Open

Should this package be in Base? Or as stdlib? #31

PallHaraldsson opened this issue Sep 28, 2020 · 4 comments

Comments

@PallHaraldsson
Copy link

I noticed:

Simultaneously, we are pushing the performance of working with dictionaries to be closer to that of working with arrays.

I'm looking into replacing Dict with OrderedDict (already done successfully in Base) or other such variant.

My work should maybe be independent of yours, unless you know it might be helpful to incorporate it too. I'll look more closely laer at this package.

@andyferris
Copy link
Owner

My original goal here was a prototype for improvements to AbstractDict for Julia 2.0. Changes here are breaking so there would be a need to wait until then.

Midway through that I realised ordered dictionaries were a massive usability improvement, so that got incorporated here also.

The concrete implementation of Base.Dict is almost orthogonal in a sense - so long as we can merge it in a non-breaking way, it could enter Base at a different time (and Julia 1.6 makes a lot of sense to me, also).

And - thanks @PallHaraldsson for your consideration and effort! :)

@PallHaraldsson
Copy link
Author

PallHaraldsson commented Oct 9, 2020

Do you have an idea for what would be the best implementation of ordered Dict to put into Base? OrderedDict should have been a drop-in replacement, but I ran into some trouble. If your is faster anyway, I may look into that one. In either case, LittleDict is faster (while I'm not sure it's been benchmarked against your), so see my idea here of combining:

JuliaCollections/OrderedCollections.jl#57 (comment)

I'm a bit conflicted, should I use the most tested code, OrderedDict (originally Jeff's), or the what we would like to have eventually your or OrderedRobinDict:

JuliaLang/julia#37804 (comment)

Do you feel it's important to have UDict, unordered, so that new code could switch the the old implementation? Python implements their ordered by modifying the old one adding doubly linked lists, so I think time complexity is the same, just memory overhead and slight slowdown.

Do you think Set should stay unordered? The same argument against it could be made, serialization doesn't preserve order (it seems less important for sets), but having ordered doesn't seem super important, as not mathematically meaningful.

@andyferris
Copy link
Owner

It’s kind of nice having Set and Dict so closely related - I would probably try to maintain that.

The code here would be more challenging to integrate with Base. It makes radical departures in APIs and drops support for WeakKeyDict.

My experience with Base (and large software in general) is that it’s easier to iterate than to replace. If I were to try implement what’s here for example, I would be tempted to do the API changes and Dict internal changes separately.

Do we have benchmarks comparing all the implementations? There is some benchmark code here - we could maybe add the Robin dictionary or any hybrid (little/big) dictionary you come up with to the benchmarks?

@PallHaraldsson
Copy link
Author

Hi, it's VERY discouraging to see my PR being closed (after all the work, mostly on other dead-end related PRs), the one working (passing CI with tests), and the issue I made for it, where you answered:

I think the value would be in modifying the behaviour in Base to be better by default rather than in providing more functionality or exporting more types from Base

I didn't have time to respond, and rather do not want to do it now (at JuliaLang). That's what I tried and is a dead-end, is (at least a minorly) braking change, or at least an avenue I couldn't have gotten merged for 1.6 (or anyone prior to 2.0?).

I think I might be able to convince one person at a time, and it might be you: my only concern was if the API of your type here might be better (conceptually), and if ODict would need to not contradict it, so I wouldn't introduce a new type that would also have to be changed later in a braking change, if yours is more ideal.

I'm not blaming you for the issue and PR closed, or anyone really. I can see their point.

What you wrote seems inconsistent with your earlier comment:

If ordered hash maps are inherently slower at insertion/deletion then I'd say Julia might want actually both ordered and unordered versions built in. The ordered one being the default dictionary in Base. The unordered one, being

If ODict is slower, then you would want it in (and I assume you would also want it in if ordered is faster)? But you mean you want it in only as Dict (a breaking change)? Actually ODict is likely 2x faster (at least Google's design, I didn't benchmark much the actual translated SwissDict from C++ code).

I do see ODict replacing Dict later, just not now (while the original plan was right away, not using it as an opt-in). Getting an ordered dict in, any ordered, seemed time critical for 1.6, why I put in a ton of work investigating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants