Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ordered Dict based on RobinDict implementations #38155

Conversation

PallHaraldsson
Copy link
Contributor

No description provided.

@JeffBezanson
Copy link
Sponsor Member

Please stop opening PRs like this. Do you really expect somebody to just click "merge" on a large PR with zero description? A change this significant, in particular, would need a detailed proposal and benchmarks.

@PallHaraldsson PallHaraldsson deleted the ordered_dict_based_on_robin branch October 23, 2020 22:29
@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Oct 23, 2020

Sorry about that, no I was trying out a different implementation, and wasn't expecting anyone to even look until tests passed. I'm having trouble testing on my local machine with them slowing down so much. I actually thought this one would work, then would have filled out.

@oscardssmith
Copy link
Member

Note that you can mark PRs as drafts to signal to people that you know it isn't ready and are just experimenting.

@PallHaraldsson
Copy link
Contributor Author

Thanks, I'm just new to this, and have now done it to #38145, but it seems too late here (and maybe would be unlikely to be merged anyway; at least I can no longer see the CI errors). Anyway, I'm not sure I'll finish this, it's getting very frustrating.

@JeffBezanson
Copy link
Sponsor Member

You can put in the description something like "Just opening this to see if CI passes". Then we will know what's going on.

@KristofferC
Copy link
Sponsor Member

Anyway, I'm not sure I'll finish this, it's getting very frustrating.

I can understand that. However, I must say that it doesn't seem like you appreciate the magnitude of the change you are trying to make. You are trying to replace a core data structure which is used everywhere and has very high requirements on its performance. In addition, it introduces a new concept into Base (the concept of a dict being ordered). A PR like that would require extensive benchmarks, a discussion about the trade-offs, potential ways it might break code, how other data structures like Set are influenced etc etc. It might have been a gold idea to discuss the idea and make sure enough people are on board with the idea itself before starting fighting with CI.

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Oct 24, 2020

I do realize "the magnitude of the change", by now... :) and my new proposal avoids all those issues, by not replacing Dict, the extensive benchmarks are not needed, this wouldn't break any code, since a new type, wouldn't change Set either, and I also think Dict is often not a performance bottleneck.

@StefanKarpinski
Copy link
Sponsor Member

But adding a new Base type is an even bigger change.

@PallHaraldsson
Copy link
Contributor Author

Why "even bigger change"? I guess we should rather continue the discussion at my proposal #38163, as the implementation hasn't been decided (i.e. maybe not based on [Ordered]RobinDict, this failed PR).

The size of the PR is similar or smaller, what I meant is it's way easier to get through CI.

@JeffBezanson
Copy link
Sponsor Member

There is no reason to simply move the code for OrderedDict (which already exists) into Base. The change (if any) to consider here is whether Base.Dict should be ordered.

@timholy
Copy link
Sponsor Member

timholy commented Oct 26, 2020

@PallHaraldsson, I think part of what people are saying is that there seems to be a mismatch between the seriousness of the change you're proposing and the depth of justification, analysis, and implementation you're providing. All the code you're introducing to base Julia already exists and is available to any Julian in the world---there is no new functionality that your PRs enable that isn't already achievable, and there are precious few advantages to having it in Base over a package. The only question is what should the "default" implementation in Julia be? This is a question that touches on many issues and requires serious benchmarking, profiling, and use-case analysis. None of this work has appeared in any of your PRs; your justifications are essentially quotes pulled from discourse. Reading what the community says and trying to take action to address their concerns is noble, and I endorse the spirit with which you're doing this work (I know you are genuinely trying to make Julia better), but the reason it hasn't been done yet is that it's a lot more complicated than copying code in OrderedCollections.jl and DataStructures.jl and pasting it here.

How to do this more productively? Here are examples of some questions that need to be addressed: are there bugs in the existing implementation? What kind of code gets easier to write if we make them ordered, and are there examples where things get more confusing? (For example, ordering seems to lead to periodic requests to support indexing by order-of-insertion, but this sets up difficult-to-resolve questions: JuliaCollections/OrderedCollections.jl#64. The absence of what seems to be natural functionality can cause additional confusion that the existing implementation does not.) What kinds of workloads do we want to optimize for, which will get faster, and which will get slower? Where are the performance bottlenecks in various implementations? How do changes affect both compile-time latency and runtime performance? There is essentially no benefit to submitting PRs such as these without such analysis.

The other issue to keep in mind is that there are nearly 1000 watchers of this repository, and they get an email or notification for each commit or comment you make. Too many "trial balloons" and people could start feeling like they are getting spammed. Likewise, 50 posts each reporting one additional tweak or tiny chance in a benchmark would also be annoying; much better to spend several days developing a whole suite and then posting a serious analysis of the results.

I'd suggest that you abandon the notion of getting this change in for Julia 1.6, and focus instead on building a more sustainable path for your contributions to Julia: choose a simpler starting project, start building familiarity with the tools you need to analyze code and its weaknesses, and develop an area of expertise where you deeply understand the issues and tradeoffs that affect design. From there, you can slowly grow your "portfolio" until you arrive at the issues (like core data structures and compile latency) that you seem to find interesting but which have high demands in terms of making contributions. In addition to Julia itself, there are many packages where you could surely gain this kind of expertise, and the fact that some packages are not "under the microscope" in the way that Julia itself is may make it easier for you to iterate through the cycle of problem identification, proposed fix, and seeing your changes merged. The more times you get through this cycle, the faster you learn, and if you start with a less high-profile project than the language itself, you may find that your learning curve is steeper and gets you faster to where you want to be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants