Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Replaced Dict with OrderedDict #37804

Closed

Conversation

PallHaraldsson
Copy link
Contributor

@PallHaraldsson PallHaraldsson commented Sep 29, 2020

(from OrderedCollections.jl/DataStructures.jl).

Fixes: #34265 See also my argument: #37761 (comment)

[My first real PR, or I mean the first where I had to recompile Julia.]

Added also other variants, e.g. OrderedRobinDict, for now disabled.
UDict is the old unordered (or unpredictable order) Dict. It is for now
NOT exported, unclear it should be as there are better replacements
e.g. SwissDict based on Google's Swiss Table.

I would like packageeval run on this.

I know there are no tests (they are in the packages I took the code from). The 2x slowdown isn't real. I.e. starting Julia where I have Revise and OhMyREPL with @time

0.948864 seconds (552.62 k allocations: 34.550 MiB)
0.856972 seconds (747.65 k allocations: 48.796 MiB, 2.50% gc time)

vs. Commit a0a68a5 (14 days old master)

0.592275 seconds (635.56 k allocations: 40.700 MiB, 1.61% gc time)
0.686102 seconds (668.09 k allocations: 44.774 MiB, 2.82% gc time)

is because of precompilation issues. Note, however former has fewer allocations, so I'm optimistic about the speed (for those, after fixing packages themselves). I looked into it and at least OhMyREPL needs more statements, independent of this, and with this a bit more.

By adding precompile statements to Revise I got slowdown limited to 10%:

$ ./julia --startup-file=no -e "@time using Revise"
  0.796831 seconds (716.50 k allocations: 47.621 MiB, 1.93% gc time)
vs.
  0.721835 seconds (661.93 k allocations: 45.353 MiB, 2.18% gc time)

There are tricks possible to make OrderedDict much faster, but not in time for Julia 1.6.0 (faster than Dict is now, for this/these cases). Startup of Julia itself is 174.5 ms vs. 162.2, extra 12 ms or 7% [EDIT: no longer slower in to come PR], likely could be faster than with Dict as it is now. See: JuliaCollections/OrderedCollections.jl#57 (comment)

…ures.jl).

Added also other variants, e.g. OrderedRobinDict, for now disabled.
UDict is the old unordered (or unpredictable order) Dict. It is for now
exported, but unclear it should be as there are better replacements
e.g. SwissDict based on Google's Swiss Table.
@PallHaraldsson
Copy link
Contributor Author

Note, only real added file is base/ordered_dict.jl and just Dict->UDict renaming in dict.jl, so can ignore the other files. Helpful to merge the other disabled variants(?), and RobinDict, in case people want to experiment with it in base.

@PallHaraldsson
Copy link
Contributor Author

The only CI fail, whitespace (well plus now "makedocs") is I guess:

# Types
UDict,
OrderedRobinDict,
RobinDict,
    AbstractChannel,
    AbstractChannel,

for me, I wanted it to stand out, as in some sense WIP. Interesting that you can exporst undefined stuff (only UDict of those is really defined).

@JeffBezanson
Copy link
Sponsor Member

We are not going to dump a whole slew of new dictionary types into Base. It would help your case if the PR did only as described, and just made Dict ordered, but it's still very uncertain whether we will do this.

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Sep 29, 2020

People asked for not just replacing, but also having the old Dict. I guess that's reasonable, and first with replaced it affectesd types dependent on it e.g. IdDict WeakKeyDict, but then wasn't sure, so decided to only replace Dict, making sure those other previously dependent types are still unordered.

It's very easy to drop the other files, they are even not included (except for old Dict support). I was just working on different implementation of OrderedDict too compare, e.g. OrderedRobinDict, and if anyone would like to compare then I have the files in. We would drop them later, before Julia 1.6 release.

This shouldn't stop packageeval?

@StefanKarpinski
Copy link
Sponsor Member

This PR isn't building, let alone passing CI, never mind being able to do a PkgEval run.

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Sep 29, 2020

[Outdated, see my next comment.]

It's (only?) false alarms:

A.
Unrelated to this PR:

cp: cannot stat 'dist-extras/7z.*': No such file or directory

B.

ERROR: LoadError: `makedocs` encountered errors. Terminating build

The new Dict works for me, I compiled locally (first commit), with make, wasn't aware of doc issue, seemingly this:

help?> Dict
[..]
  Base.Dict is of type UnionAll.

I guess because of my: const Dict = OrderedDict [EDIT: gone in latest commit.]

Is it ok to not export UDict from Base (intentionally excluding users of Julia), does it interfere with using it internally? Maybe that broke something in my subsequent commit?

@PallHaraldsson
Copy link
Contributor Author

Linux CI compete (strange as I see "trigger 0 builds, 1 pending builds").

@KristofferC
Copy link
Sponsor Member

Linux CI compete (strange as I see "trigger 0 builds, 1 pending builds").

The package phase completed but the tests fail, see e.g. https://build.julialang.org/#/builders/34/builds/4250/steps/5/logs/stdio.

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Sep 30, 2020

The problem seems to be "unbound type parameters" in Method (really "UndefVarErrors"?), I'm just not sure what that means. See details on error below.

Sets seem to work however, with Dict and UDict:

julia> Base.UDict("A" => 1, "B" => 2)
Base.UDict{String, Int64} with 2 entries:
  "B" => 2
  "A" => 1

julia> Set(ans)
Set{Pair{String, Int64}} with 2 elements:
  "B" => 2
  "A" => 1

vs.

julia> Dict("A" => 1, "B" => 2)
Dict{String, Int64} with 2 entries:
  "A" => 1
  "B" => 2

julia> Set(ans)
Set{Pair{String, Int64}} with 2 elements:
  "A" => 1
  "B" => 2

Error in last test, be e.g. not in similar test higher up:

# Test that Core and Base are free of UndefVarErrors
# not using isempty so this prints more information when it fails
@testset "detect_unbound_args in Base and Core" begin
    # TODO: review this list and remove everything between test_broken and test
    let need_to_handle_undef_sparam =
            Set{Method}(detect_unbound_args(Core; recursive=true))
        pop!(need_to_handle_undef_sparam, which(Core.Compiler.eltype, Tuple{Type{Tuple{Any}}}))
        @test_broken need_to_handle_undef_sparam == Set()
        pop!(need_to_handle_undef_sparam, which(Core.Compiler._cat, Tuple{Any, AbstractArray}))
        pop!(need_to_handle_undef_sparam, first(methods(Core.Compiler.same_names)))
        @test need_to_handle_undef_sparam == Set()
    end
    let need_to_handle_undef_sparam =
            Set{Method}(detect_unbound_args(Base; recursive=true))
        pop!(need_to_handle_undef_sparam, which(Base._totuple, (Type{Tuple{Vararg{E}}} where E, Any, Any)))
        pop!(need_to_handle_undef_sparam, which(Base.eltype, Tuple{Type{Tuple{Any}}}))
        pop!(need_to_handle_undef_sparam, first(methods(Base.same_names)))
        @test_broken need_to_handle_undef_sparam == Set()
        pop!(need_to_handle_undef_sparam, which(Base._cat, Tuple{Any, AbstractArray}))
        pop!(need_to_handle_undef_sparam, which(Base.byteenv, (Union{AbstractArray{Pair{T,V}, 1}, Tuple{Vararg{Pair{T,V}}}} where {T<:AbstractString,V},)))
        pop!(need_to_handle_undef_sparam, which(Base.float, Tuple{AbstractArray{Union{Missing, T},N} where {T, N}}))
        pop!(need_to_handle_undef_sparam, which(Base.zero, Tuple{Type{Union{Missing, T}} where T}))
        pop!(need_to_handle_undef_sparam, which(Base.one, Tuple{Type{Union{Missing, T}} where T}))
        pop!(need_to_handle_undef_sparam, which(Base.oneunit, Tuple{Type{Union{Missing, T}} where T}))
        @test need_to_handle_undef_sparam == Set()
    end
end

Does it mean I need to export the new type UDict from Base, for Julia itself to work? [I tried get same error.]

It may be this line (nr 52 in ordered_dict.jl implicated in test failure):

Dict(kv::Tuple{Vararg{Pair{K,V}}}) where {K,V}  = Dict{K,V}(kv)

seen by running the tests above from the REPL:

detect_unbound_args in Base and Core: Test Failed at REPL[2]:23
  Expression: need_to_handle_undef_sparam == Set()
   Evaluated: Set(Method[Dict(kv::Tuple{Vararg{Pair{K, V}, N} where N}) where {K, V} in Base at ordered_dict.jl:52]) == Set{Any}()
Stacktrace:
 [1] top-level scope
   @ REPL[2]:23
 [2] top-level scope
   @ ~/myjulia/julia/usr/share/julia/stdlib/v1.6/Test/src/Test.jl:1113
 [3] top-level scope
   @ REPL[2]:3
Test Summary:                        | Pass  Fail  Broken  Total
detect_unbound_args in Base and Core |    1     1       2      4
ERROR: Some tests did not pass: 1 passed, 1 failed, 0 errored, 2 broken.

I just took the (working?!) code from elsewhere, and just renamed OrderedDict to Dict.

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Sep 30, 2020

I don't know about those test if false alarm or code broken, but I now get 157.5 ms startup of Julia (with change not in PR), faster than my best so far on master 162.2 ms. And also faster than with my current PR (with additional precompile for Revise), while still 12% slower:

$ ./julia --startup-file=no -e "@time using Revise"
  0.784178 seconds (747.22 k allocations: 49.842 MiB, 1.95% gc time)

vs.
$ julia --startup-file=no -e "@time using Revise"
  0.698489 seconds (672.96 k allocations: 46.114 MiB, 2.33% gc time)

[I didn't improve Revive startup speed by much, but as my precompile statements improved unchanged Revise more on my unchanged Julia master (despite now more allocations), now 12% difference. Also since my downloaded Julia master is isn't brandnew, is "Commit a0a68a5 (15 days old master)" master may have regressed and not because of my code.]

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Oct 18, 2020

I could silence the test error by commenting out this one line:

# problematic line: OrderedDict(kv::Tuple{Vararg{Pair{K,V}}}) where {K,V}  = OrderedDict{K,V}(kv)

OrderedDict(ps::Pair...)                  = OrderedDict(ps)

Is the former redundant with the latter? As of Julia 1.6 only? It there still time to get this in Julia 1.6?

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Oct 19, 2020

The docss test is a false alarm kind of, but the API for OrderedDict isn't the same so I need to look into e.g.:

A.

Error in testset Random:
Error During Test at /usr/home/julia/buildbot/worker-tabularasa/tester_freebsd64/build/share/julia/test/testdefs.jl:22
  Got exception outside of a @test
  LoadError: MethodError: no method matching isslotfilled(::Dict{Int64, Nothing}, ::Int64)
  Closest candidates are:
    isslotfilled(!Matched::Base.UDict, ::Int64) at dict.jl:172

B.

Error During Test at /buildworker/worker/tester_linux64/build/share/julia/test/asyncmap.jl:6
  Test threw exception
  Expression: allunique(asyncmap((x->begin
                sleep(1.0)
                objectid(current_task())
            end), 1:10))
  MethodError: no method matching ht_keyindex2!(::Dict{UInt64, Nothing}, ::UInt64)
  Closest candidates are:
    ht_keyindex2!(!Matched::Base.UDict{K, V}, ::Any) where {K, V} at dict.jl:305
  Stacktrace:
   [1] allunique(C::Vector{UInt64})
     @ Base ./set.jl:0
   [2] top-level scope
     @ /buildworker/worker/tester_linux64/build/share/julia/test/asyncmap.jl:6
   [3] include(mod::Module, _path::String)
     @ Base ./Base.jl:392
   [4] include
     @ /buildworker/worker/tester_linux64/build/share/julia/test/testdefs.jl:13 [inlined]
   [5] macro expansion
     @ /buildworker/worker/tester_linux64/build/share/julia/test/testdefs.jl:25 [inlined]
   [6] macro expansion
     @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Test/src/Test.jl:1113 [inlined]
   [7] macro expansion
     @ /buildworker/worker/tester_linux64/build/share/julia/test/testdefs.jl:24 [inlined]
   [8] macro expansion
     @ ./timing.jl:310 [inlined]
   [9] top-level scope
     @ /buildworker/worker/tester_linux64/build/share/julia/test/testdefs.jl:22

C.
And fix tests, that assume a order:
Evaluated: "Set([(1.0, 1.0), (2.0, 2.0), (3.0, 3.0)])" == "Set([(3.0, 3.0), (2.0, 2.0), (1.0, 1.0)])"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make Dict ordered?
4 participants