Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom iterator breaks on last row of a readdlm-loaded array #3

Open
loleg opened this issue Sep 30, 2018 · 5 comments · May be fixed by JuliaLang/julia#29458
Open

Custom iterator breaks on last row of a readdlm-loaded array #3

loleg opened this issue Sep 30, 2018 · 5 comments · May be fixed by JuliaLang/julia#29458

Comments

@loleg
Copy link

loleg commented Sep 30, 2018

When using a custom iterator over an array created with readdlm, the last row is broken in Julia 1.0. Yes, I did test this with no warnings in 0.7. This behavior looks like a bug to me and possibly related to JuliaLang/julia#28763

using DelimitedFiles # for readdlm

mutable struct mytbl # a custom object
    source
    function mytbl(csvdata::Base.GenericIOBuffer)
        source = readdlm(csvdata, ',')
        source = convert(Array, source[2:end,:]) # clear the headers
        new(source)
    end
end

Base.length(it::mytbl) = size(it.source, 1)

function Base.iterate(it::mytbl, (el, i)=(it.source[1,:], 1))
   return i >= length(it) ? nothing : (el, (it.source[i + 1,:], i + 1))
end

# some sample data
TABLE_CAST = """id,height,age,name,occupation
1,10.0,1,string1,2012-06-15 00:00:00
2,10.1,2,string2,2013-06-15 01:00:00
"""

table =  mytbl(IOBuffer(TABLE_CAST))

[ row for row in table ]
# Returns: last element is #undef
# Expected: the last element is the second row

[ row[1] for row in table ]
# Returns:
[1, 239486056]
# Expected: [1, 2]

[ row[2] for row in table ]
# Returns:
[10.0, 1.18e-315]
# Expected: [10.0, 10.1]

Note that I've tested this with more elements, the last row is always..well, broken.

Further discussion and a workaround in https://discourse.julialang.org/t/last-row-is-broken-when-iterating-an-array/15696

@KristofferC
Copy link
Collaborator

KristofferC commented Sep 30, 2018

Isn't this just a faulty iterate definition? It says that length should be equal to size(it.source, 1) but it returns nothing after size(it.source, 1)-1. Using something like

function Base.iterate(it::mytbl, i=1)
    return i > length(it) ? nothing : (it.source[i, :], (i+1))
end

we get

julia> [ row for row in table ]
2-element Array{Array{Any,1},1}:
 [1, 10.0, 1, "string1", "2012-06-15 00:00:00"]
 [2, 10.1, 2, "string2", "2013-06-15 01:00:00"]

julia> [ row[1] for row in table ]
2-element Array{Int64,1}:
 1
 2

julia> [ row[2] for row in table ]
2-element Array{Float64,1}:
 10.0
 10.1

@loleg
Copy link
Author

loleg commented Sep 30, 2018

There I suppose there's two things going on here: instead of nothing the last item returned is a weird undefined row with strange values when accessed by index. And the documentation on iterators could use some better examples, like yours @KristofferC

@nalimilan
Copy link
Member

Maybe the array comprehension code could check that the number of collected elements matches the declared length for HasLength/HasShape iterators? That would catch errors like this.

@JeffBezanson
Copy link
Contributor

Yes, this is very similar to JuliaLang/julia#28763. Adding a check for iterating fewer than length elements sounds good; that at least fixes the problem in one direction.

@nalimilan
Copy link
Member

JuliaLang/julia#29458 catches this:

julia> [ row for row in table ]
ERROR: iterator returned fewer elements than its declared length
Stacktrace:
 [1] collect_to!(::Array{Array{Any,1},1}, ::Base.Generator{mytbl,getfield(Main, Symbol("##3#4"))}, ::Int64, ::Tuple{Array{Any,1},Int64}) at ./array.jl:690
 [2] collect_to_with_first!(::Array{Array{Any,1},1}, ::Array{Any,1}, ::Base.Generator{mytbl,getfield(Main, Symbol("##3#4"))}, ::Tuple{Array{Any,1},Int64}) at ./array.jl:663
 [3] collect(::Base.Generator{mytbl,getfield(Main, Symbol("##3#4"))}) at ./array.jl:644
 [4] top-level scope at none:0

julia> [ row[1] for row in table ]
ERROR: iterator returned fewer elements than its declared length
Stacktrace:
 [1] collect_to!(::Array{Int64,1}, ::Base.Generator{mytbl,getfield(Main, Symbol("##5#6"))}, ::Int64, ::Tuple{Array{Any,1},Int64}) at ./array.jl:690
 [2] collect_to_with_first!(::Array{Int64,1}, ::Int64, ::Base.Generator{mytbl,getfield(Main, Symbol("##5#6"))}, ::Tuple{Array{Any,1},Int64}) at./array.jl:663
 [3] collect(::Base.Generator{mytbl,getfield(Main, Symbol("##5#6"))}) at ./array.jl:644
 [4] top-level scope at none:0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants