Skip to content
This repository has been archived by the owner on May 5, 2019. It is now read-only.

Stack should use similar_nullable, not NullableArray #54

Merged
merged 3 commits into from
Apr 23, 2017
Merged

Stack should use similar_nullable, not NullableArray #54

merged 3 commits into from
Apr 23, 2017

Conversation

cjprybol
Copy link
Contributor

@cjprybol cjprybol commented Apr 13, 2017

  • Adds a test set to trigger multiple entries in unstack warning
  • Adds a similar_nullable function for NullableCategoricalArrays
  • Tests to make sure column types are converted to their appropriate nullable type.

@@ -98,8 +98,8 @@ function stack(dt::AbstractDataTable, measure_vars::Vector{Int}, id_var::Int;
end
function stack(dt::AbstractDataTable, measure_var::Int, id_vars::Vector{Int};
variable_name::Symbol=:variable, value_name::Symbol=:value)
stackdt(dt, [measure_var], id_vars;
variable_name=variable_name, value_name=value_name)
stack(dt, [measure_var], id_vars;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local tests say no, so it must be something I hadn't yet reverted. I'm not sure why this is the only stack function calling stackdt, but I'll investigate that when I update the indexing of stackdt's custom array views.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message shouldn't mention this change now that it's not included.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Could you add a test for the incorrect type that you fixed?

@@ -341,4 +341,21 @@ module TestDataTable
@test find(c -> isa(c, NullableCategoricalArray), categorical!(DataTable(A=1:3, B=4:6), :A).columns) == [1]
@test find(c -> isa(c, NullableCategoricalArray), categorical!(DataTable(A=1:3, B=4:6), [1]).columns) == [1]
@test find(c -> isa(c, NullableCategoricalArray), categorical!(DataTable(A=1:3, B=4:6), 1).columns) == [1]

@testset "multiple entries in unstack warnings" begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great but should go to a separate commit since it's not related to the above changes AFAICT.

@cjprybol
Copy link
Contributor Author

Updated. I tagged on a commit at the end to fix the deprecation warnings for similar_nullable as the new tests triggered the depwarn.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this exercizes the code path, but doesn't test that warnings are thrown, right? The commit message could explain this. I don't know how to catch warnings, but ideally we would check that they are thrown as expected.

@@ -15,6 +15,9 @@ similar_nullable{T,R}(dv::CategoricalArray{T,R}, dims::@compat(Union{Int, Tuple{
similar_nullable(dt::AbstractDataTable, dims::Int) =
DataTable(Any[similar_nullable(x, dims) for x in columns(dt)], copy(index(dt)))

similar_nullable{T,R}(dv::NullableCategoricalArray{T,R}, dims::Union{Int, Tuple{Vararg{Int}}}) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be more logical to add this definition right after that for CategoricalArray.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AbstractDataTable function that separates the two will disappear in #44. I thought adding the new function to the end of the set like this would minimize the risk of a merge conflict, although that's purely speculative. I can reverse them if you'd like, but I'm not sure it will matter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but it's always better to have commits be correct when taken in isolation.

udt = unstack(long, :variable, :value)
@test udt == DataTable(Any[collect(1:12), repeat(1:3, inner = 4),
repeat(1:4, inner = 3),
Nullable{Float64}.(linspace(0, 1, 12)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just Nullable. is enough. Or even NullableArray since that's a more common structure to use.

@testset "unstack nullable promotion" begin
long = DataTable(Any[repeat([:c, :d], inner=12),
vcat(linspace(0, 1, 12), linspace(2, 1, 12)),
repeat(1:12, outer=2), repmat(repeat(1:3, inner=4), 2),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Only applies if you keep this.) Simply repeat(1:3, inner=4, outer=2). Same below.

@@ -342,6 +342,40 @@ module TestDataTable
@test find(c -> isa(c, NullableCategoricalArray), categorical!(DataTable(A=1:3, B=4:6), [1]).columns) == [1]
@test find(c -> isa(c, NullableCategoricalArray), categorical!(DataTable(A=1:3, B=4:6), 1).columns) == [1]

@testset "unstack nullable promotion" begin
long = DataTable(Any[repeat([:c, :d], inner=12),
vcat(linspace(0, 1, 12), linspace(2, 1, 12)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you find a simpler example which doesn't involve repeat nor linspace? Since the goal here is to test promotion, focus on testing types rather than contents. BTW, types are not tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took this example from the unstack docstring, replacing rand with linspace to avoid srand and random number issues. I've updated it a bit to make it a little simpler. The two tested types here are converting standard arrays to NullableArrays and CategoricalArrays to NullableCategoricalArrays, with the latter being the primary issue of the current master implementation. Other types are tested in assorted places where other calls to unstack are made.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took this example from the unstack docstring, replacing rand with linspace to avoid srand and random number issues.

Yes, but there's no reason to use such a complicated example here. Just a handful a values are enough to test that promotion works.

The two tested types here are converting standard arrays to NullableArrays and CategoricalArrays to NullableCategoricalArrays,

But you should explicitly test for column types of the result. The current code only checks that it compares equal, which can happen with any nullable vector, not just NullableCategoricalArray (and in the future, maybe even with a non-nullable vector with the new Nullable implementation).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

NullableCategoricalArray(linspace(2, 1, 12))],
[:id, :a, :b, :c, :d])
@test unstack(long) ==
unstack(long, :id, :variable, :value) ==
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect indentation.

dt = DataTable(id=[1, 2, 1, 2], variable=["a", "b", "a", "b"], value=[3, 4, 5, 6])
a = unstack(dt, :id, :variable, :value)
b = unstack(dt, :variable, :value)
@test a == b == DataTable(id = Nullable[1, 2], a = Nullable[5, Nullable()], b = Nullable[Nullable(), 6])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nullable[ isn't needed for the two last uses and is actually even counter-productive here since the array will have an abstract element type. Elsewhere, use Nullable{Int}.

@ararslan
Copy link
Member

On 0.6 you can test warnings with @test_warn (IIRC). That may have been ported to BaseTestNext, in which case you could use that.

@cjprybol
Copy link
Contributor Author

Thanks for the suggestion, Alex! Didn't know about @test_warn and @test_nowarn, but I'm glad they are in 0.6! I've updated the tests to check if those macros are defined and if so, the warnings are checked. Let's see if the CI tests for 0.5 complain or if that will work.

I couldn't find anything in BaseTestNext or compat to extend supporting those macros into v0.5


@testset "duplicate entries in unstack warnings" begin
dt = DataTable(id=NullableArray([1, 2, 1, 2]), variable=["a", "b", "a", "b"], value=[3, 4, 5, 6])
if isdefined(Symbol(:@test_warn))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be isdefined(Base.Test, Symbol("@test_warn")). This is why the tests failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't get isdefined to work because Julia will still try and expand the @test_warn macro during parsing. I think it needs a @static check to avoid the if block entirely during parsing in v0.5. A static check against the first tagged version with the @test_warn macro works, although I'm not sure how to reference the exact commit. JuliaLang/julia@28a11ff

Copy link
Member

@ararslan ararslan Apr 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run ./contrib/commit-name.sh 28a11ffd4d2e8451492013f18dad5e9a576233f6 from the root directory of a clone of Julia to get the exact version number for a particular commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get around the eager macro expansion,

if isdefined(Base.Test, Symbol("@test_warn"))
    include_string("""
        @test_warn ...
    """)
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just bookmarked both of those comments.

In v0.6, where the macros test_warn and test_nowarn are defined, tests
will actually check the warning messages. In v0.5 warnings are not
checked but can be manually confirmed by reviewing the output of running
Pkg.test("DataTables").
@@ -235,15 +232,10 @@ function unstack(dt::AbstractDataTable, colkey::Int, value::Int)
end
keycol = NullableCategoricalArray(dt[colkey])
valuecol = dt[value]
dt1 = dt[g.idx[g.starts], g.cols]
dt1 = nullable!(dt[g.idx[g.starts], g.cols], g.cols)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to specify g.cols the second time AFAICT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nullify! function I had written for the pre-refactor PR didn't need any specification of columns when you want to nullify all of the columns, but this existing nullable! function does require columns to be specified.

julia> nullable!(DataTable(a = 1:4))
ERROR: MethodError: no method matching nullable!(::DataTables.DataTable)
Closest candidates are:
  nullable!(::DataTables.DataTable, ::Union{Real, Symbol}) at /Users/Cameron/.julia/v0.6/DataTables/src/datatable/datatable.jl:779
  nullable!(::DataTables.DataTable, ::Array{T<:Union{Real, Symbol},1}) where T<:Union{Real, Symbol} at /Users/Cameron/.julia/v0.6/DataTables/src/datatable/datatable.jl:783
  nullable!(::Array{Symbol,1}, ::DataTables.AbstractDataTable) at deprecated.jl:50
  ...

julia> nullable!(DataTable(a = 1:4), :a)
4×1 DataTables.DataTable
│ Row │ a │
├─────┼───┤
│ 11 │
│ 22 │
│ 33 │
│ 44

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so that's fine for now.

[:id, :variable, :value])
udt = unstack(dt)
@test udt == unstack(dt, :variable, :value) == unstack(dt, :id, :variable, :value)
@test udt == DataTable(Any[NullableCategoricalArray([1, 2]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nullable[1, 2] and so on is enough here, NullableCategoricalArray is only needed in the type check.

Stack was using NullableArray, which would incorrectly cast
CategoricalArrays to NullableArrays{CategoricalValue}. Adds a new
similar_nullable function for NullableCategoricalArrays.
@ararslan ararslan merged commit 817ab76 into JuliaData:master Apr 23, 2017
@cjprybol cjprybol deleted the cjp/stacktypes branch April 24, 2017 06:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants