Make CoefTable implement the Tables.jl interface #629

nalimilan · 2021-01-05T10:41:46Z

This allows retrieving the contents of a CoefTable object in a convenient form, notably a DataFrame. To avoid introducing a dependency on Tables.jl, CoefTable has to iterate NamedTuples, so that it implements the row-table interface implicitly. This is inefficient since CoefTable uses a column-based storage, but given the typical size of such tables it should not matter.

nalimilan · 2021-01-05T10:44:17Z

test/statmodels.jl

 ──────────────────────────────────────────
     Estimate   Stderror        df       p
 ──────────────────────────────────────────
 [1]  0.112582  0.0566454  0.381813  0.8198
 [2]  0.368314  0.120781   0.815104  0.6699
 [3]  0.344454  0.179574   0.242208  0.4531
 ──────────────────────────────────────────"""
+@test length(ct) === 3
+@test eltype(ct) ==
+NamedTuple{(:Name, :Estimate, :Stderror, :df, :p),
+           Tuple{String,Float64,Float64,Float64,Float64}}
+@test collect(ct) == [
+    (Name = "[1]", Estimate = 0.11258244478647295, Stderror = 0.05664544616214151,
+     df = 0.38181274408522614, p = 0.8197779704008801)


I'm hesitant as to whether it's better to always create a "Name" column even when no names have been specified. That makes the result more predictable, but that also creates a column which is nearly useless...

I think it is better to keep a fixed schema for this.

Well there isn't really a fixed schema since each package determines the column names and types. Actually, "Names" would be the only column which is guaranteed to be present. :-)

Ah - in this case I would leave this out (i.e. rely on the schema provided by the package) and allow packages to specify the schema in full. Then actually we should check if packages provide a proper schema.

This allows retrieving the contents of a `CoefTable` object in a convenient form, notably a `DataFrame`. To avoid introducing a dependency on Tables.jl, `CoefTable` has to iterate `NamedTuple`s, so that it implements the row-table interface implicitly. This is inefficient since `CoefTable` uses a column-based storage, but given the typical size of such tables it should not matter.

bkamins · 2021-01-05T11:22:22Z

Can we add DataFrames.jl as [extras] dependency and just check if a DataFrame is correctly produced (it is not strictly necessary, but I would find it nice to have in the tests 😄).

nalimilan · 2021-01-05T12:33:54Z

Can we add DataFrames.jl as [extras] dependency and just check if a DataFrame is correctly produced (it is not strictly necessary, but I would find it nice to have in the tests smile).

Is that really useful? As long as we check that the interface is implemented everything should be OK.

nalimilan · 2021-01-05T12:33:59Z

test/statmodels.jl

+NamedTuple{(:Name, :Estimate, :Stderror, :df, :p),
+           Tuple{String,Float64,Float64,Float64,Float64}}


Suggested change

NamedTuple{(:Name, :Estimate, :Stderror, :df, :p),

Tuple{String,Float64,Float64,Float64,Float64}}

NamedTuple{(:Name, :Estimate, :Stderror, :df, :p),

Tuple{String,Float64,Float64,Float64,Float64}}

bkamins · 2021-01-05T12:52:29Z

Is that really useful?

This was a soft suggestion - mainly to show users in the tests how this can be used. Alternatively maybe we could put some docstring about it. The point is to have somewhere a clear visual signal about what we allow (and DataFrame is very easy to recognize visually). This is a soft suggestion.

jerlich · 2021-01-05T15:43:37Z

Thanks for being super responsive about this!

kleinschmidt · 2021-01-05T15:59:06Z

I do think it could be a good example in the docs ("civilians" might not understand that Tables.jl integration means you can just call DataFrame on it...)

quinnj · 2021-01-05T16:12:15Z

src/statmodels.jl

+Base.length(ct::CoefTable) = length(ct.cols[1])
+function Base.eltype(ct::CoefTable)
+    nmtype = isempty(ct.rownms) ? String : eltype(ct.rownms)
+    NamedTuple{(Symbol("Name"), Symbol.(ct.colnms)...),


The other option is we could return a custom "view" struct here instead of a NamedTuple. That could make it a bit more efficient and is a common pattern for other "table" types. For example, it's common to have something like:

struct CoefRow table::CoefTable rownumber::Int end

Then on top of that you just need to implement propertynames and getproperty that correspond to the "row values" to be returned. The property names could be computed a single time in the initial Base.iterate(ct::CoefTable) method and passed as an additional piece of iterator state. It's obviously a bit more code/work, so maybe not worth it if these tables are always going to be small and simple anyway.

Interesting, thanks. Though these tables are always super small so as you say I'm not sure it's worth the additional complexity. If performance really mattered we should implement a column-wise approach anyway.

nalimilan · 2021-01-05T16:25:59Z

OK, I've pushed changes to avoid adding a "Name" column when there are no names and to mention Tables.jl and DataFrames in the docstring.

nalimilan requested review from kleinschmidt and quinnj January 5, 2021 10:41

nalimilan commented Jan 5, 2021

View reviewed changes

nalimilan force-pushed the nl/coeftable branch from 718f096 to d7ae41d Compare January 5, 2021 10:47

nalimilan commented Jan 5, 2021

View reviewed changes

kleinschmidt approved these changes Jan 5, 2021

View reviewed changes

quinnj reviewed Jan 5, 2021

View reviewed changes

Review fixes

868289e

kleinschmidt approved these changes Jan 5, 2021

View reviewed changes

nalimilan merged commit ed3b86e into master Feb 1, 2021

nalimilan deleted the nl/coeftable branch February 1, 2021 08:44

nalimilan mentioned this pull request Feb 1, 2021

Make coeftable objects conform to Tables API #527

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make CoefTable implement the Tables.jl interface #629

Make CoefTable implement the Tables.jl interface #629

nalimilan commented Jan 5, 2021

nalimilan Jan 5, 2021

bkamins Jan 5, 2021

nalimilan Jan 5, 2021

bkamins Jan 5, 2021

bkamins commented Jan 5, 2021

nalimilan commented Jan 5, 2021

nalimilan Jan 5, 2021

bkamins commented Jan 5, 2021

jerlich commented Jan 5, 2021

kleinschmidt commented Jan 5, 2021

quinnj Jan 5, 2021

nalimilan Jan 5, 2021

nalimilan commented Jan 5, 2021

		NamedTuple{(:Name, :Estimate, :Stderror, :df, :p),
		Tuple{String,Float64,Float64,Float64,Float64}}

Make CoefTable implement the Tables.jl interface #629

Make CoefTable implement the Tables.jl interface #629

Conversation

nalimilan commented Jan 5, 2021

nalimilan Jan 5, 2021

Choose a reason for hiding this comment

bkamins Jan 5, 2021

Choose a reason for hiding this comment

nalimilan Jan 5, 2021

Choose a reason for hiding this comment

bkamins Jan 5, 2021

Choose a reason for hiding this comment

bkamins commented Jan 5, 2021

nalimilan commented Jan 5, 2021

nalimilan Jan 5, 2021

Choose a reason for hiding this comment

bkamins commented Jan 5, 2021

jerlich commented Jan 5, 2021

kleinschmidt commented Jan 5, 2021

quinnj Jan 5, 2021

Choose a reason for hiding this comment

nalimilan Jan 5, 2021

Choose a reason for hiding this comment

nalimilan commented Jan 5, 2021