-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make CoefTable implement the Tables.jl interface #629
Conversation
test/statmodels.jl
Outdated
────────────────────────────────────────── | ||
Estimate Stderror df p | ||
────────────────────────────────────────── | ||
[1] 0.112582 0.0566454 0.381813 0.8198 | ||
[2] 0.368314 0.120781 0.815104 0.6699 | ||
[3] 0.344454 0.179574 0.242208 0.4531 | ||
──────────────────────────────────────────""" | ||
@test length(ct) === 3 | ||
@test eltype(ct) == | ||
NamedTuple{(:Name, :Estimate, :Stderror, :df, :p), | ||
Tuple{String,Float64,Float64,Float64,Float64}} | ||
@test collect(ct) == [ | ||
(Name = "[1]", Estimate = 0.11258244478647295, Stderror = 0.05664544616214151, | ||
df = 0.38181274408522614, p = 0.8197779704008801) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hesitant as to whether it's better to always create a "Name" column even when no names have been specified. That makes the result more predictable, but that also creates a column which is nearly useless...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to keep a fixed schema for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well there isn't really a fixed schema since each package determines the column names and types. Actually, "Names" would be the only column which is guaranteed to be present. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - in this case I would leave this out (i.e. rely on the schema provided by the package) and allow packages to specify the schema in full. Then actually we should check if packages provide a proper schema.
This allows retrieving the contents of a `CoefTable` object in a convenient form, notably a `DataFrame`. To avoid introducing a dependency on Tables.jl, `CoefTable` has to iterate `NamedTuple`s, so that it implements the row-table interface implicitly. This is inefficient since `CoefTable` uses a column-based storage, but given the typical size of such tables it should not matter.
718f096
to
d7ae41d
Compare
Can we add DataFrames.jl as [extras] dependency and just check if a |
Is that really useful? As long as we check that the interface is implemented everything should be OK. |
test/statmodels.jl
Outdated
NamedTuple{(:Name, :Estimate, :Stderror, :df, :p), | ||
Tuple{String,Float64,Float64,Float64,Float64}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NamedTuple{(:Name, :Estimate, :Stderror, :df, :p), | |
Tuple{String,Float64,Float64,Float64,Float64}} | |
NamedTuple{(:Name, :Estimate, :Stderror, :df, :p), | |
Tuple{String,Float64,Float64,Float64,Float64}} |
This was a soft suggestion - mainly to show users in the tests how this can be used. Alternatively maybe we could put some docstring about it. The point is to have somewhere a clear visual signal about what we allow (and |
Thanks for being super responsive about this! |
I do think it could be a good example in the docs ("civilians" might not understand that Tables.jl integration means you can just call |
src/statmodels.jl
Outdated
Base.length(ct::CoefTable) = length(ct.cols[1]) | ||
function Base.eltype(ct::CoefTable) | ||
nmtype = isempty(ct.rownms) ? String : eltype(ct.rownms) | ||
NamedTuple{(Symbol("Name"), Symbol.(ct.colnms)...), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other option is we could return a custom "view" struct here instead of a NamedTuple. That could make it a bit more efficient and is a common pattern for other "table" types. For example, it's common to have something like:
struct CoefRow
table::CoefTable
rownumber::Int
end
Then on top of that you just need to implement propertynames
and getproperty
that correspond to the "row values" to be returned. The property names could be computed a single time in the initial Base.iterate(ct::CoefTable)
method and passed as an additional piece of iterator state. It's obviously a bit more code/work, so maybe not worth it if these tables are always going to be small and simple anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, thanks. Though these tables are always super small so as you say I'm not sure it's worth the additional complexity. If performance really mattered we should implement a column-wise approach anyway.
OK, I've pushed changes to avoid adding a "Name" column when there are no names and to mention Tables.jl and DataFrames in the docstring. |
This allows retrieving the contents of a
CoefTable
object in a convenient form, notably aDataFrame
. To avoid introducing a dependency on Tables.jl,CoefTable
has to iterateNamedTuple
s, so that it implements the row-table interface implicitly. This is inefficient sinceCoefTable
uses a column-based storage, but given the typical size of such tables it should not matter.