Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement HypothesisCoding and ~un-export~ deprecate ContrastsCoding #158

Merged
merged 9 commits into from
Feb 13, 2020

Conversation

kleinschmidt
Copy link
Member

ContrastsCoding is a footgun unless you really know what you're doing.
Hypothesis coding is a more natural way of specifying manual contrasts, since it
directly expresses differences between conditional means of levels in the matrix
(which is the pseudo-inverse of the contrasts matrix). This PR introduces a
HypothesisCoding <: AbstractContrasts which is exported, and un-exports
ContrastsCoding (and adds a note in the docs/help text that points to
hypotheses)

@codecov-io
Copy link

codecov-io commented Oct 2, 2019

Codecov Report

Merging #158 into master will increase coverage by 0.35%.
The diff coverage is 93.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #158      +/-   ##
==========================================
+ Coverage   86.11%   86.47%   +0.35%     
==========================================
  Files           9        9              
  Lines         497      510      +13     
==========================================
+ Hits          428      441      +13     
  Misses         69       69
Impacted Files Coverage Δ
src/StatsModels.jl 100% <ø> (ø) ⬆️
src/contrasts.jl 87.5% <93.75%> (+3.18%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 51f1749...3fc0a8b. Read the comment docs.

@kleinschmidt
Copy link
Member Author

I've also implemented sequential difference/sliding/repeated contrasts coding for the hell of it (see #120)

src/contrasts.jl Outdated Show resolved Hide resolved
src/contrasts.jl Outdated Show resolved Hide resolved
src/contrasts.jl Outdated Show resolved Hide resolved
Contrasts are derived the hypothesis matrix by taking the pseudoinverse:

```jldoctest hyp
julia> sdiff_contrasts = pinv(sdiff_hypothesis)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
julia> sdiff_contrasts = pinv(sdiff_hypothesis)
julia> pinv(sdiff_hypothesis)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used anywhere but it is more to help with reading...

src/contrasts.jl Outdated Show resolved Hide resolved

"""
mutable struct HypothesisCoding <: AbstractContrasts
hypotheses::Matrix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be parameterized on the eltype?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, although it's going to get converted to whatever the eltype of the final model matrix is anyway. But maybe it would help the compiler?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if the compiler can know the concrete type of such objects in typical situations (which I think is the case), it will generate much more efficient code. That should matter especially if you iterate over rows.

Though I've noticed this affects other contrasts, so maybe for a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it only affect this and ContrastsCoding (the only other one that involves a manually specified matrix)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And anyway, I don't think the contrasts themselves ever get used in any hot code paths (that's all the ContrastsMatrix struct...)

base::Nothing
levels::Union{Vector,Nothing}

function HypothesisCoding(hypotheses, base, levels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why specify base if it's always nothing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's code that tries to access the field. I could overload getproperty for this type instead though, or refactor that code...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since HypothesisCoding is new, how could existing code try to access that field? :-)

Anyway, in general I think fields are considered private unless specified otherwise. And unexporting ContrastsCoding is much more breaking. BTW, maybe we should keep exporting it for now until we want to make another breaking release? It's simpler to break many things at the same time than force people to update small things continuously.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Let's keep it exported but maybe add a warning to teh constructor or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a depwarn because the deprecations weren't quite doing it correctly (giving a spurious method overwritten warning)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base field is used in the constructor of ContrastsMatrix. Could change that to use an accessor method but that seems like overkill at this point.

src/contrasts.jl Outdated
Specify how to code a categorical variable in terms of a *hypothesis matrix*.
For a variable with ``k`` levels, this should be a ``k-1 \times k`` matrix.
Each row of the hypothesis corresponds to the hypothesis tested by the
corresponding predictor, given as the weights given to the "cell mean" of each
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand that phrasing. The repetition of "given" doesn't help. :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about "expressed as the weights of the "cell mean" of each of the k values of the predictor"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I guess I don't really understand what "cell means" refers to in this context, but maybe that's OK.

@kleinschmidt
Copy link
Member Author

For non-breaking-ness, perhaps it would be better to deprecate ContrastsCoding with a message that it will be un-exported in the next minor release? Rather than removing it right now.

@nalimilan
Copy link
Member

For non-breaking-ness, perhaps it would be better to deprecate ContrastsCoding with a message that it will be un-exported in the next minor release? Rather than removing it right now.

Yes, that sounds better, but I always forget whether that's possible or not. We could also just keep exporting it for now, and only remove it in the next minor release (when we want to break other, more fundamental things).

src/contrasts.jl Outdated Show resolved Hide resolved
src/contrasts.jl Outdated
Specify how to code a categorical variable in terms of a *hypothesis matrix*.
For a variable with ``k`` levels, this should be a ``k-1 \times k`` matrix.
Each row of the hypothesis corresponds to the hypothesis tested by the
corresponding predictor, given as the weights given to the "cell mean" of each
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I guess I don't really understand what "cell means" refers to in this context, but maybe that's OK.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good apart from the suggestion.

src/contrasts.jl Outdated Show resolved Hide resolved
@kleinschmidt kleinschmidt changed the title Implement HypothesisCoding and un-export ContrastsCoding Implement HypothesisCoding and ~~un-export~~ deprecate ContrastsCoding Feb 13, 2020
@kleinschmidt kleinschmidt changed the title Implement HypothesisCoding and ~~un-export~~ deprecate ContrastsCoding Implement HypothesisCoding and ~un-export~ deprecate ContrastsCoding Feb 13, 2020
@kleinschmidt kleinschmidt merged commit 0630f46 into master Feb 13, 2020
@nalimilan nalimilan deleted the dfk/contrasts-update branch February 13, 2020 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants