Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add effects and ANOVAtest function for lm #70

Closed
wants to merge 2 commits into from
Closed

add effects and ANOVAtest function for lm #70

wants to merge 2 commits into from

Conversation

timema
Copy link

@timema timema commented May 16, 2014

Instead of an ANOVA table that was previously discussed and there were issues about it's potential misuse and abuse. This is a specific, full vs reduced model test where the effects from the desired terms are grouped together in a hypothesis to form a ANOVA style F test.

The effects function extracts the effects from an LmMod either produced with a formula and dataframe or from LmMod produced using a y and X matrix. ANOVA test allows you to specify terms in an LmMod model to do an anova (full vs reduced) test. You can specify 1 or more terms if the model was produced with a formula and data frame. If the LmMod was produced with a y and X, you can specify the columns of X that you would like to group together. ANOVAtest is a type to hold DF,SS,MS,F, and either the pval or -log10pval. It has a show function that produces a table.

Note, that if the model is produced from a formula and dataframe the terms can be specified by their position in the formula where the intercept is 0. A single term can have multiple degrees of freedom if it involves factors. If the model is produced from specified y and X then the user will need to specify what columns of the X matrix would need to be tested (i.e. we do not know the terms from the model if factors are involved.

The names, the option for log10pval, and how I did the show function are just what I put together and have no attachment to. I was also not certain how to deal with the term number and column number difference between models derived with and without formula's and data frames.

Below are some simple examples.
a=randn(100,4);
b=convert(DataFrame,a);

c=fit(LmMod,x1~x2+x3+x4,b) #fit model with formula and dataframe
d=ANOVAtest(c,[2,3],log10pval=true) #test of multiple terms in the formula with LmMod made with formula and dataframe
e=ANOVAtest(c,3) #test of a single term of X with LmMod made with formula and dataframe
f=ANOVAtest(c,[3]) #shows that you can also do one term with a single cell array

g=fit(LmMod,c.model.pp.X,c.model.rr.y) #fit model with X and y without formula or dataframe
h=ANOVAtest(g,[3,4]) #test of multiple columns of X with LmMod made without formula and dataframe
i=ANOVAtest(g,4) #test of a single column of X with LmMod made without formula and dataframe
j=ANOVAtest(g,[4]) #shows that you can also do one term was a single cell array

the effects function extracts the effects from an LmMod either produced with a formula and dataframe or from LmMod produced using a y and X matrix.  ANOVA test allows you to specify terms in an LmMod model to do an anova (full vs reduced) test.  You can specify 1 or more terms if the model was produced with a formula and data frame.  If the LmMod was produced with a y and X, you can specify the columns of X that you would like to group together.  ANOVAtest is a type to hold DF,SS,MS,F, and either the pval or -log10pval.  It has a show function that produces a table.  Below are some examples.

a=randn(100,4);
b=convert(DataFrame,a);

c=fit(LmMod,x1~x2+x3+x4,b)   #fit model with formula and dataframe
d=ANOVAtest(c,[2,3],log10pval=true)  #test of multiple columns of X with LmMod made with formula and dataframe
e=ANOVAtest(c,3)     #test of a single column of X with LmMod made with formula and dataframe
f=ANOVAtest(c,[3])    #shows that you can also do one term with a single cell array

g=fit(LmMod,c.model.pp.X,c.model.rr.y)   #fit model with X and y without formula or dataframe
h=ANOVAtest(g,[3,4])  #test of multiple columns of X with LmMod made without formula and dataframe
i=ANOVAtest(g,4)   #test of a single column of X with LmMod made without formula and dataframe
j=ANOVAtest(g,[4])    #shows that you can also do one term was a single cell array
@timema timema changed the title add effects and anovatable function for lm add effects and ANOVAtest function for lm May 16, 2014
@simonster
Copy link
Member

The F value here is still tied to sequential sum of squares and I believe it will only match the comparison of the full model versus the reduced model with the given terms dropped if 1) the predictors are orthogonal or 2) the predictor(s) being dropped are the last predictors in the model. Otherwise the comparison performed is equivalent to regressing the dropped predictors out of the predictors that came after them in the model specification before computing the reduced model. This may make sense if the predictors have a canonical order, but I don't think that's usually the case. I'd rather start with a function that takes two LmMod objects, one nested in the other, and just computes the statistics based on the residuals of those two models. @dmbates, what say you?

@diegozea
Copy link

I was looking for a Julia's anova implementation and I found this and https://github.com/JuliaStats/GLM.jl/pull/65/files. It would be great to have a way to perform an ANOVA test in Julia (It's in #7 list).

@nalimilan
Copy link
Member

@dmbates Could you give your opinion here? Sounds too bad that this PR has been abandoned.

ibadr added a commit to ibadr/ANOVA.jl that referenced this pull request Jan 19, 2017
@LewisHein
Copy link
Contributor

LewisHein commented May 16, 2017

I am going to jump in here with a strong vote for getting this merged. Julia+GLM is such a capable, useful system that I was really surprised that it doesn't yet have ANOVA integrated with it already.

I have an outside perspective and know very little of the JuliaStats priorities, but I think getting the ANOVA working would be a humongous step forward in recruiting R users.

Edit: My strong vote includes a willingness to write/edit Julia code if necessary

@nalimilan
Copy link
Member

@LewisHein Thanks for offering your help. If I understood correctly Douglas's objections at #65, the idea is that the only correct way (and the simplest implementation-wise) to make an ANOVA is to fit several models and compare them. This doesn't look like what this PR is doing.

As @simonster suggested above, we should rather have a function which would take two or more models and which would return a table comparing them. This should be quite easy to do. The returned object should probably be a custom type with one field for each column (similar to CoefTable).

Then as an option we could have a convenience function similar to Stata's nestreg, which would take a series of formulas giving each the list of terms to add in the next model. It's up to you, depending on your needs.

Anyway, looks like you'll have to make a new pull request (which should include tests).

@timema
Copy link
Author

timema commented May 17, 2017

@dmbates was the one that gave me the code for the effects function that underlies the rest of the pull request (not saying the whole implementation is approved by him). That being said, I do not care one way or another as to how this is done. In my use cases, I typically only care about is being able to test the significance of a multiple factor variable (i.e. can't just test a single beta coefficient) so the effects function is perfectly appropriate for my needs.

@nalimilan
Copy link
Member

Ah, good to hear. Sorry if I missed something, I just skimmed the diff. I hope Douglas will answer at #181.

Just to be clear about the possible alternatives, what do you think of my second suggestion inspire from nestreg? IIUC it should suit your needs too, wouldn't it?

@timema
Copy link
Author

timema commented May 17, 2017

I am fine with what you suggested, it is much more thorough. There should be at least two options and/or different functions: one where you specify a full vs reduced model (for speed if you are only interested in on specific test) and one where it compares all models (when someone wants a rough idea about each variables).

@timema
Copy link
Author

timema commented May 17, 2017

I never responded or mentioned it originally because I did not want to put words in @dmbates mouth and/or assume I knew what his opinion is.

@nalimilan
Copy link
Member

F-test has been implemented in #182. Leaving open since we still need a similar function for the likelihood ratio test, and maybe convenience functions to run a series of comparisons from formulas.

@palday
Copy link
Member

palday commented Dec 18, 2021

@nalimilan we now have lrtest and there's ANOVA.jl if a user really wants classical ANOVA tables, so I think we could close this now.

@nalimilan nalimilan closed this Dec 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants