Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable importance #29

Closed
wants to merge 43 commits into from
Closed

Variable importance #29

wants to merge 43 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Mar 7, 2016

Almond and me (in coordination with Fabian Scheipl) wrote a function to extract and plot the (in-bag) risk reduction per base-learner from a fitted mboost-model.
The contribution (to risk reduction) of each base-learner can be used as a measure for variable importance of the different base-learners or variables in the model.

Function varimp(object) simply returns a object of class varimp with the corresponding risk reductions per base-learner.
The plot function (eventually a lattice::barchart) for these varimp objects additionally offers some visualisation options like the display in percentages or absolute values, number and order of the displayed bars and whether to focus on base-learners or variables involved in the base-leaners.

t-8-n and others added 30 commits February 8, 2016 22:17
- changed selprob order to decreasing=FALSE
still
-> no selprob
-> no "other"
-> no maxchar
…s ordered factors instead of sorting the data.frame. Now, variables are correctly sorted as well.

bl per variable are accumulated again
variables in interactions sorted for identifiability

selprob -> selfreq
@hofnerb
Copy link
Member

hofnerb commented Mar 7, 2016

Thanks a lot. We currently have an issue with Travis-CI but I will try to fix this and have a look at your PR afterwards.

@ghost
Copy link
Author

ghost commented Mar 20, 2016

Just as a short information - the Travis build still includes two notes caused by the variable importance code:

plot.varimp: no visible binding for global variable ‘ylab’
plot.varimp: no visible binding for global variable ‘blearner’

We already tried Hadley Wickhams advice to solve the problem via the following line:
if (getRversion() >= "2.15.1") globalVariables(c("ylab", "blearner"))
but unfortunately that didn't work out. Besides the notes, we also got additional errors:

Error in registerNames(names, package, ".__global__", add) :
The namespace for package "mboostDevel" is locked; no changes in
the global variables list may be made.

Question is, can we keep it that way despite the notes?

@hofnerb
Copy link
Member

hofnerb commented Mar 21, 2016

No, the issue has to be resolved.

  1. Please do specify all possible options of arguments like type and blorder in the function definition, e.g. type = c("blearner", "variable") and use type <- match.arg(type) in your function body.

  2. Do not throw an error if the user wants to specify xlabs and ylabs. Use xlab = NULL and ylab = NULL in your function definition. If these arguments are not set, i.e., NULL, keep the default labels as you do now. Otherwise use the user specified labels. By having xlab and ylab in your function definition this should also remove the warning.

Regarding the issue with blearner I do not have a solution a.t.m. I can remember that we had similar issues before but do not remember how we've fixed them.

@ghost
Copy link
Author

ghost commented Mar 21, 2016

Ok, thanks for the advice. We implemented remarks 1) and 2) and it helped eliminating the note for ylab, of course.
blearner is still an issue. This seems to be quite a common problem!? Is there maybe anyone else who we could ask for support?

@hofnerb
Copy link
Member

hofnerb commented Mar 22, 2016

I now seem to have found the error. type = "variable" should not work as you use the object blearner which isn't defined:

 if( type == "variable" ) {
   barchart(variable ~ reduction, groups = blearner, data = plot_data,
    horizontal = TRUE, xlab = xlab, ylab = ylab, xlim = xlim,
    scales = list(x = list(tck = c(1,0), at = seq(0,sum(x), length.out = 5))), 
    stack = TRUE, auto.key = auto.key, ...)
}

I'd guess this should be plot_data[, "blearner"]? Might that be the case?

@ghost
Copy link
Author

ghost commented Mar 22, 2016

..yes, that's it - thanks a lot!! The note for blearner is gone, too.

Actually the barchart was correctly created with the call in the post above (with groups = blearner), but it also works with groups = plot_data[, "blearner"].

Is there anything else that got your attention or can we leave the code as it is?

@hofnerb
Copy link
Member

hofnerb commented Jul 4, 2016

Hi @tkuehn13,

I was a bit too lazy (and had too little time) to pull in your patch bevor changing the structure of mboost (I got rid of the stupid sub folders mboostPatch and mboostDevel. Now, I cannot use your patch easily for mboostDevel anymore. Do you have any ideas how to solve this issue? I do not want to loose your change track etc by simply copy-pasting the files in the new structure...

Sorry for this.
Benjamin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants