Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic simcf and tile #18

Open
chrisadolph opened this issue Jan 29, 2012 · 5 comments
Open

Automatic simcf and tile #18

chrisadolph opened this issue Jan 29, 2012 · 5 comments

Comments

@chrisadolph
Copy link
Owner

When I use simcf in my own work, I usually write a long helper function to which I pass my formula and data. The helper function looks through the formula, and then has a series of if() statements which check for each possible covariate that could appear in the model, and, if present, adds a new scenario for that covariate to the cf object. This gets complex if there are potentially interaction terms, so inside that if statement are sub-if() statements which check for possible interactions. It also gets complex when there are categorical or compositional variables, which need to be set in logically consistent ways. In the end, this function basically is sorting all covariates into continuous, binary, categorical, ordered, and compositional piles, and creating appropriate first difference counterfactuals (repectively: mean to mean +1 sd, 0 to 1, mean to each category or baseline to each other category, mean composition to ratio-preserving counterfactual, etc.).

We want a function which, given a formula and dataframe, and some tips on what is categorical, ordered, or compositional, does this automagically. When I do this for my own projects, I spend little time messing around with cfChange() code, which gets written for me.

Once we have an automagic simcf function, writing automagic ropeladder code to go with it should be much easier. Indeed, the likely final call from a user, for a model with binary variables x1 and x3, continuous x2, and ordered x4, would be something like:

res <- lm(y~x1+x2+x3+x4, data)
auto.ropeladder(res, data, binary=c(1,3), continuous=c(2), ordered=c(4), conf=0.95)

All this will be hard, but will be the core of our second release of tile+simcf to CRAN.

@ghost
Copy link

ghost commented Feb 25, 2012

Hey Chris,

I would like to start chipping away at this project -- would you mind posting/sending along one of your example helper functions so I can see what you've done in the past? Thanks,

Mike

@chrisadolph
Copy link
Owner Author

I've uploaded two examples (with working code, data, and example pdf output; one for logit, and one for ordered probit) of how I do this here. There isn't much documented here in terms of what these data are, so let me know if you need background.

@ghost
Copy link

ghost commented Feb 28, 2012

Chris,

Could you outline a bit more specifically what the counterfactuals should be for each type of covariate? Here I what I have:

continuous: mean to mean + 1 SD (got it)
binary: 0 to 1 (got it)
ordered: mean to each category or baseline to each other category (not sure how to implement either of these options, thinking specifically about the cfChange line that would mirror this code for continuous variables:

cfChange(xscen, paste(s.clean), x = mean(data) + sd(data), scen=scen.num)

compositional piles: to be honest, I'm not sure what a compositional pile is, or how hI would compare it's mean to a ratio-preserving counterfactural.

Any info is appreciated. Thanks,

@chrisadolph
Copy link
Owner Author

On 2/28/12 8:24 AM, mikefree88 wrote:

Chris,

Could you outline a bit more specifically what the counterfactuals should be for each type of covariate? Here I what I have:
There should also be the ability to globally override these defaults
with alternatives for the pre and post for each type. (E.g., if I want
to rest all continuous covariate scenarios to be (mean - 1 sd) to
(mean + 1 sd), I should be able to).

continuous: mean to mean + 1 SD (got it)
yes

binary: 0 to 1 (got it)

yes

ordered: mean to each category or baseline to each other category
yes

(not sure how to implement either of these options, thinking specifically about the cfChange line that would mirror this

cfChange(xscen, paste(s.clean), x = mean(data) + sd(data), scen=scen.num)
'''
You need to set up one scenario for each level of the categorical 
variable, so you have to figure out how many categories and what they are.


compositional piles: to be honest, I'm not sure what a compositional pile is, or how hI would compare it's mean to a ratio-preserving counterfactural.

Not sure what "pile" means either. Suppose you have a three
counterfactual variable, like this:

{% of population < 18 years, % of population>=18 or <65, % of pop >65}

In any specific case, this will sum to a constraint, like 1.0 or 100.

If three covariates (or two covariates and a reference category) are
defined by the user as belonging to the same composition, then any
hypothetical change in one should lead to a logically compatible change
in the others preserving the fixed sum of all components. rpcf() can
help calculate these; I'll tell more about this later.

Chris

Any info is appreciated. Thanks,


Reply to this email directly or view it on GitHub:
#18 (comment)

@ghost
Copy link

ghost commented Mar 16, 2012

Quick question about defaults: to clarify, the xpost values should always be the same as the xpre values unless the variable is being simulated. For example, if there is a binary variable being simulated, we should set the xpre to 0 for all scenarios, and the xpost also be set to 0 for all scenarios except when evaluating the binary variable. I suppose that if the xpre is set to 0, we would want the xpost to be set to 0 as well unless we were evaluating the variable. I just wanted to double check. Thanks,

Mike

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant