Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subset and na.action arguments #29

Open
MrDomani opened this issue May 23, 2023 · 4 comments
Open

subset and na.action arguments #29

MrDomani opened this issue May 23, 2023 · 4 comments

Comments

@MrDomani
Copy link
Collaborator

lm enables user to supply subset and na.action arguments. The first filters out data based a certain condition, and the second treats NA values. Both (most of time) drop some observations.
Now the ranking r function is called before this happens. Which means, that some ECDF values might not be present in the final model matrix. Should we:

  1. Raise an error whenever this happens and prompt the user to deal with it himself
  2. Try to handle it ourselves (could be difficult)
  3. Do nothing (or just raise a warning), because it does not interfere with our theory (I doubt that, but I don't know for sure)
@danielwilhelm
Copy link
Owner

danielwilhelm commented May 24, 2023 via email

@MrDomani
Copy link
Collaborator Author

MrDomani commented Jun 2, 2023

Currently an error is thrown if user supplies na.action or subset or an NA value is present anywhere (cause lmby default removes rows containing NA anywhere, and that affects calculation of ranks).

@MrDomani
Copy link
Collaborator Author

Turns out, that it is more complicated, than I expected. As I mentioned, subsetting and handling of NA values occurs after evaluation of ranking function. I do not see an easy, quick way to handle this. Some ways that I see is

a) copy paste a lot of code from lm() (and calculate ranks after model.frame, and remember about handling r() correctly in other places) and, indeed, fit linear model ourselves (not by calling lm) or
b) evaluate get_all_vars, subset it and handle NAs (which could? be inferred from model.frame), and supply it as data argument to lm.

On the other hand, those functionalities are far from being critical, and can be done by user without much work (for example with subset function from base R and drop_na from tidyr package.

A lot of work for not so much gain. I would assign this issue a low priority and work on other matters.

@danielwilhelm
Copy link
Owner

danielwilhelm commented Jun 23, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants