Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed solution to issue with poly() function #176

Open
1 of 3 tasks
ray-p144 opened this issue Aug 18, 2021 · 0 comments
Open
1 of 3 tasks

Proposed solution to issue with poly() function #176

ray-p144 opened this issue Aug 18, 2021 · 0 comments

Comments

@ray-p144
Copy link

ray-p144 commented Aug 18, 2021

Please specify whether your issue is about:

  • a possible bug
  • a question about package functionality
  • a suggested code or documentation change, improvement to the code, or feature request

Background

It seems that the requirement to use stats::poly() rather than just poly() in a model formula has been a persistent issue (#79 #133 #175, and in the prediction package)

Reproducible error:

set.seed(99)

library(margins)

data <- data.frame(y = rnorm(n = 100, mean = 0, sd = 1),
                   x = rnorm(n = 100, mean = 0, sd = 1))

fit <- lm(formula = y ~ poly(x, degree = 2), data = data)
margins(fit)
#> Error in poly(x, degree = 2, coefs = list(alpha = c(-0.087688783657271, : could not find function "poly"


fit2 <- lm(formula = y ~ stats::poly(x, degree = 2), data = data)
margins(fit2)
#> Average marginal effects
#> lm(formula = y ~ stats::poly(x, degree = 2), data = data)
#>        x
#>  0.04987

Proposed solution:

The solution that I would like to propose is to provide a reducemem option for the various methods of the margins() function.

I have debugged the error to be the result of this line of code where the ".Environment" attribute of model[["terms"]] is removed.

I propose that the reducemem option be used to allow the user to turn this feature off so the formula will work as expected (not only for the stats::poly() function, but any other user-defined functions they would like to have in the formula).

Example of working fix using assignInNamespace() function:

fixedfunc <-
    function(model, 
             data = find_data(model, parent.frame()), 
             variables = NULL,
             at = NULL, 
             type = c("response", "link"),
             vcov = stats::vcov(model),
             vce = c("delta", "simulation", "bootstrap", "none"),
             iterations = 50L, # if vce == "bootstrap" or "simulation"
             unit_ses = FALSE,
             eps = 1e-7,
             reducemem = TRUE,
             ...) {
        
        # match.arg()
        type <- match.arg(type)
        vce <- match.arg(vce)
        
        # setup data
        data_list <- build_datalist(data, at = at)
        if (is.null(names(data_list))) {
            names(data_list) <- NA_character_
        }
        at_specification <- attr(data_list, "at_specification")
        
        # identify classes of terms in `model`
        varslist <- find_terms_in_model(model, variables = variables)
        
        # reduce memory profile
        model[["model"]] <- NULL
        
        ## Potential solution
        if (reducemem == TRUE) {
            ## Option 1
            attr(model[["terms"]], ".Environment") <- NULL
            ## Option 2
            # attr(model[["terms"]], ".Environment") <- new.env(parent = parent.env(globalenv()))
        }
        
        # calculate marginal effects
        out <- list()
        for (i in seq_along(data_list)) {
            out[[i]] <- build_margins(model = model, 
                                      data = data_list[[i]], 
                                      variables = variables,
                                      type = type, 
                                      vcov = vcov, 
                                      vce = vce, 
                                      iterations = iterations, 
                                      unit_ses = unit_ses, 
                                      eps = eps,
                                      varslist = varslist,
                                      ...)
            out[[i]][["_at_number"]] <- i
        }
        if (vce == "delta") {
            jac <- do.call("rbind", lapply(out, attr, "jacobian"))
            rownames(jac) <- paste0(rownames(jac), ".", rep(seq_len(length(out)), each = length(unique(rownames(jac)))))
            vc <- jac %*% vcov %*% t(jac)
        } else {
            jac <- NULL
            vc <- NULL
        }
        
        # return value
        structure(do.call("rbind", out), 
                  class = c("margins", "data.frame"), 
                  at = if (is.null(at)) NULL else at_specification,
                  type = type,
                  call = if ("call" %in% names(model)) model[["call"]] else NULL,
                  model_class = class(model),
                  vce = vce, 
                  vcov = vc,
                  jacobian = jac,
                  weighted = FALSE,
                  iterations = if (vce == "bootstrap") iterations else NULL)
    }

bugfunc <- get("margins.lm", envir = asNamespace("margins"))
environment(fixedfunc) <- environment(bugfunc)

assignInNamespace("margins.lm",
                  value = fixedfunc,
                  ns = "margins")

margins(fit, reducemem = FALSE)
#> Average marginal effects
#> lm(formula = y ~ poly(x, degree = 2), data = data)
#>        x
#>  0.04987


margins(fit2, reducemem = FALSE)
#> Average marginal effects
#> lm(formula = y ~ stats::poly(x, degree = 2), data = data)
#>        x
#>  0.04987

Alternative option

An additional thought I had was that rather than setting the ".Environment" attribute to NULL, you can just create a new environment that has the same parents as the global environment by doing:

## Option 2
attr(model[["terms"]], ".Environment") <- new.env(parent = parent.env(globalenv()))

This would prevent creating a massive memory footprint if the user has a large dataset loaded into R, but still allow use of any packages that were attached prior to running the margins() function.

Created on 2021-08-18 by the reprex package (v2.0.1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant