Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tidyverse "no visible binding for global variable" notes #2758

Open
6 tasks
dlebauer opened this issue Dec 17, 2020 · 20 comments
Open
6 tasks

Fix tidyverse "no visible binding for global variable" notes #2758

dlebauer opened this issue Dec 17, 2020 · 20 comments
Labels
Misc: good first issue Priority: 01 - Low Topic: CRAN Changes related to getting PEcAn CRAN ready

Comments

@dlebauer
Copy link
Member

dlebauer commented Dec 17, 2020

Adapted from discussion with @infotroph in #2756

  • an easy but tedious and inelegant fix for a problem that's mostly just annoying
  • We are currently ignoring 571 no visible binding for global variable NOTEs across all of PEcAn, and my wild guess is that half to two-thirds of these are Tidyverse calls.

Shortcu:t For anyone who runs into this error and doesn't want to implement this fix: just add the appropriate note causing the build to fail to the package's <pkg>/test/Rcheck_reference.log, like this: 4a303b7

If you are in a good place to tackle this: we support you, get comfy! Bonus points if you fix all of the issues in the package you are working on. A special prize if you fix all (currently 563) of these Notes.

TODO:

  • find list of violations in <pkg>/test/Rcheck_reference.log files
    • find . -name Rcheck_reference.log | xargs grep "no visible binding for global variable"
    • to count occurrences add | wc -l to the end of the line above
  • prepend variables with .data$ as done here: https://github.com/PecanProject/pecan/pull/2756/files#diff-296112e719b8455a8be63e9cac941111d4ae1700081aa05b3a9adfa8d5b9738aR210
  • add rlang to Imports for relevant package to description
  • add ##' @importFrom rlang .data to <pkg>R/PEcAn.<pkgname>-package.R. See PEcAn.DB for an example.
    • can also be done per function, but not sure when this would be preferred
  • delete (relevant) lines starting <function name>: no visible binding for global variable <variable> in <pkg>/test/Rcheck_reference.log
  • update documentation

Alternative solutions:

  • change build checks to allow these notes
  • or add notes as they appear to <pkg>/test/Rcheck_reference.log so that build ignores it
  • wait until a more elegant solution evolves

See also

Originally posted by @infotroph in #2756 (comment)

dlebauer added a commit to jessicaguo/pecan that referenced this issue Dec 17, 2020
fixing no visible binding; see PecanProject#2758
@infotroph

This comment has been minimized.

@dlebauer

This comment has been minimized.

@moki1202
Copy link
Collaborator

@dlebauer hello , is this issue still open ? if yes I would like to work on this. thankyou

@ashiklom
Copy link
Member

@moki1202 Yes, this issue is still open. We have hundreds of these notes that need to be addressed (e.g., for base/utils, base/db, modules/data.atmosphere).

For each package, follow @dlebauer 's detailed instructions above.

@moki1202
Copy link
Collaborator

@ashiklom @infotroph We need to prepend only variables with .data$ that cause the run check error "no visible binding for global variable " . please correct me if I'm wrong. Some help on this issue would be appreciated.

@ashiklom
Copy link
Member

@moki1202 More or less. TLDR: You'll want to do this for any variables referring to column names in dplyr (and similar) functions (e.g., select, mutate, filter, summarize, etc.).

The root of the problem is that the dplyr package provides the shortcut to refer to columns inside the target data.frame in a more convenient way:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

mydata <- tribble(
  ~food, ~category, ~price,
  "apple", "fruit", 1.25,
  "tomato", "vegetable", 1.40,
  "cucumber", "vegetable", 1.70,
  "orange", "fruit", 0.85,
  "banana", "fruit", 0.30
)

mydata %>%
  select(food, price)
#> # A tibble: 5 x 2
#>   food     price
#>   <chr>    <dbl>
#> 1 apple     1.25
#> 2 tomato    1.4 
#> 3 cucumber  1.7 
#> 4 orange    0.85
#> 5 banana    0.3

mydata %>%
  filter(category == "fruit")
#> # A tibble: 3 x 3
#>   food   category price
#>   <chr>  <chr>    <dbl>
#> 1 apple  fruit     1.25
#> 2 orange fruit     0.85
#> 3 banana fruit     0.3

mydata %>%
  mutate(price_cents = price * 100)
#> # A tibble: 5 x 4
#>   food     category  price price_cents
#>   <chr>    <chr>     <dbl>       <dbl>
#> 1 apple    fruit      1.25         125
#> 2 tomato   vegetable  1.4          140
#> 3 cucumber vegetable  1.7          170
#> 4 orange   fruit      0.85          85
#> 5 banana   fruit      0.3           30

However, this can create some ambiguity. For instance, in the second column below, how can you be sure what price refers to?

price <- 1.00

cheap <- mydata %>%
  filter(price > 0.9)

More insidiously, the following expression will silently succeed if price is defined in the current environment, but return every row in the data frame (because 1.00 > 0.9 is always true):

mydata %>%
  select(-price) %>%
  filter(price > 0.9)

# # A tibble: 5 x 2
#   food     category 
#  <chr>    <chr>    
# 1 apple    fruit    
# 2 tomato   vegetable
# 3 cucumber vegetable
# 4 orange   fruit    
# 5 banana   fruit    

...but if price is not defined in the global environment, it will throw an error.

This ambiguity is exactly the source of the warning from the R CMD check -- price, food, and category all look like variables that don't have definitions in the current environment. The way to make it crystal clear that these variables are coming from inside the data is to prefix these variables with .data$, as in:

mydata %>%
  filter(.data$price > 0.9)
#> # A tibble: 3 x 3
#>   food     category  price
#>   <chr>    <chr>     <dbl>
#> 1 apple    fruit      1.25
#> 2 tomato   vegetable  1.4 
#> 3 cucumber vegetable  1.7

mydata %>%
  select(.data$food, .data$price)
#> # A tibble: 5 x 2
#>   food     price
#>   <chr>    <dbl>
#> 1 apple     1.25
#> 2 tomato    1.4 
#> 3 cucumber  1.7 
#> 4 orange    0.85
#> 5 banana    0.3

mydata %>%
  mutate(price_cents = .data$price * 100)
#> # A tibble: 5 x 4
#>   food     category  price price_cents
#>   <chr>    <chr>     <dbl>       <dbl>
#> 1 apple    fruit      1.25         125
#> 2 tomato   vegetable  1.4          140
#> 3 cucumber vegetable  1.7          170
#> 4 orange   fruit      0.85          85
#> 5 banana   fruit      0.3           30

Created on 2021-02-17 by the reprex package (v1.0.0)

@moki1202
Copy link
Collaborator

@ashiklom thank you for the help. I tried to do my best to update the files here. some critical review would be highly appreciated for further work in base/db, modules/data.atmosphere.

ashiklom added a commit that referenced this issue Mar 8, 2021
ashiklom added a commit that referenced this issue Mar 10, 2021
@Sarthakaga15
Copy link
Contributor

@dlebauer @ashiklom Hi, I am new to open source . Is this issue still open ? If would like to work on this if possible. Thanks

@ashiklom
Copy link
Member

ashiklom commented Jan 7, 2022

Thanks for your interest!

Yes, this issue is still open, though we've made some progress on it -- #2773, #2774.

In each PEcAn module, there is a file tests/Rcheck_reference.log; for example, here is the one for the benchmark module:

add_workflow_info: no visible binding for global variable ‘id’
add_workflow_info: no visible binding for global variable ‘workflow_id’

Inside those files, look for messages like "No visible binding for global variable". Some of those are for tidyverse functions and use the .data$<variable> solution described here. Some others are for data.table, which uses similar functionality but requires a different solution. Some others may be genuine references to global variables, which is really bad but also requires a different and probably more complex solution.

@infotroph
Copy link
Member

@Sarthakaga15 Adding two details to @ashiklom's answer:

  1. There are a select few packages with no Rcheck_reference.log. These are modules that we've cleaned up well enough to eliminate every message from R CMD check, including any "no visible binding..."'s that were there before we deleted the reference log.
  2. The reference logs are only occasionally updated, so if you see a no visible binding... in a given package's Rcheck_reference.log but can't find any such variable in the function it references, it's possible that someone has fixed that one without updating the reference log. If in doubt, you can run rcmdcheck::rcmdcheck("path/to/package") to see if R CMD check still sees a problem with the current code.

@moki1202
Copy link
Collaborator

@Sarthakaga15 Feel free to ping me if you get stuck anywhere. I'm sure I can help you out in some places :)

@akshat-max
Copy link

Hey I am newbie in open source and want to work on this issue is it still open ? and some resources would be helpful for me :D

@moki1202
Copy link
Collaborator

Hello @akshat-max! Glad you want to work on this. For now, you can refer to the conversation above for the description and the suggested solution. For starting, @Sarthakaga15 is working on fixing the no visible binding for global variables checks for PEcAn.workflow so pick up any package from modules and start working. If you get stuck somewhere feel free to ping me 😃.

@mdietze mdietze closed this as completed in 06350ac Mar 6, 2022
@infotroph infotroph reopened this Mar 6, 2022
@Its-Maniaco
Copy link
Contributor

Hi! @dlebauer @moki1202 @ashiklom . Can I work on this issue?

@moki1202
Copy link
Collaborator

Yes it is! We have multiple PEcAn packages that need some fixing. I would suggest, you have a good look at the conversations above and start working! In case you get stuck somewhere feel free to mention me here.

@Its-Maniaco
Copy link
Contributor

Thank you. Will give it a look and start working ASAP.

@Aariq Aariq added the Topic: CRAN Changes related to getting PEcAn CRAN ready label Jul 14, 2022
@vaishase
Copy link

vaishase commented Aug 5, 2022

@infotroph @dlebauer @moki1202 Hi, I am new to open source and would love to work on this issue if it's still open.

@moki1202
Copy link
Collaborator

moki1202 commented Aug 5, 2022

@vaishase yes it is! there are still a lot of packages that need work. Pick a package and check the rcheckreference.log file in the tests folder. check out #2971 and #2965, these PRs solve similar notes and warning that you'll be picking up. I would suggest you have a good look at the conversations above and start working. Happy coding and feel free to ping me if you get stuck somewhere.

@vaishase
Copy link

vaishase commented Aug 5, 2022

@vaishase yes it is! there are still a lot of packages that need work. Pick a package and check the rcheckreference.log file in the tests folder. check out #2971 and #2965, these PRs solve similar notes and warning that you'll be picking up. I would suggest you have a good look at the conversations above and start working. Happy coding and feel free to ping me if you get stuck somewhere.

Sure, thanks! I'll look into it.

@Aariq
Copy link
Collaborator

Aariq commented Dec 12, 2022

Packages that still have this check warning:

  • db
  • qaqc
  • settings
  • utils
  • visualization
  • workflow
  • ed
  • linkages
  • lpjguess
Code to get this list:
library(stringr)
library(purrr)
library(dplyr)

logs <- list.files(
  pattern = "Rcheck_reference.log",
  recursive = TRUE,
  full.names = TRUE
)

logs |>
  set_names(
    dirname(logs) |> dirname() |> basename()
  ) |>
  map_lgl(~{
    readLines(.x) |>
      str_detect("no visible binding for global variable") |> 
      any()
  }) |> as_tibble(rownames = "package") |> filter(value)

meetagrawal09 added a commit to meetagrawal09/pecan that referenced this issue Feb 6, 2023
infotroph added a commit to meetagrawal09/pecan that referenced this issue Feb 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Misc: good first issue Priority: 01 - Low Topic: CRAN Changes related to getting PEcAn CRAN ready
Projects
None yet
Development

No branches or pull requests

9 participants