Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing one-country-regions #270

Merged
merged 12 commits into from Apr 5, 2022
Merged

Conversation

julie-vienne
Copy link
Contributor

@julie-vienne julie-vienne commented Mar 28, 2022

Adding country->region mapper, as done here:
2DegreesInvesting/r2dii.climate.stress.test@45e0064

@julie-vienne julie-vienne assigned jdhoffa and ghost Mar 28, 2022
@julie-vienne julie-vienne requested review from jdhoffa and a user March 28, 2022 13:23
@julie-vienne julie-vienne unassigned jdhoffa and ghost Mar 28, 2022
# Add countries for regions -----------------------------------------------

countries_for_regions_mapper_lookup <- tibble::tibble(
region = c("brazil", "brazil", "india", "india", "japan", "japan", "russia", "russia", "south africa", "south africa", "united states", "united states"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdhoffa leaving this PR to you to review. Just a quick not, @julie-vienne adapted the solution from a patch I did in stresstesting, so the added countries are restricted to those that are of relevance to stresstesting. I am not sure if that is sufficient in this context.
And a quick question. So that our project partners can use the patch this needs to be on CRAN. Is there a new CRAN release planned in the next weeks?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also @cjyetman

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MirijaHa I don't have enough knowledge or background to answer these questions, so you'll have to wait for @jdhoffa

That being said, these are a couple thoughts...

  • as far as I understand, the P4B team is preparing for their annual big update, so I suspect they will be making a new release on CRAN in the near future... I believe it's intended for April 1, but I don't know for sure

  • I don't know the real intended purpose of this region_isos object, but it does seem to me that this PR not only changes the values in that object, but also changes/adds conceptually what that object contains (single-country groups, versus multi-country groups). You've also made it clear that these changes are specific to stress testing, though this pkg is intended for a broader audience. That may be fine, but my gut, instinctual reaction is that modifying an object like that could be controversial, e.g. it would seem more fitting to make broader changes that are relevant to all the intended users. But as I said, I don't know much detail about this pkg, so @jdhoffa will have a much better sense of whether this type of change is reasonable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for example, see my other comment in this PR thread... if this script were used in its current state in this PR as intended (to create the data/region_isos.rda, which then gets included when the pkg is built), I'm almost certain it will not pass the test for "outputs expected regions for weo_2020". It could be the case that these new single-country regions that are defined here shoud be in the "expected regions for weo_2020" and the test for that needs to be updated as well, but I don't know what the bar is for adding new expected regions in this context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MirijaHa thanks for the context, @julie-vienne also mentioned to me the inspiration for the PR and it makes sense to me. We are indeed planning a CRAN release in the coming week(s), so I will be sure to bundle this in the next release.

@cjyetman I will do a more complete review now, but thanks for the input. Definitely, we still need to source the .R files and update a few tests to include the new regions. From a methodological perspective, this is definitely an "ok" thing to do, since some scenarios do have "country granularity", we just need to cross-reference these one-country regions against the scenarios that offer them.

Will do a complete review no :-)

data-raw/region_isos.R Outdated Show resolved Hide resolved
@cjyetman
Copy link
Member

I assume this will make this test fail (once data-raw/region_isos.R has been run and a new data/region_isos.rda has been committed)
https://github.com/2DegreesInvesting/r2dii.data/blob/d29c64618d02403c6efcea6f955ede830e4c637c/tests/testthat/test-region_isos.R#L24-L52

Copy link
Member

@jdhoffa jdhoffa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some particular comments directly to the PR, but additionally:

This PR currently only affects the file data-raw/region_isos.R. This file will need to be sourced to actually update the data in data/region_isos.rda (there's no way you could've known this since I didn't tell you hahaha). The steps that need to be followed are:

To add rows to data (an existing dataset in r2dii.data)

  • In "data-raw/data.R", edit file as necessary (something like):
    library(usethis)
    # Source: @<contributor> <URL to issue or pull request>
    newdata <- read_csv_(file.path("data-raw", "newdata.csv"))
    use_data(newdata, overwrite = TRUE)
    
  • Add regression test in tests/testthat/data.R to test that the desired change has successfully been applied to data
  • source("data-raw/data_dictionary.R"); devtools::load_all() (only necessary if column names of the dataset have changed)
  • Source data-raw/ and test all datasets remain the same:
    r2dii.usethis::source_data_raw()
    devtools::load_all()
    devtools::test()
    devtools::check()
    
  • If tests fail, consider if it is expected or unexpected, and adjust as necessary
  • Edit NEWS.md explaining dataset change, and referencing the relevant GH issue number.

# Add countries for regions -----------------------------------------------

countries_for_regions_mapper_lookup <- tibble::tibble(
region = c("brazil", "brazil", "india", "india", "japan", "japan", "russia", "russia", "south africa", "south africa", "united states", "united states"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MirijaHa thanks for the context, @julie-vienne also mentioned to me the inspiration for the PR and it makes sense to me. We are indeed planning a CRAN release in the coming week(s), so I will be sure to bundle this in the next release.

@cjyetman I will do a more complete review now, but thanks for the input. Definitely, we still need to source the .R files and update a few tests to include the new regions. From a methodological perspective, this is definitely an "ok" thing to do, since some scenarios do have "country granularity", we just need to cross-reference these one-country regions against the scenarios that offer them.

Will do a complete review no :-)

Comment on lines 251 to 252
isos = c("br", "br", "in", "in", "jp", "jp", "ru", "ru", "za", "za", "us", "us"),
source = rep(c("weo_2019", "weo_2020"), 6)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a long-term maintenance perspective, I am not in favour of manually doubling each country, and using rep(c("weo_2019", "weo_2020"), 6).

I would rather define separate objects for each source, and then bind them later. That way when we need to add a new scenario, we just add another object, and bind it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be something like this?

countries_for_regions_mapper_lookup <- tibble::tribble(
  ~region,          ~isos,  ~source,
  "brazil",         "br",   "weo_2019",
  "brazil",         "br",   "weo_2020",
  "india",          "in",   "weo_2019",
  "india",          "in",   "weo_2020",
  "japan",          "jp",   "weo_2019",
  "japan",          "jp",   "weo_2020",
  "russia",         "ru",   "weo_2019",
  "russia",         "ru",   "weo_2020",
  "south africa",   "za",   "weo_2019",
  "south africa",   "za",   "weo_2020",
  "united states",  "us",   "weo_2019",
  "united states",  "us",   "weo_2020"
)

or separated further by source?

countries_for_regions_we02019 <- tibble::tribble(
  ~region,          ~isos,  ~source,
  "brazil",         "br",   "weo_2019",
  "india",          "in",   "weo_2019",
  "japan",          "jp",   "weo_2019",
  "russia",         "ru",   "weo_2019",
  "south africa",   "za",   "weo_2019",
  "united states",  "us",   "weo_2019",
)

countries_for_regions_we02020 <- tibble::tribble(
  ~region,          ~isos,  ~source,
  "brazil",         "br",   "weo_2020",
  "india",          "in",   "weo_2020",
  "japan",          "jp",   "weo_2020",
  "russia",         "ru",   "weo_2020",
  "south africa",   "za",   "weo_2020",
  "united states",  "us",   "weo_2020"
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup exactly. I think the latter would be preferable, as it clearly separates each source, so adding a new scenario would be relatively trivial.

@jdhoffa
Copy link
Member

jdhoffa commented Mar 29, 2022

One final comment, I have just transferred an issue from https://github.com/2DegreesInvesting/r2dii.analysis that actually relates directly to this, see #271.

@julie-vienne can you add the text "Closes #271" as a comment on this PR? When this PR is merged, that issue will then automatically be closed. But, more importantly, it links the context of this PR (the "solution), to the issue (the "question"). In software development, this is a very crucial thing to do, especially if we ever want to revisit this change later down the road, and want to understand the context of why the change was made.

@julie-vienne
Copy link
Contributor Author

Closes #271

@julie-vienne
Copy link
Contributor Author

The commit 'Updating region_isos snapshot' is intentionally replacing the file region_isos.md with the newest version. This will make sure that the snapshot test passes as we're expecting the updated region_isos.md file.

@jdhoffa jdhoffa linked an issue Apr 4, 2022 that may be closed by this pull request
Copy link
Member

@jdhoffa jdhoffa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only comment here is on the addition of tibble as a dependency. Everything looks great otherwise, but see if you can, try to find a way to write the test that you added using only base R functions.

You could, for example, do something like this:

country_regions_weo_2019 <- c(
  "brazil",
  "india",
  "japan",
  "russia",
  "south africa",
  "united states"
)

expect_equal(
  setdiff(
    country_regions_weo_2019,
    region_isos[region_isos$source == "weo_2019",][['region']]
  ),
  character(0)
)

setdiff(a, b) will just check that a is a subset of b (ie. all elements of a are contained in b). If that is the case, it should return character(0).

NEWS.md Outdated
@@ -1,5 +1,7 @@
# r2dii.data (development version)

* 'region_isos' gained country specific regions for 'weo_2019' and 'weo_2020' (#270 @julie-vienne)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: We usually use the present tense for NEWS entries, ie. "region_isos gains ... ". Also, usually, the author mentioned in NEWS is the one that raised the issue (and only if they are external). You (@julie-vienne) will automatically be credited as you are the author of the PR. It is just meant as a way to show gratitude to external people contributing to open source projects.

(Also FYI any review comment preceded by "NIT" means "nit-picky", meaning they are just the opinion of the reviewer and not crucial to address to merge the PR).

DESCRIPTION Outdated
@@ -44,7 +44,8 @@ Suggests:
covr,
rlang,
rmarkdown,
testthat (>= 2.1.0)
testthat (>= 2.1.0),
tibble
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, tibble has been added as a soft dependency (Suggests). In general, try to introduce as few dependencies as possible. Especially since tibble is only required for tests, it would be preferable to try to think of a way to write the test without using that package (ie. using only base R functions).

Note: tibble along with other tidyverse packages are also already used in functions in the data-raw folder. However, since that folder is only ever sourced by the package maintainer, the packages used within it are not strict dependencies of the r2dii.data package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI.... the tests containing calls to {tibble} were triggering warnings (which cause failure) in R CMD check, so I suggested that @julie-vienne add {tibble} to Suggests (along with a full explanation of not wanting to add it as a hard dependency by adding it to Imports).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@jdhoffa jdhoffa Apr 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. In this case, the desired recourse would be to avoid tibble in the tests at all, rather than adding it to Suggests. r2dii.data has very few dependencies, since it is just a package of datasets. So, I would like to avoid adding any unless there is a very good reason (even soft ones) :-)

Copy link
Contributor Author

@julie-vienne julie-vienne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the previous commits should have been pushed together as they relate to the same topic.

Copy link
Member

@jdhoffa jdhoffa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @julie-vienne :-)

@jdhoffa jdhoffa merged commit e41b854 into main Apr 5, 2022
@jdhoffa jdhoffa deleted the bridging_file_1country_region branch April 5, 2022 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

region_isos needs to map regions to isos, but also countries to isos
3 participants