add new summarize_weighted_buildout function #141

jdhoffa · 2020-07-21T14:42:28Z

This is a WIP, and requires #136 to be merged before checks will pass. I open here as a draft just to track my work.

Remains to be done:

Add tests
Write an article outlining the methodology
Refactor (summarize_weighted_buildout is very similar to summarize_weighted_production and I imagine there must be a more DRY way to accomplish this
Filter out input data if the start-year production is 0 (as this would result in "infinite" build-out percentage)

Relates to #132

R/summarize_weighted_buildout.R

…olume-build-out

tests/testthat/test-summarize_weighted_buildout.R

maurolepore · 2020-07-23T17:00:58Z

Surprisingly, all calls to join seem to use by and that I expected to suppress this message:

#> Joining, by = c("sector", "technology")

--

library(r2dii.data)
library(r2dii.match)
#> Warning: package 'r2dii.match' was built under R version 4.0.2
devtools::load_all()
#> Loading r2dii.analysis

master <- loanbook_demo %>%
  match_name(ald_demo) %>%
  prioritize() %>%
  join_ald_scenario(ald_demo,
                    scenario_demo_2020,
                    region_isos = region_isos_demo
  )

summarize_weighted_buildout(master)
#> Joining, by = c("sector", "technology")
#> # A tibble: 138 x 4
#>    sector     technology  year weighted_buildout
#>    <chr>      <chr>      <int>             <dbl>
#>  1 automotive electric    2020       0          
#>  2 automotive electric    2021       0.000000765
#>  3 automotive electric    2022       0.00000153 
#>  4 automotive electric    2023       0.00000230 
#>  5 automotive electric    2024       0.00000306 
#>  6 automotive electric    2025       0.00000383 
#>  7 automotive electric    2026       0.00000459 
#>  8 automotive hybrid      2020       0.0000528  
#>  9 automotive hybrid      2021       0.0000501  
#> 10 automotive hybrid      2022       0.0000474  
#> # … with 128 more rows

^{Created on 2020-07-23 by the reprex package (v0.3.0)}

--

Found the issue here. A call to inner_join() was passing to by not the character vector c("sector", "technology") but the columns c(.data$sector, .data$technology).

R/summarize_weighted_production.R

R/target_market_share.R

tests/testthat/test-summarize_weighted_production.R

vignettes/production-percent-change.Rmd

maurolepore · 2020-07-28T19:10:35Z

@jdhoffa, is it safe to update the regression references? I can't figure out what is that changed them. The change happens seems to happen between f5d9959 and 0e57f9c but it's not clear to me what it is. Do you know?

The previous approach is too recent in rlang, and would break unless the user has a recent version of rlang.

jdhoffa · 2020-07-28T19:46:22Z

@maurolepore yes it's safe to update. Im truly not sure what is causing them to mess up.

Conflicts: tests/testthat/test-summarize_weighted_production.R

maurolepore · 2020-07-28T20:37:30Z

tests/testthat/test-summarize_weighted_production.R

+test_that("is sensitive to `use_credit_limit`", {
+  data <- fake_master(
+    name_ald = rep(c("company a", "company b"), 2),
+    year = c(2020, 2020, 2021, 2021),
+    id_loan = c("i1", "i2", "i1", "i2"),
+    loan_size_credit_limit = c(20, 30, 20, 30),
+    production = c(10, 10, 20, 40),
+  )
+  out2 <- data %>%
+    summarize_weighted_percent_change(name_ald, use_credit_limit = TRUE)
+  out3 <- data %>%
+    summarize_weighted_percent_change(name_ald, use_credit_limit = FALSE)
+
+  expect_false(identical(out2, out3))
+})


This comes from expectations that I found outside a call to test_that(). I change the code to reflect how I would test that summarize_weighted_percent_change() is sensitive to use_credit_limit.

maurolepore · 2020-07-28T20:41:25Z

tests/testthat/test-summarize_weighted_production.R

-# TODO: Come back to this after the methodology article is written
-# test_that("with known input outputs as expected", {
-#   # styler: off
-  data <- fake_master(
-    name_ald = rep(c("company a", "company b"),2),
-    year =                  c(2020, 2020, 2021, 2021),
-    id_loan    =            c("i1", "i2", "i1", "i2"),
-    production            = c( 10,   10,   20,   40),
-  )
-  out1 <- summarize_weighted_percent_change(data, name_ald)
-  out1$weighted_percent_change
-  expect_equal(out1$weighted_percent_change, c(0,0,50,150))
-
-  # Is sensitive to `use_credit_limit`
-  # Reversing loan_size and production outputs reverse result
-  data2 <- fake_master(
-    name_ald = rep(c("company a", "company b"),2),
-    year =                  c(2020, 2020, 2021, 2021),
-    id_loan    =            c("i1", "i2", "i1", "i2"),
-    loan_size_credit_limit = c( 20,   30,   20,   30),
-    production            = c( 10,   10,   20,   40),
-  )
-  out2 <- summarize_weighted_percent_change(data2, name_ald, use_credit_limit = TRUE)
-  expect_equal(out2$weighted_percent_change, c(0,0,40,180))
-#   # styler: on
-# })
-


@jdhoffa, I'm not sure about removing this. It wasn't inside a test, nor it seems that the code and comments correspond to each other. Can you check this is really to throw away?

ah yes, i didn't mean to remove the test entirely! i meant only to remove the TODO comment haha.
I put the test back in

maurolepore · 2020-07-28T20:54:53Z

vignettes/production-percent-change.Rmd

@@ -0,0 +1,61 @@
+---
+title: "Indicator Choices: Production vs. Percent Change in Production"


Is "production" enough or do we need to qualify "production" with some word to more precisely distinguish it from "percent change in production"? Without a qualifier, percent in production seems like a subgroup of production when it is instead an alternative.

Three qualifiers that come to my mind are weighted, absolute, and raw.

I elected to add "Weighted" in every case

maurolepore · 2020-07-28T20:59:04Z

vignettes/production-percent-change.Rmd

+We aggregate the percent-change in production for each company to the portfolio-level, by using the same loan-weighted average as above. In particular, for each loan $l_j$ to company $j$, we have: 
+$$ \overline{\chi_i} = \sum_j \left[ \chi_{i,j} * \dfrac{l_j}{\sum_j l_j} \right]$$
+
+It should be noted that the percent change, $\chi$, is undefined for $0$ initial production. Intuitively, this makes sense, since you would require an "infinite percent" build-out to grow to anything from 0. For this reason, any company having $0$ initial production is filtered out prior to calculating the percent change indicator.


Where it says "$0$ initial production" to replace $0$ with just 0. You seem to refer not to the equation but to the value 0.

Maybe you mean "When initial production ($0$) takes a value of 0 (zero) ... ". If so, maybe you need to reword a bit.

maurolepore · 2020-07-28T21:00:12Z

vignettes/production-percent-change.Rmd

+
+```{r}
+# using the master dataset defined in the previous chunk:
+summarize_weighted_percent_change(master)


I removed the calls to head() because they seem necessary for tibbles. Is there a reason you prefer head()?

Hmmm I thought they were formatting strangely without head(), but it looks fine to me now.

maurolepore

@jdhoffa, I suggested a few changes and made a few comments. If you feel it's ready, then let's merge. If not, I'm happy to review again after you incorporate my review.

jdhoffa · 2020-07-29T09:54:04Z

I will merge once the checks pass! Thanks @maurolepore

add new summarize_weighted_buildout function

af42b27

maurolepore assigned jdhoffa Jul 23, 2020

maurolepore added this to In progress in r2dii via automation Jul 23, 2020

jdhoffa added 8 commits July 23, 2020 15:19

test summarize_weighted_buildout

a14f8d9

begin to make tests pass

c14a3f6

touch volume buildout article

540efd3

Merge branch 'master' into 132-volume-build-out

324202d

ignore article

0dbfc79

further tests for summarize_weighted_buildout

337c814

tidy summarize_weighted_buildout

0084735

document

a6fed72

maurolepore reviewed Jul 23, 2020

View reviewed changes

R/summarize_weighted_buildout.R Outdated Show resolved Hide resolved

maurolepore and others added 4 commits July 23, 2020 10:45

Suggest small rewrite for clarity

26cbda4

Remove leftover switch to turn off styler

af46931

add crucial equations to methodology article

d77c391

Merge remote-tracking branch 'origin/132-volume-build-out' into 132-v…

a7f2892

…olume-build-out

maurolepore reviewed Jul 23, 2020

View reviewed changes

tests/testthat/test-summarize_weighted_buildout.R Outdated Show resolved Hide resolved

DRY

dc7f26e

maurolepore reviewed Jul 23, 2020

View reviewed changes

R/summarize_weighted_production.R Outdated Show resolved Hide resolved

maurolepore reviewed Jul 23, 2020

View reviewed changes

R/summarize_weighted_production.R Outdated Show resolved Hide resolved

maurolepore reviewed Jul 23, 2020

View reviewed changes

R/summarize_weighted_production.R Show resolved Hide resolved