`sector_profile()` now accounts for unmatched `type`, `sector` or `subsector` #739

maurolepore · 2024-02-27T14:45:54Z

Closes NAs in benchmark-columns should yield NAs in risk_category (except "all") #638
Relates to The emissions*() functions preserve unmatched products and missing benchmarks #639
Note to self: Test `sector*()` focusing on `type`, `sector` and `subsector` #782

--

Given a clustered matching one but not a second type of scenario, when the scenarios dataset has the two types, then the second type and its corresponding scenario are still present in grouped_by, and the mismatch is reflected correctly in the value.

The expected behaviour is captured in this GoogleSheet and explained in #739 (comment) (thanks @Tilmon).

TODO

EXCEPTIONS

Slide here any item that you intentionally choose to not do.

maurolepore · 2024-03-04T18:58:49Z

@Tilmon (cc' @AnneSchoenauer)

I'm struggling to see the difference between the case "unmatched products" and the case "missing benchmarks".

For emissions*() I could clearly understand the difference and and create different tests for the cases with "unmatched product" versus "missing benchmark". For example, to test the case "unmatched product" I could add a product in the companies dataset that did not exist in the products dataset. And to test the case "missing benchmark" I could add an NA in the isic_4digit column of the products dataset.

However, for sector*() I can't understand the difference as clearly, and I realize that my tests for both cases essentially use the same type of toy input data: An NA or "unmatched" in either the columns sector, subsector, or type of the companies dataset.

Your #738 (comment) seems to support the idea that these two cases are no different:

I agree that in this case the unmatched value in sector is the important relationship.

The best way to clarify things is through a reproducible example that you are familiar with because it's based on the one you created in this GoogleSheet. At this point I'm so confused that I don't know if you think it shows one or both cases, but I hope it's a good start.

reprex

library(tibble)
devtools::load_all()
#> ℹ Loading tiltIndicator

packageVersion("tiltIndicator")
#> [1] '0.0.0.9211'

withr::local_options(list(tibble.print_max = Inf, width = 500))

companies <- tribble(
  ~companies_id, ~clustered, ~activity_uuid_product_uuid, ~tilt_sector, ~tilt_subsector,       ~type,     ~sector,  ~subsector,
            "a",        "a",                         "a",          "a",             "a",       "ipr",     "total",    "energy",
            "a",        "a",                         "a",          "a",             "a",       "weo",     "total",    "energy",
            "a",        "b",                 "unmatched",  "unmatched",     "unmatched", "unmatched", "unmatched", "unmatched",
            "a",        "c",                 "unmatched",          "c",             "c",       "ipr", "land use",   "land use",
            "a",        "c",                 "unmatched",          "c",             "c",       "weo",         NA,           NA
  )

scenarios <- tribble(
     ~sector,   ~subsector,  ~year, ~reductions, ~type, ~scenario,
     "total",     "energy",   2050,         1.0, "ipr",       "a",
     "total",     "energy",   2050,         0.6, "weo",       "a",
  "land use",   "land use",   2050,         0.3, "ipr",       "a"
)

sector_profile(companies, scenarios) |> unnest_product()
#> # A tibble: 5 × 11
#>   companies_id grouped_by risk_category profile_ranking clustered activity_uuid_product_uuid tilt_sector scenario  year type      tilt_subsector
#>   <chr>        <chr>      <chr>                   <dbl> <chr>     <chr>                      <chr>       <chr>    <dbl> <chr>     <chr>         
#> 1 a            ipr_a_2050 high                      1   a         a                          a           a         2050 ipr       a             
#> 2 a            weo_a_2050 medium                    0.6 a         a                          a           a         2050 weo       a             
#> 3 a            <NA>       <NA>                     NA   b         unmatched                  unmatched   <NA>        NA unmatched unmatched     
#> 4 a            ipr_a_2050 low                       0.3 c         unmatched                  c           a         2050 ipr       c             
#> 5 a            <NA>       <NA>                     NA   c         unmatched                  c           <NA>        NA weo       c

sector_profile(companies, scenarios) |> unnest_company()
#> # A tibble: 8 × 4
#>   companies_id grouped_by risk_category value
#>   <chr>        <chr>      <chr>         <dbl>
#> 1 a            ipr_a_2050 high          0.25 
#> 2 a            ipr_a_2050 medium        0    
#> 3 a            ipr_a_2050 low           0.25 
#> 4 a            ipr_a_2050 <NA>          0.5  
#> 5 a            weo_a_2050 high          0    
#> 6 a            weo_a_2050 medium        0.333
#> 7 a            weo_a_2050 low           0    
#> 8 a            weo_a_2050 <NA>          0.667

It it possible that this is already all you want? If yes, we're done and I can close this PR. If not, what do you expect?

Tilmon · 2024-03-04T20:47:12Z

Hi @maurolepore ,

let me respond to the points you raised below. I first start with a more detailed description of how to think about the two different cases and then explain why I think the reprex is unfortunately wrong.

Description of the "two cases"

I'm struggling to see the difference between the case "unmatched products" and the case "missing benchmarks".

I must admit that I find the difference in the case of sector*() also more complicated, but it's still there. Let's think about it that way: For the sector*() it doesn't matter, whether a clustered is matched to Ecoinvent or not. What matters is:

Whether the clustered has a tilt_sector. If it doesn't have a tilt_sector , the clustered will neither have a sector or subsector for any of the types. Hence, you can think of the tilt_sector as the activity_uuid_product_uuid of sector*(): If the tilt_sector (activity_uuid_product_uuid) is missing, you won't have any sector data (co2_footprint) that can be used to calculate the profile. Because ultimately, the scenario sectors give us the info on the reduction targets. I.e., no scenario sector = no result at all. Hence, tilt_sector == unmatched should be have as the activity_uuid_product_uuid == unmatched in the emission*().
If the tilt_sector & tilt_subsector leads to a sector & subsector for either of the type ipr or weo or both or none. You can think of the sector x subsector x type x year combination as the 6 benchmarks in the emission*() . For each clustered with a tilt_sector, we want to show all benchmarks, i.e. all corresponding combinations of sector x subsector x type x year for the specific tilt_sector x tilt_subsector , even if some are NAs. I.e., for every clustered with a tilt_sector, we should show the benchmarks weo_2030, weo_2050, ipr_2030, ipr_2050, even if some are NA (as in your reprex clustered c has no corresponding weo sector) - similar to an NA in isic_4digit in emission_profile().

Reprex
Your reprex shows BOTH cases.

clustered b has no tilt_sector (equivalent of no activity_uuid_product_uuid) and will hence lead to no results for the indicator (as there is no sector, equivalent to no co2_footprint). This is what I describe under 1.
clustered c has a tilt_sector (equivalent to matched product with activity_uuid_product_uuid) and hence will lead to results for the sector x subsector x type x year combination, even if there will be some NAs (equivalent to an activity_uuid_product_uuid leading to results for at least some of the 6 benchmarks).

So the reprex example is great to discuss the issue. I see two problems with the reprex you shared, one on product-level and one on company-level:

product-level: clustered c should have the grouped_by value "weo_a_2050" instead of "NA". We have a tilt_sector for that clustered and hence should show all benchmarks, even if they are NA.
company-level: The results show that for ipr_a_2050, there are 25% of products in high and 25% in low risk category. That can't be right, because we only have 3 products. Instead, we have 1 clustered with high, 1 with low, and one with NA benchmark , i.e. we should have for the benchmark ipr_a_2050 the following values: high 1/3, medium 0, low 1/3, NA 1/3.

I hope this helps. Let me know if it does!

P.S. I realize my comment here was not very helpful, as it's not wrong but leads to more confusion about the two different cases.

Your #738 (comment) seems to support the idea that these two cases are no different: "But I agree that in this case the unmatched value in sector is the important relationship."

I agree that in this case the unmatched value in sector is the important relationship.

maurolepore · 2024-05-17T22:10:45Z

it's best to pause the sector_upstream_profile() work for now -- @Tilmon in #739 (comment)

Noted (#784).

Thanks!

Handle missing benchmark

4fea92b

maurolepore changed the title ~~Handle missing benchmark~~ The sector*() functions now handle missing "benchmark" Feb 27, 2024

maurolepore mentioned this pull request Feb 29, 2024

The sector*() functions now preserve unmatched products #738

Merged

11 tasks

maurolepore added 3 commits March 4, 2024 09:29

Fix merge conflicts

40b2b35

wip

62dccc0

wip

f62764c

maurolepore mentioned this pull request Mar 5, 2024

Create Bundesbank data (focus on Germany and exclude *_upstream()) 2DegreesInvesting/tiltIndicatorAfter#153

Closed

10 tasks

maurolepore self-assigned this Mar 5, 2024

maurolepore changed the title ~~The sector*() functions now handle missing "benchmark"~~ The sector*() functions should preserve "missing benchmarks" Mar 5, 2024

maurolepore mentioned this pull request Mar 12, 2024

Explore why the company "auto will gmbh" has 40% of products with NA in sector_profile #745

Closed

maurolepore added the bug Something isn't working label Mar 14, 2024

Tilmon added this to the Database launch milestone Apr 23, 2024

Merge branch 'main' into 733_sector_missing_benchmark

6e56a7d

maurolepore force-pushed the 733_sector_missing_benchmark branch from 7e45582 to 6e56a7d Compare May 13, 2024 14:39

maurolepore added 13 commits May 13, 2024 15:51

Revert to main

8a0d1ef

Unmatched tilt_sector yields NA in grouped_by

9be3307

Add related expectations

895f1ca

Test matching sector, subsector, and type

adc8d16

Style

199dae6

Polish

1d1dca2

Polish

fcb44ec

Typo

d25901d

Prune

6289cd9

Refactor

5561309

Refactor

0a55f66

Refactor

3d9462d

Refactor

ec88b56

Tilmon mentioned this pull request May 14, 2024

Generate all tilt output data after completion of webtool-relevant tickets #783

Open

4 tasks

maurolepore added 24 commits May 17, 2024 23:25

Refactor

12f6815

refactor

b893978

refactor

a88dd29

refactor

d224336

refactor

e4d8df7

refactor

75635f7

refactor

50c0e8f

refactor

d8dd898

Refactor

bfad25f

refactor

8d16b6f

Refactor

5c2a938

refactor

0c1b4c6

refactor

d0679bc

refactor

4a10b1c

refactor

c763e00

refactor

1dd185b

refactor

1d8cd31

refactor

b19ff6d

update snaps

9fbe783

refactor

85739d0

refactor

ede3981

refactor

e21b506

refactor

40f3956

Refactor: Gather thigns that change for the same reasons

9e66215

maurolepore marked this pull request as ready for review May 20, 2024 17:21

maurolepore merged commit 61b4fa9 into main May 20, 2024
14 checks passed

maurolepore deleted the 733_sector_missing_benchmark branch May 20, 2024 17:21

This was referenced Jul 2, 2024

profile_sector() fails to handle case "c" 2DegreesInvesting/tiltIndicatorAfter#279

Open

profile_sector() fails to handle case "b" 2DegreesInvesting/tiltIndicatorAfter#280

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`sector_profile()` now accounts for unmatched `type`, `sector` or `subsector` #739

`sector_profile()` now accounts for unmatched `type`, `sector` or `subsector` #739

maurolepore commented Feb 27, 2024 •

edited

Loading

maurolepore commented Mar 4, 2024 •

edited

Loading

Tilmon commented Mar 4, 2024 •

edited

Loading

maurolepore commented May 17, 2024 •

edited

Loading

sector_profile() now accounts for unmatched type, sector or subsector #739

sector_profile() now accounts for unmatched type, sector or subsector #739

Conversation

maurolepore commented Feb 27, 2024 • edited Loading

maurolepore commented Mar 4, 2024 • edited Loading

Tilmon commented Mar 4, 2024 • edited Loading

maurolepore commented May 17, 2024 • edited Loading

`sector_profile()` now accounts for unmatched `type`, `sector` or `subsector` #739

`sector_profile()` now accounts for unmatched `type`, `sector` or `subsector` #739

maurolepore commented Feb 27, 2024 •

edited

Loading

maurolepore commented Mar 4, 2024 •

edited

Loading

Tilmon commented Mar 4, 2024 •

edited

Loading

maurolepore commented May 17, 2024 •

edited

Loading