Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The emissions*() functions preserve unmatched products and missing benchmarks #639

Merged
merged 42 commits into from
Feb 27, 2024

Conversation

maurolepore
Copy link
Member

@maurolepore maurolepore commented Dec 12, 2023

Relates to

Conflicts with

--

This PR focuses on emissions_profile*(). (For sector_profile*() see #738 (unmatched products) and PENDING (missing benchmarks).)

@Tilmon and @AnneSchoenauer please see the reprexes and let me know if this is what you expect or what needs to change. I'm aware one case should be impossible but that's something we can discuss later (#732). (Tilman, I believe you already saw this behaviour, Nothing new for you here.)

reprex `emissions_profile()`
library(tibble)
devtools::load_all()
#> ℹ Loading tiltIndicator

options(tibble.print_max = Inf, width = 500)

companies <- tribble(
    ~activity_uuid_product_uuid, ~clustered, ~companies_id,
                            "a",        "a",           "a",
                            "b",        "b",           "a",
                            "c",        "c",           "a"
)

co2 <- tibble::tribble(
    ~activity_uuid_product_uuid, ~co2_footprint, ~isic_4digit, ~tilt_sector, ~unit,
                            "a",              1,     "'1234'",          "a",   "a",
                            "b",              1,           NA,          "a",   "a"
)

result <- emissions_profile(companies, co2)

result |> unnest_product()
#> # A tibble: 13 × 7
#>    companies_id grouped_by       risk_category profile_ranking clustered activity_uuid_product_uuid co2_footprint
#>    <chr>        <chr>            <chr>                   <dbl> <chr>     <chr>                              <dbl>
#>  1 a            all              high                        1 a         a                                      1
#>  2 a            isic_4digit      high                        1 a         a                                      1
#>  3 a            tilt_sector      high                        1 a         a                                      1
#>  4 a            unit             high                        1 a         a                                      1
#>  5 a            unit_isic_4digit high                        1 a         a                                      1
#>  6 a            unit_tilt_sector high                        1 a         a                                      1
#>  7 a            all              high                        1 b         b                                      1
#>  8 a            isic_4digit      <NA>                       NA b         b                                      1
#>  9 a            tilt_sector      high                        1 b         b                                      1
#> 10 a            unit             high                        1 b         b                                      1
#> 11 a            unit_isic_4digit <NA>                       NA b         b                                      1
#> 12 a            unit_tilt_sector high                        1 b         b                                      1
#> 13 a            <NA>             <NA>                       NA c         c                                     NA

result |> unnest_company()
#> # A tibble: 24 × 4
#>    companies_id grouped_by       risk_category value
#>    <chr>        <chr>            <chr>         <dbl>
#>  1 a            all              high          0.667
#>  2 a            all              medium        0    
#>  3 a            all              low           0    
#>  4 a            all              <NA>          0.333
#>  5 a            isic_4digit      high          0.333
#>  6 a            isic_4digit      medium        0    
#>  7 a            isic_4digit      low           0    
#>  8 a            isic_4digit      <NA>          0.667
#>  9 a            tilt_sector      high          0.667
#> 10 a            tilt_sector      medium        0    
#> 11 a            tilt_sector      low           0    
#> 12 a            tilt_sector      <NA>          0.333
#> 13 a            unit             high          0.667
#> 14 a            unit             medium        0    
#> 15 a            unit             low           0    
#> 16 a            unit             <NA>          0.333
#> 17 a            unit_isic_4digit high          0.333
#> 18 a            unit_isic_4digit medium        0    
#> 19 a            unit_isic_4digit low           0    
#> 20 a            unit_isic_4digit <NA>          0.667
#> 21 a            unit_tilt_sector high          0.667
#> 22 a            unit_tilt_sector medium        0    
#> 23 a            unit_tilt_sector low           0    
#> 24 a            unit_tilt_sector <NA>          0.333
reprex: `emissions_profile_upstream()`
library(tibble)
devtools::load_all()
#> ℹ Loading tiltIndicator

options(tibble.print_max = Inf, width = 500)

companies <- tribble(
  ~activity_uuid_product_uuid, ~clustered, ~companies_id,
                          "a",        "a",           "a",
                          "b",        "a",           "a",
                  "unmatched",        "a",           "a"
)

co2 <- tribble(
  ~activity_uuid_product_uuid, ~input_activity_uuid_product_uuid, ~input_co2_footprint, ~input_isic_4digit, ~input_tilt_sector, ~input_tilt_subsector, ~input_unit,
                          "a",                               "a",                    1,           "'1234'",                "a",                   "a",         "a",
                          "b",                               "a",                    1,           "'1234'",                "a",                   "a",         "a"
)

result <- emissions_profile_upstream(companies, co2)

result |> unnest_product()
#> # A tibble: 13 × 8
#>    companies_id grouped_by                   risk_category profile_ranking clustered activity_uuid_product_uuid input_activity_uuid_product_uuid input_co2_footprint
#>    <chr>        <chr>                        <chr>                   <dbl> <chr>     <chr>                      <chr>                                          <dbl>
#>  1 a            all                          high                        1 a         a                          a                                                  1
#>  2 a            input_isic_4digit            high                        1 a         a                          a                                                  1
#>  3 a            input_tilt_sector            high                        1 a         a                          a                                                  1
#>  4 a            input_unit                   high                        1 a         a                          a                                                  1
#>  5 a            input_unit_input_isic_4digit high                        1 a         a                          a                                                  1
#>  6 a            input_unit_input_tilt_sector high                        1 a         a                          a                                                  1
#>  7 a            all                          high                        1 a         b                          a                                                  1
#>  8 a            input_isic_4digit            high                        1 a         b                          a                                                  1
#>  9 a            input_tilt_sector            high                        1 a         b                          a                                                  1
#> 10 a            input_unit                   high                        1 a         b                          a                                                  1
#> 11 a            input_unit_input_isic_4digit high                        1 a         b                          a                                                  1
#> 12 a            input_unit_input_tilt_sector high                        1 a         b                          a                                                  1
#> 13 a            <NA>                         <NA>                       NA a         unmatched                  <NA>                                              NA

result |> unnest_company()
#> # A tibble: 24 × 4
#>    companies_id grouped_by                   risk_category value
#>    <chr>        <chr>                        <chr>         <dbl>
#>  1 a            all                          high          0.667
#>  2 a            all                          medium        0    
#>  3 a            all                          low           0    
#>  4 a            all                          <NA>          0.333
#>  5 a            input_isic_4digit            high          0.667
#>  6 a            input_isic_4digit            medium        0    
#>  7 a            input_isic_4digit            low           0    
#>  8 a            input_isic_4digit            <NA>          0.333
#>  9 a            input_tilt_sector            high          0.667
#> 10 a            input_tilt_sector            medium        0    
#> 11 a            input_tilt_sector            low           0    
#> 12 a            input_tilt_sector            <NA>          0.333
#> 13 a            input_unit                   high          0.667
#> 14 a            input_unit                   medium        0    
#> 15 a            input_unit                   low           0    
#> 16 a            input_unit                   <NA>          0.333
#> 17 a            input_unit_input_isic_4digit high          0.667
#> 18 a            input_unit_input_isic_4digit medium        0    
#> 19 a            input_unit_input_isic_4digit low           0    
#> 20 a            input_unit_input_isic_4digit <NA>          0.333
#> 21 a            input_unit_input_tilt_sector high          0.667
#> 22 a            input_unit_input_tilt_sector medium        0    
#> 23 a            input_unit_input_tilt_sector low           0    
#> 24 a            input_unit_input_tilt_sector <NA>          0.333

.


TODO

  • Link related issue/PR.
  • Describe the goal of the PR. Avoid details that are clear in the diff.
  • Mark the PR as draft.
  • Include a unit test.
  • Review your own PR in "Files changed".
  • Ensure the PR branch is updated.
  • Ensure the checks pass.
  • Change the status from draft to ready.
  • Polish the PR title and description.
  • Assign a reviewer.

EXCEPTIONS

  • Slide here any item that you intentionally choose to not do.

@maurolepore maurolepore changed the title Leave a failing test At product level, NA in a benchmark column yields NA in the corresponding risk_category and profile_ranking Feb 5, 2024
@maurolepore maurolepore changed the title At product level, NA in a benchmark column yields NA in the corresponding risk_category and profile_ranking At product level, NA in a benchmark column yields NA in the corresponding risk_category and profile_ranking and preserve unmatched products Feb 8, 2024
@maurolepore maurolepore changed the title At product level, NA in a benchmark column yields NA in the corresponding risk_category and profile_ranking and preserve unmatched products NA in a benchmarks yield NA in risk_category / profile_ranking (#638), and preserve unmatched products (#567) Feb 8, 2024
@maurolepore maurolepore changed the title NA in a benchmarks yield NA in risk_category / profile_ranking (#638), and preserve unmatched products (#567) In emissions_profile*() preserve NA in benchmarks (#638) and unmatched products (#567) Feb 9, 2024
@maurolepore maurolepore changed the title In emissions_profile*() preserve NA in benchmarks (#638) and unmatched products (#567) In emissions_profile*() preserve missing benchmarks (#638) and unmatched products (#567) Feb 9, 2024
@maurolepore maurolepore force-pushed the 638_risk_category-must-be-NA branch 3 times, most recently from aa70209 to a6a2440 Compare February 10, 2024 18:07
@maurolepore maurolepore changed the title In emissions_profile*() preserve missing benchmarks (#638) and unmatched products (#567) emissions_profile*() preserves missing benchmarks & unmatched products Feb 10, 2024
@maurolepore maurolepore changed the title emissions_profile*() preserves missing benchmarks & unmatched products emissions*() preserves missing benchmarks & unmatched products Feb 10, 2024
Copy link
Member Author

@maurolepore maurolepore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AnneSchoenauer,

I believe I'm close to what you want for emissions_profile() and emissions_profile_upstream() at product level (see reprex in the top comment of this PR).

But at company level the output is weird. I get different things with different bechmarks and I'm quite lost about what to actually expect.

Please see the comments below where I @mention you.

@@ -157,3 +157,58 @@
18 0


# FIXME? at company level, `NA` in a benchmark yields the expected `value`s (#638)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AnneSchoenauer,

Note that these three different benchmarks with similar inputs behave differently. Is this a bug? What do you actually expect in each case?

@@ -166,3 +166,58 @@
18 0


# FIXME? at company level, `NA` in a benchmark yields the expected `value`s (#638)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AnneSchoenauer,

Note that these three different benchmarks with similar inputs behave differently. Is this a bug? What do you actually expect in each case?

@AnneSchoenauer
Copy link

Hi @maurolepore - thanks a lot for this. I am not 100% sure how to calculate it correctly but I provide to you the following data on product_level and company_level:

library(dplyr)
#> 
#> Attache Paket: 'dplyr'
#> Die folgenden Objekte sind maskiert von 'package:stats':
#> 
#>     filter, lag
#> Die folgenden Objekte sind maskiert von 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tibble)

# Directly create a toy dataset with specified columns and values
companies_co2 <- tibble(
  companies_id = rep("a", 13),
  grouped_by = c("all", "isic_4digit", "tilt_sector", "unit", "unit_isic_4digit", "unit_tilt_sector", "NA", 
                 "all", "isic_4digit", "tilt_sector", "unit", "unit_isic_4digit", "unit_tilt_sector"),
  risk_category = c("high", NA, "high", "high", NA, "high", NA, "medium", "medium", "medium", "medium", "medium", "medium"),
  profile_ranking = c(1, NA, 1, 1, NA, 1, NA, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5),
  clustered = c(rep("a", 6), "b", rep("c", 6)),
  activity_uuid_product_uuid = c(rep("a", 6), NA, rep("c", 6)),
  co2_footprint = c(rep(1, 6), NA, rep(0.5, 6))
)

# View the created toy dataset
print(companies_co2)
#> # A tibble: 13 × 7
#>    companies_id grouped_by       risk_category profile_ranking clustered
#>    <chr>        <chr>            <chr>                   <dbl> <chr>    
#>  1 a            all              high                      1   a        
#>  2 a            isic_4digit      <NA>                     NA   a        
#>  3 a            tilt_sector      high                      1   a        
#>  4 a            unit             high                      1   a        
#>  5 a            unit_isic_4digit <NA>                     NA   a        
#>  6 a            unit_tilt_sector high                      1   a        
#>  7 a            NA               <NA>                     NA   b        
#>  8 a            all              medium                    0.5 c        
#>  9 a            isic_4digit      medium                    0.5 c        
#> 10 a            tilt_sector      medium                    0.5 c        
#> 11 a            unit             medium                    0.5 c        
#> 12 a            unit_isic_4digit medium                    0.5 c        
#> 13 a            unit_tilt_sector medium                    0.5 c        
#> # ℹ 2 more variables: activity_uuid_product_uuid <chr>, co2_footprint <dbl>

# Intended final dataset format
companies <- tibble(
  companies_id = rep("a", 24),
  grouped_by = c("all", "all", "all", "NA", "isic_4digit", "isic_4digit", "isic_4digit", "NA", "tilt_sector", "tilt_sector", "tilt_sector", "NA", "unit", "unit", "unit", "NA", "unit_isic_4digit", "unit_isic_4digit", "unit_isic_4digit", "NA", "unit_tilt_sector", "unit_tilt_sector", "unit_tilt_sector", "NA"),
  risk_category = c("high", "medium", "low", NA, "high", "medium", "low", NA, "high", "medium", "low", NA, "high", "medium", "low", NA, "high", "medium", "low", NA, "high", "medium", "low", NA),
  risk_share = c(0.33, 0.33, 0, 0.33, 0, 0.33, 0, 0.67, 0.33, 0.33, 0, 0.33, 0.33, 0.33, 0, 0.33, 0, 0.33, 0, 0.67, 0.33, 0.33, 0, 0.33)
)

# View the intended final dataset
print(companies)
#> # A tibble: 24 × 4
#>    companies_id grouped_by  risk_category risk_share
#>    <chr>        <chr>       <chr>              <dbl>
#>  1 a            all         high                0.33
#>  2 a            all         medium              0.33
#>  3 a            all         low                 0   
#>  4 a            NA          <NA>                0.33
#>  5 a            isic_4digit high                0   
#>  6 a            isic_4digit medium              0.33
#>  7 a            isic_4digit low                 0   
#>  8 a            NA          <NA>                0.67
#>  9 a            tilt_sector high                0.33
#> 10 a            tilt_sector medium              0.33
#> # ℹ 14 more rows

Created on 2024-02-13 with reprex v2.0.2

I created 3 products as it is easier to see the shares on company level. As you can see one product 'clustered' 'b' was not been matched with ecoinvent - therefore it is NA. Plus we have a product namely 'clustered' 'a' which was not beeing able to be grouped in the 'isic' benchmarks. Therefore on company level we always have one third in the NA category as 'b' was not been able to be matched. If we look at the isic benchmarks, then two out of three products have no risk_value so 2/3 are 'NA'. For the other benchmarks we are able to match and benchmark two products 'a' and 'c'. As 'a' is a high product it and 'c' has a medium product we have 1/3 in 'high', 1/3 in 'medium' and 1/3 in 'NA'. Does this make sense?

@maurolepore
Copy link
Member Author

maurolepore commented Feb 20, 2024

@Tilmon FYI I already fixed the output at company level for the remaining case (preserve unmatched products) using the 6-benchmarks approach. Or so I think.

See #729. We can have a focus conversation there.

@Tilmon
Copy link
Contributor

Tilmon commented Feb 21, 2024

@maurolepore thanks a lot for your detailed response to my comment. This really helps to clarify the issues. As discussed in yesterday's sprint, I reviewed your proposed solutions carefully and will now respond to your questions here:

ml01: Do you agree with my interpretation of the reprex in the four bullets immediately above?

Yes, I agree. The code does what it is supposed to do for three cases (missing benchmark on product- and company-level, unmatched product on product-level). I will respond to whether the latest changes to the company-level output in case of unmatched products fix the problem with that case here #729.

ml02: Considering that adding the 7th benchmark would be relatively hard, shall we a) wait for Anne's input b) explore an easier alternative with the existing 6-benchmarks, or c) something else?

Summarizing our conclusion from yesterday's tech meeting: As 7th benchmark requires significant extra effort, we'll stick to a solution with six benchmarks for now to get the issue fixed. When @AnneSchoenauer is back, we can discuss, if we want to move to 7 benchmarks in the mid- or long-term for improved usability.

ml03: Can you help me create toy datasets that passed to our real code would generate an output at company level with value different than 0 in two of these: "high", "medium", and "low"? For example, I would want value to be c(1/3, 1/3, 1/3) where risk_category is c("high", "medium", NA), respectively. Conceptually it seems possible as you and Anne "draw" such results but in practice I fail to do that -- even if I create inputs that look almost identical to what you draw. That is a crucial motivation for you and Anne to write real reprexes -- instead of conceptual examples that only pretend to have used our code. That would put your real knowledge of the data and methodology in contact with the real code and tell us for sure if the code behaves as you expect.

I'll put it on my list and try to find some time later today!

@maurolepore
Copy link
Member Author

Thanks @Tilmon

RE ml03: I realize this item stands alone in its own issue: #730

@maurolepore maurolepore changed the title emissions*() preserves missing benchmarks & unmatched products emissions*() preserves missing benchmarks Feb 21, 2024
@maurolepore maurolepore changed the title emissions*() preserves missing benchmarks emissions*() preserves missing benchmarks & unmatched products Feb 21, 2024
@maurolepore maurolepore changed the title emissions*() preserves missing benchmarks & unmatched products The emissions*() functions preserve unmatched products and missing benchmarks Feb 25, 2024
@AnneSchoenauer
Copy link

AnneSchoenauer commented Feb 26, 2024

Dear @maurolepore,

Thanks a lot for this! It is such a substantial change and good to see that we get our heads around this.
Here my review of the reprexes:

This PR focuses on emissions_profile*(). (For sector_profile*() see #738 (unmatched products) and PENDING (missing benchmarks).)

@Tilmon and @AnneSchoenauer please see the reprexes and let me know if this is what you expect or what needs to change. I'm aware one case should be impossible but that's something we can discuss later (#732). (Tilman, I believe you already saw this behaviour, Nothing new for you here.)

reprex emissions_profile()

library(tibble)
devtools::load_all()
#> ℹ Loading tiltIndicator

options(tibble.print_max = Inf, width = 500)

companies <- tribble(
    ~activity_uuid_product_uuid, ~clustered, ~companies_id,
                            "a",        "a",           "a",
                            "b",        "b",           "a",
                            "c",        "c",           "a"
)

co2 <- tibble::tribble(
    ~activity_uuid_product_uuid, ~co2_footprint, ~isic_4digit, ~tilt_sector, ~unit,
                            "a",              1,     "'1234'",          "a",   "a",
                            "b",              1,           NA,          "a",   "a"
)

result <- emissions_profile(companies, co2)
result |> unnest_product()
#> # A tibble: 13 × 7
#>    companies_id grouped_by       risk_category profile_ranking clustered activity_uuid_product_uuid co2_footprint
#>    <chr>        <chr>            <chr>                   <dbl> <chr>     <chr>                              <dbl>
#>  1 a            all              high                        1 a         a                                      1
#>  2 a            isic_4digit      high                        1 a         a                                      1
#>  3 a            tilt_sector      high                        1 a         a                                      1
#>  4 a            unit             high                        1 a         a                                      1
#>  5 a            unit_isic_4digit high                        1 a         a                                      1
#>  6 a            unit_tilt_sector high                        1 a         a                                      1
#>  7 a            all              high                        1 b         b                                      1
#>  8 a            isic_4digit      <NA>                       NA b         b                                      1
#>  9 a            tilt_sector      high                        1 b         b                                      1
#> 10 a            unit             high                        1 b         b                                      1
#> 11 a            unit_isic_4digit <NA>                       NA b         b                                      1
#> 12 a            unit_tilt_sector high                        1 b         b                                      1
#> 13 a            <NA>             <NA>                       NA c         c                                     NA

result |> unnest_company()
#> # A tibble: 24 × 4
#>    companies_id grouped_by       risk_category value
#>    <chr>        <chr>            <chr>         <dbl>
#>  1 a            all              high          0.667
#>  2 a            all              medium        0    
#>  3 a            all              low           0    
#>  4 a            all              <NA>          0.333
#>  5 a            isic_4digit      high          0.333
#>  6 a            isic_4digit      medium        0    
#>  7 a            isic_4digit      low           0    
#>  8 a            isic_4digit      <NA>          0.667
#>  9 a            tilt_sector      high          0.667
#> 10 a            tilt_sector      medium        0    
#> 11 a            tilt_sector      low           0    
#> 12 a            tilt_sector      <NA>          0.333
#> 13 a            unit             high          0.667
#> 14 a            unit             medium        0    
#> 15 a            unit             low           0    
#> 16 a            unit             <NA>          0.333
#> 17 a            unit_isic_4digit high          0.333
#> 18 a            unit_isic_4digit medium        0    
#> 19 a            unit_isic_4digit low           0    
#> 20 a            unit_isic_4digit <NA>          0.667
#> 21 a            unit_tilt_sector high          0.667
#> 22 a            unit_tilt_sector medium        0    
#> 23 a            unit_tilt_sector low           0    
#> 24 a            unit_tilt_sector <NA>          0.333

This is exactly what I expect! So case 1 in which there are missing benchmarks is done!

reprex: emissions_profile_upstream()

library(tibble)
devtools::load_all()
#> ℹ Loading tiltIndicator

options(tibble.print_max = Inf, width = 500)

companies <- tribble(
  ~activity_uuid_product_uuid, ~clustered, ~companies_id,
                          "a",        "a",           "a",
                          "b",        "a",           "a",
                  "unmatched",        "a",           "a"
)

co2 <- tribble(
  ~activity_uuid_product_uuid, ~input_activity_uuid_product_uuid, ~input_co2_footprint, ~input_isic_4digit, ~input_tilt_sector, ~input_tilt_subsector, ~input_unit,
                          "a",                               "a",                    1,           "'1234'",                "a",                   "a",         "a",
                          "b",                               "a",                    1,           "'1234'",                "a",                   "a",         "a"
)

result <- emissions_profile_upstream(companies, co2)

result |> unnest_product()
#> # A tibble: 13 × 8
#>    companies_id grouped_by                   risk_category profile_ranking clustered activity_uuid_product_uuid input_activity_uuid_product_uuid input_co2_footprint
#>    <chr>        <chr>                        <chr>                   <dbl> <chr>     <chr>                      <chr>                                          <dbl>
#>  1 a            all                          high                        1 a         a                          a                                                  1
#>  2 a            input_isic_4digit            high                        1 a         a                          a                                                  1
#>  3 a            input_tilt_sector            high                        1 a         a                          a                                                  1
#>  4 a            input_unit                   high                        1 a         a                          a                                                  1
#>  5 a            input_unit_input_isic_4digit high                        1 a         a                          a                                                  1
#>  6 a            input_unit_input_tilt_sector high                        1 a         a                          a                                                  1
#>  7 a            all                          high                        1 a         b                          a                                                  1
#>  8 a            input_isic_4digit            high                        1 a         b                          a                                                  1
#>  9 a            input_tilt_sector            high                        1 a         b                          a                                                  1
#> 10 a            input_unit                   high                        1 a         b                          a                                                  1
#> 11 a            input_unit_input_isic_4digit high                        1 a         b                          a                                                  1
#> 12 a            input_unit_input_tilt_sector high                        1 a         b                          a                                                  1
#> 13 a            <NA>                         <NA>                       NA a         unmatched                  <NA>                                              NA

result |> unnest_company()
#> # A tibble: 24 × 4
#>    companies_id grouped_by                   risk_category value
#>    <chr>        <chr>                        <chr>         <dbl>
#>  1 a            all                          high          0.667
#>  2 a            all                          medium        0    
#>  3 a            all                          low           0    
#>  4 a            all                          <NA>          0.333
#>  5 a            input_isic_4digit            high          0.667
#>  6 a            input_isic_4digit            medium        0    
#>  7 a            input_isic_4digit            low           0    
#>  8 a            input_isic_4digit            <NA>          0.333
#>  9 a            input_tilt_sector            high          0.667
#> 10 a            input_tilt_sector            medium        0    
#> 11 a            input_tilt_sector            low           0    
#> 12 a            input_tilt_sector            <NA>          0.333
#> 13 a            input_unit                   high          0.667
#> 14 a            input_unit                   medium        0    
#> 15 a            input_unit                   low           0    
#> 16 a            input_unit                   <NA>          0.333
#> 17 a            input_unit_input_isic_4digit high          0.667
#> 18 a            input_unit_input_isic_4digit medium        0    
#> 19 a            input_unit_input_isic_4digit low           0    
#> 20 a            input_unit_input_isic_4digit <NA>          0.333
#> 21 a            input_unit_input_tilt_sector high          0.667
#> 22 a            input_unit_input_tilt_sector medium        0    
#> 23 a            input_unit_input_tilt_sector low           0    
#> 24 a            input_unit_input_tilt_sector <NA>          0.333

I think @Tilmon you now created a new benchmarked called "not matched" right? If this is the case the behaviour here would be not what you expect as this case would include the benchmark "not matched" as you indicated here. I like the idea about adding another benchmark a lot but I think for now it would be fine with the status that we have - so for the Bundesbank the above reprexes would be enough and they picture how I expect the code to behave. Is this fine for you @Tilmon? I would then leave this for an enhancement at a later stage to add another benchmark? What do you think?

@maurolepore
Copy link
Member Author

Thanks both for your input.

@AnneSchoenauer RE

This is exactly what I expect! So case 1 in which there are missing benchmarks is done!

Note the reprex shows not only the case of a missing benchmark but also the case of an unmatched product. Here I reproduce just the relevant bit. Focus on the value of activity_uuid_product_uuid and note:

  • "a" is matched.
  • "b" is matched but has a missing benchmark.
  • "c" is unmatched.
# ... more code
companies <- tribble(
    ~activity_uuid_product_uuid, ~clustered, ~companies_id,
                            "a",        "a",           "a",
                            "b",        "b",           "a",
                            "c",        "c",           "a"
)

co2 <- tibble::tribble(
    ~activity_uuid_product_uuid, ~co2_footprint, ~isic_4digit, ~tilt_sector, ~unit,
                            "a",              1,     "'1234'",          "a",   "a",
                            "b",              1,           NA,          "a",   "a"
)
# ... more code

I'll assume this case is also handled as you expect -- since otherwise Tilman and you would have likely noticed.

As discussed today during sprint planning, I'll go ahead and merge this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants