-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The emissions*()
functions preserve unmatched products and missing benchmarks
#639
Conversation
875b228
to
ff513ad
Compare
NA
in a benchmark column yields NA
in the corresponding risk_category
and profile_ranking
NA
in a benchmark column yields NA
in the corresponding risk_category
and profile_ranking
NA
in a benchmark column yields NA
in the corresponding risk_category
and profile_ranking
and preserve unmatched products
NA
in a benchmark column yields NA
in the corresponding risk_category
and profile_ranking
and preserve unmatched productsNA
in a benchmarks yield NA
in risk_category / profile_ranking
(#638), and preserve unmatched products (#567)
87bd2db
to
b30a95f
Compare
NA
in a benchmarks yield NA
in risk_category / profile_ranking
(#638), and preserve unmatched products (#567)emissions_profile*()
preserve NA
in benchmarks (#638) and unmatched products (#567)
emissions_profile*()
preserve NA
in benchmarks (#638) and unmatched products (#567)emissions_profile*()
preserve missing benchmarks (#638) and unmatched products (#567)
aa70209
to
a6a2440
Compare
emissions_profile*()
preserve missing benchmarks (#638) and unmatched products (#567)emissions_profile*()
preserves missing benchmarks & unmatched products
emissions_profile*()
preserves missing benchmarks & unmatched productsemissions*()
preserves missing benchmarks & unmatched products
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I'm close to what you want for emissions_profile()
and emissions_profile_upstream()
at product level (see reprex in the top comment of this PR).
But at company level the output is weird. I get different things with different bechmarks and I'm quite lost about what to actually expect.
Please see the comments below where I @mention
you.
@@ -157,3 +157,58 @@ | |||
18 0 | |||
|
|||
|
|||
# FIXME? at company level, `NA` in a benchmark yields the expected `value`s (#638) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that these three different benchmarks with similar inputs behave differently. Is this a bug? What do you actually expect in each case?
@@ -166,3 +166,58 @@ | |||
18 0 | |||
|
|||
|
|||
# FIXME? at company level, `NA` in a benchmark yields the expected `value`s (#638) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that these three different benchmarks with similar inputs behave differently. Is this a bug? What do you actually expect in each case?
Hi @maurolepore - thanks a lot for this. I am not 100% sure how to calculate it correctly but I provide to you the following data on product_level and company_level: library(dplyr)
#>
#> Attache Paket: 'dplyr'
#> Die folgenden Objekte sind maskiert von 'package:stats':
#>
#> filter, lag
#> Die folgenden Objekte sind maskiert von 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tibble)
# Directly create a toy dataset with specified columns and values
companies_co2 <- tibble(
companies_id = rep("a", 13),
grouped_by = c("all", "isic_4digit", "tilt_sector", "unit", "unit_isic_4digit", "unit_tilt_sector", "NA",
"all", "isic_4digit", "tilt_sector", "unit", "unit_isic_4digit", "unit_tilt_sector"),
risk_category = c("high", NA, "high", "high", NA, "high", NA, "medium", "medium", "medium", "medium", "medium", "medium"),
profile_ranking = c(1, NA, 1, 1, NA, 1, NA, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5),
clustered = c(rep("a", 6), "b", rep("c", 6)),
activity_uuid_product_uuid = c(rep("a", 6), NA, rep("c", 6)),
co2_footprint = c(rep(1, 6), NA, rep(0.5, 6))
)
# View the created toy dataset
print(companies_co2)
#> # A tibble: 13 × 7
#> companies_id grouped_by risk_category profile_ranking clustered
#> <chr> <chr> <chr> <dbl> <chr>
#> 1 a all high 1 a
#> 2 a isic_4digit <NA> NA a
#> 3 a tilt_sector high 1 a
#> 4 a unit high 1 a
#> 5 a unit_isic_4digit <NA> NA a
#> 6 a unit_tilt_sector high 1 a
#> 7 a NA <NA> NA b
#> 8 a all medium 0.5 c
#> 9 a isic_4digit medium 0.5 c
#> 10 a tilt_sector medium 0.5 c
#> 11 a unit medium 0.5 c
#> 12 a unit_isic_4digit medium 0.5 c
#> 13 a unit_tilt_sector medium 0.5 c
#> # ℹ 2 more variables: activity_uuid_product_uuid <chr>, co2_footprint <dbl>
# Intended final dataset format
companies <- tibble(
companies_id = rep("a", 24),
grouped_by = c("all", "all", "all", "NA", "isic_4digit", "isic_4digit", "isic_4digit", "NA", "tilt_sector", "tilt_sector", "tilt_sector", "NA", "unit", "unit", "unit", "NA", "unit_isic_4digit", "unit_isic_4digit", "unit_isic_4digit", "NA", "unit_tilt_sector", "unit_tilt_sector", "unit_tilt_sector", "NA"),
risk_category = c("high", "medium", "low", NA, "high", "medium", "low", NA, "high", "medium", "low", NA, "high", "medium", "low", NA, "high", "medium", "low", NA, "high", "medium", "low", NA),
risk_share = c(0.33, 0.33, 0, 0.33, 0, 0.33, 0, 0.67, 0.33, 0.33, 0, 0.33, 0.33, 0.33, 0, 0.33, 0, 0.33, 0, 0.67, 0.33, 0.33, 0, 0.33)
)
# View the intended final dataset
print(companies)
#> # A tibble: 24 × 4
#> companies_id grouped_by risk_category risk_share
#> <chr> <chr> <chr> <dbl>
#> 1 a all high 0.33
#> 2 a all medium 0.33
#> 3 a all low 0
#> 4 a NA <NA> 0.33
#> 5 a isic_4digit high 0
#> 6 a isic_4digit medium 0.33
#> 7 a isic_4digit low 0
#> 8 a NA <NA> 0.67
#> 9 a tilt_sector high 0.33
#> 10 a tilt_sector medium 0.33
#> # ℹ 14 more rows Created on 2024-02-13 with reprex v2.0.2 I created 3 products as it is easier to see the shares on company level. As you can see one product 'clustered' 'b' was not been matched with ecoinvent - therefore it is NA. Plus we have a product namely 'clustered' 'a' which was not beeing able to be grouped in the 'isic' benchmarks. Therefore on company level we always have one third in the NA category as 'b' was not been able to be matched. If we look at the isic benchmarks, then two out of three products have no risk_value so 2/3 are 'NA'. For the other benchmarks we are able to match and benchmark two products 'a' and 'c'. As 'a' is a high product it and 'c' has a medium product we have 1/3 in 'high', 1/3 in 'medium' and 1/3 in 'NA'. Does this make sense? |
@maurolepore thanks a lot for your detailed response to my comment. This really helps to clarify the issues. As discussed in yesterday's sprint, I reviewed your proposed solutions carefully and will now respond to your questions here:
Yes, I agree. The code does what it is supposed to do for three cases (missing benchmark on product- and company-level, unmatched product on product-level). I will respond to whether the latest changes to the company-level output in case of unmatched products fix the problem with that case here #729.
Summarizing our conclusion from yesterday's tech meeting: As 7th benchmark requires significant extra effort, we'll stick to a solution with six benchmarks for now to get the issue fixed. When @AnneSchoenauer is back, we can discuss, if we want to move to 7 benchmarks in the mid- or long-term for improved usability.
I'll put it on my list and try to find some time later today! |
emissions*()
preserves missing benchmarks & unmatched productsemissions*()
preserves missing benchmarks
emissions*()
preserves missing benchmarksemissions*()
preserves missing benchmarks & unmatched products
… in all levels of 'risk_catrgory' (#737)
emissions*()
preserves missing benchmarks & unmatched productsemissions*()
functions preserve unmatched products and missing benchmarks
Dear @maurolepore, Thanks a lot for this! It is such a substantial change and good to see that we get our heads around this.
This is exactly what I expect! So case 1 in which there are missing benchmarks is done!
I think @Tilmon you now created a new benchmarked called "not matched" right? If this is the case the behaviour here would be not what you expect as this case would include the benchmark "not matched" as you indicated here. I like the idea about adding another benchmark a lot but I think for now it would be fine with the status that we have - so for the Bundesbank the above reprexes would be enough and they picture how I expect the code to behave. Is this fine for you @Tilmon? I would then leave this for an enhancement at a later stage to add another benchmark? What do you think? |
Thanks both for your input.
Note the reprex shows not only the case of a missing benchmark but also the case of an unmatched product. Here I reproduce just the relevant bit. Focus on the value of
# ... more code
companies <- tribble(
~activity_uuid_product_uuid, ~clustered, ~companies_id,
"a", "a", "a",
"b", "b", "a",
"c", "c", "a"
)
co2 <- tibble::tribble(
~activity_uuid_product_uuid, ~co2_footprint, ~isic_4digit, ~tilt_sector, ~unit,
"a", 1, "'1234'", "a", "a",
"b", 1, NA, "a", "a"
)
# ... more code I'll assume this case is also handled as you expect -- since otherwise Tilman and you would have likely noticed. As discussed today during sprint planning, I'll go ahead and merge this PR. |
Relates to
NA
s in benchmark-columns should yieldNA
s inrisk_category
(except "all") #638emissions/sector_profile()
at product level should preserve unmatched products, filling withNA
s #657Conflicts with
xstr_polish_output_at_company_level()
should dropNA
s inrisk_category
#393--
This PR focuses on
emissions_profile*()
. (Forsector_profile*()
see #738 (unmatched products) and PENDING (missing benchmarks).)@Tilmon and @AnneSchoenauer please see the reprexes and let me know if this is what you expect or what needs to change. I'm aware one case should be impossible but that's something we can discuss later (#732). (Tilman, I believe you already saw this behaviour, Nothing new for you here.)
reprex `emissions_profile()`
reprex: `emissions_profile_upstream()`
.
TODO
EXCEPTIONS