Add units column to commodities input file#1110
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1110 +/- ##
==========================================
+ Coverage 87.52% 87.58% +0.06%
==========================================
Files 55 55
Lines 7429 7474 +45
Branches 7429 7474 +45
==========================================
+ Hits 6502 6546 +44
Misses 630 630
- Partials 297 298 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This pull request adds a units column to the commodities input file (commodities.csv) to enable validation that all SED/SVD output flows from the same process have consistent units. This is important because annual fixed costs are distributed proportionally between outputs based on flow coefficients, which only makes sense if all outputs share the same units.
Changes:
- Added
units: Stringfield to theCommoditystruct for storing unit information - Implemented
validate_output_flows_units()function to check that all SED/SVD outputs from each process have matching units - Updated all example commodities.csv files with a units column (using "PJ" for energy commodities, "kt" for CO2, and "NOUNIT" as a placeholder)
- Added new test fixtures and test cases to verify the validation works correctly
- Created a utility shell script to help add columns to CSV files across all examples
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/commodity.rs | Added units field to the Commodity struct with documentation |
| src/input/commodity.rs | Updated CSV reading logic to parse the units field and updated test helper functions |
| src/input/process/flow.rs | Implemented validation function to check SED/SVD output units consistency and added comprehensive test cases |
| src/fixture.rs | Added new test fixtures for commodities with different units and updated existing fixtures |
| src/process.rs | Updated test commodity constructors to include units field |
| src/input/agent/commodity_portion.rs | Updated test commodity constructors to include units field |
| schemas/input/commodities.yaml | Added schema definition for the units field with clear documentation |
| examples/*/commodities.csv | Added units column to all example commodity files with appropriate values |
| tests/add_column.sh | Added utility script to automatically add columns to CSV files across examples |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #[rstest] | ||
| fn single_sed_svd_output(svd_commodity: Commodity, process: Process) { | ||
| let commodity = Rc::new(svd_commodity); | ||
| let (_, flows_map) = build_maps( | ||
| process, | ||
| std::iter::once((commodity.id.clone(), flow(commodity.clone(), 1.0))), | ||
| None, | ||
| ); | ||
|
|
||
| // Single output should always pass validation | ||
| validate_output_flows_units(&flows_map).unwrap(); | ||
| } |
There was a problem hiding this comment.
Add a test case for processes that have no SED/SVD outputs (e.g., a process that only produces OTH commodities). This would help catch the validation bug identified on lines 107-111 where the validation would fail for processes with zero SED/SVD outputs.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #[rstest] | ||
| fn output_flows_matching_units( | ||
| svd_commodity: Commodity, | ||
| sed_commodity: Commodity, | ||
| process: Process, | ||
| ) { | ||
| // Both commodities have the same units | ||
| assert_eq!(svd_commodity.units, sed_commodity.units); | ||
|
|
||
| let commodity1 = Rc::new(svd_commodity); | ||
| let commodity2 = Rc::new(sed_commodity); | ||
| let (_, flows_map) = build_maps( | ||
| process, | ||
| [ | ||
| (commodity1.id.clone(), flow(commodity1.clone(), 1.0)), | ||
| (commodity2.id.clone(), flow(commodity2.clone(), 2.0)), | ||
| ] | ||
| .into_iter(), | ||
| None, | ||
| ); | ||
|
|
||
| // Validation should pass since the units are the same | ||
| validate_output_flows_units(&flows_map).unwrap(); | ||
| } | ||
|
|
||
| #[rstest] | ||
| fn output_flows_mismatched_units( | ||
| sed_commodity_pj: Commodity, | ||
| sed_commodity_tonnes: Commodity, | ||
| process: Process, | ||
| ) { | ||
| // Ensure the two commodities have different units | ||
| assert_ne!(sed_commodity_pj.units, sed_commodity_tonnes.units); | ||
|
|
||
| let commodity1 = Rc::new(sed_commodity_pj); | ||
| let commodity2 = Rc::new(sed_commodity_tonnes); | ||
| let (_, flows_map) = build_maps( | ||
| process, | ||
| [ | ||
| (commodity1.id.clone(), flow(commodity1.clone(), 1.0)), | ||
| (commodity2.id.clone(), flow(commodity2.clone(), 2.0)), | ||
| ] | ||
| .into_iter(), | ||
| None, | ||
| ); | ||
|
|
||
| // Different units should cause validation to fail | ||
| let result = validate_output_flows_units(&flows_map); | ||
| assert!(result.is_err()); | ||
| assert!( | ||
| result | ||
| .unwrap_err() | ||
| .to_string() | ||
| .contains("has SED/SVD outputs with different units") | ||
| ); | ||
| } | ||
|
|
||
| #[rstest] | ||
| fn output_flows_other_commodity_ignored( | ||
| sed_commodity_pj: Commodity, | ||
| other_commodity: Commodity, | ||
| process: Process, | ||
| ) { | ||
| // Modify OTH commodity to have different units | ||
| let mut other_commodity = other_commodity; | ||
| other_commodity.units = "tonnes".into(); | ||
| assert_ne!(sed_commodity_pj.units, other_commodity.units); | ||
|
|
||
| let sed_commodity = Rc::new(sed_commodity_pj); | ||
| let oth_commodity = Rc::new(other_commodity); | ||
|
|
||
| let (_, flows_map) = build_maps( | ||
| process, | ||
| [ | ||
| (sed_commodity.id.clone(), flow(sed_commodity.clone(), 1.0)), | ||
| (oth_commodity.id.clone(), flow(oth_commodity.clone(), 2.0)), | ||
| ] | ||
| .into_iter(), | ||
| None, | ||
| ); | ||
|
|
||
| // OTH commodity should be ignored, validation should pass | ||
| validate_output_flows_units(&flows_map).unwrap(); | ||
| } | ||
|
|
||
| #[rstest] | ||
| fn single_sed_svd_output(svd_commodity: Commodity, process: Process) { | ||
| let commodity = Rc::new(svd_commodity); | ||
| let (_, flows_map) = build_maps( | ||
| process, | ||
| std::iter::once((commodity.id.clone(), flow(commodity.clone(), 1.0))), | ||
| None, | ||
| ); | ||
|
|
||
| // Single output should always pass validation | ||
| validate_output_flows_units(&flows_map).unwrap(); | ||
| } |
There was a problem hiding this comment.
Consider adding a test case that verifies inputs (negative coefficients) are correctly ignored during unit validation. For example, a process with SED/SVD inputs having different units but SED/SVD outputs with the same units should pass validation. This would explicitly demonstrate that the validation logic correctly filters flows by checking flow.coeff.value() > 0.0.
There was a problem hiding this comment.
Agree.
You may be able to make some of these tests parametrised to make life easier. For example, you could take coeffs for process flows and expected error message as arguments, then you could check for a few different kinds of errors with the same test. Just a thought!
alexdewar
left a comment
There was a problem hiding this comment.
Nearly there! There is one bug which needs fixing: validation will currently fail for processes without SED/SVD outputs.
Also, I don't think we want to allow users to supply an empty string for the unit name. Let's make them put something for every commodity.
I think we should add units to all the examples in this PR. @ahawkes can you advise what the units should be for the commodities in the simple, two_outputs and circularity examples? There's a lot of overlap between them -- presumably the units are the same where the commodity ID is the same?
schemas/input/commodities.yaml
Outdated
| This field is used for validation only and is not used for internal unit conversions. MUSE | ||
| will validate that all SED/SVD output flows from the same process have the same units, | ||
| as the annual fixed costs are distributed proportionally between outputs based on flow coefficients. | ||
| E.g. `Petajoules`, `Tonnes`. |
There was a problem hiding this comment.
I'd give the abbreviations because that's what users will probably do:
| E.g. `Petajoules`, `Tonnes`. | |
| E.g. "PJ" (petajoules) or "t" (tonnes) |
| .unwrap_err() | ||
| .to_string() | ||
| .contains("has SED/SVD outputs with different units") | ||
| ); |
There was a problem hiding this comment.
I'd just use the assert_error! macro in fixture.rs for this and match the whole message.
There was a problem hiding this comment.
@alexdewar
I've encountered a problem with matching the exact message because we're using unordered hashmaps/hashsets, so you'll get different errors popping up depending on the hashing order decided for process_flows at runtime for example. I can try and make the ordering consistent if you think it's worth it though.
There was a problem hiding this comment.
Ah, we've come across this in other places... Actually, I do think it would be worth switching to ordered maps/sets where needed to avoid this.
There was a problem hiding this comment.
I've got consistent ordering now, may be a simpler way to do it but not sure. The HashMap of process flows from where we do the validation seems to run quite deep, so rather than changing this to an IndexMap I just collect all the possible errors and sort them (by process_id, region, and year) before printing the error message
src/input/process/flow.rs
Outdated
| let sed_svd_output_units: HashSet<&str> = flows | ||
| .values() | ||
| .filter_map(|flow| { | ||
| let commodity = &flow.commodity; | ||
| (flow.coeff.value() > 0.0 | ||
| && matches!( | ||
| commodity.kind, | ||
| CommodityType::ServiceDemand | CommodityType::SupplyEqualsDemand | ||
| )) | ||
| .then_some(commodity.units.as_str()) | ||
| }) | ||
| .collect(); | ||
|
|
||
| ensure!( | ||
| sed_svd_output_units.len() <= 1, |
There was a problem hiding this comment.
Validation will fail if there are any processes without SED/SVD outputs. How about:
| let sed_svd_output_units: HashSet<&str> = flows | |
| .values() | |
| .filter_map(|flow| { | |
| let commodity = &flow.commodity; | |
| (flow.coeff.value() > 0.0 | |
| && matches!( | |
| commodity.kind, | |
| CommodityType::ServiceDemand | CommodityType::SupplyEqualsDemand | |
| )) | |
| .then_some(commodity.units.as_str()) | |
| }) | |
| .collect(); | |
| ensure!( | |
| sed_svd_output_units.len() <= 1, | |
| let mut sed_svd_output_units = flows | |
| .values() | |
| .filter_map(|flow| { | |
| let commodity = &flow.commodity; | |
| (flow.coeff.value() > 0.0 | |
| && matches!( | |
| commodity.kind, | |
| CommodityType::ServiceDemand | CommodityType::SupplyEqualsDemand | |
| )) | |
| .then_some(commodity.units.as_str()) | |
| }); | |
| let Some(first_unit) = sed_svd_output_units.next() else { | |
| // No SED/SVD output flows | |
| continue; | |
| }; | |
| ensure!( | |
| sed_svd_output_units.all(|unit|unit == first_unit), |
It would be good to have a test for this case too.
There was a problem hiding this comment.
I belatedly noticed that your way also works 😛
| #[rstest] | ||
| fn single_sed_svd_output(svd_commodity: Commodity, process: Process) { | ||
| let commodity = Rc::new(svd_commodity); | ||
| let (_, flows_map) = build_maps( | ||
| process, | ||
| std::iter::once((commodity.id.clone(), flow(commodity.clone(), 1.0))), | ||
| None, | ||
| ); | ||
|
|
||
| // Single output should always pass validation | ||
| validate_output_flows_units(&flows_map).unwrap(); | ||
| } |
| #[rstest] | ||
| fn output_flows_matching_units( | ||
| svd_commodity: Commodity, | ||
| sed_commodity: Commodity, | ||
| process: Process, | ||
| ) { | ||
| // Both commodities have the same units | ||
| assert_eq!(svd_commodity.units, sed_commodity.units); | ||
|
|
||
| let commodity1 = Rc::new(svd_commodity); | ||
| let commodity2 = Rc::new(sed_commodity); | ||
| let (_, flows_map) = build_maps( | ||
| process, | ||
| [ | ||
| (commodity1.id.clone(), flow(commodity1.clone(), 1.0)), | ||
| (commodity2.id.clone(), flow(commodity2.clone(), 2.0)), | ||
| ] | ||
| .into_iter(), | ||
| None, | ||
| ); | ||
|
|
||
| // Validation should pass since the units are the same | ||
| validate_output_flows_units(&flows_map).unwrap(); | ||
| } | ||
|
|
||
| #[rstest] | ||
| fn output_flows_mismatched_units( | ||
| sed_commodity_pj: Commodity, | ||
| sed_commodity_tonnes: Commodity, | ||
| process: Process, | ||
| ) { | ||
| // Ensure the two commodities have different units | ||
| assert_ne!(sed_commodity_pj.units, sed_commodity_tonnes.units); | ||
|
|
||
| let commodity1 = Rc::new(sed_commodity_pj); | ||
| let commodity2 = Rc::new(sed_commodity_tonnes); | ||
| let (_, flows_map) = build_maps( | ||
| process, | ||
| [ | ||
| (commodity1.id.clone(), flow(commodity1.clone(), 1.0)), | ||
| (commodity2.id.clone(), flow(commodity2.clone(), 2.0)), | ||
| ] | ||
| .into_iter(), | ||
| None, | ||
| ); | ||
|
|
||
| // Different units should cause validation to fail | ||
| let result = validate_output_flows_units(&flows_map); | ||
| assert!(result.is_err()); | ||
| assert!( | ||
| result | ||
| .unwrap_err() | ||
| .to_string() | ||
| .contains("has SED/SVD outputs with different units") | ||
| ); | ||
| } | ||
|
|
||
| #[rstest] | ||
| fn output_flows_other_commodity_ignored( | ||
| sed_commodity_pj: Commodity, | ||
| other_commodity: Commodity, | ||
| process: Process, | ||
| ) { | ||
| // Modify OTH commodity to have different units | ||
| let mut other_commodity = other_commodity; | ||
| other_commodity.units = "tonnes".into(); | ||
| assert_ne!(sed_commodity_pj.units, other_commodity.units); | ||
|
|
||
| let sed_commodity = Rc::new(sed_commodity_pj); | ||
| let oth_commodity = Rc::new(other_commodity); | ||
|
|
||
| let (_, flows_map) = build_maps( | ||
| process, | ||
| [ | ||
| (sed_commodity.id.clone(), flow(sed_commodity.clone(), 1.0)), | ||
| (oth_commodity.id.clone(), flow(oth_commodity.clone(), 2.0)), | ||
| ] | ||
| .into_iter(), | ||
| None, | ||
| ); | ||
|
|
||
| // OTH commodity should be ignored, validation should pass | ||
| validate_output_flows_units(&flows_map).unwrap(); | ||
| } | ||
|
|
||
| #[rstest] | ||
| fn single_sed_svd_output(svd_commodity: Commodity, process: Process) { | ||
| let commodity = Rc::new(svd_commodity); | ||
| let (_, flows_map) = build_maps( | ||
| process, | ||
| std::iter::once((commodity.id.clone(), flow(commodity.clone(), 1.0))), | ||
| None, | ||
| ); | ||
|
|
||
| // Single output should always pass validation | ||
| validate_output_flows_units(&flows_map).unwrap(); | ||
| } |
There was a problem hiding this comment.
Agree.
You may be able to make some of these tests parametrised to make life easier. For example, you could take coeffs for process flows and expected error message as arguments, then you could check for a few different kinds of errors with the same test. Just a thought!
src/fixture.rs
Outdated
| demand: DemandMap::new(), | ||
| units: "tonnes".into(), | ||
| } | ||
| } |
There was a problem hiding this comment.
I guess we probs won't want these fixtures anywhere outside of input/commodity.rs? In which case that's probably where they should live
|
Oh, I'm being stupid. You already fixed that bug. Nvm. But let's still add units to the examples. |
|
Simple: Processes |
|
Two_outputs: Processes |
|
Actually I guess you don't need process capacity units as it comes out in the wash. |
|
Thanks @ahawkes!
|
add test cases for process output validation move test fixtures from general fixtures to flow file remove unnecessary shell script add docs clarification
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/input/process/flow.rs
Outdated
| assert_error!( | ||
| result, | ||
| "Process process1 has SED/SVD outputs with different units: {\"tonnes\", \"PJ\"}" |
There was a problem hiding this comment.
The test assertion is checking for an exact match of the error message, including the set representation. However, HashSet iteration order is not guaranteed to be deterministic in Rust, which could cause this test to be flaky. The set could be formatted as either {"tonnes", "PJ"} or {"PJ", "tonnes"} depending on hash values.
Consider either:
- Using a BTreeSet instead of HashSet in the validation function to ensure deterministic ordering
- Making the test more flexible to accept either ordering
- Sorting the units in the error message before formatting
| assert_error!( | |
| result, | |
| "Process process1 has SED/SVD outputs with different units: {\"tonnes\", \"PJ\"}" | |
| let err = result.unwrap_err(); | |
| let msg = err.to_string(); | |
| assert!( | |
| msg | |
| == "Process process1 has SED/SVD outputs with different units: {\"tonnes\", \"PJ\"}" | |
| || msg | |
| == "Process process1 has SED/SVD outputs with different units: {\"PJ\", \"tonnes\"}", | |
| "unexpected error message: {msg}" |
src/input/process/flow.rs
Outdated
| ensure!( | ||
| sed_svd_output_units.len() <= 1, | ||
| "Process {process_id} has SED/SVD outputs with different units: {sed_svd_output_units:?} \ | ||
| in region: {region_id} and year: {year}" |
There was a problem hiding this comment.
The error message uses debug formatting for the HashSet ({sed_svd_output_units:?}), which displays the units as a set literal like {"tonnes", "PJ"}. For better user experience, consider formatting the units in a more readable way, such as a comma-separated list or using sorted values. For example: "Process {process_id} has SED/SVD outputs with different units: [PJ, tonnes] in region: {region_id} and year: {year}".
| ensure!( | |
| sed_svd_output_units.len() <= 1, | |
| "Process {process_id} has SED/SVD outputs with different units: {sed_svd_output_units:?} \ | |
| in region: {region_id} and year: {year}" | |
| let mut sed_svd_output_units_vec: Vec<&str> = | |
| sed_svd_output_units.iter().copied().collect(); | |
| sed_svd_output_units_vec.sort_unstable(); | |
| ensure!( | |
| sed_svd_output_units.len() <= 1, | |
| "Process {process_id} has SED/SVD outputs with different units: [{}] \ | |
| in region: {region_id} and year: {year}", | |
| sed_svd_output_units_vec.join(", ") |
add validation that commodity units aren't empty strings
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Description
Adds a units column to commodities.csv which allows us to validate that all associated output flows of a commodity for each process have the same units.
For the
muse1_defaultandtwo_regionsmodels I have added the units specified from here #1035 (comment)For the other models I've put in placeholders which will pass the validation since all units are the same. We can either get all the right ones in this PR or add them in future.
Fixes #1035
Type of change
Key checklist
$ cargo test$ cargo docpresent in the previous release
Further checks