-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "other" industry demand #355
base: develop
Are you sure you want to change the base?
Add "other" industry demand #355
Conversation
…en/add-chemical-industry
…into add-other-industry-demand
…ther-industry-demand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tud-mchen6 could you confirm that the only change in this PR compared to #340 is the chemicals_industry.py
and other_industry.py
files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realised that the PR title mentions only "other", so feel free to ignore my "chemical_industry.py" comments.
jrc_prod_df = jrc_prod_df.drop(specific_industries, level="cat_name") | ||
|
||
# ------------------------------------------------------------------------- | ||
# Process data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would split out these main commented steps into separate functions
# if it can be met by electricity (exclusively or otherwise), | ||
# then it's an end-use electricity demand | ||
electrical_consumption = ( | ||
jrc.get_carrier_demand("Electricity", demand, jrc_energy_df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For each of the named carriers ("Electricity", "Natural gas (incl.biogas)", etc.), I'd move them to constants at the start of the file so they're easier to see grouped together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved these to the configuration. Fully flexible now!
"subsector", level="cat_name" | ||
) | ||
all_other_consumption_filled = all_other_consumption_filled.stack() | ||
breakpoint() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover from debugging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
If compared to the commit of #340 at the same time point, the only change is indeed that Now the branch of #340 has developed quite far (e.g. integrating the JRC data processing script), and these branches need to be merged with some effort. However, #340 has never developed the functionality of chemical industry or "other" industry, so functionality wise they are still separated. |
This PR now integrates the JRC module code and
|
@irm-codebase ready to rebase onto develop now that #340 is merged in! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brynpickering ready for review!
CHE pre-processing was leading to some funky issues, so I've added comments on how I dealt with it.
@@ -90,8 +81,11 @@ def fill_missing_countries_years( | |||
_to_fill = _to_fill.bfill(dim="year") | |||
all_filled = _to_fill.ffill(dim="year") | |||
|
|||
all_filled = jrc.ensure_standard_coordinates(all_filled) | |||
all_filled = all_filled.assign_attrs(units="twh") | |||
# TODO: CHE has no values for "Wood and wood products" and "Transport Equipment". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHE was triggering assert
failures due to some missing data. For now I am just assuming those values are 0 (same as SCEC).
This is mostly because the CHE processing above this module does not seem to provide data for these sectors, meaning they are filled with nan
in all years, so none of our filling methods work.
Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we don't have much choice on that. I assume they are in "other industry" in the CHE data so we can't extract them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just some minor changes to make. I'll go through it again tomorrow by running the code and checking the results at different points. Would be good if a data consistency check existed somewhere, i.e. that no energy demand is lost / added.
modules/industry/config.yaml
Outdated
@@ -9,5 +9,10 @@ industry: | |||
placeholder-out1: | |||
placeholder-out2: | |||
params: | |||
specific-industries: ["Iron and steel", "Chemicals Industry"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rename this param. "specific" isn't very descriptive and "industries" might be better as "subsectors". subsectors-to-decarbonise
, subsectors-to-electrify
, electrified-subsectors
, ... ? I don't like any of those particularly but maybe they can trigger a better idea 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separated-subsectors
, subsectors-to-process-individually
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to non-generic-categories
and changed other
to generic-config
.
Hopefully this makes processing clearer.
modules/industry/industry.smk
Outdated
output: | ||
path_output = f"{BUILD_PATH}/annual_demand_steel.nc" | ||
script: f"{SCRIPT_PATH}/steel_industry.py" | ||
if "Iron and steel" in config["params"]["specific-industries"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the best way to add this conditionality? @timtroendle maybe you can comment.
The other approach would be to use a conditional list as inputs in a later rule, e.g.:
rule merge_industry_demands:
input:
specific_industries = expand(f"{BUILD_PATH}/annual_demand_{subsector}.nc", subsector=subsector_translator(config["params"]["specific-industries"]))
where subsector_translator
is a helper function to map e.g. Steel and Iron
to steel
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I have something useful to say as I am not fully aware of the context. I wonder: Why would steel industry be excluded? Is there a use-case for that? If not, then there is no conditionality needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is not a "specific" industry then it gets automatically lumped in as "other". So it's possible to overhaul the steel industry to decarbonise feedstocks or to just pipe all demands without converting any processes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the best way to add this conditionality? @timtroendle maybe you can comment.
The other approach would be to use a conditional list as inputs in a later rule, e.g.:
rule merge_industry_demands: input: specific_industries = expand(f"{BUILD_PATH}/annual_demand_{subsector}.nc", subsector=subsector_translator(config["params"]["specific-industries"]))where
subsector_translator
is a helper function to map e.g.Steel and Iron
tosteel
.
I like this approach because it's pretty easy to follow and does not add much complexity.
For now I'll keep it as-is since the merging step needs development (SCEC also has a scaling step there).
modules/industry/industry.smk
Outdated
conda: CONDA_PATH | ||
params: | ||
config_params = config["params"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a moment to separate out params. Pass specific-industries
and other
to the script individually so that e.g. steel
param changes don't re-trigger this rule.
input: | ||
output: f"{BUILD_PATH}/other_industry.csv" | ||
path_energy_balances = config["inputs"]["path-energy-balances"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since inputs are all paths, I tend to prefer not prepending with path_
here and path-
in the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep path
to avoid confusion between variables holding data and those holding strings.
A bit more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if all inputs hold strings...? How about config["input-paths"]["energy-balances"]
etc?
path_jrc_industry_production: str, | ||
path_output: Optional[str] = None, | ||
) -> xr.DataArray: | ||
"""Execute the default data processing pipeline all non-specific industries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Execute the default data processing pipeline all non-specific industries. | |
"""Merge all industries not selected for individual processing into a single `other` subsector using a default data processing pipeline. |
# Process data: | ||
# Extract useful dem. -> remove useful dem. from rest -> extract final dem. | ||
selected_useful = config_params["other"]["useful-demands"] | ||
other_useful_demand = jrc.convert_subsec_demand_to_carrier( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subsec -> subsector. Not worth the lack of readibility caused by removal of letters
|
||
final_method = config_params["other"]["final-energy-method"] | ||
jrc_energy = jrc_energy.drop_sel(subsection=selected_useful) | ||
if final_method == "priority": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there other methods on the nearterm horizon? If not, it doesn't seem worth introducing this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added another method that just keeps all the final demand without assumptions. There are probably better methods to process this part, but these two are good enough for now.
I changed the check to a match
-case
style so this can grow over time.
) | ||
|
||
# Fix the naming | ||
for carrier in JRC_TO_CALLIOPE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite verbose. Something like:
other_demand.coords["carrier_name"] = other_demand["carrier_name"].to_series().rename(index=JRC_TO_CALLIOPE).index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've actually removed the renaming step from this file. It makes more sense to do it once we merge all category files into one and re-scale it (if necessary).
def transform_final_demand_by_priority( | ||
jrc_energy: xr.Dataset, carrier_priority: list[str] | ||
) -> xr.DataArray: | ||
"""Transform final demand of all sectors by giving priority to certain carriers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subsectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to "category" (see prev. comment).
@brynpickering I've implemented your comments, and a couple of extras. The biggest updates are:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm almost happy for this one to go through! Just a couple of minor comments on naming and unit checking.
@@ -9,5 +9,10 @@ industry: | |||
placeholder-out1: | |||
placeholder-out2: | |||
params: | |||
steel: | |||
non-generic-categories: ["Iron and steel", "Chemicals Industry"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still concerned that these will cause confusion down the line. How about "discrete-categories" and "merged-categories"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about "explicit-categories" or "explicit-subsectors" or even "explicitly-modelled-subsectors"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of "explicit" initially too. I then had a look in the dictionary and decided it wasn't quite right (it's more about leaving nothing implied). I'm not sure that "discrete" is right either. "separate"? "independent"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... I also struggled with names. A good name has to do two things:
- Specify that these are specific / individual to a given category.
- Imply that all the rests will go to the "other" / generic / merged.
Here are words with the antonym, based on your comments.
- Specific / generic categories
- Separate / combined categories < --- I like this one the best.
- Independent / dependent categories
- Explicit / implicit categories
- Discrete / joined categories
I suggest "separate-category-processing" and "combined-category-processing".
recycled-steel-share: 0.5 # % of recycled scrap steel for H-DRI | ||
generic-config: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"merged-categories-config"?
input: | ||
output: f"{BUILD_PATH}/other_industry.csv" | ||
path_energy_balances = config["inputs"]["path-energy-balances"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if all inputs hold strings...? How about config["input-paths"]["energy-balances"]
etc?
output: f"{BUILD_PATH}/other_industry.csv" | ||
script: f"{SCRIPT_PATH}/other_industry.py" | ||
path_output = f"{BUILD_PATH}/annual_demand_generic.nc" | ||
script: f"{SCRIPT_PATH}/generic_processing.py" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"merged_category_processing.py"? I would always have category
added to whatever adjective we choose!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Why not use "subsector" instead of "category"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@irm-codebase is using "category" as it aligns with the JRC-IDEES naming convention. I don't really mind if it's subsector or category, so long as it remains the same throughout the whole submodule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timtroendle: Bryn is right on this one. Category is not a word we use often, but it makes sense int the context of JRC-IDEES data. I'd like to keep it that way.
jrc_prod = jrc_prod.drop_sel(cat_name=non_generic_categories) | ||
|
||
# Process data: | ||
# Extract useful dem. -> remove useful dem. from rest -> extract final dem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dem -> demand
other_final_demand = transform_final_demand_by_priority( | ||
jrc_energy, generic_config["final-energy-carriers"] | ||
) | ||
case "keep everything": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this doesn't lead to any double counting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An assert statement would be helpful.
@@ -90,8 +81,11 @@ def fill_missing_countries_years( | |||
_to_fill = _to_fill.bfill(dim="year") | |||
all_filled = _to_fill.ffill(dim="year") | |||
|
|||
all_filled = jrc.ensure_standard_coordinates(all_filled) | |||
all_filled = all_filled.assign_attrs(units="twh") | |||
# TODO: CHE has no values for "Wood and wood products" and "Transport Equipment". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we don't have much choice on that. I assume they are in "other industry" in the CHE data so we can't extract them.
@@ -64,16 +92,15 @@ def get_subsection_final_intensity( | |||
final_intensity = useful_intensity / carrier_eff | |||
|
|||
# Prettify | |||
final_intensity = ensure_standard_coordinates(final_intensity) | |||
final_intensity = final_intensity.assign_attrs(units="twh/kt") | |||
final_intensity = standardize(final_intensity, "twh/kt", name="final_intensity") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth checking the unit of the incoming data before setting this to twh/kt
. I.e., useful_demand
should have the unit twh
.
useful_intensity = ensure_standard_coordinates(useful_intensity) | ||
useful_intensity = useful_intensity.assign_attrs(units="twh/kt") | ||
useful_intensity.name = "useful_intensity" | ||
useful_intensity = standardize(useful_intensity, "twh/kt", name="useful_intensity") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above RE checking unit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall. I have a few minor comments.
@@ -4,7 +4,7 @@ | |||
|
|||
### Added (models) | |||
|
|||
* **ADD** industry module and steel industry energy demand processing. NOT CONNECTED TO THE MAIN WORKFLOW. Industry sectors pending: chemical, "other". (Fixes #308, #310, #347, #345 and #346) | |||
* **ADD** industry module and steel industry energy demand processing. NOT CONNECTED TO THE MAIN WORKFLOW. Industry sectors pending: chemical. (Fixes #308, #309, #310, #347, #345 and #346) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be "ADD industry module including steel and other industry energy demand"?
@@ -9,5 +9,10 @@ industry: | |||
placeholder-out1: | |||
placeholder-out2: | |||
params: | |||
steel: | |||
non-generic-categories: ["Iron and steel", "Chemicals Industry"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about "explicit-categories" or "explicit-subsectors" or even "explicitly-modelled-subsectors"?
@@ -13,36 +13,46 @@ validate(config, "./schema.yaml") | |||
|
|||
# Ensure rules are defined in order. | |||
# Otherwise commands like "rules.rulename.output" won't work! | |||
rule steel_industry: | |||
message: "Calculate energy demand for the 'Iron and steel' sector in JRC-IDEES." | |||
if "Iron and steel" in config["params"]["non-generic-categories"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition doesn't seem necessary to me. Instead of making the rule conditional, make the inputs to a rule downstream conditional.
So, if "iron and steel" is in the config, then a downstream rule will require the file f"{BUILD_PATH}/annual_demand_steel.nc"
. You will need that anyways, right? Would be good to see how this is integrated eventually.
path_output = f"{BUILD_PATH}/annual_demand_steel.nc" | ||
script: f"{SCRIPT_PATH}/steel_processing.py" | ||
|
||
if "Chemicals Industry" in config["params"]["non-generic-categories"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above.
output: f"{BUILD_PATH}/other_industry.csv" | ||
script: f"{SCRIPT_PATH}/other_industry.py" | ||
path_output = f"{BUILD_PATH}/annual_demand_generic.nc" | ||
script: f"{SCRIPT_PATH}/generic_processing.py" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Why not use "subsector" instead of "category"?
carrier_eff = carrier_tot["useful"] / carrier_tot["final"] | ||
|
||
# Fill NaNs (where there is demand, but no consumption in that country) | ||
# First by country avg. (all years), then by year avg. (all countries). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should have a ASSUME statement so we can find it.
}) | ||
|
||
# Prettify | ||
new_carrier_useful_dem = standardize(new_carrier_useful_dem, "twh") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be inlined with line 188.
jrc_prod = xr.open_dataarray(path_jrc_industry_production) | ||
|
||
# Remove data from all specifically processed industries | ||
cat_names_df = cat_names_df[~cat_names_df["jrc_idees"].isin(non_generic_categories)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to explicitly list the "generic_categories"/"generically_modelled_subsectors" instead of using all that are non_generic?
This would (1) document better which subscectors are included here, (2) safe-guard that list to possible changes in the list in the future.
other_final_demand = transform_final_demand_by_priority( | ||
jrc_energy, generic_config["final-energy-carriers"] | ||
) | ||
case "keep everything": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An assert statement would be helpful.
|
||
other_demand = jrc.standardize(other_demand, "twh") | ||
|
||
if path_output: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for? If it's needed, than the typt hint of the return type in the function signature must be updated.
Fixes #309 .
Adding "other" industry in the industry module.
#340 is a prerequisite for this PR.
Checklist
Any checks which are not relevant to the PR can be pre-checked by the PR creator. All others should be checked by the reviewer. You can add extra checklist items here if required by the PR.