Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "other" industry demand #355

Open
wants to merge 19 commits into
base: develop
Choose a base branch
from

Conversation

tud-mchen6
Copy link

Fixes #309 .

Adding "other" industry in the industry module.

#340 is a prerequisite for this PR.

Checklist

Any checks which are not relevant to the PR can be pre-checked by the PR creator. All others should be checked by the reviewer. You can add extra checklist items here if required by the PR.

  • CHANGELOG updated
  • Minimal workflow tests pass
  • Tests added to cover contribution (not relevant)
  • Documentation updated (not relevant)
  • Configuration schema updated (not relevant)

Copy link
Member

@brynpickering brynpickering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tud-mchen6 could you confirm that the only change in this PR compared to #340 is the chemicals_industry.py and other_industry.py files?

modules/industry/src/chemicals_industry.py Outdated Show resolved Hide resolved
modules/industry/src/chemicals_industry.py Outdated Show resolved Hide resolved
modules/industry/src/chemicals_industry.py Outdated Show resolved Hide resolved
modules/industry/src/chemicals_industry.py Outdated Show resolved Hide resolved
modules/industry/src/chemicals_industry.py Outdated Show resolved Hide resolved
Copy link
Member

@brynpickering brynpickering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realised that the PR title mentions only "other", so feel free to ignore my "chemical_industry.py" comments.

jrc_prod_df = jrc_prod_df.drop(specific_industries, level="cat_name")

# -------------------------------------------------------------------------
# Process data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would split out these main commented steps into separate functions

# if it can be met by electricity (exclusively or otherwise),
# then it's an end-use electricity demand
electrical_consumption = (
jrc.get_carrier_demand("Electricity", demand, jrc_energy_df)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each of the named carriers ("Electricity", "Natural gas (incl.biogas)", etc.), I'd move them to constants at the start of the file so they're easier to see grouped together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved these to the configuration. Fully flexible now!

"subsector", level="cat_name"
)
all_other_consumption_filled = all_other_consumption_filled.stack()
breakpoint()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover from debugging

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@tud-mchen6
Copy link
Author

@tud-mchen6 could you confirm that the only change in this PR compared to #340 is the chemicals_industry.py and other_industry.py files?

If compared to the commit of #340 at the same time point, the only change is indeed that other_industry.py is added and relevant rule is added in the industry.smk file.

Now the branch of #340 has developed quite far (e.g. integrating the JRC data processing script), and these branches need to be merged with some effort. However, #340 has never developed the functionality of chemical industry or "other" industry, so functionality wise they are still separated.

@irm-codebase irm-codebase added the Industry Industrial energy demand label Apr 23, 2024
@irm-codebase
Copy link
Contributor

This PR now integrates the JRC module code and xarray processing.
Also, I've added some quality of life processing for "other industries":

  • You can now "turn off" the processing of specific sectors via the configuration. All non specific sectors will be parsed through the "other" rule.
  • You can now select which carriers to extract at a final energy level and at an end-use level.
  • You can select the method of extraction. For now only "priority" is available, which reflects how SCEC does it.

@brynpickering
Copy link
Member

@irm-codebase ready to rebase onto develop now that #340 is merged in!

Copy link
Contributor

@irm-codebase irm-codebase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brynpickering ready for review!

CHE pre-processing was leading to some funky issues, so I've added comments on how I dealt with it.

@@ -90,8 +81,11 @@ def fill_missing_countries_years(
_to_fill = _to_fill.bfill(dim="year")
all_filled = _to_fill.ffill(dim="year")

all_filled = jrc.ensure_standard_coordinates(all_filled)
all_filled = all_filled.assign_attrs(units="twh")
# TODO: CHE has no values for "Wood and wood products" and "Transport Equipment".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHE was triggering assert failures due to some missing data. For now I am just assuming those values are 0 (same as SCEC).

This is mostly because the CHE processing above this module does not seem to provide data for these sectors, meaning they are filled with nan in all years, so none of our filling methods work.

Let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we don't have much choice on that. I assume they are in "other industry" in the CHE data so we can't extract them.

Copy link
Member

@brynpickering brynpickering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some minor changes to make. I'll go through it again tomorrow by running the code and checking the results at different points. Would be good if a data consistency check existed somewhere, i.e. that no energy demand is lost / added.

@@ -9,5 +9,10 @@ industry:
placeholder-out1:
placeholder-out2:
params:
specific-industries: ["Iron and steel", "Chemicals Industry"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename this param. "specific" isn't very descriptive and "industries" might be better as "subsectors". subsectors-to-decarbonise, subsectors-to-electrify, electrified-subsectors, ... ? I don't like any of those particularly but maybe they can trigger a better idea 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separated-subsectors, subsectors-to-process-individually?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to non-generic-categories and changed other to generic-config.
Hopefully this makes processing clearer.

output:
path_output = f"{BUILD_PATH}/annual_demand_steel.nc"
script: f"{SCRIPT_PATH}/steel_industry.py"
if "Iron and steel" in config["params"]["specific-industries"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the best way to add this conditionality? @timtroendle maybe you can comment.

The other approach would be to use a conditional list as inputs in a later rule, e.g.:

rule merge_industry_demands:
  input:
    specific_industries = expand(f"{BUILD_PATH}/annual_demand_{subsector}.nc", subsector=subsector_translator(config["params"]["specific-industries"]))

where subsector_translator is a helper function to map e.g. Steel and Iron to steel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I have something useful to say as I am not fully aware of the context. I wonder: Why would steel industry be excluded? Is there a use-case for that? If not, then there is no conditionality needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not a "specific" industry then it gets automatically lumped in as "other". So it's possible to overhaul the steel industry to decarbonise feedstocks or to just pipe all demands without converting any processes

Copy link
Contributor

@irm-codebase irm-codebase Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the best way to add this conditionality? @timtroendle maybe you can comment.

The other approach would be to use a conditional list as inputs in a later rule, e.g.:

rule merge_industry_demands:
  input:
    specific_industries = expand(f"{BUILD_PATH}/annual_demand_{subsector}.nc", subsector=subsector_translator(config["params"]["specific-industries"]))

where subsector_translator is a helper function to map e.g. Steel and Iron to steel.

I like this approach because it's pretty easy to follow and does not add much complexity.
For now I'll keep it as-is since the merging step needs development (SCEC also has a scaling step there).

modules/industry/industry.smk Show resolved Hide resolved
conda: CONDA_PATH
params:
config_params = config["params"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a moment to separate out params. Pass specific-industries and other to the script individually so that e.g. steel param changes don't re-trigger this rule.

input:
output: f"{BUILD_PATH}/other_industry.csv"
path_energy_balances = config["inputs"]["path-energy-balances"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since inputs are all paths, I tend to prefer not prepending with path_ here and path- in the config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep path to avoid confusion between variables holding data and those holding strings.
A bit more explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if all inputs hold strings...? How about config["input-paths"]["energy-balances"] etc?

path_jrc_industry_production: str,
path_output: Optional[str] = None,
) -> xr.DataArray:
"""Execute the default data processing pipeline all non-specific industries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Execute the default data processing pipeline all non-specific industries.
"""Merge all industries not selected for individual processing into a single `other` subsector using a default data processing pipeline.

# Process data:
# Extract useful dem. -> remove useful dem. from rest -> extract final dem.
selected_useful = config_params["other"]["useful-demands"]
other_useful_demand = jrc.convert_subsec_demand_to_carrier(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subsec -> subsector. Not worth the lack of readibility caused by removal of letters


final_method = config_params["other"]["final-energy-method"]
jrc_energy = jrc_energy.drop_sel(subsection=selected_useful)
if final_method == "priority":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other methods on the nearterm horizon? If not, it doesn't seem worth introducing this feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added another method that just keeps all the final demand without assumptions. There are probably better methods to process this part, but these two are good enough for now.

I changed the check to a match-case style so this can grow over time.

)

# Fix the naming
for carrier in JRC_TO_CALLIOPE:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite verbose. Something like:

other_demand.coords["carrier_name"] = other_demand["carrier_name"].to_series().rename(index=JRC_TO_CALLIOPE).index

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've actually removed the renaming step from this file. It makes more sense to do it once we merge all category files into one and re-scale it (if necessary).

def transform_final_demand_by_priority(
jrc_energy: xr.Dataset, carrier_priority: list[str]
) -> xr.DataArray:
"""Transform final demand of all sectors by giving priority to certain carriers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subsectors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "category" (see prev. comment).

@irm-codebase
Copy link
Contributor

@brynpickering I've implemented your comments, and a couple of extras.

The biggest updates are:

  • improved names of stuff to reduce ambiguity
  • standardized naming from "sector" to "category" to match how JRC names things
  • carrier names are no longer modified (better do it when aggregating/rescaling files)
  • added an additional processing option for final energy demand

Copy link
Member

@brynpickering brynpickering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm almost happy for this one to go through! Just a couple of minor comments on naming and unit checking.

@@ -9,5 +9,10 @@ industry:
placeholder-out1:
placeholder-out2:
params:
steel:
non-generic-categories: ["Iron and steel", "Chemicals Industry"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still concerned that these will cause confusion down the line. How about "discrete-categories" and "merged-categories"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about "explicit-categories" or "explicit-subsectors" or even "explicitly-modelled-subsectors"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of "explicit" initially too. I then had a look in the dictionary and decided it wasn't quite right (it's more about leaving nothing implied). I'm not sure that "discrete" is right either. "separate"? "independent"?

Copy link
Contributor

@irm-codebase irm-codebase Jun 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... I also struggled with names. A good name has to do two things:

  • Specify that these are specific / individual to a given category.
  • Imply that all the rests will go to the "other" / generic / merged.

Here are words with the antonym, based on your comments.

  • Specific / generic categories
  • Separate / combined categories < --- I like this one the best.
  • Independent / dependent categories
  • Explicit / implicit categories
  • Discrete / joined categories

I suggest "separate-category-processing" and "combined-category-processing".

recycled-steel-share: 0.5 # % of recycled scrap steel for H-DRI
generic-config:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"merged-categories-config"?

input:
output: f"{BUILD_PATH}/other_industry.csv"
path_energy_balances = config["inputs"]["path-energy-balances"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if all inputs hold strings...? How about config["input-paths"]["energy-balances"] etc?

output: f"{BUILD_PATH}/other_industry.csv"
script: f"{SCRIPT_PATH}/other_industry.py"
path_output = f"{BUILD_PATH}/annual_demand_generic.nc"
script: f"{SCRIPT_PATH}/generic_processing.py"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"merged_category_processing.py"? I would always have category added to whatever adjective we choose!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Why not use "subsector" instead of "category"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@irm-codebase is using "category" as it aligns with the JRC-IDEES naming convention. I don't really mind if it's subsector or category, so long as it remains the same throughout the whole submodule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timtroendle: Bryn is right on this one. Category is not a word we use often, but it makes sense int the context of JRC-IDEES data. I'd like to keep it that way.

jrc_prod = jrc_prod.drop_sel(cat_name=non_generic_categories)

# Process data:
# Extract useful dem. -> remove useful dem. from rest -> extract final dem.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dem -> demand

other_final_demand = transform_final_demand_by_priority(
jrc_energy, generic_config["final-energy-carriers"]
)
case "keep everything":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this doesn't lead to any double counting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An assert statement would be helpful.

@@ -90,8 +81,11 @@ def fill_missing_countries_years(
_to_fill = _to_fill.bfill(dim="year")
all_filled = _to_fill.ffill(dim="year")

all_filled = jrc.ensure_standard_coordinates(all_filled)
all_filled = all_filled.assign_attrs(units="twh")
# TODO: CHE has no values for "Wood and wood products" and "Transport Equipment".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we don't have much choice on that. I assume they are in "other industry" in the CHE data so we can't extract them.

@@ -64,16 +92,15 @@ def get_subsection_final_intensity(
final_intensity = useful_intensity / carrier_eff

# Prettify
final_intensity = ensure_standard_coordinates(final_intensity)
final_intensity = final_intensity.assign_attrs(units="twh/kt")
final_intensity = standardize(final_intensity, "twh/kt", name="final_intensity")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth checking the unit of the incoming data before setting this to twh/kt. I.e., useful_demand should have the unit twh.

useful_intensity = ensure_standard_coordinates(useful_intensity)
useful_intensity = useful_intensity.assign_attrs(units="twh/kt")
useful_intensity.name = "useful_intensity"
useful_intensity = standardize(useful_intensity, "twh/kt", name="useful_intensity")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above RE checking unit

Copy link
Member

@timtroendle timtroendle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall. I have a few minor comments.

@@ -4,7 +4,7 @@

### Added (models)

* **ADD** industry module and steel industry energy demand processing. NOT CONNECTED TO THE MAIN WORKFLOW. Industry sectors pending: chemical, "other". (Fixes #308, #310, #347, #345 and #346)
* **ADD** industry module and steel industry energy demand processing. NOT CONNECTED TO THE MAIN WORKFLOW. Industry sectors pending: chemical. (Fixes #308, #309, #310, #347, #345 and #346)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "ADD industry module including steel and other industry energy demand"?

@@ -9,5 +9,10 @@ industry:
placeholder-out1:
placeholder-out2:
params:
steel:
non-generic-categories: ["Iron and steel", "Chemicals Industry"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about "explicit-categories" or "explicit-subsectors" or even "explicitly-modelled-subsectors"?

@@ -13,36 +13,46 @@ validate(config, "./schema.yaml")

# Ensure rules are defined in order.
# Otherwise commands like "rules.rulename.output" won't work!
rule steel_industry:
message: "Calculate energy demand for the 'Iron and steel' sector in JRC-IDEES."
if "Iron and steel" in config["params"]["non-generic-categories"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition doesn't seem necessary to me. Instead of making the rule conditional, make the inputs to a rule downstream conditional.

So, if "iron and steel" is in the config, then a downstream rule will require the file f"{BUILD_PATH}/annual_demand_steel.nc". You will need that anyways, right? Would be good to see how this is integrated eventually.

path_output = f"{BUILD_PATH}/annual_demand_steel.nc"
script: f"{SCRIPT_PATH}/steel_processing.py"

if "Chemicals Industry" in config["params"]["non-generic-categories"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

output: f"{BUILD_PATH}/other_industry.csv"
script: f"{SCRIPT_PATH}/other_industry.py"
path_output = f"{BUILD_PATH}/annual_demand_generic.nc"
script: f"{SCRIPT_PATH}/generic_processing.py"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Why not use "subsector" instead of "category"?

carrier_eff = carrier_tot["useful"] / carrier_tot["final"]

# Fill NaNs (where there is demand, but no consumption in that country)
# First by country avg. (all years), then by year avg. (all countries).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should have a ASSUME statement so we can find it.

})

# Prettify
new_carrier_useful_dem = standardize(new_carrier_useful_dem, "twh")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be inlined with line 188.

jrc_prod = xr.open_dataarray(path_jrc_industry_production)

# Remove data from all specifically processed industries
cat_names_df = cat_names_df[~cat_names_df["jrc_idees"].isin(non_generic_categories)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to explicitly list the "generic_categories"/"generically_modelled_subsectors" instead of using all that are non_generic?

This would (1) document better which subscectors are included here, (2) safe-guard that list to possible changes in the list in the future.

other_final_demand = transform_final_demand_by_priority(
jrc_energy, generic_config["final-energy-carriers"]
)
case "keep everything":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An assert statement would be helpful.


other_demand = jrc.standardize(other_demand, "twh")

if path_output:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for? If it's needed, than the typt hint of the return type in the function signature must be updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Industry Industrial energy demand
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Get demand for "other" industry and process it
4 participants