-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "other" industry demand #355
Changes from 19 commits
a2b7d35
547f66d
3d4c3ca
0904879
1a02a58
bb9b95f
7acc8c8
368be03
657b50f
34f5c45
bffdfbf
16b5995
e23a4a4
dd446c2
e6068a0
7f0165c
67bcc6d
56306e0
de2ba7f
f734975
18fd2c9
1fdc6ea
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,5 +9,10 @@ industry: | |
placeholder-out1: | ||
placeholder-out2: | ||
params: | ||
steel: | ||
non-generic-categories: ["Iron and steel", "Chemicals Industry"] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm still concerned that these will cause confusion down the line. How about "discrete-categories" and "merged-categories"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what about "explicit-categories" or "explicit-subsectors" or even "explicitly-modelled-subsectors"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought of "explicit" initially too. I then had a look in the dictionary and decided it wasn't quite right (it's more about leaving nothing implied). I'm not sure that "discrete" is right either. "separate"? "independent"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmm... I also struggled with names. A good name has to do two things:
Here are words with the antonym, based on your comments.
I suggest "separate-category-processing" and "combined-category-processing". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I personally think "subsectors" is clearer, because "category" is generic. "category" is also an implementation detail: should we move to a different data source than JRC, that name would change. Anyways, "category" is fine by me. I like the "separate" and "combined" distinction! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @timtroendle : I also do not like category, but keeping the code close to the data helps in this case because JRC-IDEES is already hard to wrap your head around due to the amount of columns it has (sector/category/subcategory/process/energy type...) A different data source is going to require a different approach. This (and the JRC module above it) are just very tied to how the data was constructed. I think this is unavoidable for things as heterogeneous as industry. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, "Separate" / "combined" sounds good! |
||
steel-config: | ||
recycled-steel-share: 0.5 # % of recycled scrap steel for H-DRI | ||
generic-config: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "merged-categories-config"? |
||
final-energy-method: "by priority" | ||
final-energy-carriers: ["Electricity", "Natural gas (incl. biogas)", "Diesel oil (incl. biofuels)"] | ||
useful-demands: ["Low enthalpy heat"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,36 +13,46 @@ validate(config, "./schema.yaml") | |
|
||
# Ensure rules are defined in order. | ||
# Otherwise commands like "rules.rulename.output" won't work! | ||
rule steel_industry: | ||
message: "Calculate energy demand for the 'Iron and steel' sector in JRC-IDEES." | ||
if "Iron and steel" in config["params"]["non-generic-categories"]: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The condition doesn't seem necessary to me. Instead of making the rule conditional, make the inputs to a rule downstream conditional. So, if "iron and steel" is in the config, then a downstream rule will require the file |
||
rule steel_processing: | ||
message: "Calculate energy demand for the 'Iron and steel' sector in JRC-IDEES." | ||
conda: CONDA_PATH | ||
params: | ||
steel_config = config["params"]["steel-config"] | ||
input: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are using both the "filling.py" and "jrc_idees_parser.py" scripts in this rule and Snakemake isn't aware of these dependencies. You should add both scripts to the input of this file so that the rule gets triggered when they are changed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a way to tell snakemake to "monitor" a full folder? hard coding every-single utility file seems very brittle. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think yes, but you also don't want to monitor a full folder, right? I agree that linking each file individually isn't ideal, but that's all you can do. That's why we keep scripts as self-contained as possible. If necessary, we move things to the lib, which also circumvents this problem by providing updates through the environment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think moving things to On the other hand, making self-contained scripts bloats code and leads to repeated functionality. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will add them for now, but this approach worries me.
But that's for the future, maybe. |
||
path_energy_balances = config["inputs"]["path-energy-balances"], | ||
path_cat_names = config["inputs"]["path-cat-names"], | ||
path_carrier_names = config["inputs"]["path-carrier-names"], | ||
path_jrc_industry_energy = config["inputs"]["path-jrc-industry-energy"], | ||
path_jrc_industry_production = config["inputs"]["path-jrc-industry-production"], | ||
output: | ||
path_output = f"{BUILD_PATH}/annual_demand_steel.nc" | ||
script: f"{SCRIPT_PATH}/steel_processing.py" | ||
|
||
if "Chemicals Industry" in config["params"]["non-generic-categories"]: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As above. |
||
rule chemicals_processing: | ||
message: "." | ||
conda: CONDA_PATH | ||
params: | ||
input: | ||
output: | ||
script: f"{SCRIPT_PATH}/chemicals_processing.py" | ||
|
||
rule generic_processing: | ||
message: "Calculate energy demand for all other industry sectors in JRC-IDEES." | ||
brynpickering marked this conversation as resolved.
Show resolved
Hide resolved
|
||
conda: CONDA_PATH | ||
params: | ||
config_steel = config["params"]["steel"] | ||
non_generic_categories = config["params"]["non-generic-categories"], | ||
generic_config = config["params"]["generic-config"], | ||
input: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As above, this rule needs the scripts that are imported as "input" here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was added to the "combine" rule. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the sake of being explicit rather than implicit, I would add them here, too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, the "combine" rule is downstream of this, so this won't work, right? I would definitely make the dependencies explicit, then you do not even have to think about this at all. |
||
path_energy_balances = config["inputs"]["path-energy-balances"], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since inputs are all paths, I tend to prefer not prepending with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd like to keep There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but if all inputs hold strings...? How about |
||
path_cat_names = config["inputs"]["path-cat-names"], | ||
path_carrier_names = config["inputs"]["path-carrier-names"], | ||
path_jrc_industry_energy = config["inputs"]["path-jrc-industry-energy"], | ||
path_jrc_industry_production = config["inputs"]["path-jrc-industry-production"], | ||
output: | ||
path_output = f"{BUILD_PATH}/annual_demand_steel.nc" | ||
script: f"{SCRIPT_PATH}/steel_industry.py" | ||
|
||
rule chemical_industry: | ||
message: "." | ||
conda: CONDA_PATH | ||
params: | ||
input: | ||
output: | ||
script: f"{SCRIPT_PATH}/chemicals.py" | ||
|
||
rule other_industry: | ||
message: "." | ||
conda: CONDA_PATH | ||
params: | ||
input: | ||
output: f"{BUILD_PATH}/other_industry.csv" | ||
script: f"{SCRIPT_PATH}/other_industry.py" | ||
path_output = f"{BUILD_PATH}/annual_demand_generic.nc" | ||
script: f"{SCRIPT_PATH}/generic_processing.py" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "merged_category_processing.py"? I would always have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree. Why not use "subsector" instead of "category"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @irm-codebase is using "category" as it aligns with the JRC-IDEES naming convention. I don't really mind if it's subsector or category, so long as it remains the same throughout the whole submodule. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @timtroendle: Bryn is right on this one. Category is not a word we use often, but it makes sense int the context of JRC-IDEES data. I'd like to keep it that way. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will it become There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that would make sense! the idea is: |
||
|
||
# rule combine_and_scale: | ||
# message: "." | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All the requested name changes show here, too. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,142 @@ | ||
from typing import Optional | ||
|
||
import pandas as pd | ||
import xarray as xr | ||
from utils import filling | ||
from utils import jrc_idees_parser as jrc | ||
|
||
|
||
def get_generic_demand( | ||
non_generic_categories: list, | ||
generic_config: dict, | ||
path_energy_balances: str, | ||
path_cat_names: str, | ||
path_carrier_names: str, | ||
path_jrc_industry_energy: str, | ||
path_jrc_industry_production: str, | ||
path_output: Optional[str] = None, | ||
) -> xr.DataArray: | ||
"""Processing of industry categories not selected for individual processing. | ||
|
||
Merges all energy demand into a single `generic` category using a configurable data processing pipeline. | ||
|
||
Args: | ||
non_generic_categories (list): categories with separate processing (will be ignored). | ||
generic_config (dict): configuration for generic category processing. | ||
path_energy_balances (str): country energy balances (usually from eurostat). | ||
path_cat_names (str): eurostat category mapping file. | ||
path_carrier_names (str): eurostat carrier name mapping file. | ||
path_jrc_industry_energy (str): jrc country-specific industrial energy demand file. | ||
path_jrc_industry_production (str): jrc country-specific industrial production file. | ||
path_output (str): location of steel demand output file. | ||
|
||
Returns: | ||
pd.DataFrame: dataframe with industrial demand per country. | ||
""" | ||
# Load data | ||
energy_balances_df = pd.read_csv( | ||
path_energy_balances, index_col=[0, 1, 2, 3, 4] | ||
).squeeze("columns") | ||
cat_names_df = pd.read_csv(path_cat_names, header=0, index_col=0) | ||
carrier_names_df = pd.read_csv(path_carrier_names, header=0, index_col=0) | ||
jrc_energy = xr.open_dataset(path_jrc_industry_energy) | ||
jrc_prod = xr.open_dataarray(path_jrc_industry_production) | ||
|
||
# Remove data from all specifically processed industries | ||
cat_names_df = cat_names_df[~cat_names_df["jrc_idees"].isin(non_generic_categories)] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it make sense to explicitly list the "generic_categories"/"generically_modelled_subsectors" instead of using all that are non_generic? This would (1) document better which subscectors are included here, (2) safe-guard that list to possible changes in the list in the future. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd rather avoid that, because it introduces the risk of not processing an added category. I have my sights on this other dataset: https://iopscience.iop.org/article/10.1088/2753-3751/ad4e39 |
||
jrc_energy = jrc_energy.drop_sel(cat_name=non_generic_categories) | ||
jrc_prod = jrc_prod.drop_sel(cat_name=non_generic_categories) | ||
|
||
# Process data: | ||
# Extract useful dem. -> remove useful dem. from rest -> extract final dem. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. dem -> demand |
||
selected_useful = generic_config["useful-demands"] | ||
other_useful_demand = jrc.convert_subsection_demand_to_carrier( | ||
jrc_energy, selected_useful | ||
) | ||
|
||
final_method = generic_config["final-energy-method"] | ||
jrc_energy = jrc_energy.drop_sel(subsection=selected_useful) | ||
|
||
match final_method: | ||
case "by priority": | ||
other_final_demand = transform_final_demand_by_priority( | ||
jrc_energy, generic_config["final-energy-carriers"] | ||
) | ||
case "keep everything": | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assume this doesn't lead to any double counting? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An assert statement would be helpful. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @brynpickering it shouldn't. It's basically "assume nothing, do nothing with the data and just give me the final demand. If a useful demand is requested, the lines above should remove it. So no double counting is possible. |
||
other_final_demand = jrc_energy["final"].sum(["section", "subsection"]) | ||
other_final_demand = jrc.standardize(other_final_demand, "twh") | ||
case _: | ||
raise ValueError(f"Unsupported final energy method: {final_method}.") | ||
|
||
# Combine and fill missing countries | ||
other_demand = xr.concat( | ||
[other_useful_demand, other_final_demand], dim="carrier_name" | ||
) | ||
|
||
other_demand = filling.fill_missing_countries_years( | ||
energy_balances_df, cat_names_df, carrier_names_df, other_demand | ||
) | ||
|
||
other_demand = jrc.standardize(other_demand, "twh") | ||
|
||
if path_output: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is this for? If it's needed, than the typt hint of the return type in the function signature must be updated. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if you mean Otherwise, the function would always return a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I needed for testing purposes, I'd recommend to let the function return a dataset which is stored somewhere else, as discussed in the last dev call. Otherwise you increase complexity of the function signature. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a bit puzzled. That is already what this is doing: the I'll just remove it. |
||
other_demand.to_netcdf(path_output) | ||
|
||
return other_demand | ||
|
||
|
||
def transform_final_demand_by_priority( | ||
jrc_energy: xr.Dataset, carrier_priority: list[str] | ||
) -> xr.DataArray: | ||
"""Transform final demand of generic categories by giving priority to certain carriers. | ||
|
||
Steps: | ||
1. Assume that all demand that could consume a carrier will be met by said carrier. | ||
2. Drop overlapping consumption so that demand is met by carriers with the given priority. | ||
3. Combine. | ||
|
||
E.g., if carrier priority is [Electricity, Natural gas, Diesel] then: | ||
- Electricity: if met exclusively or otherwise, it's final electrical demand. | ||
- Natural gas: if met exclusively or otherwise, EXCEPT for overlapping cases with Electricity. | ||
- Diesel: if met exclusively or otherwise, EXCEPT for overlapping cases with all the above. | ||
|
||
Args: | ||
jrc_energy (xr.Dataset): JRC energy dataset. | ||
carrier_priority (list[str]): carriers to take in order of priority. | ||
|
||
Returns: | ||
xr.DataArray: dataset filled with demands for the given carriers. | ||
""" | ||
carrier_final_dem = {} | ||
|
||
for carrier in carrier_priority: | ||
dem_replaced = jrc.replace_final_demand_by_carrier(carrier, jrc_energy) | ||
dem_replaced = dem_replaced.to_dataframe().dropna() | ||
for dem_replaced_prev in carrier_final_dem.values(): | ||
dem_replaced = dem_replaced.drop(dem_replaced_prev.index, errors="ignore") | ||
carrier_final_dem[carrier] = dem_replaced | ||
|
||
for carrier, df in carrier_final_dem.items(): | ||
carrier_final_dem[carrier] = ( | ||
df["final"].to_xarray().assign_coords(carrier_name=carrier) | ||
) | ||
|
||
final_dem = xr.concat(carrier_final_dem.values(), dim="carrier_name") | ||
final_dem = final_dem.sum(["section", "subsection"]) | ||
|
||
final_dem = jrc.standardize(final_dem, "twh") | ||
|
||
return final_dem | ||
|
||
|
||
if __name__ == "__main__": | ||
get_generic_demand( | ||
non_generic_categories=snakemake.params.non_generic_categories, | ||
generic_config=snakemake.params.generic_config, | ||
path_energy_balances=snakemake.input.path_energy_balances, | ||
path_cat_names=snakemake.input.path_cat_names, | ||
path_carrier_names=snakemake.input.path_carrier_names, | ||
path_jrc_industry_energy=snakemake.input.path_jrc_industry_energy, | ||
path_jrc_industry_production=snakemake.input.path_jrc_industry_production, | ||
path_output=snakemake.output.path_output, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be "ADD industry module including steel and other industry energy demand"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated it with a simpler message. I'll add chemical industry once that is done.