Skip to content

Remove ProcessName column from consumption files#448

Merged
tsmbland merged 5 commits intodevelopfrom
process_name
Aug 8, 2024
Merged

Remove ProcessName column from consumption files#448
tsmbland merged 5 commits intodevelopfrom
process_name

Conversation

@tsmbland
Copy link
Copy Markdown
Collaborator

@tsmbland tsmbland commented Aug 8, 2024

Description

One of my biggest gripes with the input files was that when specifying consumption data you had to specify a process (with the "ProcessName" column). This gives the impression that demand is for a particular process, but this is NOT true. Demand is for the commodities, and is agnostic to the process. Even though process must be specified in the file, this information isn't used in the model. If consumption data is specified for multiple processes, this just ends up being summed. Same thing with preset supply data which shares the same reader function (although this isn't used in any of the examples)

I think it would be much clearer to remove this column from the consumption files (or at least not to mandate it). However, we still need things to work if the column is present.

The main changes I've made are to the read_csv_outputs function (which I've also renamed), taking the summing operation from the PresetSector class and moving it here, and then dropping the ProcessName column. This will now work with or without the ProcessName column, and I think it makes it clearer what's actually going on.

I've also updated the documentation accordingly.

Fixes #391

Type of change

Please add a line in the relevant section of
CHANGELOG.md to
document the change (include PR #) - note reverse order of PR #s.

  • New feature (non-breaking change which adds functionality)
  • Optimization (non-breaking, back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (whatever its nature)

Key checklist

  • All tests pass: $ python -m pytest
  • The documentation builds and looks OK: $ python -m sphinx -b html docs docs/build

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 8, 2024

Codecov Report

Attention: Patch coverage is 50.00000% with 5 lines in your changes missing coverage. Please review.

Project coverage is 71.28%. Comparing base (b2220a6) to head (679f673).

Files Patch % Lines
src/muse/readers/csv.py 50.00% 3 Missing and 1 partial ⚠️
src/muse/sectors/preset_sector.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #448      +/-   ##
===========================================
- Coverage    71.31%   71.28%   -0.04%     
===========================================
  Files           44       44              
  Lines         5881     5885       +4     
  Branches      1153     1154       +1     
===========================================
+ Hits          4194     4195       +1     
- Misses        1366     1369       +3     
  Partials       321      321              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tsmbland tsmbland marked this pull request as ready for review August 8, 2024 10:26
@tsmbland tsmbland requested a review from alexdewar August 8, 2024 10:26
Copy link
Copy Markdown
Member

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I would consider giving users a warning if they supply a ProcessName column, but other than that I think it's all good.

Comment thread src/muse/readers/csv.py
assert all(u in data.columns for u in indices)

# Legacy: drop ProcessName column and sum data (PR #448)
if "ProcessName" in data.columns:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about warning users that having a ProcessName column is deprecated here?

Comment thread src/muse/readers/csv.py
datas = {}
for path in allfiles:
data = pd.read_csv(path, low_memory=False)
assert all(u in data.columns for u in indices)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that assert statements aren't included in optimised Python bytecode. But I guess if one of these columns is missing an error will be raised below?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok I didn't know that, but yes, an error will be raised below

@tsmbland tsmbland enabled auto-merge August 8, 2024 13:00
@tsmbland tsmbland merged commit 0f1f518 into develop Aug 8, 2024
@tsmbland tsmbland deleted the process_name branch August 8, 2024 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

Remove ProcessName column from consumption files

2 participants