Remove ProcessName column from consumption files#448
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #448 +/- ##
===========================================
- Coverage 71.31% 71.28% -0.04%
===========================================
Files 44 44
Lines 5881 5885 +4
Branches 1153 1154 +1
===========================================
+ Hits 4194 4195 +1
- Misses 1366 1369 +3
Partials 321 321 ☔ View full report in Codecov by Sentry. |
alexdewar
left a comment
There was a problem hiding this comment.
LGTM. I would consider giving users a warning if they supply a ProcessName column, but other than that I think it's all good.
| assert all(u in data.columns for u in indices) | ||
|
|
||
| # Legacy: drop ProcessName column and sum data (PR #448) | ||
| if "ProcessName" in data.columns: |
There was a problem hiding this comment.
How about warning users that having a ProcessName column is deprecated here?
| datas = {} | ||
| for path in allfiles: | ||
| data = pd.read_csv(path, low_memory=False) | ||
| assert all(u in data.columns for u in indices) |
There was a problem hiding this comment.
Note that assert statements aren't included in optimised Python bytecode. But I guess if one of these columns is missing an error will be raised below?
There was a problem hiding this comment.
Ah ok I didn't know that, but yes, an error will be raised below
Description
One of my biggest gripes with the input files was that when specifying consumption data you had to specify a process (with the "ProcessName" column). This gives the impression that demand is for a particular process, but this is NOT true. Demand is for the commodities, and is agnostic to the process. Even though process must be specified in the file, this information isn't used in the model. If consumption data is specified for multiple processes, this just ends up being summed. Same thing with preset supply data which shares the same reader function (although this isn't used in any of the examples)
I think it would be much clearer to remove this column from the consumption files (or at least not to mandate it). However, we still need things to work if the column is present.
The main changes I've made are to the
read_csv_outputsfunction (which I've also renamed), taking the summing operation from thePresetSectorclass and moving it here, and then dropping the ProcessName column. This will now work with or without the ProcessName column, and I think it makes it clearer what's actually going on.I've also updated the documentation accordingly.
Fixes #391
Type of change
Please add a line in the relevant section of
CHANGELOG.md to
document the change (include PR #) - note reverse order of PR #s.
Key checklist
$ python -m pytest$ python -m sphinx -b html docs docs/buildFurther checks