Document input format by alexdewar · Pull Request #588 · EnergySystemsModellingLab/MUSE2

alexdewar · 2025-06-02T10:33:07Z

Description

I started working on #530 and realised it probably made more sense to just document the input format rather than attempting to describe each of the (many) small checks we're doing in prose form, so that's what I've done here.

Rather than manually writing a bunch of markdown tables, I thought it would be easier to generate them from some source files. I've used the table schema format to document the CSV files, which is published by the people who make the frictionless Python framework. The nice thing about using schemas is that we could also use them to validate the data, which would tell us whether we've forgotten to document any fields or if the types have changed etc. (NB: This would be purely for documentation -- the Rust code already validates the input data perfectly well.)

There wasn't a tool to produce documentation from table schemas already, so I knocked together a script. It's a bit rough around the edges -- in retrospect, I wish I'd used jinja with a template instead -- but I figure it's probably fine for now. We can reuse it with some tweaks when we come to documenting the output format (#529).

I haven't documented model.toml yet, but we could adopt a similar approach there. frictionless doesn't support TOML directly, but we could just use a JSON schema, which is more or less the same thing.

Closes #530.

Type of change

Bug fix (non-breaking change to fix an issue)
New feature (non-breaking change to add functionality)
Refactoring (non-breaking, non-functional change to improve maintainability)
Optimization (non-breaking change to speed up the code)
Breaking change (whatever its nature)
Documentation (improve or add documentation)

Key checklist

All tests pass: $ cargo test
The documentation builds and looks OK: $ cargo doc

Further checks

Code is commented, particularly in hard-to-understand areas
Tests added that prove fix is effective or that feature works

codecov · 2025-06-02T10:34:09Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.79%. Comparing base (6432176) to head (58cec12).
Report is 86 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #588      +/-   ##
==========================================
- Coverage   89.47%   84.79%   -4.69%     
==========================================
  Files          37       37              
  Lines        3544     3301     -243     
  Branches     3544     3301     -243     
==========================================
- Hits         3171     2799     -372     
- Misses        179      313     +134     
+ Partials      194      189       -5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull Request Overview

This PR documents the input file format by generating markdown tables from YAML table schemas. Key changes include the addition of new YAML schema files for various input files, improvements to the documentation generation script, and updates to the GitHub action to integrate the new documentation process.

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
schemas/input/regions.yaml	Added basic region schema for defining regions.
schemas/input/processes.yaml	Added main processes schema with field definitions.
schemas/input/process_parameters.yaml	Added process parameters schema with details on process inputs.
schemas/input/process_flows.yaml	Added schema for commodity flows of each process.
schemas/input/process_availabilities.yaml	Added schema for process availabilities.
schemas/input/demand_slicing.yaml	Added schema for annual demand slicing details.
schemas/input/demand.yaml	Added schema for service demand commodity entries.
schemas/input/commodity_costs.yaml	Added schema for commodity cost definitions.
schemas/input/commodities.yaml	Added schema for commodities.
schemas/input/assets.yaml	Added schema for asset definitions.
schemas/input/agents.yaml	Added schema for agent definitions.
schemas/input/agent_search_space.yaml	Added schema for agent search space definitions.
schemas/input/agent_objectives.yaml	Added schema for agent objectives with decision rule details.
schemas/input/agent_cost_limits.yaml	Added schema for agent cost limits.
schemas/input/agent_commodity_portions.yaml	Added schema for commodity demand portions per agent.
docs/input_format.md	New documentation page for the input format.
docs/generate_input_format_doc.py	Script to auto-generate input format documentation from schemas.
docs/SUMMARY.md	Updated to include the Input Format documentation link.
doc-requirements.txt	Added table2md dependency requirement.
.github/actions/generate-docs/action.yml	Updated GitHub action to install deps and generate input docs.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

tsmbland · 2025-06-09T08:06:09Z

I managed to create input_format.md, but when I run mdbook serve I get the following error:

2025-06-09 08:59:25 [ERROR] (mdbook::utils): Error: Unable to read "Input Format" (C:\Users\tbland\Documents\Code\MUSE_2.0\docs\./input_format.md)
2025-06-09 08:59:25 [ERROR] (mdbook::utils):    Caused By: stream did not contain valid UTF-8

Maybe a Windows thing?

Also, we should either add this file to gitignore or commit it - what do you think?

tsmbland · 2025-06-09T08:22:54Z

+
+
+if __name__ == "__main__":
+    print(generate_markdown(), end="")


Suggested change

print(generate_markdown(), end="")

output_path = _DOCS_DIR / "input_format.md"

output_path.write_text(generate_markdown(), encoding="utf-8")

Any reason not to do this? (fixes the utf-8 problem for me on Windows, and simpler just to run python generate_input_format_doc.py)

Not really. I think that's a cleaner way of doing things.

tsmbland · 2025-06-09T08:31:44Z

+    with path.open() as f:
+        data = yaml.safe_load(f)
+
+    info = data["title"]


The "{title}. {description}" format read really weirdly for most files. e.g.

"Commodity demand portions for agents. Portions of commodity demand for which agents are responsible."

I would say we don't need a title field for the files, just go straight to the description

Actually, I would split this into "Description" and "Notes", as also suggested for the individual fields, and put the notes underneath the table

tsmbland

I think we can refine this as we go along, but as a starting point this is awesome. Good job!

tsmbland · 2025-06-09T08:34:44Z

+def fields2table(fields: list[dict[str, str]]) -> str:
+    data = [
+        {
+            "Field": f"`{f['name']}`",


I would change the column titles to "Field", "Description" and "Notes"

tsmbland · 2025-06-09T08:38:38Z

+    with path.open() as f:
+        data = yaml.safe_load(f)
+
+    info = data["title"]


Actually, I would split this into "Description" and "Notes", as also suggested for the individual fields, and put the notes underneath the table

tsmbland · 2025-06-09T08:44:16Z

+description: |
+  Defines processes for the system.
+
+  Every SED (supply equals demand) commodity must have both producer and consumer processes for


Comments like this are hard to know where to place, as they involve multiple tables. I would say this probably makes more sense in the commodities file, but I can see why you would include it here. Alternatively, we could have a separate file documenting global validation checks. What do you think?

That's a good point. I put it here kind of mindlessly because the relevant check is in with process-related code. I think we should probably just move it to commodities.yaml.

We might want to have somewhere to mention global checks. I'd like to turn the document into a jinja template at some point and then that'll make adding a preamble with info about global checks in a bit easier.

alexdewar · 2025-06-09T10:36:17Z

I managed to create input_format.md, but when I run mdbook serve I get the following error:
2025-06-09 08:59:25 [ERROR] (mdbook::utils): Error: Unable to read "Input Format" (C:\Users\tbland\Documents\Code\MUSE_2.0\docs\./input_format.md)
2025-06-09 08:59:25 [ERROR] (mdbook::utils):    Caused By: stream did not contain valid UTF-8
Maybe a Windows thing?

Ah, yes. It's because the default file encoding for Python on Windows isn't UTF-8 for historical reasons (which is a constant source of annoyance). Maybe we should just write directly to the file, as you suggest.

Also, we should either add this file to gitignore or commit it - what do you think?

I'm thinking the gitignore route would be cleaner, otherwise the committed file will be perpetually out of date. I stuck a placeholder file in docs for it as I did for the (also generated) command-line help, but maybe we should just have them both in gitignore instead. Mdbook doesn't mind if the files don't exist.

alexdewar added 10 commits June 2, 2025 10:34

.gitignore: Ignore ignorable Python-related files

ef8b0d9

Add script to generate documentation for input format from schemas

0d5258d

Add placeholder for input_format.md

2f91314

CI: Generate documentation for input format

10bbd51

Document time_slices.csv

524e584

Document regions.csv

60108d8

Document agents-related files

303bb90

Document assets.csv

d035839

Document commodity-related files

981df49

Document process-related files

7d11895

alexdewar marked this pull request as ready for review June 2, 2025 10:43

alexdewar requested review from Copilot and tsmbland June 2, 2025 10:43

Copilot AI reviewed Jun 2, 2025

View reviewed changes

Comment thread schemas/input/processes.yaml Outdated

Comment thread docs/generate_input_format_doc.py Outdated

alexdewar and others added 3 commits June 2, 2025 11:45

Fix: end_year must be >= start_year

db6cb8c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Use yaml.safe_load

8b36937

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

user_guide.md: Link to command-line help

5d84a36

tsmbland reviewed Jun 9, 2025

View reviewed changes

tsmbland approved these changes Jun 9, 2025

View reviewed changes

alexdewar added 5 commits June 9, 2025 11:46

Add generated doc files to gitignore

7a8c5a9

Write directly to file cf. stdout

f66fd43

Fix filename in output file

a50e2a1

Put descriptions in separate "notes" sections

acba455

Move notes about SED and SVD commodities to commodities.yaml

cc226b8

alexdewar enabled auto-merge June 10, 2025 09:10

alexdewar disabled auto-merge June 10, 2025 09:11

Fix CI workflow for generating input format doc

58cec12

alexdewar enabled auto-merge June 10, 2025 09:11

alexdewar merged commit bbef385 into main Jun 10, 2025
7 checks passed

alexdewar deleted the document-input-format branch June 10, 2025 09:17



		if __name__ == "__main__":
		print(generate_markdown(), end="")

	print(generate_markdown(), end="")
	output_path = _DOCS_DIR / "input_format.md"
	output_path.write_text(generate_markdown(), encoding="utf-8")

Conversation

alexdewar commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Key checklist

Further checks

Uh oh!

codecov Bot commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

tsmbland commented Jun 9, 2025

Uh oh!

tsmbland Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

alexdewar Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

tsmbland Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

tsmbland Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

tsmbland left a comment

Choose a reason for hiding this comment

Uh oh!

tsmbland Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

tsmbland Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

tsmbland Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexdewar Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

alexdewar commented Jun 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alexdewar commented Jun 2, 2025 •

edited

Loading

codecov Bot commented Jun 2, 2025 •

edited

Loading

tsmbland Jun 9, 2025 •

edited

Loading