Skip to content

Document input format#588

Merged
alexdewar merged 19 commits intomainfrom
document-input-format
Jun 10, 2025
Merged

Document input format#588
alexdewar merged 19 commits intomainfrom
document-input-format

Conversation

@alexdewar
Copy link
Copy Markdown
Collaborator

@alexdewar alexdewar commented Jun 2, 2025

Description

I started working on #530 and realised it probably made more sense to just document the input format rather than attempting to describe each of the (many) small checks we're doing in prose form, so that's what I've done here.

Rather than manually writing a bunch of markdown tables, I thought it would be easier to generate them from some source files. I've used the table schema format to document the CSV files, which is published by the people who make the frictionless Python framework. The nice thing about using schemas is that we could also use them to validate the data, which would tell us whether we've forgotten to document any fields or if the types have changed etc. (NB: This would be purely for documentation -- the Rust code already validates the input data perfectly well.)

There wasn't a tool to produce documentation from table schemas already, so I knocked together a script. It's a bit rough around the edges -- in retrospect, I wish I'd used jinja with a template instead -- but I figure it's probably fine for now. We can reuse it with some tweaks when we come to documenting the output format (#529).

I haven't documented model.toml yet, but we could adopt a similar approach there. frictionless doesn't support TOML directly, but we could just use a JSON schema, which is more or less the same thing.

Closes #530.

Type of change

  • Bug fix (non-breaking change to fix an issue)
  • New feature (non-breaking change to add functionality)
  • Refactoring (non-breaking, non-functional change to improve maintainability)
  • Optimization (non-breaking change to speed up the code)
  • Breaking change (whatever its nature)
  • Documentation (improve or add documentation)

Key checklist

  • All tests pass: $ cargo test
  • The documentation builds and looks OK: $ cargo doc

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.79%. Comparing base (6432176) to head (58cec12).
Report is 86 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #588      +/-   ##
==========================================
- Coverage   89.47%   84.79%   -4.69%     
==========================================
  Files          37       37              
  Lines        3544     3301     -243     
  Branches     3544     3301     -243     
==========================================
- Hits         3171     2799     -372     
- Misses        179      313     +134     
+ Partials      194      189       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@alexdewar alexdewar marked this pull request as ready for review June 2, 2025 10:43
@alexdewar alexdewar requested review from Copilot and tsmbland June 2, 2025 10:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR documents the input file format by generating markdown tables from YAML table schemas. Key changes include the addition of new YAML schema files for various input files, improvements to the documentation generation script, and updates to the GitHub action to integrate the new documentation process.

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
schemas/input/regions.yaml Added basic region schema for defining regions.
schemas/input/processes.yaml Added main processes schema with field definitions.
schemas/input/process_parameters.yaml Added process parameters schema with details on process inputs.
schemas/input/process_flows.yaml Added schema for commodity flows of each process.
schemas/input/process_availabilities.yaml Added schema for process availabilities.
schemas/input/demand_slicing.yaml Added schema for annual demand slicing details.
schemas/input/demand.yaml Added schema for service demand commodity entries.
schemas/input/commodity_costs.yaml Added schema for commodity cost definitions.
schemas/input/commodities.yaml Added schema for commodities.
schemas/input/assets.yaml Added schema for asset definitions.
schemas/input/agents.yaml Added schema for agent definitions.
schemas/input/agent_search_space.yaml Added schema for agent search space definitions.
schemas/input/agent_objectives.yaml Added schema for agent objectives with decision rule details.
schemas/input/agent_cost_limits.yaml Added schema for agent cost limits.
schemas/input/agent_commodity_portions.yaml Added schema for commodity demand portions per agent.
docs/input_format.md New documentation page for the input format.
docs/generate_input_format_doc.py Script to auto-generate input format documentation from schemas.
docs/SUMMARY.md Updated to include the Input Format documentation link.
doc-requirements.txt Added table2md dependency requirement.
.github/actions/generate-docs/action.yml Updated GitHub action to install deps and generate input docs.

Comment thread schemas/input/processes.yaml Outdated
Comment thread docs/generate_input_format_doc.py Outdated
alexdewar and others added 3 commits June 2, 2025 11:45
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@tsmbland
Copy link
Copy Markdown
Collaborator

tsmbland commented Jun 9, 2025

I managed to create input_format.md, but when I run mdbook serve I get the following error:

2025-06-09 08:59:25 [ERROR] (mdbook::utils): Error: Unable to read "Input Format" (C:\Users\tbland\Documents\Code\MUSE_2.0\docs\./input_format.md)
2025-06-09 08:59:25 [ERROR] (mdbook::utils):    Caused By: stream did not contain valid UTF-8

Maybe a Windows thing?

Also, we should either add this file to gitignore or commit it - what do you think?

Comment thread docs/generate_input_format_doc.py Outdated


if __name__ == "__main__":
print(generate_markdown(), end="")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(generate_markdown(), end="")
output_path = _DOCS_DIR / "input_format.md"
output_path.write_text(generate_markdown(), encoding="utf-8")

Any reason not to do this? (fixes the utf-8 problem for me on Windows, and simpler just to run python generate_input_format_doc.py)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. I think that's a cleaner way of doing things.

Comment thread docs/generate_input_format_doc.py Outdated
with path.open() as f:
data = yaml.safe_load(f)

info = data["title"]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "{title}. {description}" format read really weirdly for most files. e.g.

"Commodity demand portions for agents. Portions of commodity demand for which agents are responsible."

I would say we don't need a title field for the files, just go straight to the description

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I would split this into "Description" and "Notes", as also suggested for the individual fields, and put the notes underneath the table

Copy link
Copy Markdown
Collaborator

@tsmbland tsmbland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can refine this as we go along, but as a starting point this is awesome. Good job!

def fields2table(fields: list[dict[str, str]]) -> str:
data = [
{
"Field": f"`{f['name']}`",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change the column titles to "Field", "Description" and "Notes"

Comment thread docs/generate_input_format_doc.py Outdated
with path.open() as f:
data = yaml.safe_load(f)

info = data["title"]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I would split this into "Description" and "Notes", as also suggested for the individual fields, and put the notes underneath the table

Comment thread schemas/input/processes.yaml Outdated
description: |
Defines processes for the system.

Every SED (supply equals demand) commodity must have both producer and consumer processes for
Copy link
Copy Markdown
Collaborator

@tsmbland tsmbland Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments like this are hard to know where to place, as they involve multiple tables. I would say this probably makes more sense in the commodities file, but I can see why you would include it here. Alternatively, we could have a separate file documenting global validation checks. What do you think?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I put it here kind of mindlessly because the relevant check is in with process-related code. I think we should probably just move it to commodities.yaml.

We might want to have somewhere to mention global checks. I'd like to turn the document into a jinja template at some point and then that'll make adding a preamble with info about global checks in a bit easier.

@alexdewar
Copy link
Copy Markdown
Collaborator Author

I managed to create input_format.md, but when I run mdbook serve I get the following error:

2025-06-09 08:59:25 [ERROR] (mdbook::utils): Error: Unable to read "Input Format" (C:\Users\tbland\Documents\Code\MUSE_2.0\docs\./input_format.md)
2025-06-09 08:59:25 [ERROR] (mdbook::utils):    Caused By: stream did not contain valid UTF-8

Maybe a Windows thing?

Ah, yes. It's because the default file encoding for Python on Windows isn't UTF-8 for historical reasons (which is a constant source of annoyance). Maybe we should just write directly to the file, as you suggest.

Also, we should either add this file to gitignore or commit it - what do you think?

I'm thinking the gitignore route would be cleaner, otherwise the committed file will be perpetually out of date. I stuck a placeholder file in docs for it as I did for the (also generated) command-line help, but maybe we should just have them both in gitignore instead. Mdbook doesn't mind if the files don't exist.

@alexdewar alexdewar enabled auto-merge June 10, 2025 09:10
@alexdewar alexdewar disabled auto-merge June 10, 2025 09:11
@alexdewar alexdewar enabled auto-merge June 10, 2025 09:11
@alexdewar alexdewar merged commit bbef385 into main Jun 10, 2025
7 checks passed
@alexdewar alexdewar deleted the document-input-format branch June 10, 2025 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document the list of validation checks that are performed

3 participants