Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix random state for kmean clustering #313

Merged
merged 2 commits into from
Feb 10, 2022
Merged

Conversation

koen-vg
Copy link
Contributor

@koen-vg koen-vg commented Feb 10, 2022

When the kmeans algorithm is used to cluster networks, this is not deterministic by default. The result is that repeated runs of the simplify_network and cluster_network rules can and usually do produce different results that vary randomly. This makes results less reproducible when given only a pypsa-eur configuration file.

Changes proposed in this Pull Request

The fix is to supply a fixed random state to the k-means algorithm.

It might be considered if this should be done PyPSA itself, in the busmap_by_kmeans functions in networkclustering.py. That's an equally valid option, but maybe a bit opinionated.

I have not checked in a very rigorous way if there are any other sources of randomness in the network building process, but I don't think so.

Checklist

  • I tested my contribution locally and it seems to work fine.
  • N/A Code and workflow changes are sufficiently documented.
  • N/A Newly introduced dependencies are added to envs/environment.yaml and envs/environment.docs.yaml.
  • N/A Changes in configuration options are added in all of config.default.yaml, config.tutorial.yaml, and test/config.test1.yaml.
  • N/A Changes in configuration options are also documented in doc/configtables/*.csv and line references are adjusted in doc/configuration.rst and doc/tutorial.rst.
  • A note for the release notes doc/release_notes.rst is amended in the format of previous release notes.

When the kmeans algorithm is used to cluster networks, this is not
deterministic by default. The result is that repeated runs of the
`simplify_network` and `cluster_network` rules can and usually do
produce different results that vary somewhat randomly. This makes
results less reproducible when given only a pypsa-eur configuration
file.

The fix is to supply a fixed random state to the k-means algorithm.
@fneum
Copy link
Member

fneum commented Feb 10, 2022

That's a good idea!

@fneum fneum enabled auto-merge February 10, 2022 15:27
@fneum fneum merged commit 402f2cd into PyPSA:master Feb 10, 2022
pz-max referenced this pull request in pypsa-meets-earth/pypsa-earth Feb 12, 2022
When the kmeans algorithm is used to cluster networks, this is not
deterministic by default. The result is that repeated runs of the
`simplify_network` and `cluster_network` rules can and usually do
produce different results that vary somewhat randomly. This makes
results less reproducible when given only a pypsa-eur configuration
file.

The fix is to supply a fixed random state to the k-means algorithm.
fneum pushed a commit that referenced this pull request Mar 6, 2023
ci: reduce duplications of test/config.*.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants