Fix random state for kmean clustering #313

koen-vg · 2022-02-10T15:11:19Z

When the kmeans algorithm is used to cluster networks, this is not deterministic by default. The result is that repeated runs of the simplify_network and cluster_network rules can and usually do produce different results that vary randomly. This makes results less reproducible when given only a pypsa-eur configuration file.

Changes proposed in this Pull Request

The fix is to supply a fixed random state to the k-means algorithm.

It might be considered if this should be done PyPSA itself, in the busmap_by_kmeans functions in networkclustering.py. That's an equally valid option, but maybe a bit opinionated.

I have not checked in a very rigorous way if there are any other sources of randomness in the network building process, but I don't think so.

Checklist

I tested my contribution locally and it seems to work fine.
N/A Code and workflow changes are sufficiently documented.
N/A Newly introduced dependencies are added to envs/environment.yaml and envs/environment.docs.yaml.
N/A Changes in configuration options are added in all of config.default.yaml, config.tutorial.yaml, and test/config.test1.yaml.
N/A Changes in configuration options are also documented in doc/configtables/*.csv and line references are adjusted in doc/configuration.rst and doc/tutorial.rst.
A note for the release notes doc/release_notes.rst is amended in the format of previous release notes.

When the kmeans algorithm is used to cluster networks, this is not deterministic by default. The result is that repeated runs of the `simplify_network` and `cluster_network` rules can and usually do produce different results that vary somewhat randomly. This makes results less reproducible when given only a pypsa-eur configuration file. The fix is to supply a fixed random state to the k-means algorithm.

fneum · 2022-02-10T15:27:36Z

That's a good idea!

When the kmeans algorithm is used to cluster networks, this is not deterministic by default. The result is that repeated runs of the `simplify_network` and `cluster_network` rules can and usually do produce different results that vary somewhat randomly. This makes results less reproducible when given only a pypsa-eur configuration file. The fix is to supply a fixed random state to the k-means algorithm.

ci: reduce duplications of test/config.*.yaml

koen-vg added 2 commits February 10, 2022 15:57

Document the k-means random state fix

a2d3edd

fneum approved these changes Feb 10, 2022

View reviewed changes

fneum enabled auto-merge February 10, 2022 15:27

fneum merged commit 402f2cd into PyPSA:master Feb 10, 2022

fneum pushed a commit that referenced this pull request Mar 6, 2023

Merge pull request #313 from PyPSA/ci-config

f21f0ea

ci: reduce duplications of test/config.*.yaml

koen-vg mentioned this pull request May 9, 2024

Clustering problem seemingly caused by addition of new TYNDP projects #996

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix random state for kmean clustering #313

Fix random state for kmean clustering #313

koen-vg commented Feb 10, 2022

fneum commented Feb 10, 2022

Fix random state for kmean clustering #313

Fix random state for kmean clustering #313

Conversation

koen-vg commented Feb 10, 2022

Changes proposed in this Pull Request

Checklist

fneum commented Feb 10, 2022