iter_morphology output is unsorted #59

arnaudon · 2021-03-17T08:38:37Z

This function is sometimes non-reproducible in the ordering of morphs:

https://github.com/BlueBrain/morph-tool/blob/master/morph_tool/utils.py#L16

I think because of pathlib, maybe we can add an sorted() somewhere to ensure it is?

@asanin-epfl @adrien-berchet , what do you think?

adrien-berchet · 2021-03-17T09:10:53Z

Yes, sorting is probably the best.

asanin-epfl · 2021-03-17T09:25:04Z

As an alternative, why it should guarantee any order? If user needs order, he can sort on his side. What do you think?

wizmer · 2021-03-17T09:26:04Z

I agree with this, yes.

arnaudon · 2021-03-17T09:29:37Z

It was breaking a test randomly which was hardcoding the expected order of morphs in neuror here BlueBrain/NeuroR#62.
That was the origin of this, and I suspect it may happen again. So you think it's better to sort all tests using this function, or just sort it's output?

asanin-epfl · 2021-03-17T09:32:19Z

I'd say sort it's output where it is needed (in those failing tests).

arnaudon · 2021-03-17T10:05:00Z

What is the good reason for going through the pain of updating all tests to make sure they won't randomly fail in a future PR rather than a small change here?

wizmer · 2021-03-17T10:23:34Z

By sorting, you are complexifying the functionality and decreasing the perf. You also need to update the docstring to explicit the expected sorted ordering. It is better to keep functions simple with a minimal functionality.
Moreover if a test is failing because it expects a specific ordering, the tests is not testing what it should. I don't know what is the issue in this case but you can probably fix it by change [] to {} somewhere in the expected result.

arnaudon · 2021-03-17T10:32:31Z

so the simplest is to not use that function in neuror test, and just use a custom sorted one

what change in perf is sorted inducing?
sure, docstring needs updating
this function is so minimal already it is near useless and seems to only breaks reproducibility
the test compares against {morphs[0]: result_morph0, morphs[1]: result_morph1}, where morphs is the output of this non-deterministic function, so the key/values are random and the assert fails random.

wizmer · 2021-03-17T11:10:40Z

what change in perf is sorted inducing?

The change is O(1) to O(N log N). It is massive. Like really massive. Try doing next(iter_morphology_files(folder)) and next(sorted(iter_morphology_files(folder)) on a folder containing a large amount of morphologies on GPFS. You could go from ~10ms to 10 sec easily.

this function is so minimal already it is near useless and seems to only breaks reproducibility

I must be using it in 20 locations lol, I wouldn't call it useless ! And I have no need of the sorting.

the test compares against {morphs[0]: result_morph0, morphs[1]: result_morph1}, where morphs is the output of this non-deterministic function, so the key/values are random and the assert fails random.

Have you tried using assert_dict_equal from nose ? That should be invariant of the ordering. Or dictdiffer ?

arnaudon · 2021-03-17T11:15:18Z

Ah ok, indeed, it's massive, I didn't realize. The issue is not comparing sorted dict, it is that the dict constructed to compare result has wrong pairs of key/values if the keys are shuffled randomly. I'll put an issue in neuror to fix all the tests then.

adrien-berchet · 2022-01-13T13:50:56Z

I guess we can close this issue?

arnaudon mentioned this issue Mar 17, 2021

new cut plane algo BlueBrain/NeuroR#62

Merged

wizmer changed the title ~~iter_morphology non reproducible~~ iter_morphology output is unsorted Mar 17, 2021

arnaudon mentioned this issue Mar 17, 2021

update tests to ensure ordered morphs BlueBrain/NeuroR#70

Closed

arnaudon closed this as completed Jan 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iter_morphology output is unsorted #59

iter_morphology output is unsorted #59

arnaudon commented Mar 17, 2021

adrien-berchet commented Mar 17, 2021

asanin-epfl commented Mar 17, 2021

wizmer commented Mar 17, 2021

arnaudon commented Mar 17, 2021

asanin-epfl commented Mar 17, 2021

arnaudon commented Mar 17, 2021

wizmer commented Mar 17, 2021

arnaudon commented Mar 17, 2021

wizmer commented Mar 17, 2021

arnaudon commented Mar 17, 2021

adrien-berchet commented Jan 13, 2022

iter_morphology output is unsorted #59

iter_morphology output is unsorted #59

Comments

arnaudon commented Mar 17, 2021

adrien-berchet commented Mar 17, 2021

asanin-epfl commented Mar 17, 2021

wizmer commented Mar 17, 2021

arnaudon commented Mar 17, 2021

asanin-epfl commented Mar 17, 2021

arnaudon commented Mar 17, 2021

wizmer commented Mar 17, 2021

arnaudon commented Mar 17, 2021

wizmer commented Mar 17, 2021

arnaudon commented Mar 17, 2021

adrien-berchet commented Jan 13, 2022