Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iter_morphology output is unsorted #59

Closed
arnaudon opened this issue Mar 17, 2021 · 11 comments
Closed

iter_morphology output is unsorted #59

arnaudon opened this issue Mar 17, 2021 · 11 comments

Comments

@arnaudon
Copy link
Contributor

This function is sometimes non-reproducible in the ordering of morphs:

https://github.com/BlueBrain/morph-tool/blob/master/morph_tool/utils.py#L16

I think because of pathlib, maybe we can add an sorted() somewhere to ensure it is?

@asanin-epfl @adrien-berchet , what do you think?

@adrien-berchet
Copy link
Member

Yes, sorting is probably the best.

@asanin-epfl
Copy link
Contributor

As an alternative, why it should guarantee any order? If user needs order, he can sort on his side. What do you think?

@wizmer
Copy link
Contributor

wizmer commented Mar 17, 2021

I agree with this, yes.

@arnaudon
Copy link
Contributor Author

It was breaking a test randomly which was hardcoding the expected order of morphs in neuror here BlueBrain/NeuroR#62.
That was the origin of this, and I suspect it may happen again. So you think it's better to sort all tests using this function, or just sort it's output?

@asanin-epfl
Copy link
Contributor

I'd say sort it's output where it is needed (in those failing tests).

@arnaudon
Copy link
Contributor Author

What is the good reason for going through the pain of updating all tests to make sure they won't randomly fail in a future PR rather than a small change here?

@wizmer
Copy link
Contributor

wizmer commented Mar 17, 2021

By sorting, you are complexifying the functionality and decreasing the perf. You also need to update the docstring to explicit the expected sorted ordering. It is better to keep functions simple with a minimal functionality.
Moreover if a test is failing because it expects a specific ordering, the tests is not testing what it should. I don't know what is the issue in this case but you can probably fix it by change [] to {} somewhere in the expected result.

@arnaudon
Copy link
Contributor Author

so the simplest is to not use that function in neuror test, and just use a custom sorted one

  • what change in perf is sorted inducing?
  • sure, docstring needs updating
  • this function is so minimal already it is near useless and seems to only breaks reproducibility
  • the test compares against {morphs[0]: result_morph0, morphs[1]: result_morph1}, where morphs is the output of this non-deterministic function, so the key/values are random and the assert fails random.

@wizmer
Copy link
Contributor

wizmer commented Mar 17, 2021

what change in perf is sorted inducing?

The change is O(1) to O(N log N). It is massive. Like really massive. Try doing next(iter_morphology_files(folder)) and next(sorted(iter_morphology_files(folder)) on a folder containing a large amount of morphologies on GPFS. You could go from ~10ms to 10 sec easily.

this function is so minimal already it is near useless and seems to only breaks reproducibility

I must be using it in 20 locations lol, I wouldn't call it useless ! And I have no need of the sorting.

the test compares against {morphs[0]: result_morph0, morphs[1]: result_morph1}, where morphs is the output of this non-deterministic function, so the key/values are random and the assert fails random.

Have you tried using assert_dict_equal from nose ? That should be invariant of the ordering. Or dictdiffer ?

@wizmer wizmer changed the title iter_morphology non reproducible iter_morphology output is unsorted Mar 17, 2021
@arnaudon
Copy link
Contributor Author

Ah ok, indeed, it's massive, I didn't realize. The issue is not comparing sorted dict, it is that the dict constructed to compare result has wrong pairs of key/values if the keys are shuffled randomly. I'll put an issue in neuror to fix all the tests then.

@adrien-berchet
Copy link
Member

I guess we can close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants