Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that results from repeated runs are deterministic #80

Merged
merged 3 commits into from
Sep 23, 2013

Conversation

astrofrog
Copy link
Contributor

Here's the output of a simple script:

from astrodendro import Dendrogram
from astropy.io import fits
image = fits.getdata('L1448_13CO.fits')
d = Dendrogram.compute(image, min_value=1.2, min_delta=0.2, min_npix=10, verbose=True)
print(list(d.all_structures)[0].indices())

The output is not always the same which is going to cause issues if users try and reproduce bugs. Also, structure IDs may not always refer to the same structures, which would be an issue. There are several places where dictionaries are used, so this must be what is causing the differences:

mac-robitaille2:dendro-core tom$ python test.py 
Generating dendrogram using 78,465 of 6,637,050 pixels (1.1822270436413769% of data)
[========================================>] 100%
(array([223, 222, 224, 217, 221, 217, 217, 221, 222, 222, 221, 219, 223,
       220, 223, 219, 222, 220, 220, 220, 223, 221, 222, 218, 217, 222,
       222, 220, 222, 220, 221, 219, 221, 221, 221, 221, 220, 221, 220,
       224, 219, 221, 221, 221, 222, 222, 220, 222, 220, 218, 218, 219,
       219, 218, 221, 221, 223, 223, 221, 219, 220, 224, 224, 225, 221,
       223, 223, 219, 221, 220, 221, 218, 220, 223, 224, 216, 219, 220, 216]), array([44, 44, 45, 44, 42, 44, 44, 42, 46, 43, 42, 43, 44, 45, 44, 43, 45,
       43, 43, 43, 44, 45, 44, 43, 43, 45, 45, 44, 45, 44, 43, 43, 43, 43,
       43, 43, 44, 43, 44, 45, 43, 45, 45, 45, 44, 44, 45, 45, 43, 44, 44,
       44, 44, 44, 44, 46, 45, 45, 44, 44, 42, 44, 44, 46, 44, 45, 43, 44,
       44, 42, 44, 44, 43, 46, 46, 43, 45, 43, 43]), array([81, 79, 80, 78, 85, 79, 80, 84, 80, 81, 83, 81, 80, 79, 79, 80, 78,
       84, 82, 81, 82, 82, 81, 82, 79, 80, 81, 81, 82, 80, 84, 83, 83, 82,
       81, 80, 79, 79, 78, 79, 82, 78, 79, 81, 82, 80, 81, 79, 80, 81, 80,
       81, 82, 79, 80, 79, 79, 80, 79, 79, 82, 79, 81, 79, 81, 81, 82, 80,
       82, 84, 78, 78, 79, 80, 79, 78, 79, 78, 79]))
mac-robitaille2:dendro-core tom$ python test.py 
Generating dendrogram using 78,465 of 6,637,050 pixels (1.1822270436413769% of data)
[========================================>] 100%
(array([156, 156, 154, 156, 156, 154, 156, 154, 154, 156, 155, 153, 154,
       155, 154, 156, 155, 154, 153, 157, 158, 158, 154, 155, 153, 154,
       156, 155, 155, 154, 156, 154, 153, 156, 153, 154, 153, 154, 155,
       155, 155, 155, 153, 155, 153, 155, 155, 155, 156, 153, 158, 158,
       155, 155, 155, 155, 153, 155, 156, 156, 157, 154, 157, 154, 154,
       154, 155, 155, 155, 155, 155, 154, 154, 154, 153, 152, 155, 154,
       155, 154, 155, 158, 158, 156, 156, 155, 157, 157, 156, 155, 157,
       157, 156, 155, 154, 155, 155, 154, 154, 155, 158, 154]), array([75, 78, 78, 78, 73, 77, 75, 80, 77, 75, 78, 73, 78, 76, 73, 73, 81,
       77, 79, 74, 75, 76, 77, 76, 76, 78, 74, 78, 81, 76, 80, 78, 78, 74,
       79, 78, 78, 77, 78, 77, 77, 72, 77, 72, 76, 77, 78, 78, 81, 76, 76,
       76, 76, 75, 74, 79, 77, 79, 76, 79, 75, 76, 73, 79, 79, 79, 75, 75,
       75, 79, 75, 79, 79, 80, 77, 78, 80, 74, 73, 79, 80, 76, 76, 76, 77,
       73, 76, 76, 77, 73, 78, 76, 77, 74, 75, 74, 76, 75, 75, 76, 75, 78]), array([37, 34, 37, 33, 37, 36, 36, 34, 35, 34, 32, 37, 33, 34, 37, 38, 35,
       33, 35, 37, 37, 37, 34, 39, 35, 39, 37, 35, 34, 37, 34, 35, 36, 39,
       33, 36, 35, 39, 36, 34, 35, 39, 35, 40, 37, 39, 34, 33, 34, 36, 38,
       39, 38, 34, 36, 36, 34, 37, 37, 33, 37, 35, 37, 34, 35, 36, 38, 37,
       36, 38, 35, 37, 38, 35, 36, 36, 34, 37, 39, 33, 35, 40, 41, 36, 33,
       36, 37, 38, 34, 38, 33, 41, 35, 39, 38, 37, 36, 37, 36, 33, 41, 34]))

@astrofrog
Copy link
Contributor Author

One way to achieve this is to use ordered dictionaries - unfortunately, these were not available in Python 2.6.

@astrofrog
Copy link
Contributor Author

The attached code does the trick.

@ChrisBeaumont - what do you think?

@ChrisBeaumont
Copy link
Contributor

Maybe make sorted_by_idx _sorted_by_idx? Otherwise, good to me

@astrofrog
Copy link
Contributor Author

Done - I need to try and figure out if and how to add a test for this.

@astrofrog
Copy link
Contributor Author

I think a test is going to be hard, precisely because it's non-deterministic, so let's forget about a test for now.

@keflavich
Copy link
Contributor

As an aside - ordered dictionaries are included in astropy.extern. Since astrodendro requires astropy anyway, you could use their ordereddict structure.

@astrofrog
Copy link
Contributor Author

I tried that and there was a reason why it didn't work but I can't remember...

@@ -191,7 +196,7 @@ def next_idx():
adjacent = [structures[a].ancestor for a in adjacent]

# Remove duplicates
adjacent = list(set(adjacent))
adjacent = _sorted_by_idx(list(set(adjacent)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think list is needed

@astrofrog
Copy link
Contributor Author

@ChrisBeaumont - fixed the unnecessary list()

@ChrisBeaumont
Copy link
Contributor

Ready to merge

astrofrog added a commit that referenced this pull request Sep 23, 2013
Ensure that results from repeated runs are deterministic
@astrofrog astrofrog merged commit 6ff8a3a into dendrograms:master Sep 23, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants