Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make final dendrograms immutable #25

Open
astrofrog opened this issue Jun 19, 2013 · 12 comments
Open

Make final dendrograms immutable #25

astrofrog opened this issue Jun 19, 2013 · 12 comments
Milestone

Comments

@astrofrog
Copy link
Contributor

As discussed in #20, part of the infrastructure in the Structure class (e.g. the tree index) relies on the dendrogram being immutable after the computation. One of the suggestions we mentioned there was to split Structure into two classes - one representing mutable structures that are used for the initial computation, and one that is used for the final structure, which is efficient but immutable.

One question that I'm wondering now is that if we do this, the tree can no longer be pruned by the user after the fact, so is that what we want?

If we decide we want an alternative that leaves the possibility of having a flexible tree after the computation, we can use the hash of the index map to check whether the tree has changed - Numpy arrays can be hashed, so this is trivial. If the hash of the index map changes, all cached variables (including the tree index) would get reset.

@low-sky low-sky closed this as completed Jun 19, 2013
@low-sky low-sky reopened this Jun 19, 2013
@low-sky
Copy link
Contributor

low-sky commented Jun 19, 2013

Ooops. wrong button. I think a post-computation prune of the tree will often be desired. It's probably more efficient to decimate a tree than to recompute it using new parameters. While this probably isn't needed at high priority right now, it could be a medium-term goal to offer functionality that moves one immutable tree and associated index map through a set of pruning options and produces a different tree.

@astrofrog
Copy link
Contributor Author

@low-sky - that's a good point - if we provide a function to wrap the pruning, then we can control the mutability/immutability.

@ChrisBeaumont
Copy link
Contributor

Ah, the pruning issue is a good point.

For speed reasons, I'm not sure we want to rely on a hash of the index map -- this potentially involves a scan of the entire dataset every time a Structure property is accessed.

Likewise, some profiling will determine whether it's faster to prune by creating a new set of structures, or by modifying the old ones in place. It isn't obvious to me which one will be faster, so we shouldn't rule out either option yet

@astrofrog
Copy link
Contributor Author

@ChrisBeaumont - so for now,do you suggest that we should not go ahead and split the structures, or go ahead and then investigate how best to prune the tree?

@ChrisBeaumont
Copy link
Contributor

The pruning question is more important, for sure.

@astrofrog, do you have time to put together some pseudocode of what pruning might look like? I suspect it will be conceptually easier to generate a new tree, rather than traversing the current one and modifying parent/child references throughout. I could be wrong about this.

That code will help me understand if creating a new tree from a prune is dramatically inefficient or complicated. Unless it is, I think splitting the sturctures into mutable/immutable classes is fine.

If we decide to split the structure classes, we can put off actually implementing pruning until that refactoring is done.

@astrofrog
Copy link
Contributor Author

@ChrisBeaumont - here's a simple example, but I think it's not yet correct, because it would need to make sure that merge_level is set to the right value, and I'm not sure how you can figure that out because it's not clear whether if 3 leaves end up being merged into a structure, which two would have merged first, without recomputing the dendrogram. Or am I missing something?

import numpy as np

import matplotlib.pyplot as plt
from astrodendro import Dendrogram
from astropy.io import fits

d = Dendrogram.compute(fits.getdata("MSX_E.fits"), verbose=True, min_data_value=1.e-5, min_delta=1.e-5)

while True:
    m = 0
    for structure in d.leaves:
        if structure.get_npix() < 20:
            m += 1
            if structure.parent is not None:
                structure.parent.children.remove(structure)
                structure.parent._merge(structure)
                d.index_map[d.index_map == structure.idx] = structure.parent.idx
            else:
                d.index_map[d.index_map == structure.idx] = 0
            d.nodes_dict.pop(structure.idx)
    print("Merged {0} leaves".format(m))
    if m == 0:
        break

I'm actually not sure I quite get the point of merge_level - to me it should actually be a property of the child structure, not the parent structure - i.e. if 3 leaves merge to form a branch, then different leaves merged at different points - isn't it more interesting to find out which leaf merged when rather than which two structures merged first in the parent? I think I'm generally confused about merge_level and why that should be used to find the height of a leaf, especially if that leaf did not merge at that level.

Of course, we could just go ahead with the structure split, make the tree immutable and then if we find that pruning is needed and we can figure it out, then we can transform the tree back to a mutable tree, prune it, and make it immutable again?

I guess I also don't get why one would prune rather than just compute the tree with the right settings to start with since we can prune-as-you-go (:tm:)?

@ChrisBeaumont
Copy link
Contributor

Do we even need merge_level? It's currently only used within Structure.height.

In fact, I don't understand what height describes -- it isn't useful for drawing purposes, since it's sensitive to noise spikes. For drawing purposes, I think a better definition for height is min(c.vmin for c in self.children) if self.children else self.vmax.

Likewise, a useful definition for the bottom of a structure for drawing purposes is self.parent.height if self.parent is not None else self.vmin

I'm fine if we want to eliminate merge_level

@astrofrog
Copy link
Contributor Author

Sounds good to me! I was equally confused about the need for merge_level, so I think we can just remove it.

Do you think we should go ahead and try and split the Structure classes?

@ChrisBeaumont
Copy link
Contributor

I'll take a pass at it, and show you the code.

I think pruning won't be a performance concern, since prune-as-you-go ( TM ) and make-a-new-tree ( TM ) should both be fast-ish (minutes, if the data fits into RAM). We can come back to this if someone decides interactive, real-time post-pruning is crucial

@astrofrog
Copy link
Contributor Author

Ok, that sounds like a good plan. Feel free to attach the code to this issue.

@ChrisBeaumont ChrisBeaumont mentioned this issue Jun 21, 2013
5 tasks
@astrofrog
Copy link
Contributor Author

@ChrisBeaumont - what do you think we should do about this? (making dendrograms immutable). Shall we try and get that into 0.1, or delay until 0.2?

@ChrisBeaumont
Copy link
Contributor

Yes, this can wait
On Jul 12, 2013 6:08 AM, "Thomas Robitaille" notifications@github.com
wrote:

@ChrisBeaumont https://github.com/ChrisBeaumont - what do you think we
should do about this? (making dendrograms immutable). Shall we try and get
that into 0.1, or delay until 0.2?


Reply to this email directly or view it on GitHubhttps://github.com//issues/25#issuecomment-20868481
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants