Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Largest connected set returns a state when it should return an empty set #175

Closed
MaaikeG opened this issue Dec 9, 2021 · 10 comments
Closed

Comments

@MaaikeG
Copy link
Collaborator

MaaikeG commented Dec 9, 2021

The following should print the empty set [] but instead prints [3].

import numpy as np
from deeptime.markov import TransitionCountEstimator

estimator = TransitionCountEstimator(lagtime=1, count_mode='sliding')

full_counts_model = estimator.fit_fetch(np.asarray([0, 1, 2, 3]))
submodel = full_counts_model.submodel_largest(
        connectivity_threshold=1,
        directed=True)
print(submodel.state_symbols)
@clonker
Copy link
Member

clonker commented Dec 9, 2021

Yeah I think you are right. This is an error in the connectivity_threshold implementation which is not respected for self-connectivity (i.e. loops k -> k for state k). What do you think @thempel ?

@thempel
Copy link
Collaborator

thempel commented Dec 9, 2021

Well... so one could say that the set of states consists of 4 independent, disconnected sets ([0], [1], [2], [3]). Since they all have the same size, submodel_largest will return a random one, in this case [3]. It's obviously a bit useless to have a 1-state submodel though, so maybe one should print some warning?

Some thoughts: I could imagine that returning empty sets may break something downstream. It's not technically wrong to have a 1-state model, so I wouldn't necessarily raise an error.

@clonker
Copy link
Member

clonker commented Dec 9, 2021

I would agree except for the connectivity threshold - if it is set to a value that is greater than what can be found on the count matrix diagonal then the one-state set should be discarded I think.

@clonker
Copy link
Member

clonker commented Dec 9, 2021

Like think about the 1x1 count matrix C=[[1]] and you set a connectivity threshold of 10... then the single connected component no longer fulfills the constraints given by connectivity threshold and there are (by that definition) no more connected sets.

@thempel
Copy link
Collaborator

thempel commented Dec 9, 2021

As far as I remember, connectivity is only defined by transitions between different states, excluding self-transitions. In the graph sense: Two nodes are connected if there is an edge between them - the node does still exist in the graph if it's disconnected from everything else. So in the above trajectory, the connected sets with a connectivity threshold of >1 would still be [0], ..., [3].

@clonker
Copy link
Member

clonker commented Dec 9, 2021

Yeah to confirm: I just looked this up, singleton nodes are by definition connected components. Whether it is desirable to eliminate these if the weight of its loop is below connectivity threshold is a philosophical (and practical) consideration. Perhaps this should be documented somewhere more clearly.

@MaaikeG
Copy link
Collaborator Author

MaaikeG commented Dec 9, 2021

In that case, would it make sense to at least return the singleton set that contains the most samples?

@clonker
Copy link
Member

clonker commented Dec 9, 2021

If we define it to be that way then yes it would make sense, I don't see the the immediate benefit from it though, can you elaborate?

@MaaikeG
Copy link
Collaborator Author

MaaikeG commented Dec 9, 2021

If I want to restrict my samples to the largest connected set, I can imagine I would want to 'keep' as many samples as possible. But I guess it doesn't really matter because this is not a realistic use case. Just something I came accross while testing. So just updating the docs sounds good.

@clonker
Copy link
Member

clonker commented Dec 9, 2021

Alright, docs updated. In case of sets with more than one element it's not so clear how to order them anyways.

@clonker clonker closed this as completed Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants