Largest connected set returns a state when it should return an empty set #175

MaaikeG · 2021-12-09T09:21:39Z

The following should print the empty set [] but instead prints [3].

import numpy as np
from deeptime.markov import TransitionCountEstimator

estimator = TransitionCountEstimator(lagtime=1, count_mode='sliding')

full_counts_model = estimator.fit_fetch(np.asarray([0, 1, 2, 3]))
submodel = full_counts_model.submodel_largest(
        connectivity_threshold=1,
        directed=True)
print(submodel.state_symbols)

The text was updated successfully, but these errors were encountered:

clonker · 2021-12-09T09:55:17Z

Yeah I think you are right. This is an error in the connectivity_threshold implementation which is not respected for self-connectivity (i.e. loops k -> k for state k). What do you think @thempel ?

thempel · 2021-12-09T10:42:47Z

Well... so one could say that the set of states consists of 4 independent, disconnected sets ([0], [1], [2], [3]). Since they all have the same size, submodel_largest will return a random one, in this case [3]. It's obviously a bit useless to have a 1-state submodel though, so maybe one should print some warning?

Some thoughts: I could imagine that returning empty sets may break something downstream. It's not technically wrong to have a 1-state model, so I wouldn't necessarily raise an error.

clonker · 2021-12-09T10:46:02Z

I would agree except for the connectivity threshold - if it is set to a value that is greater than what can be found on the count matrix diagonal then the one-state set should be discarded I think.

clonker · 2021-12-09T10:47:03Z

Like think about the 1x1 count matrix C=[[1]] and you set a connectivity threshold of 10... then the single connected component no longer fulfills the constraints given by connectivity threshold and there are (by that definition) no more connected sets.

thempel · 2021-12-09T10:54:12Z

As far as I remember, connectivity is only defined by transitions between different states, excluding self-transitions. In the graph sense: Two nodes are connected if there is an edge between them - the node does still exist in the graph if it's disconnected from everything else. So in the above trajectory, the connected sets with a connectivity threshold of >1 would still be [0], ..., [3].

clonker · 2021-12-09T11:56:44Z

Yeah to confirm: I just looked this up, singleton nodes are by definition connected components. Whether it is desirable to eliminate these if the weight of its loop is below connectivity threshold is a philosophical (and practical) consideration. Perhaps this should be documented somewhere more clearly.

MaaikeG · 2021-12-09T14:07:52Z

In that case, would it make sense to at least return the singleton set that contains the most samples?

clonker · 2021-12-09T14:15:56Z

If we define it to be that way then yes it would make sense, I don't see the the immediate benefit from it though, can you elaborate?

MaaikeG · 2021-12-09T14:19:09Z

If I want to restrict my samples to the largest connected set, I can imagine I would want to 'keep' as many samples as possible. But I guess it doesn't really matter because this is not a realistic use case. Just something I came accross while testing. So just updating the docs sounds good.

clonker · 2021-12-09T14:35:47Z

Alright, docs updated. In case of sets with more than one element it's not so clear how to order them anyways.

clonker mentioned this issue Dec 9, 2021

Minor docs updates #176

Merged

clonker closed this as completed Dec 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Largest connected set returns a state when it should return an empty set #175

Largest connected set returns a state when it should return an empty set #175

MaaikeG commented Dec 9, 2021

clonker commented Dec 9, 2021

thempel commented Dec 9, 2021

clonker commented Dec 9, 2021

clonker commented Dec 9, 2021

thempel commented Dec 9, 2021

clonker commented Dec 9, 2021

MaaikeG commented Dec 9, 2021

clonker commented Dec 9, 2021

MaaikeG commented Dec 9, 2021

clonker commented Dec 9, 2021

Largest connected set returns a state when it should return an empty set #175

Largest connected set returns a state when it should return an empty set #175

Comments

MaaikeG commented Dec 9, 2021

clonker commented Dec 9, 2021

thempel commented Dec 9, 2021

clonker commented Dec 9, 2021

clonker commented Dec 9, 2021

thempel commented Dec 9, 2021

clonker commented Dec 9, 2021

MaaikeG commented Dec 9, 2021

clonker commented Dec 9, 2021

MaaikeG commented Dec 9, 2021

clonker commented Dec 9, 2021