Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

markovModel(lag,n + 1) #55

Closed
noeliaferruz opened this issue Jun 8, 2016 · 5 comments
Closed

markovModel(lag,n + 1) #55

noeliaferruz opened this issue Jun 8, 2016 · 5 comments

Comments

@noeliaferruz
Copy link

Hello!

Question. It seems that sometimes lately when I try to build n macrostates I get n-1, is this an error?

Thanks,
Noelia

model=Model(dataTica)
#model.plotTimescales(lags=range(1,100,5))
model.markovModel(50,5)
model.viewStates(protein="protein and name CA")

2016-06-07 12:58:55,952 - htmd.model - INFO - 24.4% of the data was used
2016-06-07 12:58:55,959 - htmd.model - INFO - Number of trajectories that visited each macrostate:
2016-06-07 12:58:55,960 - htmd.model - INFO - [5 4 5 2]
2016-06-07 12:58:55,962 - htmd.model - INFO - Take care! Macro 3 has been visited only in 2 trajectories:
2016-06-07 12:58:55,962 - htmd.model - INFO - id = 15
parent = None
input = []
trajectory = ['./filtered/e2s6_e1s8p0f473/output.filtered.xtc']
molfile = ./filtered/filtered.pdb
2016-06-07 12:58:55,963 - htmd.model - INFO - id = 16
parent = None
input = []
trajectory = ['./filtered/e2s7_e1s8p0f331/output.filtered.xtc']
molfile = ./filtered/filtered.pdb
[Parallel(n_jobs=1)]: Done   1 tasks       | elapsed:    6.8s
[Parallel(n_jobs=1)]: Done   2 tasks       | elapsed:   13.9s
[Parallel(n_jobs=1)]: Done   3 tasks       | elapsed:   20.5s
[Parallel(n_jobs=1)]: Done   4 tasks       | elapsed:   27.2s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:   27.2s finished
@stefdoerr
Copy link
Contributor

Getting n-1 is normal. Has to do with the implementation of PCCA.

The real problem here seems to me that your data is extremely disconnected and at a lag of 50 frames you are throwing away 75% of your data, having only essentially 16 trajectories of data left.

@stefdoerr
Copy link
Contributor

p.s. there is a new option units= where you can provide units for your lag time. So instead of 50 you can do model.markovModel(5, 5, units='ns') so that if you later change your fstep you won't accidentally do the analysis on a different lag time.

@noeliaferruz
Copy link
Author

Thanks,

I like the units= option!

It just happened in another system where I actually have a lot of data and 100.0% of the data was used, is it an indicative of a bad model although the implied timescales are good, or it's nothing at all to worry about?

@stefdoerr
Copy link
Contributor

stefdoerr commented Jun 8, 2016

Well, it depends how you interpret it. I would not trust kinetics of the macrostate with has only occurred in 2 trajectories. Bootstrapping for example would give you huge errors.

The way I usually think about it is:
Does this state look interesting?

  • yes: restart simulations from there to gather statistics
  • no: reduce the number of macrostates and it will be joined into another macrostate with more statistics

The more macrostates you add the more macrostates you will get with bad statistics.

Also look at your timescales. If you only have one slow process, the correct thing to do would be to make 2 macrostates for example.

Thinking about it, I would love to have a feature like

ace = AcemdLocal()
ace.submit(model, macro=2)

which starts simulations from macrostate 2 to improve sampling!

@noeliaferruz
Copy link
Author

ok, thanks!
Yes that function would be cool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants