Linking mixture and partition models #25

bqminh · 2017-05-19T14:30:03Z

Allowing the possibility to have mixture models (or PMSF) as well as partitioned data analysis. One could imagine allowing the weight parameters of the mixture either being linked across partitions or unlinked).

Message from Andrew Roger:

One thing that might be really useful is for people to be able to use partition models at the same time as these PMSF (or mixture) models. For instance they may wish to have different branchlengths and alpha shape parameters for different partitions, but at the same time use the mixture models (or PMSF) with the same weights for all the partitions (or partition-specific weights maybe).

It is relevant to recent debates between Nicolas Lartillot (phylobayes CAT model) and Ken Halanych over animal phylogeny. Halanych suggests its more important to partition data than accommodate site-heterogeneity (i.e. through mixture models like CAT) for accurate phylogenetic inference. Nicolas Lartillot argues the opposite.I think Nicolas is mostly correct — that site-heterogeneity is more important usually to accommodate than partitioning…but ultimately having both partitions and the ability to have site-heterogeneity would lead to the most model ‘realism’ in my view.

Just to follow up on the rationale for a model that allows partitions AND mixture models (where the mixture models are ‘linked’ across partitions).

I think the issue of gene-specific ‘heterotachy’ — i.e. different genes having different branchlengths — can cause problems if ignored and branchlengths are linked across partitions. This is the general problem of heterotachy (see a paper we wrote on this in 2005). However, as I mentioned in my phyloseminar talk, I think the site-specific ‘constraints’ on evolution are probably and even more important issue in phylogenetics. Hence the need for the site profile mixture models and the PMSF models we’ve developed.

Ideally however, it would be nice to be able to have both at the same time. In this case I really don’t think it is necessary for the partitions to have different ‘weights’ for different mixture classes (or even different gamma distribution shape parameters) — so I think linnking the mixture models so that the weights are the same for all sites in all partitions is fine. The main rationale for partitioning, in my view, is to allow for separate branchlengths for different partitions. In my view, for concatenated protein alignments, allowing for different exchangeabilities or different mixture weigths per partition is not really important. I realize I haven’t given you a lot of literature references to back up my assertions, these are more based on my own intuition.

hcwangdal · 2017-05-19T16:12:41Z

Hi Andrew and Minh, Using –m LG+C20+F+G with –sp partition_file, IQTree does do partition and C20 mixture at the same time. However, it is not clear whether the mixture weights and alpha are separately optimized for each partition. The log file seems suggesting they do the separate optimizations for the partitions but the optimized weights and alpha are not given in the log file. From: Bui Quang Minh [mailto:notifications@github.com] Sent: Friday, May 19, 2017 11:30 AM To: Cibiv/IQ-TREE <IQ-TREE@noreply.github.com> Cc: Subscribed <subscribed@noreply.github.com> Subject: [Cibiv/IQ-TREE] Linking mixture and partition models (#25) Allowing the possibility to have mixture models (or PMSF) as well as partitioned data analysis. One could imagine allowing the weight parameters of the mixture either being linked across partitions or unlinked). Message from Andrew Roger: One thing that might be really useful is for people to be able to use partition models at the same time as these PMSF (or mixture) models. For instance they may wish to have different branchlengths and alpha shape parameters for different partitions, but at the same time use the mixture models (or PMSF) with the same weights for all the partitions (or partition-specific weights maybe). It is relevant to recent debates between Nicolas Lartillot (phylobayes CAT model) and Ken Halanych over animal phylogeny. Halanych suggests its more important to partition data than accommodate site-heterogeneity (i.e. through mixture models like CAT) for accurate phylogenetic inference. Nicolas Lartillot argues the opposite.I think Nicolas is mostly correct — that site-heterogeneity is more important usually to accommodate than partitioning…but ultimately having both partitions and the ability to have site-heterogeneity would lead to the most model ‘realism’ in my view. Just to follow up on the rationale for a model that allows partitions AND mixture models (where the mixture models are ‘linked’ across partitions). I think the issue of gene-specific ‘heterotachy’ — i.e. different genes having different branchlengths — can cause problems if ignored and branchlengths are linked across partitions. This is the general problem of heterotachy (see a paper we wrote on this in 2005). However, as I mentioned in my phyloseminar talk, I think the site-specific ‘constraints’ on evolution are probably and even more important issue in phylogenetics. Hence the need for the site profile mixture models and the PMSF models we’ve developed. Ideally however, it would be nice to be able to have both at the same time. In this case I really don’t think it is necessary for the partitions to have different ‘weights’ for different mixture classes (or even different gamma distribution shape parameters) — so I think linnking the mixture models so that the weights are the same for all sites in all partitions is fine. The main rationale for partitioning, in my view, is to allow for separate branchlengths for different partitions. In my view, for concatenated protein alignments, allowing for different exchangeabilities or different mixture weigths per partition is not really important. I realize I haven’t given you a lot of literature references to back up my assertions, these are more based on my own intuition. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#25>, or mute the thread<https://github.com/notifications/unsubscribe-auth/APblYxwwlenaaR4zP6EkbiX4GThPvAFRks5r7adrgaJpZM4NgmYz>.

bqminh · 2017-05-19T16:43:19Z

Hi Huaichun, it’s true that IQ-TREE already allows to have each partition evolved under C20+… model. However, all mixture weights are unlinked and thus estimated separately across partitions, as you correctly suspected. This can be quite over-parameterized for many data sets. Thus, I have it now in the agenda to have an option to link C20 weights. This should be available in version 1.6.X. Have a nice weekend, Minh

…

On May 19, 2017, at 6:12 PM, hcwangdal ***@***.***> wrote: Hi Andrew and Minh, Using –m LG+C20+F+G with –sp partition_file, IQTree does do partition and C20 mixture at the same time. However, it is not clear whether the mixture weights and alpha are separately optimized for each partition. The log file seems suggesting they do the separate optimizations for the partitions but the optimized weights and alpha are not given in the log file. From: Bui Quang Minh ***@***.*** Sent: Friday, May 19, 2017 11:30 AM To: Cibiv/IQ-TREE ***@***.***> Cc: Subscribed ***@***.***> Subject: [Cibiv/IQ-TREE] Linking mixture and partition models (#25) Allowing the possibility to have mixture models (or PMSF) as well as partitioned data analysis. One could imagine allowing the weight parameters of the mixture either being linked across partitions or unlinked). Message from Andrew Roger: One thing that might be really useful is for people to be able to use partition models at the same time as these PMSF (or mixture) models. For instance they may wish to have different branchlengths and alpha shape parameters for different partitions, but at the same time use the mixture models (or PMSF) with the same weights for all the partitions (or partition-specific weights maybe). It is relevant to recent debates between Nicolas Lartillot (phylobayes CAT model) and Ken Halanych over animal phylogeny. Halanych suggests its more important to partition data than accommodate site-heterogeneity (i.e. through mixture models like CAT) for accurate phylogenetic inference. Nicolas Lartillot argues the opposite.I think Nicolas is mostly correct — that site-heterogeneity is more important usually to accommodate than partitioning…but ultimately having both partitions and the ability to have site-heterogeneity would lead to the most model ‘realism’ in my view. Just to follow up on the rationale for a model that allows partitions AND mixture models (where the mixture models are ‘linked’ across partitions). I think the issue of gene-specific ‘heterotachy’ — i.e. different genes having different branchlengths — can cause problems if ignored and branchlengths are linked across partitions. This is the general problem of heterotachy (see a paper we wrote on this in 2005). However, as I mentioned in my phyloseminar talk, I think the site-specific ‘constraints’ on evolution are probably and even more important issue in phylogenetics. Hence the need for the site profile mixture models and the PMSF models we’ve developed. Ideally however, it would be nice to be able to have both at the same time. In this case I really don’t think it is necessary for the partitions to have different ‘weights’ for different mixture classes (or even different gamma distribution shape parameters) — so I think linnking the mixture models so that the weights are the same for all sites in all partitions is fine. The main rationale for partitioning, in my view, is to allow for separate branchlengths for different partitions. In my view, for concatenated protein alignments, allowing for different exchangeabilities or different mixture weigths per partition is not really important. I realize I haven’t given you a lot of literature references to back up my assertions, these are more based on my own intuition. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#25>, or mute the thread<https://github.com/notifications/unsubscribe-auth/APblYxwwlenaaR4zP6EkbiX4GThPvAFRks5r7adrgaJpZM4NgmYz>. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#25 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AOM30-VVIzBoz_kBGiSPDhoBQPqdC2b5ks5r7b96gaJpZM4NgmYz>.

-- Bui Quang Minh Center for Integrative Bioinformatics Vienna (CIBIV) Campus Vienna Biocenter 5, VBC5, Ebene 1 A-1030 Vienna, Austria Phone: ++43 1 4277 74326 Email: minh.bui (AT) univie.ac.at

bqminh · 2020-05-30T02:51:44Z

This feature turned out to be too difficult to implement, and we decided not to implement it

bqminh added the enhancement label May 19, 2017

bqminh added this to the v1.6.0 milestone May 19, 2017

bqminh added the high-priority label Dec 9, 2017

bqminh removed the high-priority label Mar 4, 2019

bqminh modified the milestones: v1.6.0, v1.7.0 Mar 4, 2019

bqminh closed this as completed May 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linking mixture and partition models #25

Linking mixture and partition models #25

bqminh commented May 19, 2017

hcwangdal commented May 19, 2017 via email

bqminh commented May 19, 2017 via email

bqminh commented May 30, 2020

Linking mixture and partition models #25

Linking mixture and partition models #25

Comments

bqminh commented May 19, 2017

hcwangdal commented May 19, 2017 via email

bqminh commented May 19, 2017 via email

bqminh commented May 30, 2020