Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitting Vcf per chr & Optimal Parameter for Phase Correction #25

Closed
casia16 opened this issue Apr 21, 2022 · 1 comment
Closed

Splitting Vcf per chr & Optimal Parameter for Phase Correction #25

casia16 opened this issue Apr 21, 2022 · 1 comment

Comments

@casia16
Copy link

casia16 commented Apr 21, 2022

Hi,
Thank you for developing this great method of LAI. So far I can explore a lot with my own cattle data and it was nice to see the results, confirming our hypothesis.

I have two questions,
First, should I split the VCF file per chromosome? judging the SNPs are seen as not independent in the algorithm, or could be handled automatically?
Second, I found quite some switch errors, I did try the smoothing of phase correction. It works but we still have some unfix switch. Any recommended step to find optimal parameter (for ex, lambda) because when I try to change it to bigger lambda values, it seems improve the results (lesser switch errors) compared to change rate_vote and threshold.

@gdurif
Copy link
Collaborator

gdurif commented Apr 21, 2022

Hi,
Thanks for your interest in Loter.
Regarding your questions:

  1. Yes, you should split the VCF file per chromosome. Loter assumes that the input is a set of consecutive SNPs belonging to the same DNA molecule.
  2. Regarding the smoothing for phase correction:
  • the lambda parameter controls the local ancestry switch likelihood between consecutive SNPs (when considering haplotypes independently). Higher lambda means longer ancestry chunks/tracts which could counter-balance phasing switch errors between homologous haplotypes but does not correct it per se. You can increase the range of lambda values or shift it towards larger value to consider longer ancestry chunks in the procedure.
  • the threshold parameter (between 0 and 1) is the one controlling the smoothing of the phase correction, the lower the stronger smoothing (if I remember correctly).

I am surprised by what you observed. Maybe it is a combination of both points 1 and 2, the ancestry tracts are too short and the smoothing is not strong enough, so you could combine both (increase the lambda range and decrease the smoothing threshold).

Best

@gdurif gdurif closed this as completed Apr 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants