Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding the grapetree distance computation #87

Closed
crarlus opened this issue Feb 27, 2020 · 1 comment
Closed

Understanding the grapetree distance computation #87

crarlus opened this issue Feb 27, 2020 · 1 comment

Comments

@crarlus
Copy link

crarlus commented Feb 27, 2020

Hi,
I have some doubts whether I understand the "--missing" option in the CLI grapetree version correctly. This is what it says in the help file:

  --missing HANDLER, -y HANDLER
                        ONLY FOR symmetric DISTANCE MATRIX.
                        0: [DEFAULT] ignore missing data in pairwise comparison.
                        1: Remove column with missing data.
                        2: treat data as an allele.
                        3: Absolute number of allelic differences.

I ran the distance calculation for some allele profiles, and when I check the resulting distances and compare it to a custom distance script, I think that --missing 3 gives me actually something like "ignore missing data in pairwise comparison".

Could you please clarify:

  • What do you mean by "Absolute number of allelic differences"? In earlier versions this case was termed "Naive counting of absolute differences between profiles"?
  • How are "-" entries treated in this case (--missing 3)?
  • And how are the differences computed for --missing 0

Also I noticed that the resulting minimum spanning trees MSTreev2 contain the distances according to --missing 3?

Thanks for your help?

@zheminzhou
Copy link
Collaborator

For --missing 0, the differences are calculated as:
(Number of different alleles)/(Number of loci that present in both genomes)*(Total number of loci)

For --missing 3, missing loci are ignored and only the real different alleles are counted.
A: 111
B: 12-
Will have a distance of 1.

"Also I noticed that the resulting minimum spanning trees MSTreev2 contain the distances according to --missing 3?“

Yes, for easy interpretation of the results, the branch lengths of the MSTreeV2 result is re-calculated using "--missing 3".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants