added mutual information via whichFst #208
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Added option to calculate the mutual information instead of Fst, as an choice for
realSFS fst
. MI has nice properties (it is a true metric, it is additive, satisfies triangle inequality etc) which Fst does not. Some prefer MI to Fst for selection scans, demographics, etc. A reference is here.The MI is calculated with
-whichFst 2
flag torealSFS fst index
. Here, the numerator is the mutual information, and the denominator is the joint entropy: so the "global" result withrealSFS fst stats
is the normalized mutual information (aka Shannon differentiation) -- a metric that is bounded in [0,1]. Like with vanilla fst, theprint
option prints out the numerator (MI) and denominator (joint entropy) per site. Because of the additivity property, the whole weighted vs. unweighted distinction is moot. To avoid redundant code,realSFS fst stats
still labels the output as "Fst" even though it is not (and gives meaningless population branch statistics, with three populations). But, the initialrealSFS fst index
prints a warning to this effect.I also changed the formatting for
realSFS fst print
to output numbers with a higher precision and switch to scientific notation if necessary. This avoids annoying round-off errors where both numerator and denominator are small.Finally, I updated the help message for
realSFS fst
to reflect these updates and also the other options ... copying from the wiki where possible.