Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd probabilities under 'IM' for certain parameters #6

Open
DRL opened this issue Apr 14, 2023 · 3 comments
Open

Odd probabilities under 'IM' for certain parameters #6

DRL opened this issue Apr 14, 2023 · 3 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@DRL
Copy link
Member

DRL commented Apr 14, 2023

Hey,

sorry to bother you. Either I am not setting up the IM model correctly or there still are issues with parameter combinations which are:

  • not related to me=0
  • not related to FGVs (but that could/should be addressed as well, see below)
# setup
import agemo
import numpy as np
sample_configuration = [(), ('a', 'a'), ('b', 'b')]
mutation_shape = tuple(np.array([2,2,2,2]) + 2)
branchtype_dict = {'a': 1, 'abb': 1, 'b': 0, 'aab': 0, 'aa': 3, 'bb': 3, 'ab': 2}
branch_type_object = agemo.BranchTypeCounter(sample_configuration, branchtype_dict=branchtype_dict)
mutation_type_object = agemo.MutationTypeCounter(branch_type_object, mutation_shape)
IM_AB_events = [
                agemo.MigrationEvent(len(sample_configuration), 1, 2),
                agemo.PopulationSplitEvent(len(sample_configuration) + 1, 0, 1, 2)]
IM_BA_events = [
                agemo.MigrationEvent(len(sample_configuration), 2, 1),
                agemo.PopulationSplitEvent(len(sample_configuration) + 1, 0, 1, 2)]
IM_AB_gf = agemo.GfMatrixObject(branch_type_object, IM_AB_events)
IM_AB_evaluator = agemo.BSFSEvaluator(IM_AB_gf, mutation_type_object, seed=19)
IM_BA_gf = agemo.GfMatrixObject(branch_type_object, IM_BA_events)
IM_BA_evaluator = agemo.BSFSEvaluator(IM_BA_gf, mutation_type_object, seed=19)

For certain parameter combinations, IM_AB yield strange evaluations ...

# 'Ne_A_B': 0.9243959689201172, 
# 'Ne_A': 1.0, 
# 'Ne_B': 1.0431145700530513, 
# 'me': 0.2927732481387363, 
# 'T': 0.41070633695811387, 
# 'theta_branch': 0.4988579482629309
>>> IM_AB_bsfs = IM_AB_evaluator.evaluate(0.4988579482629309, np.array([0.9243959689201172, 1.0, 1.0431145700530513, 0.2927732481387363, 0.4988579482629309, 0.4988579482629309, 0.4988579482629309, 0.4988579482629309]), time=0.41070633695811387)
>>> np.sum(IM_AB_bsfs)
151.26742867062194

One gets the same result if one evaluates IM_BA_evaluator and switches Ne_A and Ne_B values:

>>> IM_BA_bsfs = IM_BA_evaluator.evaluate(0.4988579482629309, np.array([0.9243959689201172, 1.0431145700530513, 1.0, 0.2927732481387363, 0.4988579482629309, 0.4988579482629309, 0.4988579482629309, 0.4988579482629309]), time=0.41070633695811387)
>>> np.sum(IM_BA_bsfs)
151.26742867062194

Part of this (but not all) is due to non-zero probabilities for FGVs in the resulting array...

# getting FGV indices 
import itertools
def get_fgv_idxs(mutation_shape):
    hetAB_idxs, fixed_idx = zip(*[
        (hetAB_idx, fixed_idx) for 
            hetAB_idx, fixed_idx in list(itertools.product(np.arange(mutation_shape[2]), np.arange(mutation_shape[3]))) 
                if hetAB_idx > 0 and fixed_idx >0])
    fgv_idxs = (..., np.array(hetAB_idxs), np.array(fixed_idx))
    return fgv_idxs

But if one sets those to zero weird probabilities remain ...

>>> FGV_IDXS = get_fgv_idxs(mutation_shape)
>>> IM_AB_bsfs[FGV_IDXS] = 0
>>> np.sum(IM_AB_bsfs)
101.17863915209988

Two other (less severe) problematic parameter combinations are:

# 'Ne_A_B': 0.9243959689201172, 'Ne_A': 1.0, 'Ne_B': 1.0431145700530513, 'me': 0.2927732481387363, 'T': 0.41070633695811387, 'theta_branch': 0.4988579482629309
>>> x = IM_AB_evaluator.evaluate(0.4897223826961803, np.array([0.9123506906653297, 1.0, 1.0932156306196785, 0.349603259461486, 0.4897223826961803, 0.4897223826961803, 0.4897223826961803, 0.4897223826961803]), time=0.4506720050470323)
>>> np.sum(x)
4.149056433231137
>>> x[FGV_IDXS] = 0
>>> np.sum(x)
3.1011605633468884

# 'Ne_A_B': 0.7681084188251829, 'Ne_A': 1.0, 'Ne_B': 0.9931514622734886, 'me': 0.28192265422781476, 'T': 0.6007353717109128, 'theta_branch': 0.4263605904528828
>>> x = IM_AB_evaluator.evaluate(0.4263605904528828, np.array([0.7681084188251829, 1.0, 0.9931514622734886, 0.28192265422781476, 0.4263605904528828, 0.4263605904528828, 0.4263605904528828, 0.4263605904528828]), time=0.6007353717109128)
>>> np.sum(x)
1.177225966483018
>>> x[FGV_IDXS] = 0
>>> np.sum(x)
1.1184237023913917

If you need more, I can supply more ...

cheers,

dom

@GertjanBisschop
Copy link
Member

Thanks Dom, will look into this.

At first glance, the IM-model seems to be setup correctly. So something odd must be happening.

One thing I want to note immediately is that the matrix that is returned by evaluate does not contain the marginal probabilities: see here. This means that all values in the resulting matrix have already been modified by whatever 'bad' values were added to the matrix during the graph traversal. So simply setting those to zero doesn't give much information. I will have to check the output before adjust_marginals step.

@GertjanBisschop
Copy link
Member

This is a scatter plot comparing all the values for each of the 8512 taylor series coefficients (112 values for each of the 72 nodes in the graph), for the parameters that produce the error (y-axis) compared against a rounded version of this parameter set (to a single decimal, x-axis). This clearly shows that there are some tiny values in the rounded of set that turn out to be huge in the case that produces the error. This clearly means there must be an additional source of floating point precision errors than those I have already addressed.

scatter

Thanks again for reporting this Dom. Will get to the bottom of this.

@GertjanBisschop GertjanBisschop self-assigned this Apr 18, 2023
@GertjanBisschop GertjanBisschop added this to the 0.0.3 milestone Apr 18, 2023
@GertjanBisschop GertjanBisschop added the bug Something isn't working label Apr 18, 2023
@DRL
Copy link
Member Author

DRL commented Apr 20, 2023

Hi Gertjan,

Thanks for looking into this. Just to keep you updated: I have made a new gimble release where I put some code into that deals with anomalies:

  • makegrid complains when it sees one (have not had one yet, though)
  • optimize deals with it by warning the user and giving nlopt -inf's if an anomaly is detected. And this seems to work well enough for the heliconius data. It converges without problem...

@GertjanBisschop GertjanBisschop modified the milestones: 0.0.3, 0.0.4 Dec 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants