Odd probabilities under 'IM' for certain parameters #6

DRL · 2023-04-14T14:01:47Z

Hey,

sorry to bother you. Either I am not setting up the IM model correctly or there still are issues with parameter combinations which are:

not related to me=0
not related to FGVs (but that could/should be addressed as well, see below)

# setup
import agemo
import numpy as np
sample_configuration = [(), ('a', 'a'), ('b', 'b')]
mutation_shape = tuple(np.array([2,2,2,2]) + 2)
branchtype_dict = {'a': 1, 'abb': 1, 'b': 0, 'aab': 0, 'aa': 3, 'bb': 3, 'ab': 2}
branch_type_object = agemo.BranchTypeCounter(sample_configuration, branchtype_dict=branchtype_dict)
mutation_type_object = agemo.MutationTypeCounter(branch_type_object, mutation_shape)
IM_AB_events = [
                agemo.MigrationEvent(len(sample_configuration), 1, 2),
                agemo.PopulationSplitEvent(len(sample_configuration) + 1, 0, 1, 2)]
IM_BA_events = [
                agemo.MigrationEvent(len(sample_configuration), 2, 1),
                agemo.PopulationSplitEvent(len(sample_configuration) + 1, 0, 1, 2)]
IM_AB_gf = agemo.GfMatrixObject(branch_type_object, IM_AB_events)
IM_AB_evaluator = agemo.BSFSEvaluator(IM_AB_gf, mutation_type_object, seed=19)
IM_BA_gf = agemo.GfMatrixObject(branch_type_object, IM_BA_events)
IM_BA_evaluator = agemo.BSFSEvaluator(IM_BA_gf, mutation_type_object, seed=19)

For certain parameter combinations, IM_AB yield strange evaluations ...

# 'Ne_A_B': 0.9243959689201172, 
# 'Ne_A': 1.0, 
# 'Ne_B': 1.0431145700530513, 
# 'me': 0.2927732481387363, 
# 'T': 0.41070633695811387, 
# 'theta_branch': 0.4988579482629309
>>> IM_AB_bsfs = IM_AB_evaluator.evaluate(0.4988579482629309, np.array([0.9243959689201172, 1.0, 1.0431145700530513, 0.2927732481387363, 0.4988579482629309, 0.4988579482629309, 0.4988579482629309, 0.4988579482629309]), time=0.41070633695811387)
>>> np.sum(IM_AB_bsfs)
151.26742867062194

One gets the same result if one evaluates IM_BA_evaluator and switches Ne_A and Ne_B values:

>>> IM_BA_bsfs = IM_BA_evaluator.evaluate(0.4988579482629309, np.array([0.9243959689201172, 1.0431145700530513, 1.0, 0.2927732481387363, 0.4988579482629309, 0.4988579482629309, 0.4988579482629309, 0.4988579482629309]), time=0.41070633695811387)
>>> np.sum(IM_BA_bsfs)
151.26742867062194

Part of this (but not all) is due to non-zero probabilities for FGVs in the resulting array...

# getting FGV indices 
import itertools
def get_fgv_idxs(mutation_shape):
    hetAB_idxs, fixed_idx = zip(*[
        (hetAB_idx, fixed_idx) for 
            hetAB_idx, fixed_idx in list(itertools.product(np.arange(mutation_shape[2]), np.arange(mutation_shape[3]))) 
                if hetAB_idx > 0 and fixed_idx >0])
    fgv_idxs = (..., np.array(hetAB_idxs), np.array(fixed_idx))
    return fgv_idxs

But if one sets those to zero weird probabilities remain ...

>>> FGV_IDXS = get_fgv_idxs(mutation_shape)
>>> IM_AB_bsfs[FGV_IDXS] = 0
>>> np.sum(IM_AB_bsfs)
101.17863915209988

Two other (less severe) problematic parameter combinations are:

# 'Ne_A_B': 0.9243959689201172, 'Ne_A': 1.0, 'Ne_B': 1.0431145700530513, 'me': 0.2927732481387363, 'T': 0.41070633695811387, 'theta_branch': 0.4988579482629309
>>> x = IM_AB_evaluator.evaluate(0.4897223826961803, np.array([0.9123506906653297, 1.0, 1.0932156306196785, 0.349603259461486, 0.4897223826961803, 0.4897223826961803, 0.4897223826961803, 0.4897223826961803]), time=0.4506720050470323)
>>> np.sum(x)
4.149056433231137
>>> x[FGV_IDXS] = 0
>>> np.sum(x)
3.1011605633468884

# 'Ne_A_B': 0.7681084188251829, 'Ne_A': 1.0, 'Ne_B': 0.9931514622734886, 'me': 0.28192265422781476, 'T': 0.6007353717109128, 'theta_branch': 0.4263605904528828
>>> x = IM_AB_evaluator.evaluate(0.4263605904528828, np.array([0.7681084188251829, 1.0, 0.9931514622734886, 0.28192265422781476, 0.4263605904528828, 0.4263605904528828, 0.4263605904528828, 0.4263605904528828]), time=0.6007353717109128)
>>> np.sum(x)
1.177225966483018
>>> x[FGV_IDXS] = 0
>>> np.sum(x)
1.1184237023913917

If you need more, I can supply more ...

cheers,

dom

The text was updated successfully, but these errors were encountered:

GertjanBisschop · 2023-04-14T14:24:19Z

Thanks Dom, will look into this.

At first glance, the IM-model seems to be setup correctly. So something odd must be happening.

One thing I want to note immediately is that the matrix that is returned by evaluate does not contain the marginal probabilities: see . This means that all values in the resulting matrix have already been modified by whatever 'bad' values were added to the matrix during the graph traversal. So simply setting those to zero doesn't give much information. I will have to check the output before adjust_marginals step.

GertjanBisschop · 2023-04-15T14:02:21Z

This is a scatter plot comparing all the values for each of the 8512 taylor series coefficients (112 values for each of the 72 nodes in the graph), for the parameters that produce the error (y-axis) compared against a rounded version of this parameter set (to a single decimal, x-axis). This clearly shows that there are some tiny values in the rounded of set that turn out to be huge in the case that produces the error. This clearly means there must be an additional source of floating point precision errors than those I have already addressed.

Thanks again for reporting this Dom. Will get to the bottom of this.

DRL · 2023-04-20T11:44:45Z

Hi Gertjan,

Thanks for looking into this. Just to keep you updated: I have made a new gimble release where I put some code into that deals with anomalies:

makegrid complains when it sees one (have not had one yet, though)
optimize deals with it by warning the user and giving nlopt -inf's if an anomaly is detected. And this seems to work well enough for the heliconius data. It converges without problem...

GertjanBisschop self-assigned this Apr 18, 2023

GertjanBisschop added this to the 0.0.3 milestone Apr 18, 2023

GertjanBisschop added the bug Something isn't working label Apr 18, 2023

GertjanBisschop modified the milestones: 0.0.3, 0.0.4 Dec 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Odd probabilities under 'IM' for certain parameters #6

Odd probabilities under 'IM' for certain parameters #6

DRL commented Apr 14, 2023

GertjanBisschop commented Apr 14, 2023

GertjanBisschop commented Apr 15, 2023

DRL commented Apr 20, 2023

Odd probabilities under 'IM' for certain parameters #6

Odd probabilities under 'IM' for certain parameters #6

Comments

DRL commented Apr 14, 2023

GertjanBisschop commented Apr 14, 2023

GertjanBisschop commented Apr 15, 2023

DRL commented Apr 20, 2023