### purpose

figure out if there is any commonality regarding the simulation levels for the 269 replicates that died when using 20k loci encoded as individual genotypes

### conclusion

all MTN landscapes completed, compared to ~25% of SS and EST landscapes. Additionally, the only popsize-migration level that completed 100% was N-variable-m-variable; the others each completed about 37%.

In [1]:
from pythonimports import *

import runtime_API as rt
import MVP_summary_functions as mvp

t0 = dt.now()  # notebook timer

rt.latest_commit()
session_info.show()

#########################################################
Today:	August 12, 2024 - 15:14:12 EDT
python version: 3.8.5
conda env: mvp_env

Current commit of [1mpythonimports[0m:
[33mcommit 6a767410e7b569adbf9df526de108f22ef50aad8[m  
Author: Brandon Lind <lind.brandon.m@gmail.com>  
Date:   Wed Mar 6 13:42:13 2024 -0700

Current commit of [94m[1mMVP_offsets[0m[0m:
[33mcommit 5ce82f4d655645237a0f4026fa32e220226dc373[m  
Author: Brandon Lind <lind.brandon.m@gmail.com>  
Date:   Thu May 16 13:02:58 2024 -0400

Current commit of [38;2;128;0;128m[1mMVP_runtime[0m[0m:
[33mcommit df87dd2c708ac0fdeebcfaaca239473ca2c487af[m  
Author: Brandon Lind <lind.brandon.m@gmail.com>  
Date:   Tue Mar 19 16:17:46 2024 -0400
#########################################################



# get data 

### get simulation parameters

In [2]:
params = mvp.read_params_file()

sub_params = params[params.seed.astype(str).isin(rt.seeds)]

sub_params.shape

100%|███████████████| 2250/2250 [00:02<00:00, 841.13it/s]


(540, 36)

### get results

In [3]:
ind_results = rt.load_results(source='ind', ignore_20k=False)

ind_results.shape

[93mkeeping records for models using 20k loci[0m
ind shape = (189100, 17)
Function `load_results` completed after : 0-00:00:18


(189100, 19)

### subset ind and pooled to 20k loci sets

In [4]:
ind = ind_results[ind_results.num_loci == '20000'].copy()

In [5]:
# make sure different numbers of seeds (ie failed jobs at ind level)
len(set(ind.seed)), len(set(sub_params.seed))

(271, 540)

# which subparameters are not equally represented?

In [6]:
def get_counts(column):
    """Get expected counts of simulation seeds and number of seeds that completed evaluation.
    
    Notes
    -----
    - each seed is a simulation replicate
    """
    # sanity check that each rep evaluated 100 gardens
    assert all(ind[column].value_counts() % 100 == 0)
    
    # what are the expected counts of simulation-level parameters?
    expected_counts = sub_params[column].value_counts()

    # what are the actual counts of completed jobs of these simulation-level parameters?
        # divide by 100 common gardens
    actual_counts = ind[column].value_counts() / 100

    return expected_counts, actual_counts

In [7]:
rt.hue_order.keys()

dict_keys(['landscape', 'glevel', 'pleio', 'slevel', 'popsize', 'migration', 'noncausal_env', 'marker_set', 'program', 'demography', 'num_loci', 'final_la_bin', 'source'])

In [8]:
# if a count is less than the coun
for column in rt.hue_order.keys():
    if column in params.columns.tolist():
        print(ColorText(f'\n{column}').bold().blue())
        
        expected_counts, actual_counts = get_counts(column)

        print(
            ColorText(
                (actual_counts / expected_counts).__str__()
            )
        )

[94m[1m
landscape[0m[0m
Est-Clines    0.261111
SS-Clines     0.244444
SS-Mtn        1.000000
Name: landscape, dtype: float64
[94m[1m
glevel[0m[0m
highly-polygenic    0.505556
mod-polygenic       0.500000
oligogenic          0.500000
Name: glevel, dtype: float64
[94m[1m
pleio[0m[0m
no pleiotropy    0.511111
pleiotropy       0.492593
Name: pleio, dtype: float64
[94m[1m
slevel[0m[0m
equal-S      0.511111
unequal-S    0.492593
Name: slevel, dtype: float64
[94m[1m
popsize[0m[0m
N-cline-N-to-S            0.370370
N-cline-center-to-edge    0.361111
N-equal                   0.388889
N-variable                1.000000
Name: popsize, dtype: float64
[94m[1m
migration[0m[0m
m-breaks      0.388889
m-constant    0.373457
m-variable    1.000000
Name: migration, dtype: float64


# sanity check

show myself that / between two pd.Series pays attention to the index (since `expected_counts` and `actual_counts` are out of order)

In [9]:
expected_counts, actual_counts = get_counts('landscape')

In [10]:
expected_counts

Est-Clines    180
SS-Clines     180
SS-Mtn        180
Name: landscape, dtype: int64

In [11]:
actual_counts

SS-Mtn        180.0
Est-Clines     47.0
SS-Clines      44.0
Name: landscape, dtype: float64

In [12]:
actual_counts / expected_counts

Est-Clines    0.261111
SS-Clines     0.244444
SS-Mtn        1.000000
Name: landscape, dtype: float64

In [13]:
for subparam in expected_counts.index:
    print(subparam, actual_counts.loc[subparam] / expected_counts.loc[subparam])

Est-Clines 0.2611111111111111
SS-Clines 0.24444444444444444
SS-Mtn 1.0


In [14]:
formatclock(dt.now() - t0)

'0-00:00:22'