This notebook performs phenotype simulation using standardized genetic effect sizes with the sim_phenotypes function, by setting the standardized parameter to True. The resultant dataframe from the output is also shown. 

In [None]:
import numpy as np
import pygrgl
import matplotlib.pyplot as plt 

from grg_pheno_sim.phenotype import sim_phenotypes, sim_phenotypes_custom


The following command only serves the purpose of converting the VCF zip file into a GRG that will be used for the phenotype simulation. The bash script below will function as expected given the relative path for the source data file is accurate.

In [None]:
%%script bash --out /dev/null
if [ ! -f test-200-samples.grg ]; then
  grg construct -p 10 ../data/test-200-samples.vcf.gz --out-file test-200-samples.grg
fi

In [None]:
grg_1 = pygrgl.load_immutable_grg("test-200-samples.grg") #loading in a sample grg stored in the same directory

heritability = 0.33

phenotypes_standardized_genes = sim_phenotypes(grg_1, heritability=heritability, standardized=True)

The initial effect sizes are 
     mutation_id  effect_size  causal_mutation_id
0             43    -0.043778                   0
1             49    -0.004024                   0
2             54    -0.005199                   0
3             55     0.020529                   0
4             67     0.000948                   0
..           ...          ...                 ...
995        10798     0.001852                   0
996        10804     0.018198                   0
997        10844    -0.007695                   0
998        10860    -0.022206                   0
999        10888    -0.018921                   0

[1000 rows x 3 columns]


AttributeError: '_grgl.GRG' object has no attribute 'num_individuals'

In [None]:
phenotypes_standardized_genes

Unnamed: 0,individual_id,genetic_value,causal_mutation_id,environmental_noise,phenotype
0,0,-0.698857,0,0.241578,-0.457280
1,1,0.285342,0,-0.824492,-0.539151
2,2,0.270383,0,0.594952,0.865335
3,3,-0.418744,0,0.745673,0.326929
4,4,-0.093789,0,-1.546767,-1.640556
...,...,...,...,...,...
195,195,-0.182525,0,-0.709333,-0.891858
196,196,-0.143619,0,0.510025,0.366406
197,197,-0.698934,0,-0.312840,-1.011774
198,198,-0.472975,0,-0.004061,-0.477036


In [None]:
input_effects_dict = {
    1:  0.50,   # mutation 2 has effect +0.5
    11: -0.30,  # mutation 10 has effect -0.3
    50:  0.10   # mutation 50 has effect +0.1
}
df = sim_phenotypes_custom(
    grg_1,
    input_effects=input_effects_dict,  # or beta_list, or input_effects_df
    heritability=0.7,
    standardized= True
)


The initial effect sizes are 
   mutation_id  effect_size  causal_mutation_id
0            1          0.5                   0
1           11         -0.3                   0
2           50          0.1                   0
The genetic values of the individuals are 
     individual_id  genetic_value  causal_mutation_id
0                0      -0.061468                   0
1                1      -0.061468                   0
2                2      -0.061468                   0
3                3      -0.061468                   0
4                4      -0.061468                   0
..             ...            ...                 ...
195            195      -0.061468                   0
196            196       4.036411                   0
197            197      -0.061468                   0
198            198      -0.061468                   0
199            199      -0.061468                   0

[200 rows x 3 columns]
