# **get the numbers to rob to calculate how out of equilibrium the genome is**

### **calculation summary**
- µ = 104K paper mutation rate = 1.28e-8
- M = Mutability of human genome = 4.054508384615098e-05
- M * c  = µ
- C = µ/M  = 0.00031569795363032963
- Mo = Observed intergenic mutability = 3.16788354502687e-05
- Ms = simulated intergenic mutability = 2.936349e-05
- µO = Mo * c #mutation rate observed of intergenic = 1.0000943525041771e-08
- µs = Ms * c  #mutation rate simulated of intergenic  = 9.269993704444648e-09
- µO - µs = 7.309498205971238e-10
- µGC = mutation rate of a sequence with the same GC as the simulated GC content but NO triplet structure. 
    - GC content at equilibrium = 0.351194
    - mutability of seq with same GC content = 4.230043402232742e-05
    - mut rate of seq with same GC content = 1.3354160458523541e-08

In [1]:
!ls -lah madeleine/

total 1.2G
drwxr-xr-x. 3 omanmade domain users 4.0K Jan 14 16:50 .
drwxr-xr-x. 6 omanmade domain users 4.0K Jan 14 17:01 ..
-rwxr-xr-x. 1 omanmade domain users 101M Sep 16 10:27 CDS_GRCh38_fromOGfileUnsorted.txt
-rwxr-xr-x. 1 omanmade domain users  19M Sep 16 10:27 CDS_maxBounds_dict.txt
-rwxr-xr-x. 1 omanmade domain users 249M Sep 16 10:27 Exons_GRCh38_fromOGfileUnsorted.txt
-rwxr-xr-x. 1 omanmade domain users 249M Sep 16 10:27 Exons_GRCh38.txt
-rwxr-xr-x. 1 omanmade domain users  40M Sep 16 10:27 Homo_sapiens.GRCh38.100.chr.gff3.gz
-rwxr-xr-x. 1 omanmade domain users  44M Sep 16 10:27 Homo_sapiens.GRCh38.100.gff3.gz
-rwxr-xr-x. 1 omanmade domain users 441M Sep 16 10:27 Homo_sapiens_sorted.GRCh38.100.gff3
-rwxr-xr-x. 1 omanmade domain users  42M Sep 16 10:27 Homo_sapiens_sorted.GRCh38.100.gff3.gz
drwxr-xr-x. 2 omanmade domain users  142 Sep 16 10:27 .ipynb_checkpoints
-rwxr-xr-x. 1 omanmade domain users  12K Sep 16 10:27 README


In [53]:
import json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from numpy.random import choice

In [5]:
# step 1: get mutability of the human genome 
jonsson_denovo_muts = 108778
jonsson_callable_bases = 2682890000
mutabilityHumanGenome = jonsson_denovo_muts/jonsson_callable_bases

In [80]:
mutabilityHumanGenome

4.054508384615098e-05

In [6]:
#step #2: find the conversion factor b/t mutation rate and mutability 
mutRateHumanGenome = 1.28e-8
mutab_mutRate_conversion = mutRateHumanGenome/mutabilityHumanGenome

In [7]:
mutab_mutRate_conversion

0.00031569795363032963

### **step #3: intergenic mutability**

In [81]:
#from the whole genome intergenic vs codon calculations 
mutab_intergenicgenome = 3.16788354502687e-05
mutRateObs_intergenic = mutab_intergenicgenome * mutab_mutRate_conversion
mutRateObs_intergenic

1.0000943525041771e-08

In [82]:
#from the ../comparing_equilibrium_point
mutab_nonCoding = 2.936349e-05
mutRateSim_nonCoding = mutab_nonCoding*mutab_mutRate_conversion
mutRateSim_nonCoding

9.269993704444648e-09

In [83]:
mutRateObs_intergenic- 9.269993704444648e-09

7.309498205971238e-10

###  **steps 4: find the GC content of the non coding seqs at equilibrium**

In [42]:
#getting the non coding DNA from the "comparing equilibrium" folder 
trialN = 99
random_DNAF_dict = {"gens":[x for x in range(10000)]}
for i in (range(trialN)): 
    random_DNAF_dict["trial"+str(i)] = open("../comparing_equilibrium_point/data/trueRandom/{}_DNA.txt".format(i)).readlines()[0]

In [48]:
GC_percent_list = []
for trial_name,dna in random_DNAF_dict.items(): 
    C_count = dna.count("C")
    G_count = dna.count("G")
    GC_percent = (C_count+G_count)/len((dna))
    GC_percent_list.append(GC_percent)

In [52]:
GC_content_equilibrium = np.mean(GC_percent_list)

In [84]:
GC_content_equilibrium

0.351194

### **step 4.1: find mutability for dna string with same GC content as the noncoding at equilibrium**

In [56]:
trips = ["A","T","C","G"]
trip_prob = [(1-GC_content_equilibrium)/2,(1-GC_content_equilibrium)/2,GC_content_equilibrium/2,GC_content_equilibrium/2]

In [61]:
equiv_dna = "".join(choice(trips, 10000, p=trip_prob))#randomly sample and combine into string 

In [65]:
model = json.load(open("../Human_mutability_model/Model_2020_12_02_genomeWide.txt"))

In [66]:
equi_muts = []
for i in range(1,len(equiv_dna)-1): 
    triplet = equiv_dna[i-1:i+2]
    mut = model[triplet][0]
    equi_muts.append(mut)

In [71]:
print("mutability of equilvalent sequence is ",np.mean(equi_muts))

mutability of equilvalent sequence is  4.230043402232742e-05


In [86]:
print("mutation rate of equilvalent sequence is ",np.mean(equi_muts)*mutab_mutRate_conversion)

mutation rate of equilvalent sequence is  1.3354160458523541e-08
