# Random k Rare Snps Example

In this example we use SnpArrays.jl to simulate a trait from an Exponential distribution. With prespecified k number of snps, with simulated effect sizes based off of minor allele frequencies.

Using SnpArrays.jl we read in the set of PLINK files for analysis, filter the set of PLINK files from our desired parameters, and output the stored files to our own machine. 


In [1]:
using SnpArrays, TraitSimulation, LinearAlgebra, DataFrames

┌ Info: Recompiling stale cache file /Users/sarahji/.julia/compiled/v1.2/TraitSimulation/VikWX.ji for TraitSimulation [dec3038e-29bc-11e9-2207-9f3d5855a202]
└ @ Base loading.jl:1240


In [2]:
# User specifies the number of snps to use for simulation
k = 5
ResponseDistribution = ExponentialResponse()
InverseLinkFunction = LogLink();

# Data

Our Data comes from the data directory of the SnpArrays.jl package. We can easily filter the set of PLINK using SnpArrays.jl, and save to our own machine the filtered PLINK files.


## Input PLINK files

In [3]:
filepath = SnpArrays.datadir("EUR_subset")

"/Users/sarahji/.julia/packages/SnpArrays/d0iJw/src/../data/EUR_subset"

In [4]:
EUR_snpdata = SnpArray(filepath * ".bed")
rowmask, colmask =  SnpArrays.filter(EUR_snpdata, min_success_rate_per_row = 0.98, min_success_rate_per_col = 0.98, min_maf = 0.02);

In [5]:
SnpData(SnpArrays.datadir(filepath));

# Example: Exponential GLM Trait

$$
Y ∼ N(\mu, 4* 2GRM + 2I)
$$


In [7]:
meanformula, ES, df = Generate_Random_Model_Chisq("tmp_filtered_EURdata", k)
df

Unnamed: 0_level_0,rs34151105,rs113560219,rs1882989,rs8069133,rs112221137
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64
1,2.0,2.0,1.0,1.0,2.0
2,2.0,2.0,1.0,2.0,2.0
3,2.0,2.0,2.0,2.0,2.0
4,2.0,2.0,0.0,2.0,2.0
5,2.0,2.0,0.0,2.0,2.0
6,1.0,2.0,2.0,2.0,2.0
7,1.0,2.0,1.0,1.0,2.0
8,1.0,2.0,2.0,1.0,1.0
9,2.0,2.0,0.0,1.0,2.0
10,1.0,2.0,1.0,2.0,1.0


In [8]:
ES

5-element Array{Float64,1}:
  0.33691883856125   
 -0.48312605987491186
 -0.20004457041833135
  0.21403891114130175
  0.34764424897095303

In [12]:
y = Matrix(simulated_trait)[:]

379-element Array{Float64,1}:
 0.12716478511016469
 0.16725443529787087
 0.07177516747909278
 0.3004234733613135 
 1.1200645810021495 
 0.0715877141221013 
 1.8217490687080495 
 0.3553584958639212 
 1.6161700489290174 
 0.5508123951094465 
 0.07134615474953974
 0.8189498087892602 
 2.504663655593077  
 ⋮                  
 0.7185580220290531 
 0.06152095650583073
 0.3208334025357145 
 0.547671050086926  
 0.06308026691745156
 0.7726246115550569 
 0.27501931538928687
 0.3129044787998209 
 0.5102493972192254 
 0.36916085218768596
 0.3875262368116244 
 0.00925508978215359

In [15]:
z = ones(length(y))
Eurbm = SnpBitMatrix{Float64}(EUR_snpdata, model = ADDITIVE_MODEL, center = true, scale = true);


In [10]:
model_rare = GLMTrait(meanformula, df, ResponseDistribution, InverseLinkFunction)
simulated_trait = DataFrame(SimTrait = simulate(model_rare))

Unnamed: 0_level_0,SimTrait
Unnamed: 0_level_1,Float64
1,0.127165
2,0.167254
3,0.0717752
4,0.300423
5,1.12006
6,0.0715877
7,1.82175
8,0.355358
9,1.61617
10,0.550812


In [8]:
describe(simulated_trait)

Unnamed: 0_level_0,variable,mean,min,median,max,nunique,nmissing,eltype
Unnamed: 0_level_1,Symbol,Float64,Float64,Float64,Float64,Nothing,Nothing,DataType
1,SimTrait,0.755844,0.00750831,0.431558,9.8021,,,Float64


# Output: Save Filtered PLINK files to local machine

In [9]:
filtsnpdata = SnpArrays.filter(SnpArrays.datadir(filepath), rowmask, colmask, des = "tmp_filtered_EURdata");