# Random k Rare Snps Example

In this example we use SnpArrays.jl to simulate a trait from an Exponential distribution. With prespecified k number of snps, with simulated effect sizes based off of minor allele frequencies.

Using SnpArrays.jl we read in the set of PLINK files for analysis, filter the set of PLINK files from our desired parameters, and output the stored files to our own machine. 


In [1]:
using SnpArrays, TraitSimulation, LinearAlgebra, DataFrames

In [2]:
# User specifies the number of snps to use for simulation
k = 5
ResponseDistribution = ExponentialResponse()
InverseLinkFunction = LogLink();

# Data

Our Data comes from the data directory of the SnpArrays.jl package. We can easily filter the set of PLINK using SnpArrays.jl, and save to our own machine the filtered PLINK files.


## Input PLINK files

In [3]:
filepath = SnpArrays.datadir("EUR_subset")

"/Users/sarahji/.julia/packages/SnpArrays/d0iJw/src/../data/EUR_subset"

In [4]:
EUR_snpdata = SnpArray(filepath * ".bed")
rowmask, colmask =  SnpArrays.filter(EUR_snpdata, min_success_rate_per_row = 0.98, min_success_rate_per_col = 0.98, min_maf = 0.02);

In [5]:
SnpData(SnpArrays.datadir(filepath));

# Example: Exponential GLM Trait

$$
Y ∼ N(\mu, 4* 2GRM + 2I)
$$


In [6]:
meanformula, df = Generate_Random_Model_Chisq("tmp_filtered_EURdata", k)
df

Unnamed: 0_level_0,rs34151105,rs113560219,rs1882989,rs8069133,rs112221137
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64
1,2.0,2.0,1.0,1.0,2.0
2,2.0,2.0,1.0,2.0,2.0
3,2.0,2.0,2.0,2.0,2.0
4,2.0,2.0,0.0,2.0,2.0
5,2.0,2.0,0.0,2.0,2.0
6,1.0,2.0,2.0,2.0,2.0
7,1.0,2.0,1.0,1.0,2.0
8,1.0,2.0,2.0,1.0,1.0
9,2.0,2.0,0.0,1.0,2.0
10,1.0,2.0,1.0,2.0,1.0


In [7]:
model_rare = GLMTrait(meanformula_rare, df, ResponseDistribution, InverseLinkFunction)
simulated_trait = DataFrame(SimTrait = simulate(model_rare))

Unnamed: 0_level_0,SimTrait
Unnamed: 0_level_1,Float64
1,3.16846
2,1.44674
3,1.53901
4,0.126399
5,0.00803335
6,0.19885
7,1.33937
8,0.261177
9,0.0839421
10,0.195838


In [8]:
describe(simulated_trait)

Unnamed: 0_level_0,variable,mean,min,median,max,nunique,nmissing,eltype
Unnamed: 0_level_1,Symbol,Float64,Float64,Float64,Float64,Nothing,Nothing,DataType
1,SimTrait,0.755844,0.00750831,0.431558,9.8021,,,Float64


# Output: Save Filtered PLINK files to local machine

In [9]:
filtsnpdata = SnpArrays.filter(SnpArrays.datadir(filepath), rowmask, colmask, des = "tmp_filtered_EURdata");