# Subset and impute mouse data

In [1]:
using SnpArrays

## Subset

Because running fastPHASE takes a long time, lets subset mouse data so it has 300 samples. 

In [2]:
datapath = normpath(SnpArrays.datadir())
mouse = SnpData(joinpath(datapath, "mouse"))
n, p = size(mouse)

SnpArrays.filter(mouse, 1:300, 1:p, des="mouse")

SnpData(people: 300, snps: 10150,
snp_info: 
 Row │ chromosome  snpid       genetic_distance  position  allele1  allele2
     │ String      String      Float64           Int64     String   String
─────┼──────────────────────────────────────────────────────────────────────
   1 │ 1           rs3683945           0.0              0  A        G
   2 │ 1           rs3707673           0.1              1  G        A
   3 │ 1           rs6269442           0.11751          2  A        G
   4 │ 1           rs6336442           0.135771         3  A        G
   5 │ 1           rs13475700          0.24268          5  A        C
   6 │ 1           rs3658242           0.251925         6  A        T
…,
person_info: 
 Row │ fid        iid         father        mother        sex        phenotype
     │ Abstract…  Abstract…   Abstract…     Abstract…     Abstract…  Abstract…
─────┼─────────────────────────────────────────────────────────────────────────
   1 │ 1_3        A048005080  H2.3:C5.2(3)  H2.3:G2.

## Impute 

Because mouse data have missing genotypes, lets impute all missing genotypes with 0.

In [3]:
# copy bim/fam file
cp("mouse.fam", "mouse.imputed.fam", force=true)
cp("mouse.bim", "mouse.imputed.bim", force=true)

"mouse.imputed.bim"

In [4]:
# impute all genotypes with 0
x = SnpArray("mouse.bed")
ximputed = SnpArray("mouse.imputed.bed", 300, p)

for j in 1:p, i in 1:300
    if x[i, j] == 0x00 || x[i, j] == 0x01
        ximputed[i, j] = 0x00
    elseif x[i, j] == 0x02
        ximputed[i, j] = 0x02
    else
        ximputed[i, j] = 0x03
    end
end

In [5]:
;ls

mouse.bed
mouse.bim
mouse.fam
mouse.imputed.bed
mouse.imputed.bim
mouse.imputed.fam
subset.ipynb
