---
### **Data Bootcamp for Genomic Prediction in Plant Breeding** ###
### **University of Minnesota Plant Breeding Center** ###
#### **June 20 - 22, 2022** ####
---

### **Practical 6: Cross Selection** ###

<br />
<br />

#### **Source Scripts and Load Data**


In [6]:
WorkDir <- getwd()
setwd(WorkDir)

##Source in functions to be used
source("R_Functions/GS_Pipeline_Jan_2022_FnsApp.R")
source("R_Functions/bootcamp_functions.R")
gc()



Attaching package: 'dplyr'


The following objects are masked from 'package:stats':

    filter, lag


The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union



   *****       ***   vcfR   ***       *****
   This is vcfR 1.12.0 
     browseVignettes('vcfR') # Documentation
     citation('vcfR') # Citation
   *****       *****      *****       *****



Attaching package: 'bWGR'


The following objects are masked from 'package:NAM':

    CNT, GAU, GRM, IMP, KMUP, KMUP2, SPC, SPM, emBA, emBB, emBC, emBL,
    emCV, emDE, emEN, emGWA, emML, emML2, emRR, markov, mkr, mkr2X,
    mrr, mrr2X, mrrFast, wgr



Attaching package: 'emoa'


The following object is masked from 'package:dplyr':

    coalesce



Attaching package: 'MASS'


The following object is masked from 'package:dplyr':

    select



Attaching package: 'sommer'


The following objects are masked from 'package:rrBLUP':

    A.mat, GWAS


Welcome to rTASSEL (version 0.9.26)
 <U+2022> Conside

Unnamed: 0,used,(Mb),gc trigger,(Mb).1,max used,(Mb).2
Ncells,5804502,310.0,8802824,470.2,6426122,343.2
Vcells,10019363,76.5,15504737,118.3,12851603,98.1


In [None]:

# Import phenotypes
pheno <- read.csv("barley_cross_pred_pheno.csv")


# Load marker data
geno <- read.csv('barley_cross_pred_geno.csv', row.names=1)
mrk_names <- scan("barley_cross_pred_geno.csv", what='character', sep=',', nlines=1)[-1]
colnames(geno) <- mrk_names

# Load genetic map for variance prediction below
map <- read.csv(file = "genoForMap2.csv", na.strings = c("NA", "#N/A"))



### **Manipulate, format, and impute data**

In [None]:

# Randomly select 1000 markers to make computations faster
ranNum <- sample.int(dim(geno)[2], 1000)
geno2 <- geno[, ranNum]


# Impute missing data using naive imputation
geno_imp <- replaceNAwithMean(geno2)
geno_imp <- round(geno_imp, 0)
geno_imp[which(geno_imp==0)] <- -1


# Write file back out, and read it back in, setting header as F so that column names end up being first row. 
write.csv(geno_imp, 'geno_imp.csv')
geno_imp2 <- read.csv('geno_imp.csv', header=F)


# Match up markers in map with marker data file
mrkname <- geno_imp2[1, ][-1]
ndx <- match(map$mrk, mrkname)
ndxNa <- which(is.na(ndx))
map2 <- map[-ndxNa, ]
ndxNa2 <- match(mrkname, map2$mrk)
geno_imp3 <- geno_imp2[, -(which(is.na(ndxNa2))+1)]
map3 <- map2[order(map2$chr, map2$pos), -1]
ndxOrd <- match(map3$mrk, geno_imp3[1, ])
geno_imp4 <- geno_imp3[, c(1, ndxOrd)]


#### **Identify parents execute PopVar function** ####

In [6]:
# Identify a set of parents to predict cross combinations for. Use set of 100 arbitrarily from the middle of the set to save computation time and easier data handling.

parents <- pheno$line_name[101:200]

cross_table <- t(combn(parents, 2))
colnames(cross_table) <- c("Par1", "Par2")

cross_table <- as.data.frame(cross_table)



#### **Filter Genotypic Data**

In [4]:

# Call PopVar function, deterministic version

pop_predict_out <- pop.predict2(G.in = geno_imp4, y.in = pheno, map.in = map3, crossing.table = cross_table)

write.csv(pop_predict_out, "pop_predict_out.csv")
