<img src="https://github.com/slt666666/FAO_lecture/blob/main/title.png?raw=true" alt="title" height="300px">


# Genomic Prediction - example -

In this notebook, we will perform Genomic Prediction analysis using sample data.

And we will experience the application of genomic prediction model in breeding strategy

It may help you to understand ...

* the process of Genomic Prediction analysis

* how to use genomic prediction model in breeding

## The contents in this notebook ... 

* Review of Genomic Prediction

* Genomic prediction model using sample dataset

  * We use rice population & grain number phenotype.

* Application of genomic prediction model

  * We will consider ideal genotype for grain number

# Main contents

# Review of Genomic Prediction

Genomic prediction is to generate prediction model that explain phenotype by genotype using big dataset.

<img src="https://github.com/slt666666/FAO_lecture/blob/main/genomic_prediction.png?raw=true" alt="colab" height="300px">

The process to generate genomic prediction model & check performance of the model is ...

1. Separate all data to 80%(training data) & 20%(test data).

2. Make prediction model using training data. (we use ElasticNet regression model in this notebook)

3. Predict phenotype of test data **from genotype** by generated model

4. Compare predicted phenotype & observed phenotype to check performance of the model

<img src="https://github.com/slt666666/FAO_lecture/blob/main/gpmethod.png?raw=true" alt="colab" height="400px">

After generating good genomic prediction model,

we can apply the model to improve breeding strategy.

<img src="https://github.com/slt666666/FAO_lecture/blob/main/apply_model.png?raw=true" alt="colab" height="600px">



# Experience Genomic Prediction
In this notebook, we try to make genomic prediction model and apply it to the genomic breeding.

In [None]:
# Prepare modules & packages
!wget -O genomic_prediction.py https://github.com/slt666666/FAO_lecture/blob/main/genomic_prediction.py?raw=true

from genomic_prediction import load_dataset
from genomic_prediction import split_dataset
from genomic_prediction import make_genomic_prediction_model
from genomic_prediction import check_equation
from genomic_prediction import predict_phenotype
from genomic_prediction import check_accuracy
from genomic_prediction import show_estimated_SNP_effect
from genomic_prediction import predict_progeny_phenotype
from genomic_prediction import predict_customized_genotype

## Materials

We generated NAM population by crossing rice cultivar A and 5 other cultivars (B~F).

<img src="https://github.com/slt666666/FAO_lecture/blob/main/nam.png?raw=true" alt="colab" height="300px">

Then, we perform sequencing & phenotyping for this population.

<img src="https://github.com/slt666666/FAO_lecture/blob/main/genopheno.png?raw=true" alt="colab" height="300px">

Please run the below code to load dataset !!

The dataset contains almost 1000 lines.

- SNP genotype (0 = cultivar A, 2 = other cultivar)
- Phenotypes: Leaf width(LW_mean) & Grain number(GN_mean)

In [None]:
genotype, phenotype = load_dataset()
display(genotype)
display(phenotype)

## 1.Separate dataset

At first, we'll split all dataset to training data and test data.

* Training data(80%) ... for making prediction model.

* Test data(20%) ... for checking performance of model

Please run the below code to split datase !!

In [None]:
test_genotype, test_phenotype, train_genotype, train_phenotype = split_dataset(genotype, phenotype, "GN_mean", test=0.2)
print("test data is {} lines.".format(test_phenotype.shape[0]))
print("training data is {} lines.".format(train_phenotype.shape[0]))

## 2. Make prediction model

After splitting dataset,

Base on training dataset (80%), we'll make genomic prediction model that explain phenotype from genotype.

<img src="https://github.com/slt666666/FAO_lecture/blob/main/simulation15.png?raw=true" alt="colab" height="200px">

Please run the below code to generate a prediction model.

```
※ Memo
In this notebook, we used regression model (ElasticNet) to make prediction model.
But we skipped explanation of details of the model in this lecture because it's not statistics lecture.
```

In [None]:
GN_prediction_model = make_genomic_prediction_model(train_genotype, train_phenotype, "GN_mean")
check_equation("GN_mean", GN_prediction_model)

## 3. Predict phenotype of test data

After generating prediction model, we have to check the performance (accuracy) of the model.

To check accuracy, we used test data that is untouched data to make model.

<img src="https://github.com/slt666666/FAO_lecture/blob/main/simulation16.png?raw=true" alt="colab" height="200px">

At first, we predict phenotype values from genotype using generated model.

Then, compare predicted values with observed values.

If these values are very similar, the model is robust.

Please run the below code to predict phenotype values of test data, and compare predicted values and observed values !!

In [None]:
predicted_test_phenotype = predict_phenotype(test_genotype, GN_prediction_model)
check_accuracy(predicted_test_phenotype, test_phenotype, "GN_mean")

Above code calculate a correlation coefficient value and generate scatter plot of predicted & observed values.

Correlation coefficient is over 0.85, so the generated model looks good.

# Applying genomic prediction model to breeding strategy

In this section, we try to apply generated models to breeding startegy.

## Consider best genotype for traits

If we can generate highly accurate genomic prediction model, we can consider what genotype is ideal for traits based on the model.

<img src="https://github.com/slt666666/FAO_lecture/blob/main/simulation18.png?raw=true" alt="colab" height="200px">

So, in this section, Let's try to construct good genotype for Grain number.

### Make customized genotype & predict phenotype

For example, if we introgressed mutations in chromosome 1 & 5 to the cultivar A, how does phenotype change?

We can predict this phenotype by genomic prediction model.

Please run the below code to check phenotypic changes by the genotype !!

In [None]:
predict_customized_genotype(genotype, ["chr01", "chr05"], GN_prediction_model, "GN_mean")

Like above simulation, we can predict phenotype values of any genotypes.

From this approach, we can identify best genotype for traits!

<img src="https://github.com/slt666666/FAO_lecture/blob/main/simulation19.png?raw=true" alt="colab" height="200px">

### Play with prediction!
Try to find best genotype by editing chromosome genotype.
You can edit below code & run.

ex) 

* If you wanna change genotype of chromosome 12

`predict_customized_genotype(genotype, ["chr12"], GN_prediction_model, "GN_mean")`

* If you wanna change genotype of chromosome 1, 7, and 12

`predict_customized_genotype(genotype, ["chr01", "chr07", "chr12"], GN_prediction_model, "GN_mean")`

* If you wanna change genotype of chromosome 1, 2, 3, and 4

`predict_customized_genotype(genotype, ["chr01", "chr02", "chr03", "chr04"], GN_prediction_model, "GN_mean")`


Please use below code and try to find best genotype for grain number !!!

In [None]:
predict_customized_genotype(genotype, ["chr01", "chr02", "chr03", "chr04"], GN_prediction_model, "GN_mean")

This simulation study is editing genotype of chromosome level.

But of course, we can also consider regional level/gene level.

And we should consider the aspect that what materials(cultivars/genotypes) can we use to consider ideal & feasible genotype.

---
## Summary

In this notebook, we demonstrate **Genomic Prediction** analysis using unpublished data.

You can predict phenotypes from genotype information by genomic prediction model.

Thus, you can calculate the best combination to generate new cultivars.

Also, you can find out the ideal genotype for traits.
   
If you can generate good population which has high genetic variaty, genomic prediction approach is one of the approaches to achieve generating high-yield cultivar.

