# PCA plot Examples

In [1]:
import caplot
from bokeh.plotting import show
from bokeh.io import output_notebook
output_notebook()

## Dataset
The dataset used in this notebook is described in [data/SampleData.md](data/SampleData.md)

This dataset contains all the sample information (2504 rows and 68 cols). Columns are as follow
- s: sample id
- pheno-: phenotypic information including subpopulation, superpopulation, age, t2d, bmi and isFemale
- sample_qc-: quality-control metrics computed by hail.sample_qc
- Peinciple Component Analysis (PCA)
  - pcaSS1-scores_: The first 3 principle component vectors. Computed from 1% variants randomely selected
  - pcaSS2-scores_: The first 10 principle component vectors. Computed from 10% variants randomely selected
  - pcaMAF-scores_: The first 10 principle component vectors. Computed from common variants with minor allele frequency above 1%
  - pca-scores_: The first 20 principle component vectors. Computed from all variants

**1) All in one go**\
In this example, we create an instance of `caplot.PCA` and pass all the settings in one go.\
The plot filter samples with Type2Diabetes `pheno-t2d=1` and highlight the European population (`EUR` super-population)\
The sample id and age are set to be shown in the hover field.\
The plot is colored by the age of the samples

In [2]:
plot = caplot.PCA(
    source='data/samples.tsv.gz',
    filterQuery='SELECT * FROM data WHERE "pheno-t2d"=1',
    highlightQuery='SELECT * FROM data WHERE "pheno-superpopulation"="EUR"',
    hovers={'ID': 's', 'Age': 'pheno-age'},
    plots='["pcaMAF-scores_1", "pcaMAF-scores_2"]',
    coloringColumn='pheno-age',
    coloringPalette='Magma256',
    coloringStyle='Continuous')
plot.Show()

TypeError: __init__() got an unexpected keyword argument 'plots'

**2) Step by Step**\
Useing the following methods:
- `LoadData`
- `Filter`
- `Highligh`
- `Hover`
- `Configure`

In [2]:
plot = caplot.PCA()

plot.data = 'data/samples.tsv.gz'
plot.filter = 'SELECT * FROM data WHERE "pheno-t2d"=1' 
plot.highlight = 'SELECT * FROM data WHERE "pheno-superpopulation"="EUR"'
plot.hovers = {'ID': 's', 'Age': 'pheno-age'}
plot.subplots = ['pcaMAF-scores_1', 'pcaMAF-scores_2', 'pcaMAF-scores_3']
plot.coloringColumn = 'pheno-superpopulation'
plot.coloringStyle = 'Categorical'
plot.coloringPalette = 'Category10'
plot.numCols = 2

plot.Show()

In [5]:
plot.coloringColumn = 'pheno-age'
plot.coloringStyle = 'Continuous'
plot.coloringPalette = 'Magma256'
plot.Show()

**3) Change and re-plot**\
You can recall any of the the above metods to modify the plot\
Here we change the highlight to focus on African population (`AFR` super-population)

In [4]:
plot.Highlight('SELECT * FROM data WHERE "pheno-superpopulation"="AFR"')
plot.Show()

**4) Save the plot**\
The file format is infered from the file extension.\
See documentation for the supported formats.

In [5]:
plot.SaveAs('results/test3.html')

In [6]:
plot.Configure(
    plots='["pcaMAF-scores_1", "pcaMAF-scores_2", "pcaMAF-scores_3", "pcaMAF-scores_4"]',
    coloringColumn='pheno-age',
    coloringPalette='Magma256',
    coloringStyle='Continuous',
    numCols=2)
plot.Show()

**5) Try widgets**\
The plot response to the changes in the widgets

In [2]:
plot = caplot.PCA()

plot.data = 'data/samples.tsv.gz'
plot.filter = 'SELECT * FROM data WHERE "pheno-t2d"=1' 
plot.highlight = 'SELECT * FROM data WHERE "pheno-superpopulation"="EUR"'
plot.hovers = {'ID': 's', 'Age': 'pheno-age'}
plot.subplots = ['pcaMAF-scores_1', 'pcaMAF-scores_2', 'pcaMAF-scores_3']
plot.coloringColumn = 'pheno-superpopulation'
plot.coloringStyle = 'Categorical'
plot.coloringPalette = 'Category10'
plot.numCols = 2

plot.ShowWithForm()

VBox(children=(HBox(children=(Label(value='Filter'), Text(value='SELECT * FROM data WHERE "pheno-t2d"=1', plac…

Output()