## Introduction (Use dot for numbering)
1. Copy number variations (CNVs): large genomic insertion or deletion events;
2. Length of CNVs: spread from 50 base pairs to kilo- or even mega-bases;
3. Previous CNV research: (1) detection of CNVs associated with disease risks, (2) identification of gene sets with CNV burden;
4. Challenge: CNVs often span multiple genes (1-30 or more), unclear which one has genetic susceptibility in the same CNV event.

Figure 5

## Motivation and Aim
1. Genome-wide association studies (GWAS) have limitations: (1) Risk alleles are usually at low frequencies and difficult to detect, (2) most loci have small effect sizes, not likely to be deleterious mutations, (3) causal variants or genes in GWAS loci are often unclear;
2. CNVs are believed to play a critical role in psychiatric disorders, such as schizophrenia (SCZ) [1];
2. The prevalence of SCZ ranges between 0.25% and 0.64% in the United States [2]; symptoms of SCZ include: (1) early-onset, usually late adolescence and early adulthood, (2) cognitive impairment/dysfunction, a core feature of SCZ, (3) difficulty in social activities, (4) potential to cause disability;
3. Inspired by statistical fine-mapping of causal variants in linkage-disequilibrium blocks from GWAS, we aim to develop a new approach that exploits large-scale genome-wide CNV data in case-control studies to map genes for psychiatric disorders;
4. It can be integrated with other gene-level datasets, e.g. results from exome-sequencing studies.

## Data available
1. Swedish schizophrenia population-based case-control exome sequencing CNV data from dbGAP;
2. Schizophrenia case-control CNV data from International Schizophrenia Consortium (ISC) study;
3. hg19 refGene from UCSC genome annotation database.

## Approach and algorighm
Suppose the genome is divided into disjoint blocks and no CNVs in common between blocks (i.e. allow overlapping CNVs). Within a block $R$, we may have multiple, possibly overlapping, CNV events, and assume there is at least one causal gene in $R$. To infer CNV-gene configuration from case-control data, we leverage the statistical machinery of Bayesian regression.

Assume a mixture prior with spike-and-slab for $β_j$ and logistic regression model for the phenotype:
$$\beta_{j} = (1 - \pi_{j})\delta_0 + \pi_{j}g(\cdot)$$
where $$g(\cdot) \sim N(\mu,\sigma^2)$$
$$\text{logit P}(y_i = 1) =\log\big[\frac{\pi_{j}}{1 - \pi_{j}} \big] = \alpha_0 + \sum_{j=1}^m \alpha_j d_{{ij}}$$

$\pi_{j}$: prior inclusion probability of $\textit{j}$-th gene in a CNV-gene block

$y_i$: the phenotypic status of sample $i$

$\alpha_j$: effect size of $\textit{j}$-th gene

$d_{ij}$: the overlapping status with CNV event of $j$-th gene in sample $i$

$\mu, \sigma$: prior mean and standard error for spike, where $\mu \neq 0$

## Simulation processes
1. Obtain CNV-gene blocks: simulate CNV-gene blocks containing at least a certain number of genes/exons (no overlapping CNVs between blocks). 
2. Obtain genome-wide CNV-gene pattern (X matrix): randomly sample each block and merge them together as a simulated individual. Repeat this process $100,000$ times.
3. Obtain spike-and-slab prior: set penetrance/prevalence ($p$) as $0.05$, then $p \approx \frac{e^{\beta_0}}{1-e^{\beta_0}}$ and $\beta_0 \approx \log \frac{p}{1-p}$. Set $\pi_j = 0.05$, then $95\%$ $\beta_j's$ are adjusted to 0. Odds ratio (OR) for $j$-th gene is $\text{exp}(\beta_j)$.
4. Obtain phenotype y: first calculate $y=\frac{e^{x\boldsymbol{\beta}+\beta_0}}{1+e^{x\boldsymbol{\beta}+\beta_0}}$, then use Bernoulli($y_i$) to categorize each $y_i$ to either cases ($1's$) or controls ($0's$). Select all cases (about $5\%$) and randomly select equal number of controls.

## Results
1. Use R package `varbvs` to obtain prior parameters for MCMC method, $\pi = 0.0438$, $\mu = 0.777$ and $\sigma = 0.844$.
2. Map the genes for susceptibility in one CNV-gene block using R software packages, `SuSiE` and `varbvs`, to obtain posterior inclusion probabilities (PIP) and potentially credible set (CS);
3. Perform Bayesian Logistic Regression using python package `PyMC3`.

        
- Block example 1 (5 genes, 1 positive effect gene): 

        sample_ID  phenotype	740	741	742	743	744
        655	1	0	0	1	0	0
        774	1	0	0	1	0	0
        826	1	0	0	1	0	0
        950	1	0	0	1	0	0
        1006	1	0	0	1	0	0
        1011	1	0	0	1	0	0
        1454	1	0	0	1	0	0
        1678	1	1	1	1	1	1
        2079	1	0	0	0	1	0
        2276	1	0	0	1	0	0
        2691	1	0	0	1	0	0
        3187	1	1	1	1	1	1
        3837	1	0	0	1	0	0
        4900	1	0	0	1	0	0
        5713	1	0	0	1	0	0
        6080	1	1	1	1	1	1
        6278	1	0	0	1	0	0
        6368	1	0	0	0	1	0
        7037	0	0	0	0	1	0
        7891	0	1	1	1	1	1
        8321	0	0	0	1	0	0
        11199	0	0	0	1	0	0
        12311	0	1	1	1	1	1

        gene index	d_c	d_nc	nd_c	nd_nc	p
        gene_742	16   6690	4	6702	0.011757
        gene_743	5	6701	3	6703	0.726481
        gene_744	3	6703	2	6704	1.000000
        gene_741	3	6703	2	6704	1.000000
        gene_740	3	6703	2	6704	1.000000

|gene index|simulated effect|SuSiE|varbvs|PyMC3|
|:---:|:---:|:---:|:---:|:---:|
|740|0|0.0442|0.0913|0.0290|
|741|0|0.0442|0.0913|0.0370|
|742|0.9806|**0.8173**|**0.6076**|**0.4680**|
|743|0|0.0501|0.0804|0.0255|
|744|0|0.0442|0.0913|0.0310|

- Block example 2 (14 genes, 2 positive effect genes)

        sample_ID	phenotype	1019	1020	1021	1022	1023	1024	1025	1026	1027	1028	1029	1030	1031	1032
        789	 1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
        1198	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
        1291	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
        1748	1	0	0	1	0	0	0	0	0	0	0	0	0	0	0
        2306	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
        2377	1	0	0	1	0	0	0	0	0	0	0	0	0	0	0
        2401	1	0	0	1	0	0	0	0	0	0	0	0	0	0	0
        3285	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0
        3477	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
        3550	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0
        4497	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
        4555	1	0	0	1	0	0	0	0	0	0	0	0	0	0	0
        5394	1	0	0	1	0	0	0	0	0	0	0	0	0	0	0
        5531	1	0	0	1	0	0	0	0	0	0	0	0	0	0	0
        6230	1	0	0	1	0	0	0	0	0	0	0	0	0	0	0
        6390	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
        9210	0	1	1	0	0	0	0	0	0	0	0	0	0	0	0
        9488	0	1	1	1	1	1	1	1	1	1	1	1	1	1	1
        11470	0	1	1	1	1	1	1	1	1	1	1	1	1	1	1
        11809	0	1	1	0	0	0	0	0	0	0	0	0	0	0	0
        12698	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0
        
        gene index	d_c  d_nc	nd_c  nd_nc	p
        gene_1021	14   6692	3	6703	0.012672
        gene_1023	7	6699	2	6704	0.179541
        gene_1024	7	6699	2	6704	0.179541
        gene_1022	7	6699	2	6704	0.179541
        gene_1025	7	6699	2	6704	0.179541
        gene_1027	7	6699	2	6704	0.179541
        gene_1030	7	6699	2	6704	0.179541
        gene_1031	7	6699	2	6704	0.179541
        gene_1032	7	6699	2	6704	0.179541
        gene_1029	7	6699	2	6704	0.179541
        gene_1026	7	6699	2	6704	0.179541
        gene_1028	7	6699	2	6704	0.179541
        gene_1020	9	6697	4	6702	0.266611
        gene_1019	9	6697	4	6702	0.266611

|gene index|simulated effect|SuSiE|varbvs|PyMC3|
|:---:|:---:|:---:|:---:|:---:|
|1019|0|0.0414|0.0500|0.0385|
|1020|0|0.0414|0.0500|0.0365|
|1021|0.60|0.2900|0.2861|0.3040|
|1022|0|0.0570|0.0617|0.0580|
|1023|0|0.0570|0.0617|0.0505|
|1024|0|0.0570|0.0617|0.0650|
|1025|0|0.0570|0.0617|0.0495|
|1026|0|0.0570|0.0617|0.0475|
|1027|0|0.0570|0.0617|0.0640|
|1028|0|0.0570|0.0617|0.0635|
|1029|0|0.0570|0.0617|0.0585|
|1030|0|0.0570|0.0617|0.0485|
|1031|0|0.0570|0.0617|0.0580|
|1032|1.07|0.0570|0.0617|0.0580|

- Block example 3 (12 genes, 3 positive effect genes):

        gene index  d_c	d_nc   nd_c  nd_nc   p
        gene_671	99	6607	9	6697	1.956916e-20
        gene_672	99	6607	9	6697	1.956916e-20
        gene_676	20	6686	0	6706	1.880480e-06
        gene_678	20	6686	0	6706	1.880480e-06
        gene_677	20	6686	0	6706	1.880480e-06
        gene_673	23	6683	5	6701	9.016016e-04
        gene_674	23	6683	5	6701	9.016016e-04
        gene_675	23	6683	5	6701	9.016016e-04
        gene_667	12	6694	2	6704	1.289476e-02
        gene_668	17	6689	6	6700	3.453722e-02
        gene_670	17	6689	6	6700	3.453722e-02
        gene_669	17	6689	6	6700	3.453722e-02

|gene index|simulated effect|SuSiE|varbvs|PyMC3|
|:---:|:---:|:---:|:---:|:---:|
|667|0|8.8818e-16|0.0435|0.0240|
|668|0|3.3307e-16|0.0330|0.0195|
|669|0|3.3307e-16|0.0330|0.0110|
|670|0|3.3307e-16|0.0330|0.0150|
|671|1.2865|**0.4999**|1|0.5460|
|672|0.5374|**0.4999**|0.0192|0.5185|
|673|0|7.9936e-15|0.0338|0.0185|
|674|0|7.9936e-15|0.0338|0.0230|
|675|0|7.9936e-15|0.0338|0.0300|
|676|0|5.2757e-13|0.0614|0.1270|
|677|0|5.2757e-13|0.0614|0.1325|
|678|0.9833|5.2757e-13|0.0614|0.1080|

## Reference
1. C. Lowther et al., Genomic Disorders in Psychiatry-What Does the Clinician Need to Know? Current Psychiatry Reports 19 (2017).
2. E. Q. Wu et al., Annual prevalence of diagnosed schizophrenia in the USA: a claims data analysis approach. Psychol Med (2006).

## ASHG talks
Session 81

Poison exons in neurodevelopmental disorders: from development and disease to therapeutic target
1. What is poison exons: small exonic regions that lead to premature truncation of a protein when spliced into an RNA transcript (contain stop codon).
2. When does it occur: Inclusion of poison exons occurs during specific times in neurodevelopment, while splicing occurs in a cell-specific manner.
3. Many genes involved in neurodevelopmental disorders harbor these poison exons, epilepsy, autism, and malformations of cortical development, including ion channels, epigenetic regulators.
4. How to identify poison exons in neuronal development, splicing mechanisms, variants that disrupt splicing
5. The identification of patient-specific variants

        Dr. Mefford: describe the current genetic landscape of neurodevelopmental disorders and strategies to identify the molecular etiology in undiagnosed cases;
        Dr. Zhang: discuss the studies that identified poison exons in neuronal development, the splicing mechanisms that govern their use, and variants detected in patients with malformations of cortical development that disrupt their splicing;
        Dr. Carvill: 
        Dr. Isom: 

This session will focus on the role of poison exons in epilepsy. Poison exons are small exonic regions that, when spliced into an RNA transcript, lead to premature truncation of a protein. Inclusion of poison exons occurs during specific times in neurodevelopment, and splicing occurs in a cell-specific manner. Many of the genes implicated in neurodevelopmental disorders, including epilepsy, autism, and malformations of cortical development, harbor these poison exons. These include ion channels (SCN1A/2A/8A), epigenetic regulators (CHD2, MBD5), and cytoskeletal proteins (FLNA). In this session, Dr. Mefford will open by describing the current genetic landscape of neurodevelopmental disorders and strategies to identify the molecular etiology in undiagnosed cases. Dr. Mefford will also introduce the concept of poison exons and their function. Dr. Zhang will discuss the studies that identified poison exons in neuronal development, the splicing mechanisms that govern their use, and variants detected in patients with malformations of cortical development that disrupt their splicing. Dr. Carvill will discuss the identification of patient-specific variants that lead to aberrant inclusion of poison exons in genes implicated in epilepsy, including SCN1A, and functional studies in patient-derived induced neurons. Dr. Isom will describe an antisense oligonucleotide that targets an SCN1A poison exon, preventing its inclusion and thus restoring full-length protein and preventing SCN1A-related mortality in an animal model of epilepsy.

## ASHG posters
Approximately LD independent regions for biobank scale genetic analyses
1. For genetic fine-mapping methods to produce valid estimates: loci are assumed to be independent
2. Introduce a new method for identifying approximately LD independent regions that minimizes different functins of the full "off-diagonal" LD band at region boundaries and apply it to the UK Biobank. (minimize off-diagonal LD scores)
3. Use $\sum_{ij}1_{\{r_{ij}>0.60\}}$ instead of a naive odLDSC $\sum_{ij}r_{ij}^2$
4. affect fine-mapping