  # Cracking the long non-coding RNA epitranscriptome

### Adrien Leger,  EMBL EIPOD fellow

#### Enright Group, Dptm of Pathology, University of Cambridge

#### Marcia Group, EMBL Grenoble

#### Birney Group, EMBL-EBI

#### [aleg@ebi.ac.uk](aleg@ebi.ac.uk) / [a-slide @github](https://github.com/a-slide)

**Other people involved in the project**
* Aurelien Guy Duché (EMBL-EBI)
* Junting Zhang (EMBL-EBI)
* Jyoti Choudary (CRUK)
* Tommaso Leonardi (Gurdon Institute)
* Paulo Amaral (Gurdon Institute)
* Harvey Che (Gurdon Institute)

# My training and research experience

![](images/training_and_research.png)

# The long non-coding RNAs

##### A poorly understood, loosely defined class of RNA
* 200+ bases RNA "without" ORF
* +- PolyA
* Enriched in the nucleus
* Expressed around 10/100 fold less than mRNA

![](images/lncRNA_data.png)

<small>From Mukherjee, N. et al. Nat. Struct. Mol. Biol. 24, 86–96 (2017)</small>

## Functions of lncRNAs

**Only a small fraction have demonstrated regulatory functions**

![](images/Models-of-lncRNA-functions.png)

<small>Morlando, M et al. Long Non-Coding RNAs: New Players in Hematopoiesis and Leukemia. Frontiers in medicine. 2. 23 (2015)</small>

## Functional features of lncRNAs are hard to predict

* Base pairing with nucleic acid target

    ~ 40-50 % True positive <small>(Terai et al Comprehensive prediction of lncRNA–RNA interactions in human transcriptome, BMC Genomics 2016 17:12)</small>

* Interaction and recruitment of proteins

    ~ 30-60 % True positive <small>(Lu et al.: Computational prediction of associations between long non-coding RNAs and proteins. BMC Genomics 2013 14:651)</small>

* 2D/3D folding
    
    ?

**Incomplete model ?** 

# The Epitranscriptome

##### Collection of all the RNA post-transcriptional modifications (PTM)

* Highly conserved feature found in archaea, bacteria and eukarya (+ positional conservation)
* More than 100 RNA known modifications

![](images/Common_nucleosides.svg)

## RNA modifications in rRNA and tRNA

![](images/rRNA_tRNA_modif.png)

* Heavily modified
* Participate in the RNA folding and stability 
* Modify ligand interactions particularly in the tRNA anticodon region

## RNA modifications in messenger RNAs

* Terminal modifications are well known

    * PolyA tailing +- Urydinylation
	* 5' capping with 7' Methylguanosine

* Internal modifications recently described by NGS based methods

	* PTM IP + NGS ➡️ **m6A, m1A, m6Am**
	* Chemical treatment + NGS ➡️ **m5C, 5hmC, Pseudouridine**
	* Direct sequencing + Editing detection ➡️ **Inosine**
    
![](images/m6Aseq.png)

## Role of RNA modifications in mRNA regulation 

* Writer, Eraser and Reader proteins found for some modifications ➡️ **Dynamic layer**
* Various functions in mRNA splicing, stability, translation, decay...  
* Some modification can alter **RNA structure**
* Direct or indirect impact on **protein recruitment**

![](images/m6A_switch.jpg)

<small>Liu, N. et al. N6-methyladenosine-dependent RNA structural switches regulate RNA-protein interactions. Nature 518, 560–564 (2015).
</small> 

## Overview of various mRNA PTM and their suggested functions

![](images/mRNA_mod_loc.png)

<small>Hoernes, T. P. and Erlacher, M. D. Translating the epitranscriptome. Wiley Interdiscip Rev RNA (2016)</small> 

## RNA modification sites found in lncRNA

Data mining of all the published RNA modification datasets (2016) 

![](images/PTM_expression_lncRNA.png)

## The Epitranscriptome of lncRNAs 

* mRNA modifications are also detected in lncRNAs

* Real extend of RNA mods in lncRNAs

* Function of modifications in lncRNAs ?

* Impact on structure and protein recruitment

# Towards more sensitive methods for PTM analysis of lncRNAs

* **lncRNA capture at RNA level**

    Design of overlapping probes to capture as many annotated lncRNA as possible 

* **Quantitative analysis of modified RNA nucleosides by LC-MS/MS **

	⊕ Cheap, quick, quantitative, simultaneous detection of many PTM
    
	⊝ No sequence information

*  **Native detection of PTM by Nanopore direct RNA sequencing**    

	⊕ Virtually no processing of samples, single molecule, single nucleotide resolution
    
	⊝ Early days, lower accuracy, very competitive

## Detecting and quantifying RNA modifications by MS


![](images/mRNA_PTM_mass.png)

## LC MS/MS experimental strategy

![](images/MS-workflow.png)

## LC MS/MS Preliminary results

![](images/MS_nucleosides_example.png)

## Native detection of PTM by Nanopore direct RNA sequencing

5/6 bases at the time in the pore = context 
Already shown for DNA and a specific positions of rRNA

![](images/RNA_mod_nanopore.png)

## How to detect RNA modifications by nanopore sequencing 

**Direct basecalling of modified nucleotides from the raw signal**

* Based on Deep Learning Basecalling
* Training from a labeled raw signal with all the possible kmers combinations 
* Can potentially identify a large number of modified bases simultaneously

**Post-processing of the basecalled signal**

* Post basecalling with a modification tolerant basecaller  
* Much simpler approach
* Compare paired modified/ modified conditions OR use a model for a modified base 
* Only indicate the presence of a particular modification 

## Nanopore training experimental strategy
    
![](images/Nanopore-workflow.png)

## Where are the barcoded blocks ?

Basecalling with Albacore 2.1 for RNA 

![](images/traces.png)

From 82,418 valid Fast5 files (1M total) 110,884 blocks found only

## Preliminary data for m6A and dmA

![](images/Nanopore_distrib.png)

## Preliminary data for m6A and dmA

For all the 9-mers with a mod in the center

Extract all the 5-mers

Generate an artificial average signal

![](images/artificial_traces.png)

## Preliminary predictive model for m6A

Try to predict A/m6A using a regularized binary logistic regression model

90% data to build the model / 10% for prediction

![](images/logit.png)

Better the a coin toss !

## What's next ?

**RNA specific modification aware basecaller**
* Need a better training dataset
* High performance GPU workstation
* Possible to do better than Albacore

**Pair comparison of unmodified vs modified (Nanoraw, Nanopolish, Tombo...)**
* Modification Writer KO
* For a specific RNA (XIST for example) RNA vs IVT cDNA

**Integration of single nucleotide modification maps with structural datasets**
* SHAPE-Seq of epitranscriptomics KO
* Identify modification switches

## What's after next: CRISPR:CAS9 screening of epitranscriptomic effectors

![](images/screening_protocol2.png)

## Extra projects related to Nanopore seq

* PolyA tail length to analyse RNA stability following YTHDF2 KO
* Gene Isoform subcellular localization
