In [2]:
import ihtml

  # Mapping the epitranscriptome with Nanopore direct RNA Sequencing

### Adrien Leger,  EMBL EIPOD/Marie Curie fellow

#### Anton Enright Group, Dept of Pathology, University of Cambridge

#### Ewan Birney Group, EMBL-EBI

#### [aleg@ebi.ac.uk](aleg@ebi.ac.uk) / [a-slide @github](https://github.com/a-slide)

**Other people involved in the project**
* Aurelien Guy Duché (EMBL-EBI)
* Tomas Fitzgerald (EMBL-EBI)
* Tommaso Leonardi (Gurdon Institute)
* Paulo Amaral (Gurdon Institute)
* Harvey Che (Gurdon Institute)

# The Epitranscriptome

##### Collection of all the RNA post-transcriptional modifications (PTM)

* Highly conserved feature found in archaea, bacteria and eukarya (+ positional conservation)
* More than 100 RNA known modifications

![](images/Common_nucleosides.svg)


## RNA modifications in rRNA and tRNA

![](images/rRNA_tRNA_modif.png)

* Heavily modified
* Participate in the RNA folding and stability 
* Modify ligand interactions particularly in the tRNA anticodon region

## RNA modifications in messenger RNAs

* Terminal modifications are well known

    * PolyA tailing +- Urydinylation
	* 5' capping with 7' Methylguanosine

* Internal modifications recently described by NGS based methods

	* PTM IP + NGS ➡️ **m6A, m1A, m6Am**
	* Chemical treatment + NGS ➡️ **m5C, 5hmC, Pseudouridine**
	* Direct sequencing + Editing detection ➡️ **Inosine**
    
![](images/m6Aseq.png)

## Role of RNA modifications in mRNA regulation 

* Writer, Eraser and Reader proteins found for some modifications ➡️ **Dynamic layer**
* Various functions in mRNA splicing, stability, translation, decay...  
* Some modification can alter **RNA structure**
* Direct or indirect impact on **protein recruitment**

![](images/m6A_switch.jpg)

<small>Liu, N. et al. N6-methyladenosine-dependent RNA structural switches regulate RNA-protein interactions. Nature 518, 560–564 (2015).
</small> 

# Towards more sensitive methods for RNA modifications

* **Quantitative analysis of modified RNA nucleosides by LC-MS/MS **

	⊕ Cheap, quick, quantitative, simultaneous detection of many PTM
    
	⊝ No sequence information
    

*  **Native detection of PTM by Nanopore direct RNA sequencing**    

	⊕ Virtually no processing of samples, single molecule, single nucleotide resolution
    
	⊝ Early days, lower accuracy, probably not quantitative

## Overview of nanopore sequencing

In [13]:
%%ihtml 1100
<iframe align="middle" width="1777" height="1000" src="https://www.youtube.com/embed/GUb1TZvMWsw" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>

## Native detection of PTM by Nanopore direct RNA sequencing

5/6 bases at the time in the pore = context

Already shown for DNA and a specific positions of rRNA

<img src="images/RNA_mod_nanopore.png" width="1600">

## How to predict RNA modifications by nanopore sequencing 

**Direct basecalling of modified nucleotides from raw signal**

* Based on Deep Learning Basecalling
* Training from a labeled datasets with all the possible kmers combinations 
* Elegant and powerful but very complex

**Post-processing of the basecalled signal**

* Post basecalling with a modification tolerant basecaller  
* Identify bases that are modified by comparing to canonical or alternative models
* Easier, but probably less sensitive


## Nanopore training experimental strategy

<img src="images/Nanopore-workflow.svg" width="1600">

## Example of raw signal trace obtained 

<img src="images/traces.png" width="1600">

## Influence of the sequence context on the ability to detect a modification

<img src="images/kmer_through_pore.svg" width="1400">

## Sequence distribution of training blocks

<img src="images/seq_distrib_plot.svg" width="1400">


## Impact of modifications on the overall block

<img src="images/QC_plots.svg" width="1400">

# Kmer level analysis

* Pair comparison of Signal intensities for each 5mer

* Pair comparison of Dwell Time for each 5mer

* Comparison of A/m6A - A/dmA - m6A/dmA with a Mann-Whitney U test + 1000 bootstraps → Empirical p-value

* Combine the 2 p-values with Fisher's method 

* Plot the distribution of p-values for all 5mers

## Significantly different 5mers with central modification

<img src="images/pos0_pvalue.svg" width="1000">

**Percentage of significant 5mers (p-value < 0.05)**

<table align="left" style="font-size:75%;">
  <tr>
    <th></th>
    <th>Signal intensity<br></th>
    <th>Dwell time<br></th>
    <th>Combination</th>
  </tr>
  <tr>
    <th>A/m6A</th>
    <td>68.36 %<br></td>
    <td>37.50 %<br></td>
    <td>73.05 %</td>
  </tr>
  <tr>
    <th>A/dmA</th>
    <td>76.17 %<br></td>
    <td>27.34 %<br></td>
    <td>76.56 %<br></td>
  </tr>
  <tr>
    <th>m6A/dmA</th>
    <td>41.80 %<br></td>
    <td>26.17 %<br></td>
    <td>48.44 %<br></td>
  </tr>
</table></div>

## Significantly different adjacent 5mers

<img src="images/all_pos_pvalue.svg" width="1400">

**Percentage of significant 5mers (p-value < 0.05)**

<table align="left" style="font-size:75%;">
  <tr>
    <th class="tg-yw4l"></th>
    <th class="tg-yw4l">Position -2</th>
    <th class="tg-yw4l">Position -1</th>
    <th class="tg-yw4l">Position 0</th>
    <th class="tg-yw4l">Position 1</th>
    <th class="tg-yw4l">Position 2</th>
  </tr>
  <tr>
    <th class="tg-yw4l">A/m6A</td>
    <td class="tg-yw4l">86.72 %</td>
    <td class="tg-yw4l">91.02 %</td>
    <td class="tg-yw4l">73.05 %</td>
    <td class="tg-yw4l">62.89 %</td>
    <td class="tg-yw4l">59.38 %</td>
  </tr>
  <tr>
    <th class="tg-yw4l">A/dmA</td>
    <td class="tg-yw4l">72.94 %</td>
    <td class="tg-yw4l">73.44 %</td>
    <td class="tg-yw4l">76.56 %</td>
    <td class="tg-yw4l">46.48 %</td>
    <td class="tg-yw4l">48.44 %</td>
  </tr>
  <tr>
    <th class="tg-yw4l">m6A/dmA</td>
    <td class="tg-yw4l">76.86 %</td>
    <td class="tg-yw4l">70.31 %</td>
    <td class="tg-yw4l">48.44 %</td>
    <td class="tg-yw4l">27.34 %</td>
    <td class="tg-yw4l">33.98 %</td>
  </tr>
</table>

## Combining 5-mers in 7 and 9-mers

<img src="images/7-9mers_combine.svg" width="1000">

**Percentage of significant composite kmers (p-value < 0.05)**

<table align="left" style="font-size:75%;">
  <tr>
    <th class="tg-yw4l"></th>
    <th class="tg-yw4l">Composite 7mers</th>
    <th class="tg-yw4l">Composite 9mers</th>
  </tr>
  <tr>
    <th class="tg-yw4l">A/m6A</td>
    <td class="tg-yw4l">97.34%</td>
    <td class="tg-yw4l">99.8%</td>
  </tr>
  <tr>
    <th class="tg-yw4l">A/dmA</td>
    <td class="tg-yw4l">91.7%</td>
    <td class="tg-yw4l">97.75%</td>
  </tr>
  <tr>
    <th class="tg-yw4l">m6A/dmA</td>
    <td class="tg-yw4l">79.42%</td>
    <td class="tg-yw4l">92.44%</td>
  </tr>
</table>

# What's next ?

**Generate more concatemers data for more modifications with higher depth**
* Improved ligation protocol to yield longer concatemers
* New oligos design to improve ligation efficiency and simplify block segmentation
* 10 modifications (A, 2'Ome-A, m6A, m2A, I, U, 2'Ome-U, m3U, PseudoU, dihydroU)
* Explore on chip maskless photolithographic synthesis  

**RNA specific modification-aware basecaller**
* Need a better training dataset
* Update basecaller to deal with extra output states
* High performance GPU workstation

**Pair comparison of unmodified vs modified (Nanoraw, Nanopolish, Tombo...)**
* Modification Writer KO / In vitro transcription
* For a specific RNA (XIST for example) RNA vs IVT cDNA

**Integration of single nucleotide modification maps with structural datasets**
* SHAPE-Seq of epitranscriptomics KO
* Identify modification switches