This package provides functions for the analysis of splice junctions and
their association with somatic mutations. It integrates the output of
several tools which predict splicing effects from mutations or which
detect expressed splice junctions from RNA-seq data into a standardized
splice junction format based on genomic coordinates. Detected splice
junctions can be filtered against canonical splice junctsion and
annotated with affected transcript sequences, CDS, and resulting peptide
sequences. Splice2neo currently supports splice events from alternative
3’/5’ splice sites, exons skipping, intron retentions, exitrons and
mutually exclusive exons.
Integrating splice2neo functions and detection rules based on splice
effect scores and RNA-seq support facilitates the identification of
mutation-associated splice junctions which are tumor-specific and can
encode neoantigen candidates.
Documentation: https://tron-bioinformatics.github.io/splice2neo/
For a more detailed description and a full example workflow see the vignette
The R package splice2neo is not yet on CRAN or Bioconductor. Therefore, you need to install splice2neo from github.
if (!requireNamespace("remotes", quietly = TRUE)) {
install.packages("remotes")
}
remotes::install_github("TRON-Bioinformatics/splice2neo")
This is a basic example of how to use some functions.
library(splice2neo)
# load human genome reference sequence
requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)
bsg <- BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19
We start with some example splice junctions provided with the package.
junc_df <- dplyr::tibble(
junc_id = toy_junc_id[c(1, 6, 10)]
)
junc_df
#> # A tibble: 3 × 1
#> junc_id
#> <chr>
#> 1 chr2:152389996-152392205:-
#> 2 chr2:179415981-179416357:-
#> 3 chr2:179446225-179446226:-
Next, we find the transcripts which are in the same genomic region as the splice junction and that may be affected by the junction.
junc_df %>%
add_tx(toy_transcripts) %>%
head()
#> # A tibble: 6 × 3
#> junc_id tx_id tx_lst
#> <chr> <chr> <named list>
#> 1 chr2:152389996-152392205:- ENST00000409198 <GRanges>
#> 2 chr2:152389996-152392205:- ENST00000172853 <GRanges>
#> 3 chr2:152389996-152392205:- ENST00000397345 <GRanges>
#> 4 chr2:152389996-152392205:- ENST00000427231 <GRanges>
#> 5 chr2:152389996-152392205:- ENST00000618972 <GRanges>
#> 6 chr2:152389996-152392205:- ENST00000413693 <GRanges>
We modify the canonical transcripts by introducing the splice junctions. Then we add the transcript sequence in a fixed-sized window around the junction positions, the context sequence.
toy_junc_df %>%
head()
#> # A tibble: 6 × 2
#> junc_id tx_id
#> <chr> <chr>
#> 1 chr2:152389996-152392205:- ENST00000409198
#> 2 chr2:152389996-152390729:- ENST00000409198
#> 3 chr2:152389955-152389956:- ENST00000397345
#> 4 chr2:152388410-152392205:- ENST00000409198
#> 5 chr2:152388410-152390729:- ENST00000409198
#> 6 chr2:179415981-179416357:- ENST00000342992
toy_junc_df %>%
add_context_seq(transcripts = toy_transcripts, size = 400, bsg = bsg) %>%
head()
#> # A tibble: 6 × 8
#> junc_id tx_id tx_mod_id junc_pos_tx cts_seq cts_junc_pos cts_size cts_id
#> <chr> <chr> <chr> <int> <chr> <chr> <int> <chr>
#> 1 chr2:1523899… ENST… ENST0000… 16412 AAGAAG… 200 400 90bfc…
#> 2 chr2:1523899… ENST… ENST0000… 16517 AAGAAG… 200 400 26f77…
#> 3 chr2:1523899… ENST… ENST0000… 21620 AAGAAG… 0,200,1745,… 1945 f1f2c…
#> 4 chr2:1523884… ENST… ENST0000… 16412 AAGAAG… 200 400 d4f9e…
#> 5 chr2:1523884… ENST… ENST0000… 16517 AAGAAG… 200 400 c715a…
#> 6 chr2:1794159… ENST… ENST0000… 83789 TGGATT… 200 400 0128d…
Here, we use the splice junctions to modify the coding sequences (CDS) of the reference transcripts. The resulting CDS sequences are translated into protein sequence and further annotated with the peptide around the junction, the relative position of the splice junction in the peptide, and the location of the junction in an open reading frame (ORF).
toy_junc_df %>%
# add peptide sequence
add_peptide(cds=toy_cds, flanking_size = 13, bsg = bsg) %>%
# select subset of columns
dplyr::select(junc_id, peptide_context, junc_in_orf, cds_description) %>%
head()
#> # A tibble: 6 × 4
#> junc_id peptide_context junc_in_orf cds_description
#> <chr> <chr> <lgl> <chr>
#> 1 chr2:152389996-152392205:- NRHFKYATQLMNEIC TRUE mutated cds
#> 2 chr2:152389996-152390729:- HLLAKTAGDQISQIC TRUE mutated cds
#> 3 chr2:152389955-152389956:- MLTALYNSHMWSQVMSDGM TRUE mutated cds
#> 4 chr2:152388410-152392205:- NRHFKYATQLMNEIKYRKNYEK… TRUE mutated cds
#> 5 chr2:152388410-152390729:- <NA> TRUE truncated cds
#> 6 chr2:179415981-179416357:- SDPSKFTLAVSPVAGTPDYIDV… TRUE mutated cds
Please report issues here: https://github.com/TRON-Bioinformatics/splice2neo/issues
Lang, Franziska, Patrick Sorn, Martin Suchan, Alina Henrich, Christian Albrecht, Nina Koehl, Aline Beicht, et al. 2023. “Prediction of Tumor-Specific Splicing from Somatic Mutations as a Source of Neoantigen Candidates.” bioRxiv. https://doi.org/10.1101/2023.06.27.546494.