# Analysis Notebook - Count Genes and Events

This notebook processes the raw counts as provided by rMATS and performs some descriptive statistical analysis. It is used to produce the following outputs. 

## Data files created by this notebook
Output text files are written to the ``data/`` directory (at the same level as the ``jupyter`` directory). 

1. **gene_AS.tsv**: Alternative splicing events per gene
2. **genesWithCommonAS.tsv
3. **Total_AS_by_chr.tsv**: Total alternative splicing events per chromosome
4. **Total_AS_by_geneSymbol.tsv**: Count the number of tissues in which specific genes show significant alternative splicing
5. **Total_AS_by_tissue.tsv**: Count the number of significant splicing events per tissue
6. **Total_AS_by_splicingtype.tsv**: Count number of significant splicing events for each of the 5' alternative splicing categories
7. **Significant_AS_events.tsv**: ?? Counts of significant events per slicing type per tissue
8. **SplicingIndex_chr.tsv**: Splicing index by chr (number of sigificant AS events per 1000 exons)

In [1]:
defaultW <- getOption("warn")  # suppress warnings for this cell
options(warn = -1) 
library(dplyr)
library(ggplot2)
library(limma)
library(multtest)
library(Biobase)
library(edgeR)
library(tibble)
library(R.utils)
library(rtracklayer)

options(warn = defaultW)


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


Loading required package: BiocGenerics

Loading required package: parallel


Attaching package: ‘BiocGenerics’


The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB


The following object is masked from ‘package:limma’:

    plotMA


The following objects are masked from ‘package:dplyr’:

    combine, intersect, setdiff, union


The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs


The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
   

## 1. Download all the rMATS results

Each of the alternative splicing output files are downloaded here:

### 1.1 get released rMATS GTF annotations

For each splicing type, the junctions are defined, so we have 5 specific annotated splicing specific junction ID annotation files:

1. **fromGTF.A3SS.txt**: annotations for the alternative 3' splice site junctions
2. **fromGTF.A5SS.txt**: annotations for the alternative 5' splice site junctions
3. **fromGTF.MXE.txt**: annotations for the mutually exclusive exon junctions
4. **fromGTF.RI.txt**: annotations for the retained introns junctions
5. **fromGTF.SE.txt**: annotations for the skipped exon junctions

## 1.2 Unpack the data.tar file if necessary
To run this script, we need to import three compressed files and unpack them.

| file | sha256 | filename  |
|------|------  |-----------|
|  1   | b0c4bb23b96d77aba7e731fa2a15dc74a34daf490312478aca94443f9a6d4e90 | results/data_as_dge.tar.gz |
|  2   | a0c2c5a7d7cfa0a89c8a39e2f7a4c6c3ac8c6a860f721077c087614505d869cf | rmats_and_annotation.tar.gz |


In [2]:
data_as_dge_file_dir <- list.files("../../mounted-data", pattern='data_as_dge.tar.gz')
data_as_dge_file_dir
rmats_and_annotation_dir <- list.files("../../mounted-data", pattern='rmats_and_annotation.tar.gz')
rmats_and_annotation_dir

In [3]:
data_as_dge_file_dir <- list.files("../../mounted-data", pattern='data_as_dge.tar.gz')
data_as_dge_file <- paste("../../mounted-data", data_as_dge_file_dir, 'robinson-bucket/notebooks/data_as_dge', sep='/')
data_as_dge_file_tar_gz <- paste(data_as_dge_file, '.tar.gz', sep='')
message("In order to unpack the necessary files, execute the following commands on the shell.")
message("data_as_dge.tar.gz")
mycommand = paste("tar xvfz ",data_as_dge_file_tar_gz, "-C ../data", sep=" ")
message(mycommand)
message("checking sha256sum")
mycommand = paste("sha256sum", data_as_dge_file_tar_gz, sep = " ")
message(mycommand)
rmats_annot_file_dir <- list.files("../../mounted-data", pattern='rmats_and_annotation.tar.gz')
rmats_annot_file <- paste("../../mounted-data", data_as_dge_file_dir, 'robinson-bucket/notebooks/rmats_and_annotation', sep='/')
rmats_annot_file_tar_gz <- paste(rmats_annot_file, '.tar.gz', sep='')
message("In order to unpack the necessary files, execute the following commands on the shell.")
message("rmats_and_annotation.tar.gz")
mycommand = paste("tar xvfz ",rmats_annot_file_tar_gz, "-C ../data", sep=" ")
message(mycommand)
message("checking sha256sum")
mycommand = paste("sha256sum", rmats_annot_file_tar_gz, sep = " ")
message(mycommand)

In order to unpack the necessary files, execute the following commands on the shell.

data_as_dge.tar.gz

tar xvfz  ../../mounted-data/5eeba2b5143fa00113f95642-data_as_dge.tar.gz-5eeba2b5143fa00113f95642/robinson-bucket/notebooks/data_as_dge.tar.gz -C ../data

checking sha256sum

sha256sum ../../mounted-data/5eeba2b5143fa00113f95642-data_as_dge.tar.gz-5eeba2b5143fa00113f95642/robinson-bucket/notebooks/data_as_dge.tar.gz

In order to unpack the necessary files, execute the following commands on the shell.

rmats_and_annotation.tar.gz

tar xvfz  ../../mounted-data/5eeba2b5143fa00113f95642-data_as_dge.tar.gz-5eeba2b5143fa00113f95642/robinson-bucket/notebooks/rmats_and_annotation.tar.gz -C ../data

checking sha256sum

sha256sum ../../mounted-data/5eeba2b5143fa00113f95642-data_as_dge.tar.gz-5eeba2b5143fa00113f95642/robinson-bucket/notebooks/rmats_and_annotation.tar.gz



In [4]:
## get the rmats 3.2.5 discovered/annotated junction information in GTF format
message("Decompressing fromGTF.tar.gz into ../data")
system("mkdir -p ../data && tar xvfz ../data/fromGTF.tar.gz -C ../data", intern = TRUE)
system("gunzip ../data/fromGTF.*txt.gz", intern = TRUE)
message("Done!\n")

Decompressing fromGTF.tar.gz into ../data



Done!




### 2  Refined results
We define **refined results* as (FC > 1.5 and pVal < 0.05) for the sex\*as_event coefficient result for the linear model

### 2.1 getTissueReduction

In [5]:
tissue_reduction_filename <- "../assets/tissues.tsv"
tissue_reduction <- read.table(tissue_reduction_filename, header=TRUE, sep="\t",
                               skipNul=FALSE, stringsAsFactors = FALSE)
colnames(tissue_reduction)  <- c("SMTSD","female","male","include","display_name")
tissue_reduction <- tissue_reduction[tissue_reduction$display_name != "n/a",]
tissue_reduction$display_name <- factor(tissue_reduction$display_name)
levels(tissue_reduction$display_name)
message("We extracted ", length(levels(tissue_reduction$display_name))," different tissues with at least 50 samples in both M & f")

We extracted 39 different tissues with at least 50 samples in both M & f



### 2.2 Read in refined results and annotations

In [6]:
significant_results_dir = "../data/"
pattern = "model_B_sex_as_events_refined.csv"
files <- list.files(path = significant_results_dir, pattern = pattern)
as_types <- c("a3ss", "a5ss", "mxe", "ri", "se")
length(files)

In [7]:
a3ss_annot <- read.table(file = "../data/fromGTF.A3SS.txt", sep = "\t", quote = "\"", header = T, stringsAsFactors = F)
a5ss_annot <- read.table(file = "../data/fromGTF.A5SS.txt", sep = "\t", quote = "\"", header = T, stringsAsFactors = F)
mxe_annot <- read.table(file = "../data/fromGTF.MXE.txt", sep = "\t", quote = "\"", header = T, stringsAsFactors = F)
ri_annot <- read.table(file = "../data/fromGTF.RI.txt", sep = "\t", quote = "\"", header = T, stringsAsFactors = F)
se_annot <- read.table(file = "../data/fromGTF.SE.txt", sep = "\t", quote = "\"", header = T, stringsAsFactors = F)

In [8]:
head(se_annot)

Unnamed: 0_level_0,ID,GeneID,geneSymbol,chr,strand,exonStart_0base,exonEnd,upstreamES,upstreamEE,downstreamES,downstreamEE
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>,<chr>,<int>,<int>,<int>,<int>,<int>,<int>
1,1,ENSG00000034152.18,MAP2K3,chr17,+,21287990,21288091,21284709,21284969,21295674,21295769
2,2,ENSG00000034152.18,MAP2K3,chr17,+,21303182,21303234,21302142,21302259,21304425,21304553
3,3,ENSG00000034152.18,MAP2K3,chr17,+,21295674,21295769,21287990,21288091,21296085,21296143
4,4,ENSG00000034152.18,MAP2K3,chr17,+,21295674,21295769,21287990,21288091,21298412,21298479
5,5,ENSG00000034152.18,MAP2K3,chr17,+,21295674,21295769,21284710,21284969,21296085,21296143
6,6,ENSG00000034152.18,MAP2K3,chr17,+,21295674,21295769,21284710,21284969,21298412,21298479


In [31]:
gene_as = data.frame()
counts <- rep(NA, length(files))
length(files)

In [33]:
for (i in 1:length(files)) {
    lines  <- read.table(file=paste0(significant_results_dir, files[i]), 
                                     header = TRUE, sep = ",", quote = "\"'", skipNul = FALSE)
#    message(paste(dim(lines)[1] >0),collapse = "")
    if (dim(lines)[1] > 0) {
        event     <- as.vector(as.character(rownames(lines)))
        tissue1   <- gsub("_AS_model_B_sex_as_events_refined.csv","", files[i], fixed = TRUE)
        counts[i] <- dim(lines)[1]
        event_idx <- substring(event, regexpr("[0-9]+$", event))
        res       <- data.frame()
        if (grepl("^a3ss_", files[i])) {
            # remove the first 5 letters of the string 
            tissue2 <- substring(tissue1,6)
            idx <- match(event_idx, a3ss_annot$ID)
            res <- data.frame(GeneJunction <- event,
                              ASE          <- "A3SS", 
                              ASE_IDX      <- idx,
                              Tissue       <- tissue2,
                              counts       <- counts[i],
                              Display      <- tissue_reduction[tissue_reduction$SMTSD == tissue2, "display_name"],
                              GeneSymbol   <- a3ss_annot$geneSymbol[idx],
                              GeneID       <- a3ss_annot$GeneID[idx],
                              chr          <- a3ss_annot$chr[idx],
                              logFC        <- lines$logFC,
                              AveExpr      <- lines$AveExpr,
                              t            <- lines$t,
                              PValue       <- lines$P.Value,
                              AdjPVal      <- lines$adj.P.Val,
                              B            <- lines$B)
            colnames(res) <- c("GeneJunction","ASE","ASE_IDX","Tissue","counts","Display",
                               "GeneSymbol","GeneID","chr","logFC","AveExpr","t","PValue","AdjPVal","B")
            gene_as <- rbind(gene_as,res)
            
        } else if (grepl("^a5ss_", files[i])) {
            # remove the first 5 letters of the string 
            tissue2 <- substring(tissue1,6)
            idx <- match(event_idx, a5ss_annot$ID)
            res <- data.frame(GeneJunction <- event,
                              ASE          <- "A5SS", 
                              ASE_IDX      <- idx,
                              Tissue       <- tissue2,
                              counts       <- counts[i],
                              Display      <- tissue_reduction[tissue_reduction$SMTSD == tissue2, "display_name"],
                              GeneSymbol   <- a5ss_annot$geneSymbol[idx],
                              GeneID       <- a5ss_annot$GeneID[idx],
                              chr          <- a5ss_annot$chr[idx],
                              logFC        <- lines$logFC,
                              AveExpr      <- lines$AveExpr,
                              t            <- lines$t,
                              PValue       <- lines$P.Value,
                              AdjPVal      <- lines$adj.P.Val,
                              B            <- lines$B)
            colnames(res) <- c("GeneJunction","ASE","ASE_IDX","Tissue","counts","Display",
                               "GeneSymbol","GeneID","chr","logFC","AveExpr","t","PValue","AdjPVal","B")
            gene_as <- rbind(gene_as,res)
        } else if (grepl("^mxe_", files[i])) {
            # remove the first 4 letters of the string 
            tissue2 <- substring(tissue1,5)
            idx <- match(event_idx, a3ss_annot$ID)
            res <- data.frame(GeneJunction <- event,
                              ASE          <- "MXE", 
                              ASE_IDX      <- idx,
                              Tissue       <- tissue2,
                              counts       <- counts[i],
                              Display      <- tissue_reduction[tissue_reduction$SMTSD == tissue2, "display_name"],
                              GeneSymbol   <- mxe_annot$geneSymbol[idx],
                              GeneID       <- mxe_annot$GeneID[idx],
                              chr          <- mxe_annot$chr[idx],
                              logFC        <- lines$logFC,
                              AveExpr      <- lines$AveExpr,
                              t            <- lines$t,
                              PValue       <- lines$P.Value,
                              AdjPVal      <- lines$adj.P.Val,
                              B            <- lines$B)
            colnames(res) <- c("GeneJunction","ASE","ASE_IDX","Tissue","counts","Display",
                               "GeneSymbol","GeneID","chr","logFC","AveExpr","t","PValue","AdjPVal","B")
            gene_as <- rbind(gene_as,res)
        } else if (grepl("^se_", files[i])) {
            # remove the first 3 letters of the string 
            tissue2 <- substring(tissue1,4)
            idx <- match(event_idx, se_annot$ID)
            res <- data.frame(GeneJunction <- event,
                              ASE          <- "SE", 
                              ASE_IDX      <- idx,
                              Tissue       <- tissue2,
                              counts       <- counts[i],
                              Display      <- tissue_reduction[tissue_reduction$SMTSD == tissue2, "display_name"],
                              GeneSymbol   <- se_annot$geneSymbol[idx],
                              GeneID       <- se_annot$GeneID[idx],
                              chr          <- se_annot$chr[idx],
                              logFC        <- lines$logFC,
                              AveExpr      <- lines$AveExpr,
                              t            <- lines$t,
                              PValue       <- lines$P.Value,
                              AdjPVal      <- lines$adj.P.Val,
                              B            <- lines$B)
            colnames(res) <- c("GeneJunction","ASE","ASE_IDX","Tissue","counts","Display",
                               "GeneSymbol","GeneID","chr","logFC","AveExpr","t","PValue","AdjPVal","B")
            gene_as <- rbind(gene_as,res)
        } else if (grepl("^ri_", files[i])){
            # remove the first 3 letters of the string 
            tissue2 <- substring(tissue1,4)
            idx <- match(event_idx, ri_annot$ID)
            res <- data.frame(GeneJunction <- event,
                              ASE          <- "RI", 
                              ASE_IDX      <- idx,
                              Tissue       <- tissue2,
                              counts       <- counts[i],
                              Display      <- tissue_reduction[tissue_reduction$SMTSD == tissue2, "display_name"],
                              GeneSymbol   <- ri_annot$geneSymbol[idx],
                              GeneID       <- ri_annot$GeneID[idx],
                              chr          <- ri_annot$chr[idx],
                              logFC        <- lines$logFC,
                              AveExpr      <- lines$AveExpr,
                              t            <- lines$t,
                              PValue       <- lines$P.Value,
                              AdjPVal      <- lines$adj.P.Val,
                              B            <- lines$B)
            colnames(res) <- c("GeneJunction","ASE","ASE_IDX","Tissue","counts","Display",
                               "GeneSymbol","GeneID","chr","logFC","AveExpr","t","PValue","AdjPVal","B")
            gene_as <- rbind(gene_as,res)
        }
        
    } #if has sig. events
    
} #for all files
colnames(gene_as) <- c("GeneJunction","ASE","ASE_IDX","Tissue","counts","Display","GeneSymbol","GeneID","chr","logFC","AveExpr","t","PValue","AdjPVal","B")
n_unique_genes <- length(summary(as.factor(gene_as$GeneSymbol),maxsum=50000))
message("We extracted a total of ",nrow(gene_as)," significant alternative splicing events (gene_as)")
message("This includes ", n_unique_genes, " total genes")

We extracted a total of 12740 significant alternative splicing events (gene_as)

This includes 2887 total genes



In [34]:
head(gene_as)

Unnamed: 0_level_0,GeneJunction,ASE,ASE_IDX,Tissue,counts,Display,GeneSymbol,GeneID,chr,logFC,AveExpr,t,PValue,AdjPVal,B
Unnamed: 0_level_1,<fct>,<fct>,<int>,<fct>,<int>,<fct>,<fct>,<fct>,<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,XIST-2253,A3SS,2253,adipose_subcutaneous,4,Adipose (sc),XIST,ENSG00000229807.11,chrX,-4.4086049,3.196317,-36.48897,4.635568e-154,3.893877e-150,310.016049
2,XIST-2252,A3SS,2252,adipose_subcutaneous,4,Adipose (sc),XIST,ENSG00000229807.11,chrX,-2.4147126,3.64769,-21.921057,1.444102e-78,6.065229000000001e-75,160.028167
3,GREB1L-4933,A3SS,4933,adipose_subcutaneous,4,Adipose (sc),GREB1L,ENSG00000141449.14,chr18,1.2793173,2.115005,7.123138,3.052112e-12,8.545914e-09,16.692429
4,RHCG-1776,A3SS,1776,adipose_subcutaneous,4,Adipose (sc),RHCG,ENSG00000140519.14,chr15,-0.6930009,1.636472,-3.922124,9.797866e-05,0.03919146,1.142232
5,XIST-2253,A3SS,2253,adipose_visceral_omentum,12,Adipose (v),XIST,ENSG00000229807.11,chrX,-4.4403352,3.113532,-33.9508,2.654474e-123,2.209585e-119,241.826117
6,XIST-2252,A3SS,2252,adipose_visceral_omentum,12,Adipose (v),XIST,ENSG00000229807.11,chrX,-2.4506832,3.650617,-18.890779,2.817671e-58,1.172715e-54,114.731682


### 3 Data Structures for Figures

### 3.1 gene_as.tsv

This file contains (description)
Here is a typical line
<pre>
A data.frame: 6 × 15
GeneJunction	ASE	ASE_IDX	Tissue	counts	Display	GeneSymbol	GeneID	chr	logFC	AveExpr	t	PValue	AdjPVal	B
<fct>	<fct>	<int>	<fct>	<int>	<fct>	<fct>	<fct>	<fct>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
1	XIST-2253	A3SS	2253	adipose_subcutaneous	4	Adipose (sc)	XIST	ENSG00000229807.11	chrX	-4.4086049	3.196317	-36.488970	4.635568e-154	3.893877e-150	310.016049
2	XIST-2252	A3SS	2252	adipose_subcutaneous	4	Adipose (sc)	XIST	ENSG00000229807.11	chrX	-2.4147126	3.647690	-21.921057	1.444102e-78	6.065229e-75	160.028167
3	GREB1L-4933	A3SS	4933	adipose_subcutaneous	4	Adipose (sc)	GREB1L	ENSG00000141449.14	chr18	1.2793173	2.115005	7.123138	3.052112e-12	8.545914e-09	16.692429
4	RHCG-1776	A3SS	1776	adipose_subcutaneous	4	Adipose (sc)	RHCG	ENSG00000140519.14	chr15	-0.6930009	1.636472	-3.922124	9.797866e-05	3.919146e-02	1.142232
5	XIST-2253	A3SS	2253	adipose_visceral_omentum	12	Adipose (v)	XIST	ENSG00000229807.11	chrX	-4.4403352	3.113532	-33.950800	2.654474e-123	2.209585e-119	241.826117
6	XIST-2252	A3SS	2252	adipose_visceral_omentum	12	Adipose (v)	XIST	ENSG00000229807.11	chrX	-2.4506832	3.650617	-18.890779	2.817671e-58	1.172715e-54	114.731682
</pre>
There are 2887 significant events in the file.

In [35]:
glimpse(gene_as)
gene_as$Tissue <- factor(gene_as$Tissue)
length(levels(gene_as$Tissue))
table(is.na(gene_as$Display))
table(gene_as$Display)
colnames(gene_as)
write.table(gene_as, "../data/gene_as.tsv", quote=FALSE, sep="\t")
head(gene_as)
tissue_reduction$display_name <- factor(tissue_reduction$display_name)

Observations: 12,740
Variables: 15
$ GeneJunction [3m[90m<fct>[39m[23m XIST-2253, XIST-2252, GREB1L-4933, RHCG-1776, XIST-2253,…
$ ASE          [3m[90m<fct>[39m[23m A3SS, A3SS, A3SS, A3SS, A3SS, A3SS, A3SS, A3SS, A3SS, A3…
$ ASE_IDX      [3m[90m<int>[39m[23m 2253, 2252, 4933, 1776, 2253, 2252, 4819, 4818, 4820, 45…
$ Tissue       [3m[90m<fct>[39m[23m adipose_subcutaneous, adipose_subcutaneous, adipose_subc…
$ counts       [3m[90m<int>[39m[23m 4, 4, 4, 4, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, …
$ Display      [3m[90m<fct>[39m[23m Adipose (sc), Adipose (sc), Adipose (sc), Adipose (sc), …
$ GeneSymbol   [3m[90m<fct>[39m[23m XIST, XIST, GREB1L, RHCG, XIST, XIST, WNT2, WNT2, WNT2, …
$ GeneID       [3m[90m<fct>[39m[23m ENSG00000229807.11, ENSG00000229807.11, ENSG00000141449.…
$ chr          [3m[90m<fct>[39m[23m chrX, chrX, chr18, chr15, chrX, chrX, chr7, chr7, chr7, …
$ logFC        [3m[90m<dbl>[39m[23m -4.4086049, -2.4147126, 1.2793173, -0.69300


FALSE 
12740 


         Adipose (sc)           Adipose (v)         Adrenal gland 
                  156                   116                    92 
                Aorta      Atrial appendage                Breast 
                  278                    54                  8722 
              Caudate Cerebellar hemisphere            Cerebellum 
                   32                    24                    70 
      Coronary artery                Cortex       EBV-lymphocytes 
                   42                    52                    46 
      Esophagus (gej)         Esophagus (m)        Esophagus (mu) 
                   58                    66                   522 
          Fibroblasts        Frontal cortex           Hippocampus 
                  194                    22                    72 
         Hypothalamus        Left ventricle                 Liver 
                   26                    48                    74 
                 Lung     Nucleus accumbens              Panc

Unnamed: 0_level_0,GeneJunction,ASE,ASE_IDX,Tissue,counts,Display,GeneSymbol,GeneID,chr,logFC,AveExpr,t,PValue,AdjPVal,B
Unnamed: 0_level_1,<fct>,<fct>,<int>,<fct>,<int>,<fct>,<fct>,<fct>,<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,XIST-2253,A3SS,2253,adipose_subcutaneous,4,Adipose (sc),XIST,ENSG00000229807.11,chrX,-4.4086049,3.196317,-36.48897,4.635568e-154,3.893877e-150,310.016049
2,XIST-2252,A3SS,2252,adipose_subcutaneous,4,Adipose (sc),XIST,ENSG00000229807.11,chrX,-2.4147126,3.64769,-21.921057,1.444102e-78,6.065229000000001e-75,160.028167
3,GREB1L-4933,A3SS,4933,adipose_subcutaneous,4,Adipose (sc),GREB1L,ENSG00000141449.14,chr18,1.2793173,2.115005,7.123138,3.052112e-12,8.545914e-09,16.692429
4,RHCG-1776,A3SS,1776,adipose_subcutaneous,4,Adipose (sc),RHCG,ENSG00000140519.14,chr15,-0.6930009,1.636472,-3.922124,9.797866e-05,0.03919146,1.142232
5,XIST-2253,A3SS,2253,adipose_visceral_omentum,12,Adipose (v),XIST,ENSG00000229807.11,chrX,-4.4403352,3.113532,-33.9508,2.654474e-123,2.209585e-119,241.826117
6,XIST-2252,A3SS,2252,adipose_visceral_omentum,12,Adipose (v),XIST,ENSG00000229807.11,chrX,-2.4506832,3.650617,-18.890779,2.817671e-58,1.172715e-54,114.731682


In [36]:
head(gene_as[gene_as$chr=="chrX",])
x_as_events <- gene_as[gene_as$chr=="chrX",]
message("There were ",nrow(gene_as)," total significant alternative splicing events (gene_as)")
message("There were ",nrow(x_as_events)," total significant alternative splicing events on the X chromosome (gene_as)")
message("i.e., ", (100*nrow(x_as_events)/nrow(gene_as)), "% of all significant AS events were on the X chromosome")

Unnamed: 0_level_0,GeneJunction,ASE,ASE_IDX,Tissue,counts,Display,GeneSymbol,GeneID,chr,logFC,AveExpr,t,PValue,AdjPVal,B
Unnamed: 0_level_1,<fct>,<fct>,<int>,<fct>,<int>,<fct>,<fct>,<fct>,<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,XIST-2253,A3SS,2253,adipose_subcutaneous,4,Adipose (sc),XIST,ENSG00000229807.11,chrX,-4.408605,3.196317,-36.48897,4.635568e-154,3.893877e-150,310.016
2,XIST-2252,A3SS,2252,adipose_subcutaneous,4,Adipose (sc),XIST,ENSG00000229807.11,chrX,-2.414713,3.64769,-21.92106,1.444102e-78,6.065229000000001e-75,160.0282
5,XIST-2253,A3SS,2253,adipose_visceral_omentum,12,Adipose (v),XIST,ENSG00000229807.11,chrX,-4.440335,3.113532,-33.9508,2.654474e-123,2.209585e-119,241.8261
6,XIST-2252,A3SS,2252,adipose_visceral_omentum,12,Adipose (v),XIST,ENSG00000229807.11,chrX,-2.450683,3.650617,-18.89078,2.817671e-58,1.172715e-54,114.7317
17,XIST-2253,A3SS,2253,adrenal_gland,6,Adrenal gland,XIST,ENSG00000229807.11,chrX,-4.59924,3.465058,-30.97273,2.477109e-94,2.0314769999999998e-90,174.6587
18,XIST-2252,A3SS,2252,adrenal_gland,6,Adrenal gland,XIST,ENSG00000229807.11,chrX,-2.668392,3.991548,-21.0579,1.509534e-60,6.189843999999999e-57,115.7385


There were 12740 total significant alternative splicing events (gene_as)

There were 1566 total significant alternative splicing events on the X chromosome (gene_as)

i.e., 12.2919937205651% of all significant AS events were on the X chromosome



### 3.2 Tissue specific data frame

In [37]:
data <- data.frame(Tissue=gene_as$Display, ASE=gene_as$ASE, Counts=gene_as$counts)

numberOfUniqueTissues <- length(summary(as.factor(data$Tissue),maxsum=500))
numberOfASEmechanisms <- length(summary(as.factor(data$ASE),maxsum=500))

message("data now has ",numberOfUniqueTissues, " tissues and ", numberOfASEmechanisms, " ASE categories")
message("ASE:")
summary(as.factor(data$ASE),maxsum=500)

data now has 39 tissues and 5 ASE categories

ASE:



### 3.3 Count splicing event by chromosome

Count the number of significant alternative splicing events per chromosome and write to the file **Total_AS_by_chr.tsv**.

In [38]:
res2 <- gene_as          %>% 
       group_by(chr)    %>% 
       count(chr)       %>% 
       arrange(desc(n)) %>% 
       as.data.frame()
res2$chr <- factor(res2$chr, levels = res2$chr)
length(res2$chr)
res2
glimpse(res2)
write.table(res2, file= "../data/Total_AS_by_chr.tsv", sep="\t", quote = FALSE, row.names=F)

chr,n
<fct>,<int>
chrX,1566
chr1,1272
chr19,884
chr11,748
chr2,748
chr3,736
chr17,726
chr12,668
chr16,626
chr4,584


Observations: 23
Variables: 2
$ chr [3m[90m<fct>[39m[23m chrX, chr1, chr19, chr11, chr2, chr3, chr17, chr12, chr16, chr4, …
$ n   [3m[90m<int>[39m[23m 1566, 1272, 884, 748, 748, 736, 726, 668, 626, 584, 496, 486, 450…


### 3.4 Count most frequent spliced genes 

In [39]:
res3 <- gene_as %>% 
       group_by(GeneSymbol) %>% count(GeneSymbol) %>% arrange(desc(n)) %>% as.data.frame()
res3$GeneSymbol <- factor(res3$GeneSymbol, levels = res3$GeneSymbol)
length(res3$GeneSymbol)
head(res3)
write.table(res3, file = "../data/Total_AS_by_geneSymbol.tsv", sep = "\t", quote=FALSE, row.names = F)

Unnamed: 0_level_0,GeneSymbol,n
Unnamed: 0_level_1,<fct>,<int>
1,XIST,684
2,DDX3X,194
3,KDM5C,94
4,ZFX,94
5,KDM6A,54
6,UCA1,50


### 3.5 Count most frequent splicing by tissue

In [40]:
res4 <- gene_as %>% 
       group_by(Display) %>% 
       count(Display) %>% 
       arrange(desc(n)) %>% 
       as.data.frame()
res4$Display <- factor(res4$Display, levels = res4$Display)
length(res4$Display)
res4
write.table(res4, file = "../data/Total_AS_by_tissue.tsv", sep = "\t", row.names = F)

Display,n
<fct>,<int>
Breast,8722
Nucleus accumbens,670
Esophagus (mu),522
Aorta,278
Thyroid,208
Fibroblasts,194
Skeletal muscle,160
Adipose (sc),156
Pituitary,130
Skin (exposed),128


###  3.6 Significant Count by splicing type 
We define **significant** to be FC > 1.5 and pVal < 0.05

Our starting values were the significant events, all meeting the criteria FC > 1.5 and pVal < 0.05


In [41]:
res5 <- gene_as %>% group_by(ASE) %>% count(ASE) %>% arrange(desc(n)) %>% as.data.frame()
res5$ASE <- factor(res5$ASE, levels = res5$ASE)
head(res5)
write.table(res5, file= "../data/Total_AS_by_splicingtype.tsv")

Unnamed: 0_level_0,ASE,n
Unnamed: 0_level_1,<fct>,<int>
1,SE,9278
2,A3SS,1386
3,A5SS,842
4,RI,836
5,MXE,398


###  3.7 Significant Count by splicing type (significant == FC > 1.5 and pVal < 0.05)

In [42]:
A3SS_keep <- as.character(gene_as$ASE) %in% "A3SS"
table(A3SS_keep)
A3SS.gene_as <- data.frame(gene_as[A3SS_keep == TRUE,])

A5SS_keep <- as.character(gene_as$ASE) %in% "A5SS"
table(A5SS_keep)
A5SS.gene_as <- data.frame(gene_as[A5SS_keep == TRUE,])

MXE_keep  <- as.character(gene_as$ASE) %in% "MXE"
table(MXE_keep)
MXE.gene_as <- data.frame(gene_as[MXE_keep == TRUE,])

SE_keep   <- as.character(gene_as$ASE) %in% "SE"
table(SE_keep)
SE.gene_as <- data.frame(gene_as[SE_keep == TRUE,])

RI_keep   <- as.character(gene_as$ASE) %in% "RI"
table(RI_keep)
RI.gene_as <- data.frame(gene_as[RI_keep == TRUE,])

dim(A3SS.gene_as)
dim(A5SS.gene_as)
dim(MXE.gene_as)
dim(SE.gene_as)
dim(RI.gene_as)


A3SS_keep
FALSE  TRUE 
11354  1386 

A5SS_keep
FALSE  TRUE 
11898   842 

MXE_keep
FALSE  TRUE 
12342   398 

SE_keep
FALSE  TRUE 
 3462  9278 

RI_keep
FALSE  TRUE 
11904   836 

### 3.8 Siginficant spliced by Gene for each splicing factor

In [43]:
A3SS.res <- A3SS.gene_as %>% group_by(GeneSymbol) %>% count(GeneSymbol) %>% arrange(desc(n)) %>% as.data.frame()
A3SS.res$GeneSymbol <- factor(A3SS.res$GeneSymbol, levels = A3SS.res$GeneSymbol)
message("Significant spliced genes for A3SS\n",
        paste(length(A3SS.res$GeneSymbol)), collapse=" ")
head(A3SS.res)

A5SS.res <- A5SS.gene_as %>% group_by(GeneSymbol) %>% count(GeneSymbol) %>% arrange(desc(n)) %>% as.data.frame()
A5SS.res$GeneSymbol <- factor(A5SS.res$GeneSymbol, levels = A5SS.res$GeneSymbol)
message("Significant spliced genes for A5SS\n",
        paste(length(A5SS.res$GeneSymbol)), collapse=" ")
head(A5SS.res)

MXE.res <- MXE.gene_as %>% group_by(GeneSymbol) %>% count(GeneSymbol) %>% arrange(desc(n)) %>% as.data.frame()
MXE.res$GeneSymbol <- factor(MXE.res$GeneSymbol, levels = MXE.res$GeneSymbol)
message("Significant spliced genes for MXE\n",
        paste(length(MXE.res$GeneSymbol)), collapse=" ")
head(MXE.res)

RI.res <- RI.gene_as %>% group_by(GeneSymbol) %>% count(GeneSymbol) %>% arrange(desc(n)) %>% as.data.frame()
RI.res$GeneSymbol <- factor(RI.res$GeneSymbol, levels = RI.res$GeneSymbol)
message("Significant spliced genes for RI\n",
        paste(length(RI.res$GeneSymbol)), collapse=" ")
head(RI.res)

SE.res <- SE.gene_as %>% group_by(GeneSymbol) %>% count(GeneSymbol) %>% arrange(desc(n)) %>% as.data.frame()
SE.res$GeneSymbol <- factor(SE.res$GeneSymbol, levels = SE.res$GeneSymbol)
message("Significant spliced genes for SE\n",
        paste(length(SE.res$GeneSymbol)), collapse=" ")
head(SE.res)

Significant spliced genes for A3SS
455 



Unnamed: 0_level_0,GeneSymbol,n
Unnamed: 0_level_1,<fct>,<int>
1,XIST,154
2,DDX3X,42
3,UCA1,22
4,HAND2-AS1,14
5,STRA6,12
6,NDRG4,12


Significant spliced genes for A5SS
329 



Unnamed: 0_level_0,GeneSymbol,n
Unnamed: 0_level_1,<fct>,<int>
1,DDX3X,46
2,PUDP,26
3,MYB,12
4,LINC01198,8
5,FRMD5,8
6,WDR31,8


Significant spliced genes for MXE
125 



Unnamed: 0_level_0,GeneSymbol,n
Unnamed: 0_level_1,<fct>,<int>
1,XIST,44
2,DDX3X,14
3,SORBS2,10
4,AMT,8
5,ACSL6,8
6,CACNA1D,6


Significant spliced genes for RI
327 



Unnamed: 0_level_0,GeneSymbol,n
Unnamed: 0_level_1,<fct>,<int>
1,DDX3X,28
2,NLRP2,8
3,UCA1,8
4,PLCXD1,8
5,CELSR2,8
6,MYH14,8


Significant spliced genes for SE
2361 



Unnamed: 0_level_0,GeneSymbol,n
Unnamed: 0_level_1,<fct>,<int>
1,XIST,486
2,KDM5C,94
3,ZFX,94
4,DDX3X,64
5,KDM6A,48
6,CES1,36


### 3.9 Count most frequent spliced genes

In [44]:
res <- gene_as %>% group_by(GeneSymbol) %>% count(GeneSymbol) %>% arrange(desc(n)) %>% as.data.frame()
res$GeneSymbol <- factor(res$GeneSymbol, levels = res$GeneSymbol)
length(res$GeneSymbol)
res2 <- data %>% group_by(Tissue) %>% 
    summarise(Total = sum(Counts)) %>%
    arrange(desc(Total)) %>%
    as.data.frame()

#Add number of tissues
nTissues <- rep(NA, length(res))
for (i in 1:nrow(res)) {
  df_gene <- gene_as %>% filter(GeneSymbol == res$GeneSymbol[i])
  nTissues[i] <- length(unique(df_gene$Tissue))
}
res$Tissues <- nTissues
head(res)
write.table(res, file = "../data/genesWithCommonAS.tsv", sep = "\t", quote = F, row.names = F)

Unnamed: 0_level_0,GeneSymbol,n,Tissues
Unnamed: 0_level_1,<fct>,<int>,<int>
1,XIST,684,39
2,DDX3X,194,20
3,KDM5C,94,33
4,ZFX,94,27
5,KDM6A,54,24
6,UCA1,50,1


### 3.10 Count most frequent spliced chromosomes
To get an indication of which chromosome has the most frequent slicing event (regardless of type)
We create an index based upon the number of exons per chromosome.

get the annotation file, at this writing, gencode.v30.annotation.gtf
The information as to the number of exons within the chromosome may be found there

In [45]:
if (!("gencode.v30.annotation.gtf.gz" %in% list.files("../data/"))) {
    message("downloading gencode v30 annotation\n")
    system("wget -O ../data/gencode.v30.annotation.gtf.gz ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_30/gencode.v30.annotation.gtf.gz")
    message("Done!\n")
    message("Unzipping compressed file gencode.v30.annotation.gtf.gz..")
    system("gunzip ../data/gencode.v30.annotation.gtf.gz", intern = TRUE)
    message("Done! gencode.v30.annotation.gtf can be found in ../data/")
}
gencode <- import("../data/gencode.v30.annotation.gtf")

downloading gencode v30 annotation


Done!


Unzipping compressed file gencode.v30.annotation.gtf.gz..

Done! gencode.v30.annotation.gtf can be found in ../data/



In [46]:
exons <- gencode[ gencode$type == "exon", ]
exons <- as.data.frame(exons)

#Obtain chromosomes we have splicing information for (recall we did not use chr Y in our analysis)
all_chr <- as.character(unique(gene_as$chr))
chr_counts <- rep(0, length(all_chr))


for (i in 1:length(all_chr)) {
  chr_counts[i] <- nrow(exons[exons$seqnames == all_chr[i], ])
}

exon_counts <- data.frame(chr = all_chr, counts = chr_counts)

# Count most frequent spliced chromosomes
res <- gene_as %>% group_by(chr) %>% count(chr) %>% arrange(desc(n)) %>% as.data.frame()
res$chr <- factor(res$chr, levels = res$chr)

idx <- match(res$chr, exon_counts$chr)

res$ExonCounts <- exon_counts$counts[idx]

res$Index <- (res$n / res$ExonCounts) * 1000

res_sorted <- res %>% arrange(desc(Index))
res_sorted$chr <- factor(res_sorted$chr, levels = res_sorted$chr)
glimpse(res_sorted)

Observations: 23
Variables: 4
$ chr        [3m[90m<fct>[39m[23m chrX, chr22, chr19, chr4, chr1, chr16, chr11, chr15, chr17…
$ n          [3m[90m<int>[39m[23m 1566, 342, 884, 584, 1272, 626, 748, 450, 726, 668, 736, 4…
$ ExonCounts [3m[90m<dbl>[39m[23m 40029, 28655, 74466, 50420, 118996, 61199, 75976, 47343, 7…
$ Index      [3m[90m<dbl>[39m[23m 39.121637, 11.935090, 11.871190, 11.582705, 10.689435, 10.…


In [47]:
write.table(data,       file = "../data/Significant_AS_events.tsv", sep = "\t", row.names = F, quote = F)
write.table(res_sorted, file = "../data/SplicingIndex_chr.tsv", sep = "\t", quote = F, row.names = F)

### 3.11 gene_dge.tsv

Single data structure for all the DGE results

The files called (tissue)\_DGE\_refined.csv contain lists of genes found to have statistically significant differential expression.
The mapping files contain the ENSG id to gene symbol maps.

In [48]:
significant_results_dir = "../data/"
pattern = "_DGE_refined.csv"
files <- list.files(path = significant_results_dir, pattern = pattern)
map_pattern <- "_DGE_ensg_map.csv"
map_files <- list.files(path = significant_results_dir, pattern = map_pattern)
message("We got ", length(files), " files with significant DGEs and ", length(map_files), " mapping files")

We got 39 files with significant DGEs and 39 mapping files



In [49]:
gene_dge = data.frame()
counts <- rep(NA, length(files))

In [50]:
for (i in 1:length(files)) {
   lines  <- read.table(file=paste0(significant_results_dir, files[i]), 
                                     header = TRUE, sep = ",", quote = "\"'", skipNul = FALSE)
    if (dim(lines)[1] > 0) {
         tissue1    <- gsub(pattern,"", files[i], fixed = TRUE)
         map_lines  <- read.table(file=paste0(paste0(significant_results_dir, tissue1),map_pattern),
                                     header = TRUE, sep = ",", quote = "\"'", skipNul = FALSE)
         counts[i]  <- dim(lines)[1]    
         tissue1    <- gsub(pattern,"", files[i], fixed = TRUE)
         map_lines  <- read.table(file=paste0(paste0(significant_results_dir, tissue1),map_pattern),
                                     header = TRUE, sep = ",", quote = "\"'", skipNul = FALSE)
         ensg_ver   <- as.vector(as.character(rownames(lines)))
         ensg_no_ver<- as.vector(as.character(map_lines$ensg_names))
         ensg_genes <- as.vector(as.character(map_lines$ensg_genes))
         counts[i]  <- dim(lines)[1]  
         res <- data.frame(Tissue       <- tissue1,
                           ENSG_ver     <- ensg_ver,
                           ENSG_no_ver  <- ensg_no_ver,
                           GeneSymbol   <- ensg_genes,
                           counts       <- counts[i],
                           Display      <- tissue_reduction[tissue_reduction$SMTSD == tissue1, "display_name"],
                           logFC        <- lines$logFC,
                           AveExpr      <- lines$AveExpr,
                           t            <- lines$t,
                           PValue       <- lines$P.Value,
                           AdjPVal      <- lines$adj.P.Val,
                           B            <- lines$B)
         colnames(res) <- c("Tissue","ENSG_ver","ENSG_no_ver","GeneSymbol","counts","Display",
                            "logFC","AveExpr","t","PValue","AdjPVal","B")
         gene_dge <- rbind(gene_dge, res)
    } #if has sig. events
} #for all files
colnames(gene_dge) <- c("Tissue","ENSG_ver","ENSG_no_ver","GeneSymbol","counts","Display",
                        "logFC","AveExpr","t","PValue","AdjPVal","B")
n_unique_genes <- length(summary(as.factor(gene_dge$GeneSymbol),maxsum=50000))
message("We extracted a total of ",nrow(gene_dge)," significant differential gene events (gene_dge)")
message("This includes ", n_unique_genes, " total genes")

We extracted a total of 12633 significant differential gene events (gene_dge)

This includes 7417 total genes



In [51]:
write.table(gene_dge, "../data/gene_dge.tsv", quote=FALSE, sep="\t")

### Appendix - Metadata

For replicability and reproducibility purposes, we also print the following metadata:

1. Checksums of **'artefacts'**, files generated during the analysis and stored in the folder directory **`data`**
2. List of environment metadata, dependencies, versions of libraries using `utils::sessionInfo()` and [`devtools::session_info()`](https://devtools.r-lib.org/reference/session_info.html)

### Appendix 1. Checksums with the sha256 algorithm

In [52]:
rm (notebookid)
notebookid   = "countGenesAndEvents"
notebookid

message("Generating sha256 checksums of the file `../data/Total_AS_by_tissue.tsv` directory .. ")
system(paste0("cd ../data && find . -name SplicingIndex_chr.tsv -exec sha256sum {} \\;  >  ../metadata/", notebookid, "_sha256sums.txt"), intern = TRUE)
message("Done!\n")

message("Generating sha256 checksums of the file `../data/Significant_events.tsv` directory .. ")
system(paste0("cd ../data && find . -name SplicingIndex_chr.tsv -exec sha256sum {} \\;  >  ../metadata/", notebookid, "_sha256sums.txt"), intern = TRUE)
message("Done!\n")

message("Generating sha256 checksums of the file `../data/Significant_events.tsv` directory .. ")
system(paste0("cd ../data && find . -name SplicingIndex_chr.tsv -exec sha256sum {} \\;  >  ../metadata/", notebookid, "_sha256sums.txt"), intern = TRUE)
message("Done!\n")


paste0("../metadata/", notebookid, "_sha256sums.txt")

data.table::fread(paste0("../metadata/", notebookid, "_sha256sums.txt"), header = FALSE, col.names = c("sha256sum", "file"))

“object 'notebookid' not found”


Generating sha256 checksums of the file `../data/Total_AS_by_tissue.tsv` directory .. 



Done!


Generating sha256 checksums of the file `../data/Significant_events.tsv` directory .. 



Done!


Generating sha256 checksums of the file `../data/Significant_events.tsv` directory .. 



Done!




sha256sum,file
<chr>,<chr>
3fc39e8757ee71d9314c4e8d860a2b89d1cd38737eabbbf6e31ab5c45f635930,./SplicingIndex_chr.tsv


### Appendix 2. Libraries metadata

In [53]:
dev_session_info   <- devtools::session_info()
utils_session_info <- utils::sessionInfo()

message("Saving `devtools::session_info()` objects in ../metadata/devtools_session_info.rds  ..")
saveRDS(dev_session_info, file = paste0("../metadata/", notebookid, "_devtools_session_info.rds"))
message("Done!\n")

message("Saving `utils::sessionInfo()` objects in ../metadata/utils_session_info.rds  ..")
saveRDS(utils_session_info, file = paste0("../metadata/", notebookid ,"_utils_info.rds"))
message("Done!\n")

dev_session_info$platform
dev_session_info$packages[dev_session_info$packages$attached==TRUE, ]

Saving `devtools::session_info()` objects in ../metadata/devtools_session_info.rds  ..

Done!


Saving `utils::sessionInfo()` objects in ../metadata/utils_session_info.rds  ..

Done!




 setting  value                       
 version  R version 3.6.2 (2019-12-12)
 os       Ubuntu 18.04.3 LTS          
 system   x86_64, linux-gnu           
 ui       X11                         
 language en_US.UTF-8                 
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       Etc/UTC                     
 date     2020-06-18                  

Unnamed: 0_level_0,package,ondiskversion,loadedversion,path,loadedpath,attached,is_base,date,source,md5ok,library
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<lgl>,<lgl>,<chr>,<chr>,<lgl>,<fct>
Biobase,Biobase,2.46.0,2.46.0,/opt/conda/lib/R/library/Biobase,/opt/conda/lib/R/library/Biobase,True,False,2019-10-29,Bioconductor,,/opt/conda/lib/R/library
BiocGenerics,BiocGenerics,0.32.0,0.32.0,/opt/conda/lib/R/library/BiocGenerics,/opt/conda/lib/R/library/BiocGenerics,True,False,2019-10-29,Bioconductor,,/opt/conda/lib/R/library
dplyr,dplyr,0.8.4,0.8.4,/opt/conda/lib/R/library/dplyr,/opt/conda/lib/R/library/dplyr,True,False,2020-01-31,CRAN (R 3.6.2),,/opt/conda/lib/R/library
edgeR,edgeR,3.28.0,3.28.0,/opt/conda/lib/R/library/edgeR,/opt/conda/lib/R/library/edgeR,True,False,2019-10-29,Bioconductor,,/opt/conda/lib/R/library
GenomeInfoDb,GenomeInfoDb,1.22.0,1.22.0,/opt/conda/lib/R/library/GenomeInfoDb,/opt/conda/lib/R/library/GenomeInfoDb,True,False,2019-10-29,Bioconductor,,/opt/conda/lib/R/library
GenomicRanges,GenomicRanges,1.38.0,1.38.0,/opt/conda/lib/R/library/GenomicRanges,/opt/conda/lib/R/library/GenomicRanges,True,False,2019-10-29,Bioconductor,,/opt/conda/lib/R/library
ggplot2,ggplot2,3.2.1,3.2.1,/opt/conda/lib/R/library/ggplot2,/opt/conda/lib/R/library/ggplot2,True,False,2019-08-10,CRAN (R 3.6.1),,/opt/conda/lib/R/library
IRanges,IRanges,2.20.0,2.20.0,/opt/conda/lib/R/library/IRanges,/opt/conda/lib/R/library/IRanges,True,False,2019-10-29,Bioconductor,,/opt/conda/lib/R/library
limma,limma,3.42.0,3.42.0,/opt/conda/lib/R/library/limma,/opt/conda/lib/R/library/limma,True,False,2019-10-29,Bioconductor,,/opt/conda/lib/R/library
multtest,multtest,2.42.0,2.42.0,/opt/conda/lib/R/library/multtest,/opt/conda/lib/R/library/multtest,True,False,2019-10-29,Bioconductor,,/opt/conda/lib/R/library
