# Gene Expression Analysis of Rat Testis Across Developmental Stages

## Brad Hansen

### Data from [NCBI GEO](https://www.ncbi.nlm.nih.gov/geo/)

Search terms: *testis*, *rat*, *testes*, *postnatal*,

Only RNA-seq data used (not microarray).

Data used:

GEO study [GSE108348](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE108348)

> Study GSE108348 includes transcriptomes from three species (*mouse, rat, chicken*), four organs (*testis, brain, liver, kidney*), across five developmental stages (*E 13.5, E 18.5-19, PND 1-2, 8-10 weeks, 24 months*). This project considers the testis data for rats (Wistar) across the development stages. The data is Bulk RNA-seq from a Illumina HiSeq 2000.

GEO study [GSE162152](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE162152)

> Study GSE162152 includes gene expression profiling of testis (*also liver and cerebellum*) across five species (*opossum, mouse, rat, rhesus macaque, human*). The authors specifically targeted circRNAs through RNA R treatment, though this analysis only uses the untreated samples. The data is Bulk RNA-seq from Illumina HiSeq 2500. Rat samples taken at 16 weeks of age.

GEO study [GSE125483](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE125483)

> Study GSE1235483 includes gene expression profiles from 12 tissues, across 4 species (*cynomolgus macaque, mouse, rat, and dog*). Here we use testis data from 9 week old brown norway rats.

GEO study [GSE85420](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85420)

> Study GSE85420 includes gene expression profiling after exposure to 2,2',4'4' --Tetrabromodiphenyl Ether. This study looks at the testis RNA-seq data for the control group of Wistar rats at PND 120 (\~17 weeks)

GEO study [GSE41637](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE41637)

> Study GSE41637 looks at transciptome differences across mammals. The study reports that samples taken from animals *of breeding age* due to transcriptome stability. Samples analyses (for Rattus norvegicus) using Illumina Genome Analyzer IIx.

GEO study [GSE53960](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE53960)

> Study GSE53960 assesses the *"transcriptomic profiling across 11 organs, 4 ages, and 2 sexes of Fischer 344 rats."* This study uses the testis data from 2, 6, 21, and 104 week old Fischer 344 rats.


# Periods Covered


| Period     | Species    | Strain      | Source                        |
|-------|-----|-----|----|
| Embryonic Day 13.5 (*midstage embryo*)      | Rat        | Wistar                            | [GSE108348](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE108348)                                                                            |
| Embryonic Day 18.5-19 (*late Embryo*)       | Rat        | Wistar                            | [GSE108348](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE108348)                                                                            |
| Post Natal Day 1-2 (*neonate*)              | Rat        | Wistar                            | [GSE108348](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE108348)                                                                            |
| 2 weeks old (*young*)                       | Rat        | Fischer 344                       | [GSE53960](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE53960)                                                                              |
| 6 weeks old (*young*)                       | Rat        | Fischer 344                       | [GSE53960](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE53960)                                                                              |
| 8-10 weeks old (*young adult*), 9 weeks old | Rat        | Wistar, Brown rat                 | [GSE108348](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE108348), [GSE125483](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE125483) |
| 16 weeks old (*adult*)                      | Rat        | Norway Brown (*NCBI Taxon 10116*) | [GSE162152](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE162152)                                                                            |
| 17 weeks old (*adult*)                      | Rat        | Wistar                            | [GSE85420](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85420)                                                                              |
| 21 weeks old (*adult*)                      | Rat        | Fischer 344                       | [GSE53960](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE53960)                                                                              |
| 2 years old (*aged adult*)                  | Rat        | Wistar , Fisher 344               | [GSE108348](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE108348), [GSE53960](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE53960)   |
| *Breeding Age*                              | Rat        | Sprague-Dawley                    | [GSE41637](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE41637)                                                                              |


In [5]:
library(plyranges)
library(tidyverse)
library(dplyr)
library(plyr)
library(DESeq2)

Loading required package: BiocGenerics


Attaching package: ‘BiocGenerics’


The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs


The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min


Loading required package: IRanges

Loading required package: S4Vectors

Loading required package: stats4


Attaching package: ‘S4Vectors’


The following objects are masked from ‘package:base’:

    expand.grid, I, unname


Loading required package: GenomicRanges

Loading required package: GenomeInfoDb


Attaching package: ‘plyranges’


The following object is masked from ‘package:IRa

GEO study [GSE108348](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE108348)

Data available: text file of unique read counts 
    * Aligned with HISAT2 (2.0.5)
    * used Ensembl release 94 annotations as a reference and we assembled additional gene models with StringTie 1.3.5.
    * Genome build = rn6
    * uniquely mapped read counts using the Rsubread library in R (featureCounts) version 1.32.4, in R version 3.5.

SRR reads:
        SRR6396793
        SRR6396803
        SRR6396794
        SRR6396795
        SRR6396796
        SRR6396797
        SRR6396798
        SRR6396799
        SRR6396800
        SRR6396801
        SRR6396802

In [34]:
# load in files from STAR --quantMode genecounts

countPATH <- "/bigdata/faustmanlab/bch/gse108348_counts/counts"

files <- list.files(path=countPATH, pattern="*.tab", full.names=TRUE)
names(files) <- lapply(strsplit(files, "/|_"), "[",7)
countslist <- lapply(files, read.delim, sep="\t", header=FALSE)



In [41]:
colnames<- c("gene", "unstranded_counts", "htseq_count-1streadalign","htseq_countrev-2ndreadalign")


for (i in seq_along(countslist)){
  colnames(countslist[[i]]) <- colnames}



GEO study [GSE53960](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE53960)

TAR of text files:
    * 	Sequenced reads were trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence, then mapped to rn4 whole genome using TopHat v2.0.4 with default parameters
    * Alignment results were then processed using Cufflinks v2.0.2 for gene and transcript quantification with default parameters. For samples with 2~3 technical replicates, average FPKM (Fragment Per Kilobase per Million mapped reads) values were used.
    * Genome_build: rn4


SRR reads:
        SRR1170487
        SRR1170488
        SRR1170489
        SRR1170490
        SRR1170491
        SRR1170492
        SRR1170493
        SRR1170494
        SRR1170495
        SRR1170496
        SRR1170497
        SRR1170498
        SRR1170499
        SRR1170500
        SRR1170501
        SRR1170502
        SRR1170503
        SRR1170504
        SRR1170505
        SRR1170506
        SRR1170507
        SRR1170508
        SRR1170509
        SRR1170510
        SRR1170511
        SRR1170512
        SRR1170513
        SRR1170514
        SRR1170515
        SRR1170516
        SRR1170517
        SRR1170518

GEO study [GSE162152](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE162152)

txt files specific to circRNAs -- need raw data from SRA

SRA files:
        SRR13142136
        SRR13142137
        SRR13142138

GEO study [GSE85420](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85420) 

txt files with processed data:
    * mapped to the reference rat genome (rn5) using TopHat 2 aligner
    * Aligned reads were used for assembly of novel transcripts and differential expression of novel and reference transcripts with Cuffdiff 2.1.1.
    * Supplementary_files_format_and_content: XLSX file showing Log2(FPKM) for control and exposed condition and Log2 of their FPKM ratio for each transcript, as well as transcript coordinates and gene names.

SRA files:
        SRR4017429
        SRR4017448
        SRR4017447
        SRR4017446
        SRR4017445
        
        
        SRR4017431
        SRR4017432
        SRR4017433
        SRR4017434
        SRR4017435
        SRR4017436
        SRR4017437
        SRR4017438
        SRR4017439
        SRR4017440
        SRR4017441
        SRR4017442
        SRR4017443
        SRR4017444

[GSE41637](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE41637)

Supplementary files include mapped reads:
    * Mapped to rn4

SRA files:
        SRR594427 (strain F344/cr1)
        SRR594436 (strain BN/SsNHsd)
        SRR594445 (strain Sprague-Dawley)


