# Loading data for DESeq2 analyzes

<big><b>File completed</b></big> (08/11/2021) <br>  

Bénédicte Noblet

- About session for IFB core cluster
- Retrieve and adpat SRAtable file to get all informations
- From sample counts data to a DESeq2 dataset
- Prepare R objects for downstream analysis

---

## <b>About session for IFB core cluster</b>

<em>loaded JupyterLab</em> : Version 2.2.9

In [1]:
session_parameters <- function(){
    
    jupytersession <- c(system('echo "=== Cell launched on $(date) ==="', intern = TRUE),
                        system('squeue -hu $USER', intern = TRUE))
    
    jobid <- system("squeue -hu $USER | awk '/jupyter/ {print $1}'", intern = TRUE)
    jupytersession <- c(jupytersession,
                        "=== Current IFB session size: Medium (4CPU, 10GB) or Large (10CPU, 50GB) ===",
                        system(paste("sacct --format=JobID,AllocCPUS,NODELIST -j", jobid), intern = TRUE))
    print(jupytersession[1:6])
    
    return(invisible(NULL))
}

session_parameters()

[1] "=== Cell launched on Wed Aug 11 15:54:23 CEST 2021 ==="                         
[2] "          18219911      fast  jupyter  bnoblet  R    1:25:01      1 cpu-node-17"
[3] "=== Current IFB session size: Medium (4CPU, 10GB) or Large (10CPU, 50GB) ==="   
[4] "       JobID  AllocCPUS        NodeList "                                       
[5] "------------ ---------- --------------- "                                       
[6] "18219911              4     cpu-node-17 "                                       


---
## <b>I- Retrieve and adapt SRAtable file to get all informations</b>

### **1- Loading SRA metadata table**

In [2]:
sratable <- read.table("/shared/projects/gonseq/Building/Data/info/16samples_SraRunTable.txt",
                       header=TRUE, sep=",", na.strings="")
sratable

Run,AGE,Assay.Type,AvgSpotLen,Bases,BioProject,BioSample,Bytes,Center.Name,Consent,⋯,LibrarySelection,LibrarySource,Organism,Platform,ReleaseDate,Sample.Name,Sexe,source_name,SRA.Study,Tissue
<chr>,<chr>,<chr>,<int>,<dbl>,<chr>,<chr>,<dbl>,<chr>,<chr>,⋯,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
SRR7430706,6GW+2d,RNA-Seq,100,5233393500,PRJNA478051,SAMN09495494,3175420095,GEO,public,⋯,cDNA,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2020-02-20T00:00:00Z,GSM3223657,Female,Ovary,SRP151462,Ovary
SRR7430707,6GW+2d,RNA-Seq,100,4564533100,PRJNA478051,SAMN09495493,2739850864,GEO,public,⋯,cDNA,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2020-02-20T00:00:00Z,GSM3223658,Female,Ovary,SRP151462,Ovary
SRR7430708,6GW+6d,RNA-Seq,100,5490941400,PRJNA478051,SAMN09495487,3232029711,GEO,public,⋯,cDNA,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2020-02-20T00:00:00Z,GSM3223659,Female,Ovary,SRP151462,Ovary
SRR7430709,6GW+5d,RNA-Seq,100,4802762900,PRJNA478051,SAMN09495486,2884317954,GEO,public,⋯,cDNA,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2020-02-20T00:00:00Z,GSM3223660,Female,Ovary,SRP151462,Ovary
SRR7430710,6GW+0d,RNA-Seq,100,5651159100,PRJNA478051,SAMN09495492,3418092037,GEO,public,⋯,cDNA,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2020-02-20T00:00:00Z,GSM3223661,Male,Testis,SRP151462,Testis
SRR7430711,6GW+0d,RNA-Seq,100,4837998700,PRJNA478051,SAMN09495491,2905912075,GEO,public,⋯,cDNA,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2020-02-20T00:00:00Z,GSM3223662,Male,Testis,SRP151462,Testis
SRR7430712,6GW+3d,RNA-Seq,100,4351153200,PRJNA478051,SAMN09495490,2613300399,GEO,public,⋯,cDNA,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2020-02-20T00:00:00Z,GSM3223663,Male,Testis,SRP151462,Testis
SRR7430713,6GW+4d,RNA-Seq,100,5225035300,PRJNA478051,SAMN09495489,3120369302,GEO,public,⋯,cDNA,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2020-02-20T00:00:00Z,GSM3223664,Male,Testis,SRP151462,Testis
SRR7430738,11GW+6d,RNA-Seq,100,5237656400,PRJNA478051,SAMN09495549,3303005156,GEO,public,⋯,cDNA,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2020-02-20T00:00:00Z,GSM3223689,Female,Ovary,SRP151462,Ovary
SRR7430739,12GW,RNA-Seq,100,4500870600,PRJNA478051,SAMN09495541,2830500219,GEO,public,⋯,cDNA,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2020-02-20T00:00:00Z,GSM3223690,Female,Ovary,SRP151462,Ovary


### **2- Addition of factors of interest**

If not present, we need to add a column for each factor of interest.  

In the case of `gonseq` study (extracted from Lecluze et al., 2020), there are two biological factors:
- time or development stage
- tissue sampling or sex determination

#### **2.a- Factor 1: devlopment stage**

<div class="alert alert-block alert-danger">
    <b>Authors made a mistake when they annotated samples: in the paper they use PostConceptional Weeks instead of Gestational Weeks. Biological data is in favor of PCW units (sexual differentiation at 6 PCW / 8 GW).</b>  <br>
I correct it along whith time simplification for statistical analysis.
</div>

In [3]:
sratable[grepl("6GW", sratable$AGE), "Stage"] <- "6PCW"
sratable[grepl("11GW|12GW", sratable$AGE), "Stage"] <- "12PCW"
unique(sratable[ , c("AGE", "Stage")])

Unnamed: 0_level_0,AGE,Stage
Unnamed: 0_level_1,<chr>,<chr>
1,6GW+2d,6PCW
3,6GW+6d,6PCW
4,6GW+5d,6PCW
5,6GW+0d,6PCW
7,6GW+3d,6PCW
8,6GW+4d,6PCW
9,11GW+6d,12PCW
10,12GW,12PCW
12,11GW+4d,12PCW


#### **2.b- Factor 2: gonadal type**

Let's first see which columns do we already have:

In [5]:
print( colnames(sratable) )

 [1] "Run"                 "AGE"                 "Assay.Type"         
 [4] "AvgSpotLen"          "Bases"               "BioProject"         
 [7] "BioSample"           "Bytes"               "Center.Name"        
[10] "Consent"             "DATASTORE.filetype"  "DATASTORE.provider" 
[13] "DATASTORE.region"    "Experiment"          "GEO_Accession..exp."
[16] "Instrument"          "LibraryLayout"       "LibrarySelection"   
[19] "LibrarySource"       "Organism"            "Platform"           
[22] "ReleaseDate"         "Sample.Name"         "Sexe"               
[25] "source_name"         "SRA.Study"           "Tissue"             
[28] "Stage"              


In [6]:
unique(sratable[ , c("Sexe", "Tissue")])

Unnamed: 0_level_0,Sexe,Tissue
Unnamed: 0_level_1,<chr>,<chr>
1,Female,Ovary
5,Male,Testis


Only two combinations exist for `Sexe` and `Tissue`columns: Female-Ovary and Male-Testis. Thus, the `Tissue` column already gives us the organ origin, we will use this column as the second factor.

#### **2.c- List of all factors, number of samples per group and names with meaning**

We will define a variable to host factor colnames:

In [7]:
varnames <- c("Stage", "Tissue")

Let's check that we have at least 2 samples per condition, to be sure that we can perform a statistical analysis with this dataset.

In [8]:
table(sratable[, varnames])

       Tissue
Stage   Ovary Testis
  12PCW     4      4
  6PCW      4      4

With following cells, we will add a column to have names with a human-readable meaning:
- prepare a chart with name format and a value to count sample number

In [9]:
groups <- unique(sratable[, varnames])

if (is.null(dim(groups))){
    groups <- cbind(groups,
                    counts = rep(0, times = length(groups)),
                    names = paste0(tolower(groups), "-"))
} else {
    groups[ , "counts"] <- 0
    groups[ , "names"] <- ""
    for (i in 1:length(varnames)){
        groups[ , "names"] <- paste0(tolower(groups[ , "names"]),
                                     tolower(groups[ , varnames[i]]), sep="-")
    }
}
groups

Unnamed: 0_level_0,Stage,Tissue,counts,names
Unnamed: 0_level_1,<chr>,<chr>,<dbl>,<chr>
1,6PCW,Ovary,0,6pcw-ovary-
5,6PCW,Testis,0,6pcw-testis-
9,12PCW,Ovary,0,12pcw-ovary-
12,12PCW,Testis,0,12pcw-testis-


- sort lines by alphanumerical order of the column we prefer (here *Run* numbers)

In [10]:
colsorter <- "Run"                                         # colname to perform lines sorting
sratable <- sratable[order(sratable[, colsorter]), ]       # basic and not extensively tested: check that it works as you expect


rownames(sratable) <- 1:dim(sratable)[1]

In [12]:
sratable[ , c(colsorter, varnames)]

Unnamed: 0_level_0,Run,Stage,Tissue
Unnamed: 0_level_1,<chr>,<chr>,<chr>
1,SRR7430706,6PCW,Ovary
2,SRR7430707,6PCW,Ovary
3,SRR7430708,6PCW,Ovary
4,SRR7430709,6PCW,Ovary
5,SRR7430710,6PCW,Testis
6,SRR7430711,6PCW,Testis
7,SRR7430712,6PCW,Testis
8,SRR7430713,6PCW,Testis
9,SRR7430738,12PCW,Ovary
10,SRR7430739,12PCW,Ovary


- add individual number for each sample.

*Remark*: I use TRUE/FALSE corresponding integer's values (1, 0 respectively) to help me knowing if both one or several groups match. That way I do not suppose someone's number of factors. *Feel free to add batch (no spaces nor special characters in batch values!) in `varnames` if you want to add it in sample names. But don't forget to remove it later from `varInt` in section II.1.*

In [13]:
sratable[ , "Names"] <- ""     # to be sure column is empty else it appends
groups[ , "counts"] <- 0       # to be sure counts are null else number increases each time we try

for (i in 1:dim(sratable)[1]){
    for (j in 1:dim(groups)[1]){
        if (sum( sratable[i, varnames] == groups[j, varnames] ) == length(varnames)){
            groups[j, "counts"] <- groups[j, "counts"] + 1
            sratable[i, "Names"] <- paste0(groups[j, "names"], groups[j, "counts"])
        }
    }
}
sratable[ ,c("Run", "Names")]

Unnamed: 0_level_0,Run,Names
Unnamed: 0_level_1,<chr>,<chr>
1,SRR7430706,6pcw-ovary-1
2,SRR7430707,6pcw-ovary-2
3,SRR7430708,6pcw-ovary-3
4,SRR7430709,6pcw-ovary-4
5,SRR7430710,6pcw-testis-1
6,SRR7430711,6pcw-testis-2
7,SRR7430712,6pcw-testis-3
8,SRR7430713,6pcw-testis-4
9,SRR7430738,12pcw-ovary-1
10,SRR7430739,12pcw-ovary-2


### **3- Countfiles specification**

We will add here two columns to have:  
- filenames of count files

In [14]:
sample.colname <- "Run"                                     # or whatever reference you use for your featureCounts loop
filename.extension <- "_paired-reverse-stranded.counts"     # or other filename text you add when performing read summarization

sratable[, "Filename"] <- paste(sratable[ , sample.colname], filename.extension, sep="")
print(sratable$Filename)

 [1] "SRR7430706_paired-reverse-stranded.counts"
 [2] "SRR7430707_paired-reverse-stranded.counts"
 [3] "SRR7430708_paired-reverse-stranded.counts"
 [4] "SRR7430709_paired-reverse-stranded.counts"
 [5] "SRR7430710_paired-reverse-stranded.counts"
 [6] "SRR7430711_paired-reverse-stranded.counts"
 [7] "SRR7430712_paired-reverse-stranded.counts"
 [8] "SRR7430713_paired-reverse-stranded.counts"
 [9] "SRR7430738_paired-reverse-stranded.counts"
[10] "SRR7430739_paired-reverse-stranded.counts"
[11] "SRR7430740_paired-reverse-stranded.counts"
[12] "SRR7430741_paired-reverse-stranded.counts"
[13] "SRR7430742_paired-reverse-stranded.counts"
[14] "SRR7430743_paired-reverse-stranded.counts"
[15] "SRR7430744_paired-reverse-stranded.counts"
[16] "SRR7430745_paired-reverse-stranded.counts"


- full filepath to these files

In [15]:
path.to.counts <- "/shared/projects/gonseq/Building/Results/featurecounts/"      # destination folder when performing read summarization (with / at the end!)

sratable[, "Filepath"] <- paste(path.to.counts, sratable[ , "Filename"], sep="")
print(sratable$Filepath)

 [1] "/shared/projects/gonseq/Building/Results/featurecounts/SRR7430706_paired-reverse-stranded.counts"
 [2] "/shared/projects/gonseq/Building/Results/featurecounts/SRR7430707_paired-reverse-stranded.counts"
 [3] "/shared/projects/gonseq/Building/Results/featurecounts/SRR7430708_paired-reverse-stranded.counts"
 [4] "/shared/projects/gonseq/Building/Results/featurecounts/SRR7430709_paired-reverse-stranded.counts"
 [5] "/shared/projects/gonseq/Building/Results/featurecounts/SRR7430710_paired-reverse-stranded.counts"
 [6] "/shared/projects/gonseq/Building/Results/featurecounts/SRR7430711_paired-reverse-stranded.counts"
 [7] "/shared/projects/gonseq/Building/Results/featurecounts/SRR7430712_paired-reverse-stranded.counts"
 [8] "/shared/projects/gonseq/Building/Results/featurecounts/SRR7430713_paired-reverse-stranded.counts"
 [9] "/shared/projects/gonseq/Building/Results/featurecounts/SRR7430738_paired-reverse-stranded.counts"
[10] "/shared/projects/gonseq/Building/Results/featurecounts/SRR

- check all files exist and, identify missing ones if any

In [16]:
seeninfolder <- NULL
for (i in 1:length(sratable[, "Filepath"])){
    seeninfolder <- c(seeninfolder, file.exists(sratable[i, "Filepath"]))
}

# display readable output
if (sum( !unique(seeninfolder) ) == 0){
    print("All files are present.")
} else {
    print("Missing file(s):")
    sratable[!seeninfolder, c(sample.colname, "Filepath")]
}

[1] "All files are present."


### **4- See final table and save it**

In [17]:
sratable[ , c(sample.colname, varnames, "Names", "Filename", "Filepath")]

Unnamed: 0_level_0,Run,Stage,Tissue,Names,Filename,Filepath
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
1,SRR7430706,6PCW,Ovary,6pcw-ovary-1,SRR7430706_paired-reverse-stranded.counts,/shared/projects/gonseq/Building/Results/featurecounts/SRR7430706_paired-reverse-stranded.counts
2,SRR7430707,6PCW,Ovary,6pcw-ovary-2,SRR7430707_paired-reverse-stranded.counts,/shared/projects/gonseq/Building/Results/featurecounts/SRR7430707_paired-reverse-stranded.counts
3,SRR7430708,6PCW,Ovary,6pcw-ovary-3,SRR7430708_paired-reverse-stranded.counts,/shared/projects/gonseq/Building/Results/featurecounts/SRR7430708_paired-reverse-stranded.counts
4,SRR7430709,6PCW,Ovary,6pcw-ovary-4,SRR7430709_paired-reverse-stranded.counts,/shared/projects/gonseq/Building/Results/featurecounts/SRR7430709_paired-reverse-stranded.counts
5,SRR7430710,6PCW,Testis,6pcw-testis-1,SRR7430710_paired-reverse-stranded.counts,/shared/projects/gonseq/Building/Results/featurecounts/SRR7430710_paired-reverse-stranded.counts
6,SRR7430711,6PCW,Testis,6pcw-testis-2,SRR7430711_paired-reverse-stranded.counts,/shared/projects/gonseq/Building/Results/featurecounts/SRR7430711_paired-reverse-stranded.counts
7,SRR7430712,6PCW,Testis,6pcw-testis-3,SRR7430712_paired-reverse-stranded.counts,/shared/projects/gonseq/Building/Results/featurecounts/SRR7430712_paired-reverse-stranded.counts
8,SRR7430713,6PCW,Testis,6pcw-testis-4,SRR7430713_paired-reverse-stranded.counts,/shared/projects/gonseq/Building/Results/featurecounts/SRR7430713_paired-reverse-stranded.counts
9,SRR7430738,12PCW,Ovary,12pcw-ovary-1,SRR7430738_paired-reverse-stranded.counts,/shared/projects/gonseq/Building/Results/featurecounts/SRR7430738_paired-reverse-stranded.counts
10,SRR7430739,12PCW,Ovary,12pcw-ovary-2,SRR7430739_paired-reverse-stranded.counts,/shared/projects/gonseq/Building/Results/featurecounts/SRR7430739_paired-reverse-stranded.counts


Let's create a folder before saving file:

In [18]:
destinationfolder <- "/shared/projects/gonseq/Building/Results/Routputs/"     # where you want to find files
system(paste("mkdir -p", destinationfolder), intern = TRUE)
print( system(paste("ls -l", destinationfolder), intern = TRUE) )

[1] "total 0"


<div class="alert alert-block alert-success">
    Even if last command line output is printed in a Jupyter Notebook, I nonetheless explicitly use <code>print( )</code> function to have real vectors display as in an R console:
    <ul>
        <li>
            Results take shorter spaces as there can be several list items on a same line
        </li>
        <li>
            One's can get used to R console display this way
        </li>
    </ul>
</div>

In [19]:
print( ls() )

 [1] "colsorter"          "destinationfolder"  "filename.extension"
 [4] "groups"             "i"                  "j"                 
 [7] "path.to.counts"     "sample.colname"     "seeninfolder"      
[10] "session_parameters" "sratable"           "varnames"          


In [20]:
print( colnames(sratable) )

 [1] "Run"                 "AGE"                 "Assay.Type"         
 [4] "AvgSpotLen"          "Bases"               "BioProject"         
 [7] "BioSample"           "Bytes"               "Center.Name"        
[10] "Consent"             "DATASTORE.filetype"  "DATASTORE.provider" 
[13] "DATASTORE.region"    "Experiment"          "GEO_Accession..exp."
[16] "Instrument"          "LibraryLayout"       "LibrarySelection"   
[19] "LibrarySource"       "Organism"            "Platform"           
[22] "ReleaseDate"         "Sample.Name"         "Sexe"               
[25] "source_name"         "SRA.Study"           "Tissue"             
[28] "Stage"               "Names"               "Filename"           
[31] "Filepath"           


<div class="alert alert-block alert-warning">
    TODO Béné 07/16/2021:
    <ul>
        <li>
            add selection step to remove unwanted or unused columns
        </li>
        <li>
            adapt or add Claire's Rmarkdown file to retrieve quality metrics in <code>sratable</code> (better than in <code>counts</code>, it would disturb statistical analysis). Maybe in a third distinct table?
        </li>
    </ul>
</div>

In [22]:
myname <- "16samples_SraRunTable_v2.txt"                                # don't forget to adapt it to your project 
write.table(sratable,
            file = paste0(destinationfolder, myname),
            append = FALSE, sep = "\t", row.names = FALSE)

In [23]:
print( system(paste("ls -lh", destinationfolder), intern = TRUE) )

[1] "total 12K"                                                                   
[2] "-rw-rw----+ 1 bnoblet bnoblet 8.3K Aug 11 16:19 16samples_SraRunTable_v2.txt"


**Remark**: *My count starts with the original SraRunTable (v1) even if it is in another folder.*

---
## <b>II- From sample counts data to a DESeq2 dataset</b>

The function `DESeqDataSetFromMatrix( )` can be used if you already have a matrix of read counts prepared from featureCounts function (Liao, Smyth, and Shi 2013) in the Rsubread package. To use `DESeqDataSetFromMatrix( )`, the user should provide the counts matrix, the information about the samples (the columns of the count matrix) as a DataFrame or `data.frame`, and the design formula. From http://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

<div class="alert alert-block alert-info">
    This section <b>uses, with some changes, <a href="https://github.com/PF2-pasteur-fr/SARTools">SARTools</a> scripts and functions</b> to prepare the counts data from featureCounts output files. <br>
    A <a href="https://bioinfo-fr.net/sartools-lanalyse-differentielle-pour-tous">presentation</a> of SARTools can be read in French in <b>Bioinfo-fr.net</b>'s blog.
</div>

### **1- Parameters to be modified by the user**

We first begin with a cleaning step, in order to be sure there is no conflict or previously set variable that could interfer with our analysis

In [24]:
rm(list = ls())
print(ls())

character(0)


We need to identify the experiment design, which we will name `targetFile`, along with the columns containing other sample information. The obtained table will later be considered as column data for the summarizing DESeq Data Set (dds) object.

In [30]:
targetFile <- paste0("/shared/projects/gonseq/Building/",
                     "Results/Routputs/16samples_SraRunTable_v2.txt") # path to the design/target file

essentials <- c("Names", "Filepath", "Run")          # colnames where no missing value is expected (column for rownames first, filepath second)
varInt <- c("Stage", "Tissue")                       # colname(s) for factor(s) of interest
batch <- NULL                                        # if any blocking factor: colname for batch factor

We read in a count matrix, which we will name `counts`, and the sample information table, which we will name `coldata`.

### **2- Loading target file**

We will use this function to load the design file prepared in section I or elsewhere.  Please check that:  
- filepath is correct  
- all factors of interest and, eventually, batch columns are present
- file names  
- extension  

In [32]:
loadTargetFile <- function(targetFile, varInt, essentials, batch){
    
    # check file exists then loding it
    if (!file.exists(targetFile)){
        stop(paste0("Design file not found at ", targetFile,
                   ": please check if present and filepath is correct"))
    }
    target <- read.table(targetFile, header=TRUE, sep="\t", na.strings="")

    # check features are present...
    if (!unique( I(varInt %in% names(target)) )){
        stop(paste("The factor(s) of interest", varInt, "is (are) not (all) in the target file"))
    }
    if (!is.null(batch) && !I(batch %in% names(target))){
        stop(paste("The batch effect", batch, "is not in the target file"))
    }

    # ... consistent ...
    if (min(table(target[, varInt]))<2){
        stop("At least one group of interest has a level without replicates")
    } 
    if (any(is.na(target[, c(essentials, varInt, batch)]))){
        stop("NA are present in the target file")
    } 

    # ... without any format issue
    for (i in 1:length(target[,varInt])){
        if (any(grepl("[[:punct:]]", as.character(target[,varInt[i]])))){
            stop(paste("The", varInt[i], "variable contains punctuation characters, please remove them"))
        }
    }
    if (!is.null(batch) && is.numeric(target[,batch])){
        warning(paste("The", batch, "variable is numeric. Use factor() or rename the levels with letters to convert it into a factor"))
    }

    # Some adaptations
    rownames(target) <- as.character(target[ , essentials[1]])
    for (i in 1:length(varInt)){
        target[ , varInt[i]] <- as.factor(target[ , varInt[i]])
    } # changing varInt columns in factor(s) to avoid latter warning message
    
    # removed as a better display can be done directly calling for object
    # cat("Target file:\n")
    # print(target)
    
    return(target)
}

In [33]:
target <- loadTargetFile(targetFile=targetFile, varInt=varInt, essentials=essentials, batch=batch)
target[ , c(essentials, varInt, batch)]

Unnamed: 0_level_0,Names,Filepath,Run,Stage,Tissue
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<fct>,<fct>
6pcw-ovary-1,6pcw-ovary-1,/shared/projects/gonseq/Building/Results/featurecounts/SRR7430706_paired-reverse-stranded.counts,SRR7430706,6PCW,Ovary
6pcw-ovary-2,6pcw-ovary-2,/shared/projects/gonseq/Building/Results/featurecounts/SRR7430707_paired-reverse-stranded.counts,SRR7430707,6PCW,Ovary
6pcw-ovary-3,6pcw-ovary-3,/shared/projects/gonseq/Building/Results/featurecounts/SRR7430708_paired-reverse-stranded.counts,SRR7430708,6PCW,Ovary
6pcw-ovary-4,6pcw-ovary-4,/shared/projects/gonseq/Building/Results/featurecounts/SRR7430709_paired-reverse-stranded.counts,SRR7430709,6PCW,Ovary
6pcw-testis-1,6pcw-testis-1,/shared/projects/gonseq/Building/Results/featurecounts/SRR7430710_paired-reverse-stranded.counts,SRR7430710,6PCW,Testis
6pcw-testis-2,6pcw-testis-2,/shared/projects/gonseq/Building/Results/featurecounts/SRR7430711_paired-reverse-stranded.counts,SRR7430711,6PCW,Testis
6pcw-testis-3,6pcw-testis-3,/shared/projects/gonseq/Building/Results/featurecounts/SRR7430712_paired-reverse-stranded.counts,SRR7430712,6PCW,Testis
6pcw-testis-4,6pcw-testis-4,/shared/projects/gonseq/Building/Results/featurecounts/SRR7430713_paired-reverse-stranded.counts,SRR7430713,6PCW,Testis
12pcw-ovary-1,12pcw-ovary-1,/shared/projects/gonseq/Building/Results/featurecounts/SRR7430738_paired-reverse-stranded.counts,SRR7430738,12PCW,Ovary
12pcw-ovary-2,12pcw-ovary-2,/shared/projects/gonseq/Building/Results/featurecounts/SRR7430739_paired-reverse-stranded.counts,SRR7430739,12PCW,Ovary


### **3- Loading count data: creation of the matrix of counts from the individual counts files**

In [34]:
loadCountData <- function(target, essentials, skip=0,
                          featuresToRemove=c("alignment_not_unique", "ambiguous", "no_feature", "not_aligned", "too_low_aQual")){

    labels <- as.character(target[, essentials[1]])
    files <- as.character(target[, essentials[2]])

    # detect if input count files are from featureCounts or HTSeq-count
    f1 <- read.table(files[1], sep="\t", quote="\"", header=FALSE, skip=5, nrows=5, stringsAsFactors=FALSE)
    if (ncol(f1) >= 7 && is.numeric(f1[,7])){
        # counter featurecounts
        idCol <- 1
        countsCol <- 7
        header <- TRUE
    } else{
        if (ncol(f1) >= 2 && is.numeric(f1[,2])){
            # counter htseq-count
            idCol <- 1
            countsCol <- 2
            header <- FALSE
        } else{
            stop("Can't determine if count files come from HTSeq-count or featureCounts")
        }
    }

    # loading first file
    rawCounts <- read.table(files[1], sep="\t", quote="\"", header=header, skip=skip, stringsAsFactors=FALSE)
    rawCounts <- rawCounts[ ,c(idCol, countsCol)]
    colnames(rawCounts) <- c("Id", labels[1])
    
    if (any(duplicated(rawCounts$Id))){
        stop("Duplicated feature names in ", files[1], ": ",
             paste(unique(rawCounts$Id[duplicated(rawCounts$Id)]), collapse=", "))
    }
    cat("Loading files:\n")
    cat(files[1], ": ", length(rawCounts[,labels[1]]), " rows and ", sum(rawCounts[,labels[1]]==0), " null count(s)\n", sep="")
    
    # loading remaining files
    for (i in 2:length(files)){
        tmp <- read.table(files[i], sep="\t", quote="\"", header=header, skip=skip, stringsAsFactors=FALSE)
        tmp <- tmp[, c(idCol, countsCol)]
        colnames(tmp) <- c("Id", labels[i])
        if (any(duplicated(tmp$Id))){
            stop("Duplicated feature names in ", files[i], ": ",
                 paste(unique(tmp$Id[duplicated(tmp$Id)]), collapse=", "))
        }
        rawCounts <- merge(rawCounts, tmp, by="Id", all=TRUE)
        cat(files[i],": ",length(tmp[,labels[i]])," rows and ",sum(tmp[,labels[i]]==0)," null count(s)\n",sep="")
    }

    rawCounts[is.na(rawCounts)] <- 0
    counts <- as.matrix(rawCounts[,-1])
    rownames(counts) <- rawCounts[,1]
    counts <- counts[order(rownames(counts)),]

    # check that input counts are integers to fit edgeR and DESeq2 requirements
    if (any(counts %% 1 != 0)){
        stop("Input counts are not integer values as required by DESeq2 and edgeR.")
    }
    
    # removing not usable features
    cat("\nFeatures removed:\n")
    for (f in setdiff(featuresToRemove,"")){
        match <- grep(f, rownames(counts))
        if (length(match)>0){
            cat(rownames(counts)[match],sep="\n")
            counts <- counts[-match,]
        }
    }
    
    # silenced section: better display outside
    # cat("\nTop of the counts matrix:\n")
    # print(head(counts))
    # cat("\nBottom of the counts matrix:\n")
    # print(tail(counts))
    
    return(counts)
}

<div class="alert alert-block alert-warning">
    Following cell takes some time before displaying progression, please be patient.
</div>

In [35]:
counts <- loadCountData(target=target, essentials=essentials)

Loading files:
/shared/projects/gonseq/Building/Results/featurecounts/SRR7430706_paired-reverse-stranded.counts: 60710 rows and 27672 null count(s)
/shared/projects/gonseq/Building/Results/featurecounts/SRR7430707_paired-reverse-stranded.counts: 60710 rows and 28662 null count(s)
/shared/projects/gonseq/Building/Results/featurecounts/SRR7430708_paired-reverse-stranded.counts: 60710 rows and 27213 null count(s)
/shared/projects/gonseq/Building/Results/featurecounts/SRR7430709_paired-reverse-stranded.counts: 60710 rows and 27788 null count(s)
/shared/projects/gonseq/Building/Results/featurecounts/SRR7430710_paired-reverse-stranded.counts: 60710 rows and 27493 null count(s)
/shared/projects/gonseq/Building/Results/featurecounts/SRR7430711_paired-reverse-stranded.counts: 60710 rows and 29473 null count(s)
/shared/projects/gonseq/Building/Results/featurecounts/SRR7430712_paired-reverse-stranded.counts: 60710 rows and 29412 null count(s)
/shared/projects/gonseq/Building/Results/featurecounts

In [36]:
# from loadCountData function
cat("Top of the counts matrix:\n")
head(counts)

Top of the counts matrix:


Unnamed: 0,6pcw-ovary-1,6pcw-ovary-2,6pcw-ovary-3,6pcw-ovary-4,6pcw-testis-1,6pcw-testis-2,6pcw-testis-3,6pcw-testis-4,12pcw-ovary-1,12pcw-ovary-2,12pcw-ovary-3,12pcw-ovary-4,12pcw-testis-1,12pcw-testis-2,12pcw-testis-3,12pcw-testis-4
ENSG00000000003.15,4428,4296,3557,3805,4354,4829,4610,3595,4447,4390,5015,3702,2023,2381,1808,3779
ENSG00000000005.6,58,78,37,106,17,90,54,13,25,33,42,21,4,7,10,6
ENSG00000000419.13,986,911,909,828,835,1151,952,1031,824,948,1263,933,804,876,816,1515
ENSG00000000457.14,845,707,595,673,807,924,672,575,639,660,584,532,691,550,523,1035
ENSG00000000460.17,675,793,881,649,724,831,752,1124,871,1300,1477,1241,564,517,541,928
ENSG00000000938.13,43,32,10,30,45,60,45,67,15,25,23,16,73,84,100,151


In [37]:
# from loadCountData function
cat("Bottom of the counts matrix:\n")
tail(counts)

Bottom of the counts matrix:


Unnamed: 0,6pcw-ovary-1,6pcw-ovary-2,6pcw-ovary-3,6pcw-ovary-4,6pcw-testis-1,6pcw-testis-2,6pcw-testis-3,6pcw-testis-4,12pcw-ovary-1,12pcw-ovary-2,12pcw-ovary-3,12pcw-ovary-4,12pcw-testis-1,12pcw-testis-2,12pcw-testis-3,12pcw-testis-4
ENSG00000288694.1,6,2,14,12,0,1,1,2,22,38,51,17,20,3,11,11
ENSG00000288695.1,5,9,5,5,3,3,7,4,2,1,1,3,10,6,2,5
ENSG00000288696.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
ENSG00000288697.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
ENSG00000288698.1,1,1,0,3,6,2,3,0,3,1,2,0,1,1,4,0
ENSG00000288699.1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### **4- Creating DESeqDataSet (``dds``) object**

#### **4.a- Loading DESeq2 library**

In [38]:
library("DESeq2", quietly = TRUE)


Attaching package: ‘BiocGenerics’


The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB


The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs


The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min



Attaching package: ‘S4Vectors’


The following object is masked from ‘package:base’:

    expand.grid



Attaching package: ‘MatrixGenerics’


The following objects are masked from ‘package:

<del>As we ask for a silent loading library, <i>no message confirming package attaching is printed, and most often, no errors/warnings are printed if package attaching fails</i> (from <i>Help</i>'s section). <br>
Please <b>check that this library is in <i>other attached packages</i>'s section</b> bellow: </del>

Let's show `sessionInfo( )` to have packages versions (*used version for development* DESeq2_1.30.1): 

In [39]:
sessionInfo()

R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /shared/ifbstor1/software/miniconda/envs/r-4.0.3/lib/libopenblasp-r0.3.10.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] DESeq2_1.30.1               SummarizedExperiment_1.20.0
 [3] Biobase_2.50.0              MatrixGenerics_1.2.1       
 [5] matrixStats_0.59.0          GenomicRanges_1.42.0       
 [7] GenomeInfoDb_1.26.7         IRanges_2.24.1             
 [9] S4Vectors_0.28.

In [40]:
print( ls() )

[1] "batch"            "counts"           "essentials"       "featuresToRemove"
[5] "loadCountData"    "loadTargetFile"   "target"           "targetFile"      
[9] "varInt"          


#### **4.b- Operating on dataset**

Let's remember the names of our factors of interest and batch variables:

In [41]:
print( batch )
print( varInt )

NULL
[1] "Stage"  "Tissue"


Please adapt the parameter `design` following your own experimental design:   

> Be sure to put main factor of interest at last position  
> To have an interaction factor, use `factor1:factor2` format

In [43]:
# Constructing a DESeqDataSet object
dds <- DESeqDataSetFromMatrix(countData = counts,
                              colData = target,
                              design = ~ Tissue + Stage + Tissue:Stage)
dds <- estimateSizeFactors(dds)
dds

class: DESeqDataSet 
dim: 60710 16 
metadata(1): version
assays(1): counts
rownames(60710): ENSG00000000003.15 ENSG00000000005.6 ...
  ENSG00000288698.1 ENSG00000288699.1
rowData names(0):
colnames(16): 6pcw-ovary-1 6pcw-ovary-2 ... 12pcw-testis-3
  12pcw-testis-4
colData names(32): Run AGE ... Filepath sizeFactor

#### **4.c- Discovering DES DataSet (dds) object**

The ``DESeqdataset`` object contains several tables, we can explore each one using its own accessor function.  
*I use ``head`` function to limit results display in Jupyter notebook, you may have mentionned that ``DESeqdataset`` object has huge amount of lines!*
- counts table

In [44]:
head( counts(dds) )

Unnamed: 0,6pcw-ovary-1,6pcw-ovary-2,6pcw-ovary-3,6pcw-ovary-4,6pcw-testis-1,6pcw-testis-2,6pcw-testis-3,6pcw-testis-4,12pcw-ovary-1,12pcw-ovary-2,12pcw-ovary-3,12pcw-ovary-4,12pcw-testis-1,12pcw-testis-2,12pcw-testis-3,12pcw-testis-4
ENSG00000000003.15,4428,4296,3557,3805,4354,4829,4610,3595,4447,4390,5015,3702,2023,2381,1808,3779
ENSG00000000005.6,58,78,37,106,17,90,54,13,25,33,42,21,4,7,10,6
ENSG00000000419.13,986,911,909,828,835,1151,952,1031,824,948,1263,933,804,876,816,1515
ENSG00000000457.14,845,707,595,673,807,924,672,575,639,660,584,532,691,550,523,1035
ENSG00000000460.17,675,793,881,649,724,831,752,1124,871,1300,1477,1241,564,517,541,928
ENSG00000000938.13,43,32,10,30,45,60,45,67,15,25,23,16,73,84,100,151


- sample informations

In [47]:
head( colData(dds), n = 3 )

DataFrame with 3 rows and 32 columns
                     Run         AGE  Assay.Type AvgSpotLen      Bases
             <character> <character> <character>  <integer>  <numeric>
6pcw-ovary-1  SRR7430706      6GW+2d     RNA-Seq        100 5233393500
6pcw-ovary-2  SRR7430707      6GW+2d     RNA-Seq        100 4564533100
6pcw-ovary-3  SRR7430708      6GW+6d     RNA-Seq        100 5490941400
              BioProject    BioSample      Bytes Center.Name     Consent
             <character>  <character>  <numeric> <character> <character>
6pcw-ovary-1 PRJNA478051 SAMN09495494 3175420095         GEO      public
6pcw-ovary-2 PRJNA478051 SAMN09495493 2739850864         GEO      public
6pcw-ovary-3 PRJNA478051 SAMN09495487 3232029711         GEO      public
             DATASTORE.filetype DATASTORE.provider       DATASTORE.region
                    <character>        <character>            <character>
6pcw-ovary-1          fastq,sra         s3,gs,ncbi s3.us-east-1,gs.US,n..
6pcw-ovary-2         

- gene or transcript informations

In [48]:
head( mcols(dds), n=6)

DataFrame with 6 rows and 0 columns

Even this dataframe is empty as we don't gave any information yet, we can at least see its rownames.

In [49]:
print( head(rownames(mcols(dds))) )

[1] "ENSG00000000003.15" "ENSG00000000005.6"  "ENSG00000000419.13"
[4] "ENSG00000000457.14" "ENSG00000000460.17" "ENSG00000000938.13"


## <b>III- Prepare R objects for dowstream analysis</b>

### **1- Filtering dataset: no and very low expressed features**

A rather basic filter for non-expressed genes (or features) is to remove those that have null counts considering all samples.  
We will use a slightly different and more stringent treshold: we will remove features that have less read counts in all samples than the total number of samples in the experiment.  

Moreover, we will consider for read counts the normalized ones by DESeq2 method.  
This way, we avoid *sequencing depth and RNA composition* (terms from [HCB training page](https://hbctraining.github.io/DGE_workshop_salmon/lessons/02_DGE_count_normalization.html), **Common normalization methods** section) variabilities between samples and results driven by some samples.

In [50]:
# normalized = true if following Claire's way of doing, not the case in Sandrine's vignette
rowsum.in.norm <- rowSums( counts(dds, normalized=TRUE)) >= dim(dds)[2]
table(rowsum.in.norm)

rowsum.in.norm
FALSE  TRUE 
28695 32015 

In [52]:
dds.filteredbyNnorm <- dds[rowsum.in.norm, ]

In [66]:
dds.filteredbyNnorm

class: DESeqDataSet 
dim: 32015 16 
metadata(1): version
assays(1): counts
rownames(32015): ENSG00000000003.15 ENSG00000000005.6 ...
  ENSG00000288695.1 ENSG00000288698.1
rowData names(3): ensemblid symbol entrezid
colnames(16): 6pcw-ovary-1 6pcw-ovary-2 ... 12pcw-testis-3
  12pcw-testis-4
colData names(32): Run AGE ... Filepath sizeFactor

### **2- Add gene names annotation**
<div class="alert alert-block alert-info">
    We will use <i>AnnotationDbi</i> and <i>org.Hs.eg.db</i> (human database!) packages to do so as Love <i>et al.</i>, 2019 performed it. <br>
    See vignette's section <a href="http://www.bioconductor.org/packages/devel/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#annotating-and-exporting-results"><b>7 Annotation and exporting results</b></a>.
</div>

#### **2.a- Loading `AnnotationDbi` and `org.Hs.eg.db` libraries**

In [54]:
library("AnnotationDbi", quietly = TRUE)
library("org.Hs.eg.db", quietly = TRUE)   # Hs stands for Homo Sapiens: it's for Human genome!





In [55]:
sessionInfo()

R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /shared/ifbstor1/software/miniconda/envs/r-4.0.3/lib/libopenblasp-r0.3.10.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] org.Hs.eg.db_3.12.0         AnnotationDbi_1.52.0       
 [3] DESeq2_1.30.1               SummarizedExperiment_1.20.0
 [5] Biobase_2.50.0              MatrixGenerics_1.2.1       
 [7] matrixStats_0.59.0          GenomicRanges_1.42.0       
 [9] GenomeInfoDb_1.

The database object have the following column names, each one may be used to add a new reference to `mcols(dds)` dataframe using one column as a key.

In [56]:
print( columns(org.Hs.eg.db) )

 [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS"
 [6] "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"    
[11] "GO"           "GOALL"        "IPI"          "MAP"          "OMIM"        
[16] "ONTOLOGY"     "ONTOLOGYALL"  "PATH"         "PFAM"         "PMID"        
[21] "PROSITE"      "REFSEQ"       "SYMBOL"       "UCSCKG"       "UNIGENE"     
[26] "UNIPROT"     


#### **2.b- Adding gene symbols and NCBI EntrezID**

The IDs used in this object don't have version number, thus we create a new column (first one) in `mcols(dds)` to store the `ensemblid` references:

In [58]:
mcols(dds.filteredbyNnorm)$ensemblid <- gsub("\\..*","", rownames(mcols(dds.filteredbyNnorm)))

We now can ask for:
- gene symbols

In [59]:
mcols(dds.filteredbyNnorm)$symbol <- mapIds(org.Hs.eg.db,
                                            keys = mcols(dds.filteredbyNnorm)$ensemblid,
                                            column = "SYMBOL",
                                            keytype = "ENSEMBL",
                                            multiVals = "first")

'select()' returned 1:many mapping between keys and columns



> Despite `multiVals = "first"` parameter, `mapIds( )` function warns us that there are for one key (here ENSEMBL ID) several matches (SYMBOLS).  
Indeed, it calls for `select( )` function that gives all available pairs. *See `select( )` use below in section 2.c*.

In [60]:
table(is.na(mcols(dds.filteredbyNnorm)$symbol))


FALSE  TRUE 
22695  9320 

- NCBI EntrezID number

In [61]:
mcols(dds.filteredbyNnorm)$entrezid <- mapIds(org.Hs.eg.db,
                                              keys = mcols(dds.filteredbyNnorm)$ensemblid,
                                              column = "ENTREZID",
                                              keytype = "ENSEMBL",
                                              multiVals = "first")

'select()' returned 1:many mapping between keys and columns



In [62]:
table(is.na(mcols(dds.filteredbyNnorm)$entrezid))


FALSE  TRUE 
22695  9320 

**Remark**: Some ENSEMBL genes do not have any symbols nor *EntrezID*.  <br>
I get an eye on one of them, it seems to be putative genes (new ones or with few evidence yet).

#### **2.c- Playing with database object and `select( )` function to see values**

Other informations available in this database object are available only through `select( )` function:

In [64]:
select(x = org.Hs.eg.db,
       keys = mcols(dds.filteredbyNnorm)$ensemblid,
       keytype = "ENSEMBL",
       columns = c("ALIAS", "ENSEMBLTRANS", "REFSEQ"))

'select()' returned 1:many mapping between keys and columns



ENSEMBL,ALIAS,ENSEMBLTRANS,REFSEQ
<chr>,<chr>,<chr>,<chr>
ENSG00000000003,T245,,NM_001278740
ENSG00000000003,T245,,NM_001278741
ENSG00000000003,T245,,NM_001278742
ENSG00000000003,T245,,NM_001278743
ENSG00000000003,T245,,NM_003270
ENSG00000000003,T245,,NP_001265669
ENSG00000000003,T245,,NP_001265670
ENSG00000000003,T245,,NP_001265671
ENSG00000000003,T245,,NP_001265672
ENSG00000000003,T245,,NP_003261


<div class="alert alert-block alert-warning">
    Béné TODO 08/10/2021: add transcript annotation if available in exprAnalysis. -> maybe available for with <code>mapIds( )</code>
</div>

### **3- Saving objects for later use: RData file**

In [69]:
print( ls() )

 [1] "batch"               "counts"              "dds"                
 [4] "dds.filteredbyNnorm" "destinationfolder"   "essentials"         
 [7] "loadCountData"       "loadTargetFile"      "rowsum.in.norm"     
[10] "target"              "targetFile"          "varInt"             


Let's remind the folder where we want to save R outputs files:

In [67]:
destinationfolder <- "/shared/projects/gonseq/Building/Results/Routputs/"     # where you want to find files

Let's save everything in a file...

In [70]:
save(list = ls(),
     file = paste0(destinationfolder, "Pipe7_ending_session_all.RData"))
print(system(paste("ls -lh", destinationfolder), intern = TRUE))

[1] "total 8.3K"                                                                    
[2] "-rw-rw----+ 1 bnoblet bnoblet 8.3K Aug 11 16:19 16samples_SraRunTable_v2.txt"  
[3] "-rw-rw----+ 1 bnoblet bnoblet 4.5M Aug 11 18:05 Pipe7_ending_session_all.RData"


... and just keep DESeq DataSet objects along with usefull colnames and destinationfolder.

In [71]:
save(list = c("dds", "dds.filteredbyNnorm", "essentials", "varInt", "batch", "destinationfolder"),
     file = paste0(destinationfolder, "Pipe7_ending_session_2dds_3colnameset_1folder.RData"))
print(system(paste("ls -lh", destinationfolder), intern = TRUE))

[1] "total 4.5M"                                                                                         
[2] "-rw-rw----+ 1 bnoblet bnoblet 8.3K Aug 11 16:19 16samples_SraRunTable_v2.txt"                       
[3] "-rw-rw----+ 1 bnoblet bnoblet 3.0M Aug 11 18:22 Pipe7_ending_session_2dds_3colnameset_1folder.RData"
[4] "-rw-rw----+ 1 bnoblet bnoblet 4.5M Aug 11 18:05 Pipe7_ending_session_all.RData"                     
