# Reproducibility Testing for Transcriptomic analysis of circulating endothelial cells in sickle cell anemia (SCA) stroke

## Step 0: 
- Download the count matrix file (.txt) from NCBI

## Step I: Prepare Input files in R
- Load the count matrix into R and create the following files:
    1. Formatted count matrix
    2. Sample sheet
    3. Contrast sheet
    4. Annotation sheet
- Note: I will share the prepared files with you, so don't worry about creating them for this step. 

## Step II: Run the Nextflow pipeline - nf-core/differentialabundance
#### This is the step we will reproduce. Here is what we need: 
1. An environment with Nextflow installed
2. Input files
3. A can-do attitude

### Process
a. Create an environment 

In [None]:
%%bash
conda create --name BCHM5420_env
conda activate BCHM5420

    b. Install nextflow (per Anaconda)

In [None]:
%%bash
conda install bioconda::nextflow

    c. Create a directory tree

In [1]:
%%bash
mkdir -p pipeline \
         scripts \
         results/differentialabundance \
         r_analysis/plots \
         logs

Navigate to the pipeline directory. Then, drag and drop the files from the Windows folder to the pipeline folder. 

In [None]:
%%bash
cd ..

    d. Run the pipeline using either:

        A conda profile: 

In [None]:
%%bash
nextflow run nf-core/differentialabundance \
    --study_name "stroke_in_sickle_cell" \
    --study_type rnaseq \
    --input pipeline/sdc_samplesheet.tsv \
    --matrix pipeline/sdc_counts.tsv \
    --contrasts pipeline/sdc_contrasts.csv \
    --outdir test_results \
    --features pipeline/sdc_annotations.tsv \
    --features_id_col gene_id \
    --features_name_col gene_name \
    -r dev \                        
    -profile conda \
    -with-report report.html \
    -with-trace trace.txt

        or a docker profile: 

In [None]:
%%bash
nextflow run nf-core/differentialabundance \
    --study_name "stroke_in_sickle_cell" \
    --study_type rnaseq \
    --input pipeline/sdc_samplesheet.tsv \
    --matrix pipeline/sdc_counts.tsv \
    --contrasts pipeline/sdc_contrasts.csv \
    --outdir test_results \
    --features pipeline/sdc_annotations.tsv \
    --features_id_col gene_id \
    --features_name_col gene_name \
    -r dev \                        
    -profile docker \
    -with-report report.html \
    -with-trace trace.txt

### Expected output 
1. In your results/plots/differential/stroke_vs_control_stroke_in_sickle_cell/png directory, you should find this plot: 

![volcano.png](attachment:9d5751f2-bc8a-4c0d-9056-4612804a064d.png)

2. In your results/plots/exploratory/condition/png directory, you should find these plots (among others): 

![pca2d.png](attachment:db77ffc3-f9fc-401c-a96c-1a4469f55c14.png)

![sample_dendrogram.png](attachment:862d60a5-fb39-4891-a19c-ee5278be18b9.png)

3. In your results/tables/differential folder, you should find 2 .txt files containing results from the DESeq2 analysis:

![tableshot.png](attachment:9bed59fd-3f82-4c6d-9335-9111ae2d49ec.png)

## Step 4: GSEA in R
We can discuss this briefly for fun if I have some extra time left. 

## Step 5: Cytoscape
We can discuss this briefly for fun if I have some extra time left. 