BioXpress DESeq step

Step 3 of the BioXpress pipeline.

General Flow of Scripts

run_per_study.py -> run_per_tissue.py -> run_per_case.py

Procedure

DESeq step 1: Run the script run_per_study.sh

Summary

The python script run_per_study.py provides arguments to the R script deseq.R. The count and category files generated from the Annotation step are used to calculate differential expression and statistical significance. The result is a series of files per tissue including the normalized reads (DESeq normalization method), the DE results and significance, and QC files such as the PCA plot.

Note: this step is time consuming (~2-3 hours of run time)

Method

Edit the hard-coded paths in the script run_per_tissue.py

Specify the in_dir to be the folder containing the final output files of the Annotation steps for per study

Specify the out_dir

Ensure that the file list_files/studies.csv contains all of the tissues you wish to process - Note: the studies can be run separately (in the event that 2-3 hours cannot be dedicated to run the all studies at once) by creating separate dat files with specific tissues to run

Run the shell script sh run_per_study.sh

Note: the R libraries specified in deseq.R will need to be installed if running on a new server or system, as these installations are not included in the scripts

Output

A set of files:

log file

deSeq_reads_normalized.csv - Normalized read counts (DESeq normalization method applied)

results_significance.csv - log2fc differential expression results and statistical significance (t-test)

dispersion.png

distance_heatmap.png

pca.png - Principal component analysis plot, important for observing how well the Primary Tumor and Solid Tissue Normal group together

DESeq Step 2 : Run the script run_per_tissue.sh

Summary

The python script run_per_tissue.py provides arguments to the R script deseq.R. The count and category files generated from the Annotation step are used to calculate differential expression and statistical significance. The result is a series of files per study including the normalized reads (DESeq normalization method), the DE results and significance, and QC files such as the PCA plot.

Note: this step is time consuming (~2-3 hours of run time)

Method

Edit the hard-coded paths in the script run_per_tissue.py

Specify the in_dir to be the folder containing the final output files of the Annotation steps for per tissue

Specify the out_dir

Ensure that the file list_files/tissue.dat contains all of the tissues you wish to process - Note: the tissues can be run separately (in the event that 2-3 hours cannot be dedicated to run the all tissues at once) by creating separate dat files with specific tissues to run

Run the shell script sh run_per_tissue.sh

Output

A set of files:

log file

deSeq_reads_normalized.csv - Normalized read counts (DESeq normalization method applied)

results_significance.csv - log2fc differential expression results and statistical significance (t-test)

dispersion.png

distance_heatmap.png

pca.png - Principal component analysis plot, important for observing how well the Primary Tumor and Solid Tissue Normal group together

DESeq Step 3 : Run the script run_per_case.sh

Summary

The python script run_per_case.py provides arguments to the R script deseq.R. The count and category files generated from the Annotation step are used to calculate differential expression and statistical significance. The result is a series of files per case including the normalized reads (DESeq normalization method), the DE results and significance, and QC files such as the PCA plot.

Note: this step is time consuming (~2-3 hours of run time)

Method

Edit the hard-coded paths in the script run_per_case.py

Specify the in_dir to be the folder containing the final output files of the Annotation step for per_case

Specify the out_dir

Ensure that the file list_files/cases.csv contains all of the cases you wish to process - Note: the cases can be run separately (in the event that 2-3 hours cannot be dedicated to run the all tissues at once) by creating separate dat files with specific cases to run

Run the shell script sh run_per_tissue.sh

Output

A set of files:

log file

deSeq_reads_normalized.csv - Normalized read counts (DESeq normalization method applied)

results_significance.csv - log2fc differential expression results and statistical significance (t-test)

dispersion.png

distance_heatmap.png

pca.png - Principal component analysis plot, important for observing how well the Primary Tumor and Solid Tissue Normal group together

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deseq.rst

deseq.rst

BioXpress DESeq step

General Flow of Scripts

Procedure

DESeq step 1: Run the script run_per_study.sh

Summary

Method

Output

DESeq Step 2 : Run the script run_per_tissue.sh

Summary

Method

Output

DESeq Step 3 : Run the script run_per_case.sh

Summary

Method

Output

Files

deseq.rst

Latest commit

History

deseq.rst

File metadata and controls

BioXpress DESeq step

General Flow of Scripts

Procedure

DESeq step 1: Run the script run_per_study.sh

Summary

Method

Output

DESeq Step 2 : Run the script run_per_tissue.sh

Summary

Method

Output

DESeq Step 3 : Run the script run_per_case.sh

Summary

Method

Output