paramormyrops_rnaseq

Analysis scripts and code for our research article: Losilla, M., Luecke, D.M. & Gallant, J.R. The transcriptional correlates of divergent electric organ discharges in Paramormyrops electric fish. BMC Evol Biol 20, 6 (2020). https://doi.org/10.1186/s12862-019-1572-3

This repository contains files with the code we used in our analysis.

The table below serves as a guide to understand the flow of the code. It details the order in which the code was executed, along with a description and comments of each step. Notes are shown in bold text.

Note: that a Singularity file is provided in the folder trinity_singularity to run on high performance computing systems. This would allow any user capable of running Singularity images to recreate the exact computing environment used for these analyses, though it is not required.

script/command file	description	comments	additional_outputs (These are provided in the folder named additional_files)
sh_01_FastQCraw.sh	assess quality of raw reads
sh_02_trim_rename_unzip.sh	trim, rename and unzip reads
sh_03_FastQCtrimmed.sh	assess quality of trimmed reads
The NCBI transcripts file we used as reference for the align and count steps was from: NCBI Paramormyrops kingsleyae Annotation Release 100, based on genome assembly PKINGS_0.1. We downloaded the transcripts file from here: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/872/115/GCF_002872115.1_PKINGS_0.1 We used the file called: rna.fna.gz, and removed the sole rRNA transcript present: XR_002837744.1
cmd_generate_gene_to_trans_file.txt	generate a gene-to-transcript list from the NCBI transcripts file	this list is required by the align and count steps	gene-trans-map.txt
sh_04a_RSEMindex.sh	Index the NCBI transcripts file	calls the singularity container
sh_04a_bash.sh	Index the NCBI transcripts file	executes commands within the singularity container
sh_04b_RSEMperIndiv.sh	Aligns reads to NCBI transcripts file and counts reads per gene	calls the singularity container
sh_04b_bash.sh	Aligns reads to NCBI transcripts file and counts reads per gene	executes commands within the singularity container
sh_04c_matrices.sh	Build gene expression matrices	calls the singularity container
sh_04c_bash.sh	Build gene expression matrices	executes commands within the singularity container
At this point the gene expression matrices (RSEM.gene.counts.matrix and RSEM.gene.TMM.counts.matrix ) use gene names and symbols from the NCBI transcriptome. However, EntrezGeneIDs are preferred for downstream analyses. Therefore, I converted their gene names and symbols to Pkings EntrezGeneIDs with the next R code. The converted files were assigned to the original file names. The original files were first renamed to: <orginal name>_ORIG_gene_symbols
translate_gene_IDs.Rmd	Replace gene names and symbols with EntrezGeneIDs in the gene expression matrices generate a file with the columns Pking EntrezGeneID, gene name, gene symbol and type of gene for each of the predicted 27610 P. kingsleyae genes. This file is named Dic.PkingEntrezGeneID-to-name_symbol_type.txt	This code runs on the renamed files	Dic.PkingEntrezGeneID-to-name_symbol_type.txt
sh_05a_DE_analyses.sh	Data exploration - Correlation matrix, PCA DGE and MA plots - all 10 possible pairwise OTU comparisons	calls the singularity container
sh_05a_bash_DE_genes.sh	Data exploration - Correlation matrix, PCA DGE and MA plots - all 10 possible pairwise OTU comparisons	executes commands within the singularity container We modified 2) to use the function estimateDisp() instead of the functions estimateCommonDisp() and estimateTagwiseDisp()	uses the samples.txt file
Clustering_of_DEG_mean.Rmd	For each phenotype pair, extract the genes that meet the expression filters (Set B groups) plot expression patterns of the genes in each group from 1)	generates black & white and colored plots for Set B genes (These plots served informational purposes)
generate_suppl_files_DEG_comparisons_and_groups.Rmd	generate the supplemental files with the details of the 10 DGE comparisons and Set B groups
sh_06_blastp.sh	blast P. kingsleyae proteins to D. rerio proteins	output is split into 7 files, we merged all to one file afterwards
Annotation_wrangling.Rmd	For each ontology, generate two 'dictionaries': Pking Entrez Gene IDs to D. rerio GO IDs D. rerio GO IDs to GO terms	Files from 2) were not used in later scripts, they served as references	Dic.PkingEntrezGeneID-to-GO.{ontology}.txt Dic.{ontology}.GOid_to_term.txt
enrichment_on_Pkings _all_10_DGE_comparisons.Rmd	GO enrichment on all 10 DGE comparisons Horizontal bar plot significant GO terms	Xcel file from 1) is part of the supplementary files This code also produces a file with information on each upregulated gene annotated to enriched GO terms, including how many GO terms the gene was annotated to for a given upregulated list and ontology (frequency). The file served informational purposes
enrichment_on_Pkings_clusters.Rmd	GO enrichment on Set B groups Horizontal bar plot significant GO terms	Xcel file from 1) is part of the supplementary files the horizontal bar plot from 2) served informational purposes) This code also produces a file with information on each upregulated gene annotated to enriched GO terms, including how many GO terms the gene was annotated to for a given upregulated list and ontology (frequency). The file served informational purposes
set_C.Rmd	Intersect upregulated genes from Sets A' and B (these intersected genes are Set C) GO enrichment on Set C genes plot expression patterns of Set C genes Horizontal bar plot significant GO terms	The outputs are: one file per list of upregulated genes one file per list of enriched GO terms Xcel file with upregulated genes (consolidation of output 1) Xcel file with enriched GO terms (consolidation of output 2) Xcel file with information on each upregulated gene annotated to enriched GO terms, including how many GO terms the gene was annotated to for a given upregulated list and ontology (frequency). The file served informational purposes Color plots for Set C genes expression patterns Horizontal bar plot with enriched GO terms Outputs 3) and 4) are part of the supplemental files Outputs 6) and 7) make up Figs. 4-6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

paramormyrops_rnaseq

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
additional_files		additional_files
trinity_singularity		trinity_singularity
.gitignore		.gitignore
Annotation_wrangling.Rmd		Annotation_wrangling.Rmd
Clustering_of_DEG_mean.Rmd		Clustering_of_DEG_mean.Rmd
README.md		README.md
cmd_generate_gene_to_trans_file.txt		cmd_generate_gene_to_trans_file.txt
enrichment_on_Pkings _all_10_DGE_comparisons.Rmd		enrichment_on_Pkings _all_10_DGE_comparisons.Rmd
enrichment_on_Pkings_clusters.Rmd		enrichment_on_Pkings_clusters.Rmd
generate_suppl_files_DEG_comparisons_and_groups.Rmd		generate_suppl_files_DEG_comparisons_and_groups.Rmd
set_C.Rmd		set_C.Rmd
sh_01_FastQCraw.sh		sh_01_FastQCraw.sh
sh_02_trim_rename_unzip.sh		sh_02_trim_rename_unzip.sh
sh_03_FastQCtrimmed.sh		sh_03_FastQCtrimmed.sh
sh_04a_RSEMindex.sh		sh_04a_RSEMindex.sh
sh_04a_bash.sh		sh_04a_bash.sh
sh_04b_RSEMperIndiv.sh		sh_04b_RSEMperIndiv.sh
sh_04b_bash.sh		sh_04b_bash.sh
sh_04c_bash.sh		sh_04c_bash.sh
sh_04c_matrices.sh		sh_04c_matrices.sh
sh_05a_DE_analyses.sh		sh_05a_DE_analyses.sh
sh_05a_bash_DE_genes.sh		sh_05a_bash_DE_genes.sh
sh_06_blastp.sh		sh_06_blastp.sh
translate_gene_IDs.Rmd		translate_gene_IDs.Rmd

msuefishlab/paramormyrops_rnaseq

Folders and files

Latest commit

History

Repository files navigation

paramormyrops_rnaseq

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages