Hosts website to code all code and analysis for Pacific Ocean-wide metatranscriptome survey study:
https://shu251.github.io/pacificocean-metaT/
June 2023
- Converting to quarto document. Previously Rmd files will remain or be placed in ARCHIVE. Any exisiting code on HPC will be removed and replaced with individual .R scripts from here. Quarto document will show entire code as blocks and explanation
Sequence read table, 03-abundance_tables/ReadTable_ByContig.csv
, has raw counts for all sequences that hit contigs (via Salmon).
to do: sequence count table includes sequences that did not have any annotations. This % needs to be determined and these sequences should be placed somewhere else for now. Query the annotation table (below), for number of sequences with annotations.
This file, 02-annotation_table/TaxonomicAndFunctionalAnnotations.csv
, includes the sequence IDs that have taxonomy and/or function.
Within the annotation table, there are duplicated ShortSeqIDs. Salmon count table is generated by running reads agains nucleotide sequences. Taxonomy is determined by EUKulele and uses proteins. Since sometimes there are multiple ORFs on the same contig (Transdecoder).
Using tximport and DESeq2 for proper library normalization.