-
Notifications
You must be signed in to change notification settings - Fork 33
Working Data Dir
Serratus Working Bucket(~
): s3://serratus-public/
All data files are stored on our AWS S3 bucket. This is the working directory for the project and contains raw/less organized data.
For each electronic lab notebook entry, data associated with that run can be stored in this directory. Each folder is a date (YYMMDD
) corresponding to the date of the notebook file. For example
The data for the experiment serratus/notebook/200411_CoV_Divergence_Simulations.ipynb
is found in s3://serratus-public/notebook/200411/
.
-
~/out/200525_viro/bam
: Aligned output file, SRA accession named -
~/out/200525_viro/summary
: .summary files for this experiment
Reference sequence sets and their associated index files. Includes pan-genomes, mega-genomes, nucleotide and protein.
Examples:
-
~/seq/cov0
: All CoV sequences from NCBI- NCBI search:
"(Coronaviridae) AND "viruses"[porgn:txid10239]"
- Date Accessed: 2020/03/30
- Results: 33296
- NCBI search:
-
~/seq/hgr1
: Human rDNA testing sequence- From this publication
SRA Accession and Run Information master tables. Accessed via SRA website and the following basic filter:
"type_rnaseq"[Filter] AND cluster_public[prop] AND "platform illumina"[Properties] AND "cloud s3"[Properties] NOT "scRNA"[All Fields] AND <SUBFILTER>
-
Test Data Set
- Mammals and CoV+ swabs for testing pipeline
- SARS-CoV-2:
PRJNA616446
- Felis catus:
PRJNA432069
- Homo sapiens (HCT116):
PRJEB29794
- Macaca fascicularis:
PRJNA553361
- Mus musculus:
PRJNA553361
- Date Accessed: 2020/04/07
- Results: 49 libraries
-
Non-Human, Non-Mouse Mammals
BASE AND "Mammalia"[Organism] NOT "Homo sapiens"[Organism]) NOT "Mus musculus"[orgn]
- Date Accessed: 2020/03/28
- Results: 66926, 0.15 PB
-
Human
BASE AND "Homo sapiens"[Organism]
- Date Accessed: 2020/03/05
- Results: 520257, 4.75 PB
-
Mouse
BASE AND "Mus musculus"[orgn]
- Results: 539233
- Not accessed
-
Vertebrates, Non-mammal
BASE NOT "Mammalia"[Organism] NOT "Homo sapiens"[Organism] NOT "Mus musculus"[orgn]
- Date Accessed: 2020/03/29
- Results: 74532, 0.115 PB
-
Invertebrates
BASE NOT "Vertebrata"[Organism]
- Date Accessed: 2020/03/30
- Results: 403639, 0.7 PB
-
HCT116 RNAseq
- For testing; ca. 1000 entries of human HCT116 cell line
-
CoV Positive Control (known CoV)
"platform illumina"[Properties] OR "platform bgiseq"[Properties] AND txid694002[Organism:exp]
- Date Accessed: 2020/04/27
- Results: 862 samples
Sequence Files
-
../bam/
: aligned bam files for breaking into blocks -
../bam-block
: bam file output of fq-blocks requiring merging -
../fq/
: sequencing reads of various length -
../fq-block
: fq files broken into 'blocks' -
../out
: Example output data of re-aligned reads