Skip to content

Latest commit

 

History

History
20 lines (16 loc) · 1.49 KB

spacer_db_README.md

File metadata and controls

20 lines (16 loc) · 1.49 KB

CRISPR Spacer Database

Database of known CRISPR spacers and their host taxonomy from various sources:
(1) the CRISPRCasdb, built using CRISPRCasFinder completely assembled genomes from RefSeq
(2) a set of spacers built using CRISPRDetect on all prokaryotic assemblies in NCBI's RefSeq (December, 2017)
(3) a set of spacers found in 24345 high-quality metagenome assembled genomes (MAGs) from the human microbiome using MinCED (based on CRT)
(4) a set of spacers from the 24706 species-representative sequences in GTDB found using MinCED

Data Columns

Accession: contig accession that CRISPR array is found on
ArrayID: id of array (when more than one found on genome)
SpacerID: location of spacer in CRISPR array
SpacerSeq: sequence of spacer (used to match w/ virus)
Accession3: genome accession
TaxonomyGTDB: Host lineage from GTDB
Source: data source
NCBItaxid: NCBI taxonomic id (many redundant rows w/ diff taxids)

Most important columns are SpacerSeq, TaxonomyGTDB, and source (and the accession are probably also useful)