https://www.drivendata.org/competitions/63/genetic-engineering-attribution/
https://community.drivendata.org/c/genetic-engineering-attribution/36
https://docs.google.com/spreadsheets/d/1U9AG42qBrN4eNr10D4Y3i-MfpjsI2wODUvLjggwaaNc/edit#gid=0
- download BLAST local: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
or in Linux
sudo apt-get install ncbi-blast+
-
download and extract the Database: https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/
-
Download taxonemy
(taxonemy database doesnt seem to work)
download taxid mapping file
ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/
sed '1d' prot.accession2taxid | awk '{print $2" "$3}' > accession_taxonid
update_blastdb taxdb
unzip it into the same folder of db files
add
BLASTDB=/media/ac/BLAST
into .bashrc or create a .ncbirc file at the HOME dir.
- Build Database:
makeblastdb -in nt -parse_seqids -dbtype nucl -out nt -taxid_map accession_taxonid
- Run alignment
blastn -db nt -query test_seqs_group_0.fasta -out test.txt -num_threads 15 -outfmt "6 qseqid sseqid pident length mismatch gapopen sstart send evalue staxids sscinames sblastnames stitle" -num_alignments 1
https://www.tutorialspoint.com/biopython/biopython_overview_of_blast.htm