Aidan Coyle, afcoyle@uw.edu

2021/01/20

Roberts lab at SAFS


# 22_hemat1.6_blast

This script produces a BLASTx annotation table by BLASTing hematodinium transcriptome v1.6 against the Swiss-Prot/UniProt database

BLAST was done on Mox, so all commands are copy-pasted over, unless otherwise specified.

**Transcriptome**: Link and background info for hemat_transcriptomev1.6 available [here](https://robertslab.github.io/sams-notebook/2021/03/08/Transcriptome-Assembly-Hematodinium-Transcriptomes-v1.6-and-v1.7-with-Trinity-on-Mox.html). 

Direct link to folder with data available [here](https://gannet.fish.washington.edu/Atumefaciens/20210308_hemat_trinity_v1.6_v1.7/hemat_transcriptome_v1.6.fasta_trinity_out_dir/). 

Transcriptome md5sum is f9c8f96a49506e1810ff4004426160d8

In [None]:
# Working from the login node of Mox, specifically gscratch/srlab/afcoyle
# Download hemat_transcriptomev1.6
[afcoyle@mox2 afcoyle]$ curl -o projects/hemat1.6_blastx/hemat_transcriptomev1.6.fasta \
-k https://gannet.fish.washington.edu/Atumefaciens/20210308_hemat_trinity_v1.6_v1.7/hemat_transcriptome_v1.6.fasta_trinity_out_dir/hemat_transcriptome_v1.6.fasta

In [None]:
# Verify checksum
[afcoyle@mox2 afcoyle]$ md5sum projects/hemat1.6_blastx/hemat_transcriptomev1.6.fasta | grep "f9c8f96a49506e1810ff4004426160d8"

# f9c8f96a49506e1810ff4004426160d8  projects/hemat1.6_blastx/hemat_transcriptomev1.6.fasta

In [None]:
# Make database for uniprot db
/gscratch/srlab/programs/ncbi-blast-2.8.1+/bin/makeblastdb \
-in /gscratch/srlab/blastdbs/uniprot_sprot_20200123/uniprot_sprot.fasta \
-dbtype prot \
-parse_seqids \
-out /gscratch/srlab/afcoyle/projects/hemat1.6_blastx/output/blastdbs/uniprot_blastdb

### Mox slurm script for BLASTx of hematodinium transcriptome v1.6

In [None]:
#!/bin/bash
## Job Name
#SBATCH --job-name=afcoyle_opilioblast
## Allocation Definition
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=1-12:00:00
## Memory per node
#SBATCH --mem=120G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=afcoyle@uw.edu
## Specify the working directory for this job
#SBATCH --chdir=/gscratch/scrubbed/afcoyle


/gscratch/srlab/programs/ncbi-blast-2.8.1+/bin/blastx \
-task="blastx" \
-query /gscratch/srlab/afcoyle/projects/hemat1.6_blastx/hemat_transcriptomev1.6.fasta \
-db /gscratch/srlab/afcoyle/projects/hemat1.6_blastx/output/blastdbs/uniprot_blastdb \
-out /gscratch/srlab/afcoyle/projects/hemat1.6_blastx/output/hemat1.6_blastxres.tab \
-evalue 1E-05 \
-num_threads 40 \
-max_target_seqs 1 \
-outfmt 6

In [None]:
# Send job to executive node of Mox
[afcoyle@mox2 afcoyle]$ sbatch jobs/20210312_hemat1.6_blastx.sh
Submitted batch job 1699544

In [None]:
# Transfer BLAST output from Mox to Gannet
# These commands performed from a folder in Gannetc
# Gannet folder: /volume2/web/nerka/mox_transfers/scrubbed/
rsync -avz --progress \
afcoyle@mox.hyak.uw.edu:/gscratch/srlab/afcoyle/projects/hemat1.6_blastx/output/hemat1.6_blastxres.tab
hemat_proj/

In [None]:
# Check our md5sums to ensure correct transfer. Here's what they should be:
# To obtain, run md5sum hemat1.6_blastxres.tab on both Mox and Gannet

# 13f4f58c4333fbca3b0c030b986838bc  hemat1.6_blastxres.tab

In [None]:
# Transfer both BLAST outputs to local machine
# Ran command from local machine
# Using absolute path, as relative path fails
rsync -chavzP --stats \
afcoyle@gannet.fish.washington.edu:/volume2/web/nerka/mox_transfers/scrubbed/hemat_proj/hemat1.6_blastxres.tab \
/mnt/c/Users/acoyl/Documents/GitHub/hemat_bairdi_transcriptome/output/BLASTs/uniprot_swissprot

In [None]:
# Verify checksums still match by running md5sum ../output/BLASTs/uniprot_swissprot/hemat1.6_blastxres.tab on local machine

# 13f4f58c4333fbca3b0c030b986838bc  output/BLASTs/uniprot_swissprot/hemat1.6_blastxres.tab