Unix OS or Unix Emulator
python
Fastqc
Trimmomatic
Spades
BLASTn and BLASTx
Quast
MITOS (or use the webserver)
Sequence aligner (e.g. MAFFT)
Phylogenetic tree tool (e.g. FastTree)
NCBI BioProject: PRJNA852289
You can use any sample, for example: Sample 1EA:
https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR19860262&display=download
We will be using the Nema-mtDB available at:
https://github.com/WormsEtAl/Nematode-Mitochondrial-Database
conda create --name ENVIRONMENT_NAME python=3.9 ipykernel -y
conda activate ENVIRONMENT_NAME
A1 1EA
B1 1EB
C1 1EC
D1 1ED
for file in *.fastq
do
wellID=$(awk -F"" '{print $1}' <(echo $file))
sampleID=$(grep "$wellID" list | cut -f2)
mv $file "$sampleID"$file
done
cd "$working_directory"
done
mkdir fastqc_raw-reads
fastqc R1.fastq R2.fastq -o fastqc_raw-reads
mkdir trimmed-reads
trimmomatic PE -threads 8 SampleX_R1_001.fastq SampleX_R2_001.fastq
trimmed-reads/forward_reads_output
trimmed-reads/ reverse_reads_output
ILLUMINACLIP:adapters.fa:2:30:10:8:true
LEADING:3 TRAILING:3
SLIDINGWINDOW:4:10 MINLEN:36
metaspades.py -1 *R1_001.fastq -2 *R2_001.fastq -o metaspades_assembly
blastn
-task megablast
-query assembled_contigs.fasta
-db /home/genome/databases/nt
-outfmt '6 qseqid staxids bitscore std sscinames sskingdoms stitle'
-culling_limit 5
-num_threads 48
-evalue 1e-10
-out SampleX.vs.nt.cul5.1e10.megablast.out
barrnap --kingdom euk --threads 12 --outseq rRNA-seqs.fasta contigs.fasta
barrnap --kingdom mito --threads 12 --outseq rRNA-seqs.fasta contigs.fasta
quast.py contigs.fasta -o quast_results
blastn
-query mito-contigs.fasta
-subject Nema-mt-DB_v1.0.fasta
-oufmt "6 qseqid sseqid pident qcovs length qstart qend sstart send gaps bitscore evalue"
-out blastn_output.tsv
awk -v cutoff1="98" '$3 >= cutoff1 {print}' blastn_output.tsv | awk -v cutoff2="80" '$4 >= cutoff2 {print}'| cut -f2 -d":" | head -n1 | awk -F"|" '{print $(NF-1),$NF}'| cut -f1 >nematode-mito-contigids.list
grep --no-group-separator -A 1 -f nematode-mito-contigids.list contigs.fasta >nematode_contigs.fasta
while read -r line
do
grep -w "$line" blastn_output.tsv | head -n1
done<nematode-mito-contigids.list | cut -f2 >top_blastn_hits.tsv
while read -r line
do
grep -w "$line" blastn_output.tsv
done<species.list | cut -f2
Upload the candidate nematode contig sequences to the MITOS server for annotation at:
https://usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fmitos2%2Fmitos2%2F2.1.3%20galaxy0
Make sure to select the Invertebrate (5) genetic code for translation.
grep --no-group-separator -A 1 "$taxa_rank_name" Nemat-mtDB_v1.0.fasta | grep --no-group-separator -A 1 "$gene_name" >reference_sequences.fasta
cat annotated_nematode_nt_sequences.fasta reference_sequences.fasta >working_seqeunces.fasta
mafft --adjustdirectionaccurately working_sequences.fasta working_sequences.aln.fasta
FastTree -nt -gtr working_sequences.aln.fasta >working_sequences.tree