Cutadapt 
- using SYM_VAR primers to extract symbiodinium (symbiont)seqs from original raw sequences 
- https://cutadapt.readthedocs.io/en/stable/guide.html#five-prime-adapters

Parameter Notes:
- Used assembled contig file as input file
- SYM_VAR primer seqs from (Hume et al 2018)
- paired primers...if using paired end seqs as input files, would include the reverse complements of both 
- linked primers are apparently only for specific scenarios like barcoding

In [None]:
conda create -n cutadaptenv
conda install -c bioconda cutadapt                                                        
#install cutadapt to new env

In [None]:
#!/bin/bash
#SBATCH -c 24  # Number of Cores per Task
#SBATCH --mem=180G  # Requested Memory
#SBATCH -p cpu-long  # Partition
#SBATCH -t 24:00:00  # Job time limit
#SBATCH --mail-type=ALL
#SBATCH -o /project/pi_sarah_gignouxwolfsohn_uml_edu/brooke/bash_scripts/slurm-%j.out  # %j = job ID

# Set your input and output files
SAMPLENAME="mcav"
INPUTDIR="results/mcav_assembly3"

input_fastq="$SAMPLENAME.contigs.fa"
output_fasta="$SAMPLENAME_symb.contigs.fasta"

#load modules
module load miniconda/22.11.1-1
conda activate cutadaptenv

# Set your primer sequences
forward_primer="GAATTGCAGAACTCCGTGAACC"
reverse_primer="CGGGTTCWCTTGTYTGACTTCATGC"

# Run cutadapt
cutadapt \
  -g "$forward_primer" \
  -a "$reverse_primer" \
  --discard-untrimmed \
  -o working/trimmed_ITS2/"$output_fasta" \
  $INPUTDIR/"$input_fastq"

#JOB ID: 13905035


SLURM OUTPUT 
Loading miniconda version 22.11.1-1
This is cutadapt 2.6 with Python 3.7.16
Command line parameters: -g GAATTGCAGAACTCCGTGAACC -a CGGGTTCWCTTGTYTGACTTCATGC --discard-untrimmed -o working/trimmed_ITS2/.contigs.fasta results/mcav_assembly3/mcav.contigs.fa
Processing reads on 1 core in single-end mode ...
Finished in 26.64 s (32 us/read; 1.89 M reads/minute).

=== Summary ===

Total reads processed:                 839,850
Reads with adapters:                    23,485 (2.8%)
Reads written (passing filters):        23,485 (2.8%)

Total basepairs processed:   435,829,081 bp
Total written (filtered):     12,077,612 bp (2.8%)

=== Adapter 1 ===

Sequence: GAATTGCAGAACTCCGTGAACC; Type: regular 5'; Length: 22; Trimmed: 13528 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-22 bp: 2

Overview of removed sequences
length	count	expect	max.err	error counts
3	9739	13122.7	0	9739
4	2849	3280.7	0	2849
5	694	820.2	0	694
6	146	205.0	0	146
7	62	51.3	0	62
8	9	12.8	0	9
9	4	3.2	0	0 4
10	13	0.8	1	0 13
11	5	0.2	1	0 5
12	3	0.1	1	0 3
13	1	0.0	1	0 1
24	1	0.0	2	1
119	1	0.0	2	1
2819	1	0.0	2	1


=== Adapter 2 ===

Sequence: CGGGTTCWCTTGTYTGACTTCATGC; Type: regular 3'; Length: 25; Trimmed: 9957 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-25 bp: 2

Bases preceding removed adapters:
  A: 23.2%
  C: 23.8%
  G: 25.1%
  T: 27.9%
  none/other: 0.0%

Overview of removed sequences
length	count	expect	max.err	error counts
3	7883	13122.7	0	7883
4	1639	3280.7	0	1639
5	258	820.2	0	258
6	65	205.0	0	65
7	31	51.3	0	31
8	11	12.8	0	11
9	20	3.2	0	4 16
10	24	0.8	1	2 22
11	14	0.2	1	1 13
12	6	0.1	1	0 6
13	4	0.0	1	0 4
14	1	0.0	1	0 1
17	1	0.0	1	1

In [None]:
#Troubleshooting below script:

In [None]:
#!/bin/bash
#SBATCH -c 24  # Number of Cores per Task
#SBATCH --mem=180G  # Requested Memory
#SBATCH -p cpu-long  # Partition
#SBATCH -t 24:00:00  # Job time limit
#SBATCH --mail-type=ALL
#SBATCH -o /project/pi_sarah_gignouxwolfsohn_uml_edu/brooke/bash_scripts/slurm-%j.out  # %j = job ID

# Set your input and output files
SAMPLENAME="mcav"
INPUTDIR="results/mcav_assembly3"

input_fastq="'$SAMPLENAME'.contigs.fa"
output_fastq="'$SAMPLENAME'_symb.contigs.fastq"

#load modules
module load miniconda/22.11.1-1
conda activate cutadaptenv

# Set your primer sequences
forward_primer="GAATTGCAGAACTCCGTGAACC"
reverse_primer="CGGGTTCWCTTGTYTGACTTCATGC"
# Get reverse complements of the primers
forward_rc=$(echo "$forward_primer" | tr ACGT TCGA | rev)
reverse_rc=$(echo "$reverse_primer" | tr ACGT TCGA | rev)
# Run cutadapt
cutadapt \
  -g "$forward_primer" \
  -a "$reverse_primer" \
  -A "$forward_rc" \
  -G "$reverse_rc" \
  --discard-untrimmed \
  -o working/trimmed_ITS2/"$output_fastq" \
  $INPUTDIR/"$input_fastq"

#JOB ID: 13879028


In [None]:
#Testing linked adapter: