# Geneious Protocol

## Overview

This protocol provides a standardized method for building consensus sequences from raw sequence files using Geneious software. It includes steps for uploading raw files, assembling contigs, and conducting quality control during sequence analysis. The method is adapted from the [SI Barcode Network Informatics Documentation](https://sibarcodenetwork.readthedocs.io/en/latest/sequence_qc.html) and Geneious tutorials.

## Background

Standardizing sequence analysis methods ensures repeatability in research and transparency in methodology, which are essential for future studies and peer evaluations.

---

## Methodology

### Step 1: Folder Organization

Organize your workspace in Geneious. Suggested folder structure for **each gene marker**:

-   **01_Raw Sequences**

-   **02_Trimmed Sequences**

-   **03_Assembled Sequences**

-   **04_Final Sequences for Alignment**


---

### Step 2: Add Primers

Add primer sequences to Geneious for use during trimming below.

1. Create new folder called "Primers"
2. Go to **Sequence - New Sequence**.
3. Copy the primer sequence into the respective field.
4. In the "Name" field, enter the primer name (e.g., CYTB151F).
5. In the "Description" field, enter "primer" plus the direction (e.g., primer forward).
6. Change "Type" to "Primer".
7. Press OK.
8. Repeat Steps 2-7 for each primer.


---

### Step 3: Upload and Prepare Raw Files

1. Upload raw sequence files into the **01_Raw Sequences** folder.  
2. Rename files to include sample ID, extraction ID, location, morph, and primer.  
   Example: `BFHL-4505_BC111_WA_RB_CYTB151F`.  
3. Set read directions for all sequences.  

---

### Step 4: Trim Poor-Quality Bases

1. Copy sequences to the **02_Trimmed Sequences** folder.  
2. Use the **Annotate & Predict - Trim Ends...** function:  
   - Check **Annotate new trimmed regions**
   - Check **Trim primers** and select your respective forward and reverse primers.
   - Set Allow Mismatches to 2.
   - Set Minimum Match Length to 10.
   - Set **Error Probability Limit** to 0.05.
   - Select **Trim 5’ End** and **Trim 3’ End**. 
   - Leave everything else unselected.
3. Review the trimming annotations and manually adjust if necessary. For good sequences or specific regions, create custom trim areas if needed.  


---

### Step 5: Sequence Renaming

1. Copy sequences into a new folder with the same name followed by **“_renamed”**.
2. Use **Edit - Batch rename...** to rename sequences for contig assembly. This will make the forward and reverse sequence names identical so that Geneious knows which ones to combine. 
    - First, remove the primer name and file extension from the end of the sequence name.
        - Check **Remove** and enter the number of characters to remove from the end of the sequence name so that the primer name and file extension are removed.
        - Example: `BFHL-4505_BC111_WA_RB_CYTB151F.ab1` becomes `BFHL-4505_BC111_WA_RB`
    - Second, remove the extraction name from the middle of the sequence name 
        - Under "Advanced", check **Regex** and enter "(\w{5})_" in the **Replace parts matching:** field.
        - Enter nothing in the **With:** field.
        - Example: `BFHL-4505_BC111_WA_RB` becomes `BFHL-4505_WA_RB`


---

### Step 6: Assemble Contigs

1. Assemble sequences into contigs based on read names (i.e., based on the first space). Use the following parameters:
    - **Assembler:** Geneious
    - **Sensitivity:** Highest Sensitivity/Slow
    - Check **Use existing trim regions**.
    - **Assembly Name:** {Reads Name}
    - Check **Save assembly report**.
    - If desired, check **Save in sub-folder**.
    - Check **Save contigs**.
    - Set **Don't merge variants with coverage over approximately** to 6
    - Check **Merge homopolymer variants**
2. Move assembly report and assemblies to the **03_Assembled** folder.  


---

### Step 7: Quality Control
1. Manually review contigs in assembly report. Successful assemblies are those which have 75% HQ bases and above.
    - Move assembled contigs with $\geq$ 75% HQ bases to **Contig_assembled_highquality** subfolder.
    - Move assembled contigs which have $\leq$75% HQ bases to **Contig_assembled_lowquality** subfolder
    - Copy failed reads to **Contig_not_assembled** subfolder.
    - Move failed reads which have $\geq$75% HQ bases to **Contig_not_assembled_goodread** subfolder
    
2. Address base call disagreements and gaps. For each sequence, use CTRL+D to look at each base call disagreement and gap region. Make sure that you agree with the Geneious calls. If not, edit accordingly.  
   - Check if gap is due to dye blob or error (i.e., a repeated base). This is likely caused by a chromatogram reading error where there a longer signal than usual was picked up due to dye.
   - Check if gap is due to F and R alignment error. Shifts in the reading frame usually means a nucleotide or gap was added to one of the traces incorrectly. This type of error can usually be corrected by inspecting the properly aligned section of the sequence to determine the location of the mistake. 
   - If neither of the above, fill gap with the best-guess bp based on the chromatogram. Fill with heterozygous base, if unsure.

3. Align consensus sequences from contigs (ignore single reads for now). Analyzing the sequences’ alignment will inform of any further end trimming needed if the Geneious Prime Assembler neglected to remove primers or poor quality end regions.
    - Go to **Align/Assemble - Multiple Align...**.
    - Select **Consensus Align** in the top portion of the window.
    - Set **Threshold** to 65% in the **Consensus Options** menu.
    - In the dropdown **Sequence alignment options** menu, select **MUSCLE alignment**
    - Click “OK”.
    - Rename alignment as "Contig alignment".
    - First, check for problematic sequences that are introducing gaps due to incorrect editing during Step 2. If any are present, right click on the sequence and select "Go to referenced sequence", and re-edit accordingly.
    - Check alignment ends for untrimmed primer regions or poor quality regions that need to be removed. Delete these areas and hit Save.
    - Make sure there are no stop codons in the sequences.
        - In the **Display** tab on the right, check **Translation** on "All Sequences".
        - Set **Genetic Code:** to "Invertebrate Mitochondrial"
        - Set **Relative to:** to "Alignment"
        - Try each reading frame and make sure there is one where there are no stop codons in any of your sequences. If not, inspect and edit your sequences like above until this is fixed.
        
4. Exit the alignment and generate consensus sequences from the final contig set. 
    - Select all sequences within the alignment.
    - Right click and select **Extract Regions...**.
    - Check **Extract region as list of sequences** and press OK.
    - Move the resulting list to the **Consensus_preliminary** subfolder. Right click on the list and select **Extract Sequences from List...** to extract sequences into the folder. Delete the "Consensus" file and list file since you do not need these.

5. Copy high quality and chosen intermediate quality single reads to the same **Consensus_preliminary** folder. For each single read, go to **Trim Ends..** and check **Remove existing trimmed regions from sequences**. Hit OK.


---

### Step 8: Final Quality Check

1.  Align final consensus sequences and good/intermediate single reads, and make sure everything looks OK.
    - Go to **Align/Assemble - Multiple Align...**.
    - Select **MUSCLE Alignment** in the top portion of the window.
    - Click “OK”.
    - Rename alignment file as "Consensus alignment".
    - Check alignment for gaps, untrimmed ends, and reading frame stop codons. You might have to **reverse complement** single reads and/or trim their ends more to match up with the contig consensus sequences.
    
2. BLAST sequences to ensure they belong to the target group. If doing this in Geneious, it is recommended to only BLAST small batches of 15 or less sequences at a time since it can become time consuming.


---

### Step 11: Export Final Consensus Sequences

1. Export consensus sequences into a single FASTA file.
2. Proceed to further alignment and phylogenetic analyses using tools like MEGA.  


---

This structured approach ensures a robust and repeatable pipeline for sequence analysis using Geneious software.
