# **Methods:**

Genome sequencing data for Alteromonas macleodii, consisting of Nanopore long reads and Illumina short reads, were provided as a CSV in the course repository and organized into Nextflow channels for long reads (longread_ch) and short reads (shortread_ch). Initial quality control was performed on the short reads using FastQC v0.12.0, and the long reads were filtered using Filtlong v0.3.0 (--min_length 500 and --keep_percent 90 parameters) to filter out lower quality sequences.

Flye v2.9.6 was used to assemble the filtered long reads into contigs. The short reads were aligned to the Flye assembly using Bowtie2 v2.5.4 and alignments were sorted with SAMtools v1.22.1. Polishing of the assembly was performed with Pilon v1.24 (--threads ${task.cpus} parameter) to correct sequencing errors in order to generate the final polished genome assembly.

Prokka v1.14.6 identified and described features within the polished genome while BUSCO v6.0.0 assessed the completeness and quality of the polished genome. QUAST v5.3.0 was used to evaluate assembly quality (-r parameter). QUAST_unpolished compared the unpolished assembly from Flye against the reference genome retrieved with NCBI Datasets v18.6.0, and QUAST compared the polished assembly from Pilon against the same NCBI reference genome. Visualization of genomic features was performed using PyCirclize in a Jupyter notebook environment. All Nextflow processes were executed on a shared computer cluster using Conda environments with CPU usage observed for multi-threaded tools.


# **Results:**

**Figure 1:** Visualization of Alteromonas macleodii genomic features performed using PyCirclize in a Jupyter notebook environment (circos_plot.ipynb).

![alt text](genome.png) 

### QUAST results:

| Metric                          | QUAST   | QUAST_UNPOLISHED |
|---------------------------------|---------|------------------|
| Genome fraction (%)             | 98.429  | 98.429           |
| Duplication ratio               | 1.001   | 1.001            |
| # of misassemblies              | 19      | 19               |
| # of mismatches per 100 kbp     | 18.18   | 18.33            |
| Total length of the assembly    | 4627920 | 4627920          |
| GC content (%)                  | 44.70   | 44.70            |

Differences: The # of mismatches per 100kbp for the unpolished assembly was slightly higher at 18.33 compared to 18.18 for the polished results.


### BUSCO results:

The string indicating the BUSCO results: C:99.6%[S:99.5%,D:0.1%],F:0.0%,M:0.4%,n:1828

BUSCO (Benchmarking Universal Single-Copy Orthologs) is a tool that assesses completeness of the genome assembly by searching for highly conserved sets of single-copy genes that are expected to be present in most genomes of a lineage. The C:99.6% indicates that 99.6% of expected genes were found complete, with S:99.5% present as single copies and D:0.1% duplicated. F:0.0% shows no genes were fragmented, and M:0.4% means 0.4% were missing. n:1828 is the total number of orthologous gene clusters searched.


**Figure 2:** BUSCO Plot ![alt text](busco_figure.png)


### Conclusion: 

Overall, I believe the experiment was successful in generating a high quality assembly compared to the reference genome. The BUSCO values indicate that the polished assembly is highly complete, with almost all expected genes present (99.5%) and very few duplicated or missing. The number of mismatches and misassemblies for the experimental genome were both low, with mismatches at 18.18 per 100 kpb and misassemblies at 19. The reference genome had a genome size of 4.7 Mb and a GC content of 44.5%, while the genome sequenced during the experiment had a genome size slightly over 4.6 Mb and GC content of 44.7%, both close to those of the reference genome values indicating that the assembly was high quality.  

