# Synteny within leptophytes

Here we look at how rearranged (if it all) the four plastid genomes of leptophytes are. 

One genome, REFM_CHLORO_00001, is composed of a single contig. However, the rest of the plastid genomes are composed of multiple contigs, and may not be in the correct "order". So we first fix that manually.

In [None]:
# Check if python is 3.10.5
import json
import os
import pandas as pd
import sys
import numpy as np
import __init__


print(sys.version)
%load_ext autoreload
%autoreload 2

In [6]:
# we store the important data paths in PATH_FILE
PATH_FILE = "../../PATHS.json"

paths_dict = json.load(open(PATH_FILE, "r"));


## 1. TARA_CHLORO_00478  
This ptMAG (original name: CHL_MED_Bin_250_53_c) is the second most complete genome after REFM_CHLORO_00001. It is composed of six contigs. 

By comparing plastid maps of TARA_CHLORO_00478 with that of REFM_CHLORO_00001 manually, I determined that the contigs should be joined together in this order:
- C_0 (should be reverse complemented)
- C_3 (shares the genes psbE and trnG-TCC with C_0; should be reverse complemented)  
- C_1 (a small gap between C_3 and C_1; should be reverse complemented)  
- C_4 (no gap between C_1 and C_4)  
- C_5 (shares the gene psaA with C_4; should be reverse complemented)  
- C_2 (a gap between C_5 and C_2)  

I manually placed the contigs in the correct order. 


## 2. TARA_CHLORO_00158
This ptMAG (original name: CHL_AOS_Bin_125_5_c) is the third most complete genome after REFM_CHLORO_00001. It is composed of three contigs. 

By comparing plastid maps of TARA_CHLORO_00158 with that of REFM_CHLORO_00001 manually, I determined that the contigs should be joined together in this order:  
- C_1 (should be reverse complemented)  
- C_0 (big gap between C_0 and C_1)    
- C_2 (overlaps with C_0 for psbC and psbD)    

As before, I manually placed the contigs in the correct order.

## 3. TARA_CHLORO_00332  
This ptMAG (original name, CHL_ARC_Bin_77_4_c) is the least complete genome. It is composed of 15 contigs. 

By comparing plastid maps of TARA_CHLORO_00332 with that of REFM_CHLORO_00001 manually, I determined that the contigs should be joined together in this order:  

- C_5   
- C_12 (reverse complement)    
- C_14 (reverse complement)   
- C_2 (reverse complement)  
- C_6  
- C_13 (reverse complement)  
- C_9 (reverse complement)  
- C_3 (reverse complement)  
- C_0  
- C_11  
- C_10  
- C_1 (reverse complement)  
- C_7   
- C_4  
- C_8     

As before, I manually placed the contigs in the correct order.

## 4. Synteny analyses

I used the genbank files to visualise synteny using the tool [PyGenomeViz](https://pygenomeviz.streamlit.app/). Sequence similarity was calculated using MMseqs for reciprocal best-hit CDS search.

Overall, the genomes are very similar in terms of gene order and gene content, but there are some very minor differences. 


### 4.1. Gene order

Here, I focus more on protein coding genes, and not on tRNA sequences. Two main differences could be observed. 

1. **psaD inversion**  
The psaD gene was inverted in the lineage comprising TARA_CHLORO_00478, TARA_CHLORO_00158, and TARA_CHLORO_00332.

2. **psaB - groEL region inversion**  
A nearly 10 kbp region spanning from the psaB gene to the groEL gene is inverted in the lineage comprising TARA_CHLORO_00478 and TARA_CHLORO_00158.

## References

Parada, A. E., Needham, D. M., & Fuhrman, J. A. (2016). Every base matters: assessing small subunit rRNA primers for marine microbiomes with mock communities, time series and global field samples. Environmental microbiology, 18(5), 1403-1414. https://doi.org/10.1111/1462-2920.13023

Vernette, C., Henry, N., Lecubin, J., de Vargas, C., Hingamp, P., & Lescot, M. (2021). The Ocean barcode atlas: A web service to explore the biodiversity and biogeography of marine organisms. Molecular Ecology Resources, 21(4), 1347-1358. https://doi.org/10.1111/1755-0998.13322

Sichert, A., Corzett, C. H., Schechter, M. S., Unfried, F., Markert, S., Becher, D., ... & Hehemann, J. 
H. (2020). Verrucomicrobia use hundreds of enzymes to digest the algal polysaccharide fucoidan. Nature m
icrobiology, 5(8), 1026-1039. https://doi.org/10.1038/s41564-020-0720-2