# 16S rRNA phylogeny

16S rRNA sequences were obtained by first running Prokka 1.14.6 on all genomes:

In [None]:
f="name of genome file".fna
filename=$(basename $f);
name=${filename%.fna};
prokka_ed -o prokka_$name --prefix $name --locustag $name --kingdom Bacteria --metagenome --cpus 2 $f

The 16S sequences of each file were then extracted by combining a grep command with seqtk:

In [None]:
grep 16S\ ribosomal\ input | sed "s/ .*//" | sed "s/>//" > rRNA_list_ 
seqtk subseq "name of file".ffn rRNA_list_ > rRNA_list.frn

A total of 71 rRNA sequences were found through this technique of which 11 were partial. For both untrimmed and trimmed 16S rRNA sequences, alignments were made by concatenating the files and using mafft v7.525.

In the following step iqtree2 version 2.3.6 was performed for the untrimmed 16S rRNA sequences:

In [None]:
f=mafft_16S_align_nd 
iqtree2 -s $f -m GTR+G4 -pre $f.GTRG4

![alt text](images/16S_rRNA1.png)

The tree that resulted is very messy and unreadable which is why we chose to collapse the 16s rRNA sequences per species. Sequences were collapsed when they shared 99% identity. From 71 rRNA sequences 26 remained:

In [None]:
mkdir -p clustered_output

for file in rRNA_list*.frn; do
    base=$(basename "$file" .frn)
    cd-hit -i "$file" -o "clustered_output/${base}_cdhit" -c 0.99 -n 5
done


![alt text](images/16S_rRNA_tree_collapsed.png)

In the tree you can see that most of the sequences have been concatenated except for the sequences belonging to Sodalis ligni (free living) and Sodalis pierantonius (endosymbiont). We chose to leave it unrooted as this tree is only used to see how similar the sequences are to each other. As Sodalis Ligni is a free-living species it makes sense that there are still multiple copies of the same gene. However, Sodalis Pierantonius is an endosymbiont. In the species tree Pierantonius is still fairly close to the root and so it should have had time to lose duplicate genes. It coud be possible that Pierantonius has not yet lost the gene that is needed to have recombination. 

# Mummer

To check how much the genomes have diverged from one another, synteny between species was checked by making mummerplots. Additionally we checked how similar the two additional genomes were to each other. As we use genomes of different species, promer version 3.07 was used instead of nucmer. Mummerplot 3.5 was then used to get a visualisation:



In [9]:
promer --maxmatch -p "output_name" "Genome_file_1".fna "Genome_file_2".fna
mummerplot --color "output_name".delta --png -p "new_output_name"

SyntaxError: invalid syntax (3813182431.py, line 1)

A total of six different mummerplots were generated:

- Sodalis sp. strain CWE vs Mikella endobia strain MEPMAR
- Dickeya dadantii strain X112 vs Pectobacterium cacticida strain CFBP3628
- Mikella endobia strain MEPMAR vs Sodalis ligni strain dw23
- Sodalis pierantonius strain SOPE vs Sodalis praecaptivus strain HS1
- Mikella endobia strain MEPMAR vs Sodalis ligni strain dw23
- Sodalis sp. strain Et.F2 vs Sodalis pierantonius strain SOPE

These combinations were chosen as all genomes except for Sodalis sp. strain CWE had complete genomes which resulted in readable mummerplots.

Comparison of two most recent species according to the tree:
- Sodalis sp. strain CWE vs Mikella endobia strain MEPMAR

Comparison  free-living and free-living:
- Dickeya dadantii strain X112 vs Pectobacterium cacticida strain CFBP3628

Comparison endosymbionts and free-living species could be compared:
- Moranella endobia strain PCVAL vs Sodalis ligni strain dw23
- Mikella endobia strain MEPMAR vs Sodalis ligni strain dw23

Comparison within the same clade: 
- Sodalis sp. strain Et.F2 vs Sodalis pierantonius strain SOPE
- Sodalis pierantonius strain SOPE vs Sodalis praecaptivus strain HS1



Sodalis sp. strain CWE vs Mikella endobia strain MEPMAR


![alt text](images/CWEvsMikella.delta.png)
There is no clear synteny to be seen but they are closely related together in the tree and furthest away from the root. 



Dickeya dadantii strain X112 vs Pectobacterium cacticida strain CFBP3628

![alt text](images/DickeyavsPecto.delta.png)In the bottom right and top left a bit of synteny can be seen. These pieces of synteny probably show important proteins needed for basic function.

Moranella endobia strain PCVAL vs Sodalis ligni strain dw23

![alt text](images/MoranellavsLigni.delta.png)
No synteny can be seen. This is probably as Ligni is a free living species and Moranella is an endosymbiont from a different genus.

Mikella endobia strain MEPMAR vs Sodalis ligni strain dw23

![alt text](images/MikellavsLigni.delta.png)
Here too no clear synteny can be seen. This is logical as one is free living and the other is an endosymbiont. Additionally they do not appear to be close on the species tree.

Sodalis sp. strain Et.F2 vs Sodalis pierantonius strain SOPE

![alt text](images/EtF2vspierantonius.delta.png)
A bit of synteny can be seen which is logical as they are from within the same clade.

Sodalis pierantonius strain SOPE vs Sodalis praecaptivus strain HS1

![alt text](images/PierantoniusvsPraecaptivus.delta.png)
There is synteny as they are in the same clade but there is also a horizontal line, possibly repeats or mobile elements.