# rpl36 phylogeny

In this notebook, we'll look at the rpl36 phylogeny and alignment. This is an interesting gene because [Rice and Palmer, 2006](https://doi.org/10.1186/1741-7007-4-31) found that haptophytes and cryptophytes share a laterally transferred rpl36 gene, called rpl36-c (c for cryptophyte), which is distinct from the rpl36-p type gene (p for plastid) found in other plastid types. This rpl36-c gene was horizontally transferred from bacteria (donor is difficult to identify), and has completely replaced the native rpl36 gene. 

If the new group (leptophytes) also has the rpl36-c gene:
- it provides further evidence that the new lineage is linked to cryptophytes and haptophytes

In [1]:
# Check if python is 3.10.5
import json
import os
import pandas as pd
import sys
import numpy as np
import __init__


print(sys.version)
%load_ext autoreload
%autoreload 2

3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0]


In [2]:
# we store the important data paths in PATH_FILE
PATH_FILE = "../../PATHS.json"

paths_dict = json.load(open(PATH_FILE, "r"));

## Version 3

Extract rpl36 gene from the references and MAGs in the subset phylogeny in Figure 2 (107 taxa, 93 genes).

### 3.1 Extract rpl36 gene
We prepare the file `rpl36_replaced.fasta`.

### 3.2 Align

We align with mafft-linsi.

In [3]:
## Define data folder
ALL_FASTA = paths_dict["ANALYSIS_DATA"]["RPL36"]["DATASET"]["ROOT"]

## Define output folder
MAFFT = paths_dict["ANALYSIS_DATA"]["RPL36"]["ALIGNMENTS"]["ALIGNED"]

In [None]:
%%bash -s "$ALL_FASTA" "$MAFFT"

sbatch /crex/proj/naiss2023-6-81/Mahwash/beta-Cyclocitral/uppmax_scripts/script_bin/job_mafft-linsi.sh "$1"/v3/rpl36_replaced.fasta "$2"/v3/rpl36.mafft.fasta

We trim manually to remove the ends. The final alignment was 103 taxa (after removing emoty sequences) and 48 amino acids.

In [3]:
## Define input folder
FASTA = paths_dict["ANALYSIS_DATA"]["RPL36"]["ALIGNMENTS"]["TRIMMED"]

## Define output folder
OUT = paths_dict["ANALYSIS_DATA"]["RPL36"]["TREES"]

In [None]:
%%bash -s "$FASTA" "$OUT"

sbatch /crex/proj/naiss2023-6-81/Mahwash/beta-Cyclocitral/uppmax_scripts/script_bin/job_iqtree.sh "$1"/v3/rpl36.mafft.trimmed.fasta "$2"/v3/rpl36

I didn't really use the phylogeny as it the support values were (expectedly) very low. The rpl36 alignment was plotted next to the phylogeny in Figure 2 using the script `plot_tree_rpl36aln.R` (provided in the `src` folder).

## References

