# Otatea (only) trees


## Otatea 1


In this notebook, I reduced the outgroup only to 2 samples. I will branch bambus1 with min_sample_locus=2 (further filtering will be done with window extacter)

The command used is:
```bash
ipyrad -p params-bambus1.txt -b otatea1 MM_A1 MM_A2 MM_A3 MM_A4 MM_A5 MM_B1 MM_B2 MM_B3 MM_B4 MM_B5 MM_C1 MM_C2 MM_C3 MM_C4 MM_C8 MM_D1 MM_D2 MM_D3 MM_D4 MM_E1 MM_E2 MM_E3 MM_E4 MM_F1 MM_F2 MM_F3 MM_F4 MM_G1 MM_G2 MM_G3 MM_G4 MM_H1 MM_H2 MM_H3 MM_H4 MM_H7 MM_A6 MM_C5
```

In [37]:
import ipyrad.analysis as ipa

In [38]:
# using winddow extracter to filter bad samples

# path for hdf5
SEQS = "./otatea1_outfiles/otatea1.seqs.hdf5"

ERROR! Session/line number was not unique in database. History logging moved to new session 32


In [39]:
# ##solving bytes problems in hdf5
# # load h5py module
# import h5py
# import numpy as np


# #iterate over all names and change them to bytes
# with h5py.File(SEQS, "a") as io5:
#     names = io5["phymap"].attrs["phynames"]
#     del io5["phymap"].attrs["phynames"]
#     io5["phymap"].attrs["phynames"] = np.asarray([str.encode(i) for i in names], dtype=bytes)

In [40]:
# check the number of loci to be sure include all in the analysis.
radloci = ipa.window_extracter(SEQS)
n_radloci = radloci.scaffold_table.shape[0]

loci_toUse = radloci.scaffold_table.index[:-1].tolist() #all loci

In [None]:
wex = ipa.window_extracter(data=SEQS,
                           scaffold_idxs=loci_toUse,
                           # exclude=bad_samples, #maintaining all samples
                           name="otatea1",
                           mincov=2,
                           # rmincov=0.1,
                          )

In [None]:
wex.stats

In [None]:
wex.run(force=True)

In [None]:
rax = ipa.raxml(wex.outfile, name=wex.name, T=40, N=100, m="GTRCAT")
print(rax.command)

In [None]:
rax.run(force=True)

In [76]:
rax.trees

bestTree                   /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml/RAxML_bestTree.otatea1
bipartitions               /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml/RAxML_bipartitions.otatea1
bipartitionsBranchLabels   /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml/RAxML_bipartitionsBranchLabels.otatea1
bootstrap                  /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml/RAxML_bootstrap.otatea1
info                       /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml/RAxML_info.otatea1

In [77]:
import toytree
import pandas

In [78]:
tree = toytree.tree("./analysis-raxml/RAxML_bipartitions.otatea1")

#import real names as dict
names = pandas.read_csv("nombres_especies_RAD.csv", index_col=0, squeeze=True, usecols=[0,1])
names = names.to_dict() # put them in dict form

# new_tips = [f"{names[tip]}_{tip[3:]}" for tip in tree.get_tip_labels()]

# set dict with new tips (real name and code)
new_tips = {tip:f"{names[tip]}_{tip[3:]}" for tip in tree.get_tip_labels()}

# alter the tree itself to mantain new names
tree = tree.set_node_values(feature="name", values=new_tips)

# root tree in Guadua
tree = tree.root(wildcard="G.")

# collapse low support
tree = tree.collapse_nodes(min_support=50)

# set colors for very bad samples (few reads) AFTER ROOTING
# bad_samples = ["MM_B3","MM_B2"]  #"MM_C8","MM_H1"
# new_colors = ["red" if tip in [f"{names[i]}_{i[3:]}" for i in bad_samples] else "black" for tip in tree.get_tip_labels()]

c,a,m = tree.draw(width=1000, 
                  node_labels=tree.get_node_values("support"), 
                  node_sizes=15,
                  # tip_labels_colors=new_colors,
                 )

In [79]:
import toyplot.svg
toyplot.svg.render(c, f"otatea1.svg")

## Otatea 7


In this case I put only only 3 olmeca species as outgroup (MM_D5 MM_G5 MM_F5)

The command used is:
```bash
ipyrad -p params-bambus7.txt -b otatea7 MM_A1 MM_A2 MM_A3 MM_A4 MM_A5 MM_B1 MM_B2 MM_B3 MM_B4 MM_B5 MM_C1 MM_C2 MM_C3 MM_C4 MM_C8 MM_D1 MM_D2 MM_D3 MM_D4 MM_E1 MM_E2 MM_E3 MM_E4 MM_F1 MM_F2 MM_F3 MM_F4 MM_G1 MM_G2 MM_G3 MM_G4 MM_H1 MM_H2 MM_H3 MM_H4 MM_H7 MM_D5 MM_G5 MM_F5
```

In [32]:
import ipyrad.analysis as ipa

In [33]:
# using winddow extracter to filter bad samples

# path for hdf5
SEQS = "./otatea7_outfiles/otatea7.seqs.hdf5"

In [34]:
# ##solving bytes problems in hdf5
# # load h5py module
# import h5py
# import numpy as np


# #iterate over all names and change them to bytes
# with h5py.File(SEQS, "a") as io5:
#     names = io5["phymap"].attrs["phynames"]
#     del io5["phymap"].attrs["phynames"]
#     io5["phymap"].attrs["phynames"] = np.asarray([str.encode(i) for i in names], dtype=bytes)

In [35]:
# check the number of loci to be sure include all in the analysis.
radloci = ipa.window_extracter(SEQS)
n_radloci = radloci.scaffold_table.shape[0]

loci_toUse = radloci.scaffold_table.index[:-1].tolist() #all loci

In [36]:
wex = ipa.window_extracter(data=SEQS,
                           scaffold_idxs=loci_toUse,
                           # exclude=bad_samples, #maintaining all samples
                           name="otatea7",
                           mincov=3,
                           # rmincov=0.1,
                           exclude = ["reference"]
                          )

In [37]:
wex.stats

Unnamed: 0,scaffold,start,end,sites,snps,missing,samples
0,concatenated,0,5062289,5062289,102812,0.687,39


In [38]:
wex.run(force=True)

Wrote data to /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-window_extracter/otatea7.phy


In [39]:
rax = ipa.raxml(wex.outfile, name=wex.name, T=40, N=100, m="GTRCAT")
print(rax.command)

/home/camayal/miniconda3/envs/ipyrad/bin/raxmlHPC-PTHREADS-AVX2 -f a -T 40 -m GTRCAT -n otatea7 -w /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml -s /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-window_extracter/otatea7.phy -p 54321 -N 100 -x 12345


In [40]:
rax.run(force=True)

job otatea7 finished successfully


In [41]:
rax.trees

bestTree                   /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml/RAxML_bestTree.otatea7
bipartitions               /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml/RAxML_bipartitions.otatea7
bipartitionsBranchLabels   /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml/RAxML_bipartitionsBranchLabels.otatea7
bootstrap                  /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml/RAxML_bootstrap.otatea7
info                       /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml/RAxML_info.otatea7

In [42]:
import toytree
import pandas

In [44]:
tree = toytree.tree("./analysis-raxml/RAxML_bipartitions.otatea7")

#import real names as dict
names = pandas.read_csv("nombres_especies_RAD.csv", index_col=0, squeeze=True, usecols=[0,1])
names = names.to_dict() # put them in dict form
names["reference"] = "ref"

# new_tips = [f"{names[tip]}_{tip[3:]}" for tip in tree.get_tip_labels()]

# set dict with new tips (real name and code)
new_tips = {tip:f"{names[tip]}_{tip[3:]}" for tip in tree.get_tip_labels()}

# alter the tree itself to mantain new names
tree = tree.set_node_values(feature="name", values=new_tips)

# root tree in Guadua
tree = tree.root(wildcard="Ol.")

# collapse low support
tree = tree.collapse_nodes(min_support=50)

# set colors for very bad samples (few reads) AFTER ROOTING
# bad_samples = ["MM_B3","MM_B2"]  #"MM_C8","MM_H1"
# new_colors = ["red" if tip in [f"{names[i]}_{i[3:]}" for i in bad_samples] else "black" for tip in tree.get_tip_labels()]

c,a,m = tree.draw(width=1000, 
                  node_labels=tree.get_node_values("support"), 
                  node_sizes=15,
                  # tip_labels_colors=new_colors,
                 )

In [45]:
import toyplot.svg
toyplot.svg.render(c, f"otatea7.svg")

## Otatea7_fullOlmeca


In this case I put only as outgroup all olmeca species as outgroup (MM_D5 MM_G5 MM_F5)

The command used is:
```bash
ipyrad -p params-bambus7.txt -b otatea7_fullOlmeca MM_A1 MM_A2 MM_A3 MM_A4 MM_A5 MM_B1 MM_B2 MM_B3 MM_B4 MM_B5 MM_C1 MM_C2 MM_C3 MM_C4 MM_C8 MM_D1 MM_D2 MM_D3 MM_D4 MM_E1 MM_E2 MM_E3 MM_E4 MM_F1 MM_F2 MM_F3 MM_F4 MM_G1 MM_G2 MM_G3 MM_G4 MM_H1 MM_H2 MM_H3 MM_H4 MM_H7 MM_D5 MM_G5 MM_F5 MM_C5 MM_E5
```

In [32]:
import ipyrad.analysis as ipa

In [34]:
# using winddow extracter to filter bad samples

# path for hdf5
SEQS = "./otatea7_fullOlmeca_outfiles/otatea7_fullOlmeca.seqs.hdf5"

In [48]:
# ##solving bytes problems in hdf5
# # load h5py module
# import h5py
# import numpy as np


# #iterate over all names and change them to bytes
# with h5py.File(SEQS, "a") as io5:
#     names = io5["phymap"].attrs["phynames"]
#     del io5["phymap"].attrs["phynames"]
#     io5["phymap"].attrs["phynames"] = np.asarray([str.encode(i) for i in names], dtype=bytes)

In [49]:
# check the number of loci to be sure include all in the analysis.
radloci = ipa.window_extracter(SEQS)
n_radloci = radloci.scaffold_table.shape[0]

loci_toUse = radloci.scaffold_table.index[:-1].tolist() #all loci

In [23]:
wex = ipa.window_extracter(data=SEQS,
                           scaffold_idxs=loci_toUse,
                           # exclude=bad_samples, #maintaining all samples
                           name="otatea7_fullOlmeca_min2",
                           mincov=2,
                           # rmincov=0.1,
                           exclude = ["reference"]
                          )

In [24]:
wex.stats, wex.name

(       scaffold  start      end    sites   snps  missing  samples
 0  concatenated      0  6295444  6295444  90854    0.729       36,
 'otatea7_fullOlmeca_min2')

In [None]:
wex.stats, wex.name

Unnamed: 0,scaffold,start,end,sites,snps,missing,samples
0,concatenated,0,4371244,4371244,111324,0.655,41


In [25]:
wex.run(force=True)

Wrote data to /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-window_extracter/otatea7_fullOlmeca_min2.phy


In [86]:
rax = ipa.raxml(wex.outfile, name=wex.name, T=40, N=100, m="GTRCAT")
print(rax.command)

/home/camayal/miniconda3/envs/ipyrad/bin/raxmlHPC-PTHREADS-AVX2 -f a -T 40 -m GTRCAT -n otatea7_fullOlmeca -w /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml -s /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-window_extracter/otatea7_fullOlmeca.phy -p 54321 -N 100 -x 12345


In [87]:
rax.run(force=True)

job otatea7_fullOlmeca finished successfully


In [88]:
rax.trees

bestTree                   /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml/RAxML_bestTree.otatea7_fullOlmeca
bipartitions               /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml/RAxML_bipartitions.otatea7_fullOlmeca
bipartitionsBranchLabels   /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml/RAxML_bipartitionsBranchLabels.otatea7_fullOlmeca
bootstrap                  /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml/RAxML_bootstrap.otatea7_fullOlmeca
info                       /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml/RAxML_info.otatea7_fullOlmeca

In [89]:
import toytree
import pandas

In [39]:
tree = toytree.tree("./analysis-raxml/RAxML_bipartitions.otatea7_fullOlmeca_min2")

#import real names as dict
names = pandas.read_csv("nombres_especies_RAD.csv", index_col=0, squeeze=True, usecols=[0,1])
names = names.to_dict() # put them in dict form
names["reference"] = "ref"

# new_tips = [f"{names[tip]}_{tip[3:]}" for tip in tree.get_tip_labels()]

# set dict with new tips (real name and code)
new_tips = {tip:f"{names[tip]}_{tip[3:]}" for tip in tree.get_tip_labels()}

# alter the tree itself to mantain new names
tree = tree.set_node_values(feature="name", values=new_tips)

# root tree in Guadua
tree = tree.root(wildcard="Ol.")

# collapse low supports
tree = tree.collapse_nodes(min_support=50)

# set colors for very bad samples (few reads) AFTER ROOTING
# bad_samples = ["MM_B3","MM_B2"]  #"MM_C8","MM_H1"
# new_colors = ["red" if tip in [f"{names[i]}_{i[3:]}" for i in bad_samples] else "black" for tip in tree.get_tip_labels()]

c,a,m = tree.draw(width=1000, 
                  node_labels=tree.get_node_values("support"), 
                  node_sizes=15,
                  # tip_labels_colors=new_colors,
                 )

In [91]:
import toyplot.svg
toyplot.svg.render(c, f"otatea7_fullOlmeca_mincov2.svg")

Values of missing data for all experiments

```
{'otatea7_fullOlmeca_min2':        scaffold  start      end    sites    snps  missing  samples
 0  concatenated      0  6971558  6971558  112990    0.762       41,
 'otatea7_fullOlmeca_min3':        scaffold  start      end    sites    snps  missing  samples
 0  concatenated      0  5218909  5218909  112778    0.699       41,
 'otatea7_fullOlmeca_min4':        scaffold  start      end    sites    snps  missing  samples
 0  concatenated      0  4371244  4371244  111324    0.655       41,
 'otatea7_fullOlmeca_min5':        scaffold  start      end    sites    snps  missing  samples
 0  concatenated      0  3787908  3787908  106232    0.616       41}
 ```

## Only otatea without outgroup

ipyrad -p params-bambus7.txt -b otatea7_ONLY MM_A1 MM_A2 MM_A3 MM_A4 MM_A5 MM_B1 MM_B2 MM_B3 MM_B4 MM_B5 MM_C1 MM_C2 MM_C3 MM_C4 MM_C8 MM_D1 MM_D2 MM_D3 MM_D4 MM_E1 MM_E2 MM_E3 MM_E4 MM_F1 MM_F2 MM_F3 MM_F4 MM_G1 MM_G2 MM_G3 MM_G4 MM_H1 MM_H2 MM_H3 MM_H4 MM_H7

In [1]:
import ipyrad.analysis as ipa

In [4]:
# using winddow extracter to filter bad samples

# path for hdf5
SEQS = "./otatea7_ONLY_outfiles/otatea7_ONLY.seqs.hdf5"

In [5]:
# # ##solving bytes problems in hdf5
# # # load h5py module
# import h5py
# import numpy as np


# #iterate over all names and change them to bytes
# with h5py.File(SEQS, "a") as io5:
#     names = io5["phymap"].attrs["phynames"]
#     del io5["phymap"].attrs["phynames"]
#     io5["phymap"].attrs["phynames"] = np.asarray([str.encode(i) for i in names], dtype=bytes)

In [6]:
# check the number of loci to be sure include all in the analysis.
radloci = ipa.window_extracter(SEQS)
n_radloci = radloci.scaffold_table.shape[0]

loci_toUse = radloci.scaffold_table.index[:-1].tolist() #all loci

In [9]:
wex = ipa.window_extracter(data=SEQS,
                           scaffold_idxs=loci_toUse,
                           # exclude=bad_samples, #maintaining all samples
                           name="otatea7_ONLY",
                           mincov=2,
                           # rmincov=0.1,
                           exclude = ["reference"]
                          )

In [10]:
wex.stats

Unnamed: 0,scaffold,start,end,sites,snps,missing,samples
0,concatenated,0,6295444,6295444,90854,0.729,36


In [11]:
wex.run(force=True)

Wrote data to /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-window_extracter/otatea7_ONLY.phy


In [12]:
rax = ipa.raxml(wex.outfile, name=wex.name, T=40, N=100, m="GTRCAT")
print(rax.command)

/home/camayal/miniconda3/envs/ipyrad/bin/raxmlHPC-PTHREADS-AVX2 -f a -T 40 -m GTRCAT -n otatea7_ONLY -w /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml -s /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-window_extracter/otatea7_ONLY.phy -p 54321 -N 100 -x 12345


In [13]:
rax.run(force=True)

job otatea7_ONLY finished successfully


In [14]:
rax.trees

bestTree                   /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml/RAxML_bestTree.otatea7_ONLY
bipartitions               /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml/RAxML_bipartitions.otatea7_ONLY
bipartitionsBranchLabels   /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml/RAxML_bipartitionsBranchLabels.otatea7_ONLY
bootstrap                  /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml/RAxML_bootstrap.otatea7_ONLY
info                       /mnt/data0/camayal/GDRIVE/otherCAML/bambus/analysis-raxml/RAxML_info.otatea7_ONLY

In [16]:
import toytree
import pandas

In [20]:
tree = toytree.tree("./analysis-raxml/RAxML_bipartitions.otatea7_ONLY")

#import real names as dict
names = pandas.read_csv("nombres_especies_RAD.csv", index_col=0, squeeze=True, usecols=[0,1])
names = names.to_dict() # put them in dict form
names["reference"] = "ref"

# new_tips = [f"{names[tip]}_{tip[3:]}" for tip in tree.get_tip_labels()]

# set dict with new tips (real name and code)
new_tips = {tip:f"{names[tip]}_{tip[3:]}" for tip in tree.get_tip_labels()}

# alter the tree itself to mantain new names
tree = tree.set_node_values(feature="name", values=new_tips)

# root tree in Guadua
tree = tree.root(["O.carrilloi_F4","O.glauca_G4"])

# collapse low support
tree = tree.collapse_nodes(min_support=50)

# set colors for very bad samples (few reads) AFTER ROOTING
# bad_samples = ["MM_B3","MM_B2"]  #"MM_C8","MM_H1"
# new_colors = ["red" if tip in [f"{names[i]}_{i[3:]}" for i in bad_samples] else "black" for tip in tree.get_tip_labels()]

c,a,m = tree.draw(width=1000, 
                  node_labels=tree.get_node_values("support"), 
                  node_sizes=15,
                  # tip_labels_colors=new_colors,
                 )

In [21]:
import toyplot.svg
toyplot.svg.render(c, f"otatea7_ONLY.svg")