# Phylogeny of *Muscari* using genomic ddRAD data
In this Notebook I document the results ...

In [1]:
## conda install ipyrad -c ipyrad
## conda install toytree -c conda-forge
## conda install sra-tools -c bioconda
## conda install entrez-direct -c bioconda

In [1]:
## import
import ipyrad as ip
import ipyrad.analysis as ipa
import ipyparallel as ipp
import pandas as pd
import toytree
import toyplot

## print Version of ipyrad und toytree
print("ipyrad v. {}".format(ip.__version__))
print("toytree v. {}".format(toytree.__version__))

## print Version of Python
from platform import python_version
print("Python v.", python_version())

ipyrad v. 0.9.64
toytree v. 2.0.5
Python v. 3.7.9


#### Parallel processes on independent Python kernels
To start a parallel client you must run the command-line program 'ipcluster'. This will essentially start a number of independent Python processes (kernels) which we can then send bits of work to do. The cluster can be stopped and restarted independently of this notebook, which is convenient for working on a cluster where connecting to many cores is not always immediately available.

Open a terminal and type the following command to start an ipcluster instance with N engines.

In [3]:
## ipcluster start --n=16

In [2]:
## connect to cluster
ipyclient = ipp.Client()
print(ip.cluster_info(ipyclient))

Parallel connection | Cryptantha: 64 cores
None


## Data Assembly
### Create an Assembly object and modify *ipyrad* params file
This object stores the parameters of the assembly and the organization of the data

In [7]:
## Provide a name for the assembly
data = ip.Assembly("Muscari")

New Assembly: Muscari


In [8]:
## set parameters
data.set_params("project_dir", "Mus_Assembly")
data.set_params("sorted_fastq_path", "./Mus_fastq/*.fastq.gz")
data.set_params("clust_threshold", "0.85")
data.set_params("max_Hs_consens", (0.05))
data.set_params("restriction_overhang", ('TGCAG', 'GGCC'))
data.set_params("output_formats", "*")
data.set_params("datatype", "ddrad")

## see / print all parameters
data.get_params()

0   assembly_name               Muscari                                      
1   project_dir                 ./Mus_Assembly                               
2   raw_fastq_path                                                           
3   barcodes_path                                                            
4   sorted_fastq_path           ./Mus_fastq/*.fastq.gz                       
5   assembly_method             denovo                                       
6   reference_sequence                                                       
7   datatype                    ddrad                                        
8   restriction_overhang        ('TGCAG', 'GGCC')                            
9   max_low_qual_bases          5                                            
10  phred_Qscore_offset         33                                           
11  mindepth_statistical        6                                            
12  mindepth_majrule            6                               

### Assemble the data from step 1 to 6

In [9]:
## run steps 1 & 2 of the assembly
data.run("12", force = True)

Parallel connection | Cryptantha: 64 cores
[####################] 100% 0:00:04 | loading reads        | s1 |
[####################] 100% 0:00:22 | processing reads     | s2 |


In [11]:
## set cluster treshold to 85 && run assembly steps 3-6
data_clust85 = data.branch("data_clust85")
data_clust85.set_params("clust_threshold", 0.85)
data_clust85.run("3456", force = True)

Parallel connection | Cryptantha: 64 cores
[####################] 100% 0:00:04 | dereplicating        | s3 |
[####################] 100% 0:06:57 | clustering/mapping   | s3 |
[####################] 100% 0:00:01 | building clusters    | s3 |
[####################] 100% 0:00:00 | chunking clusters    | s3 |
[####################] 100% 0:04:10 | aligning clusters    | s3 |
[####################] 100% 0:00:16 | concat clusters      | s3 |
[####################] 100% 0:00:01 | calc cluster stats   | s3 |
[####################] 100% 0:00:14 | inferring [H, E]     | s4 |
[####################] 100% 0:00:01 | calculating depths   | s5 |
[####################] 100% 0:00:02 | chunking clusters    | s5 |
[####################] 100% 0:01:51 | consens calling      | s5 |
[####################] 100% 0:00:03 | indexing alleles     | s5 |
[####################] 100% 0:00:05 | concatenating inputs | s6 |
[####################] 100% 0:06:20 | clustering across    | s6 |
[####################] 100% 0:00:

In [77]:
## show assemby stats until step 6
## data.stats.sort_values(by=['reads_consens'])
data_clust85.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,clusters_total,clusters_hidepth,hetero_est,error_est,reads_consens
Bellevalia_dubia_W6083,6,1030736,1029284,95294,20981,0.013,0.003,19338
Bellevalia_paradoxa_ED1272,6,1636142,1634727,108498,30267,0.013,0.003,27749
Bellevalia_speciosa_W6085,6,1416391,1414294,95536,25347,0.016,0.003,22957
Brimeura_amethystina_W6084,6,1554459,1551802,424844,28296,0.034,0.005,20711
Leopoldia_caucasica_ED1262,6,1462581,1461153,77305,22469,0.013,0.003,20653
Leopoldia_comosa_ED1256,6,1299389,1298312,90402,25831,0.015,0.003,23363
Leopoldia_comosa_ED1274,6,1464810,1463759,90898,24322,0.015,0.002,22075
Leopoldia_comosa_ED3539,6,2065757,2064748,368808,46895,0.013,0.003,42303
Leopoldia_comosa_ED3965,6,1232250,1231244,94455,25479,0.013,0.003,23472
Leopoldia_cycladica_W6082,6,1664161,1661171,152200,36680,0.025,0.003,30683


In [71]:
## set cluster treshold to 90 && run assembly steps 3-6
data_clust90 = data.branch("data_clust90")
data_clust90.set_params("clust_threshold", 0.90)
data_clust90.run("3456", force = True)

Parallel connection | Cryptantha: 64 cores
[####################] 100% 0:00:04 | dereplicating        | s3 |
[####################] 100% 0:07:54 | clustering/mapping   | s3 |
[####################] 100% 0:00:01 | building clusters    | s3 |
[####################] 100% 0:00:00 | chunking clusters    | s3 |
[####################] 100% 0:04:11 | aligning clusters    | s3 |
[####################] 100% 0:00:16 | concat clusters      | s3 |
[####################] 100% 0:00:01 | calc cluster stats   | s3 |
[####################] 100% 0:00:14 | inferring [H, E]     | s4 |
[####################] 100% 0:00:01 | calculating depths   | s5 |
[####################] 100% 0:00:01 | chunking clusters    | s5 |
[####################] 100% 0:01:47 | consens calling      | s5 |
[####################] 100% 0:00:02 | indexing alleles     | s5 |
[####################] 100% 0:00:05 | concatenating inputs | s6 |
[####################] 100% 0:08:47 | clustering across    | s6 |
[####################] 100% 0:00:

In [74]:
## show assemby stats until step 6
## data.stats.sort_values(by=['reads_consens'])
data_clust90.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,clusters_total,clusters_hidepth,hetero_est,error_est,reads_consens
Bellevalia_dubia_W6083,6,1030736,1029284,99978,21634,0.01,0.002,20435
Bellevalia_paradoxa_ED1272,6,1636142,1634727,116174,31020,0.01,0.002,29197
Bellevalia_speciosa_W6085,6,1416391,1414294,102373,26225,0.012,0.002,24454
Brimeura_amethystina_W6084,6,1554459,1551802,458877,28228,0.023,0.004,22748
Leopoldia_caucasica_ED1262,6,1462581,1461153,81510,23058,0.01,0.002,21710
Leopoldia_comosa_ED1256,6,1299389,1298312,96020,26541,0.011,0.003,24753
Leopoldia_comosa_ED1274,6,1464810,1463759,96268,25062,0.011,0.002,23433
Leopoldia_comosa_ED3539,6,2065757,2064748,378944,47433,0.01,0.003,43841
Leopoldia_comosa_ED3965,6,1232250,1231244,98231,26091,0.009,0.002,24631
Leopoldia_cycladica_W6082,6,1664161,1661171,160429,38312,0.021,0.003,33215


In [10]:
## set cluster treshold to 95 && run assembly steps 3-6
data_clust95 = data.branch("data_clust95")
data_clust95.set_params("clust_threshold", 0.95)
data_clust95.run("3456", force = True)

Parallel connection | Cryptantha: 64 cores
[####################] 100% 0:00:04 | dereplicating        | s3 |
[####################] 100% 0:09:28 | clustering/mapping   | s3 |
[####################] 100% 0:00:01 | building clusters    | s3 |
[####################] 100% 0:00:00 | chunking clusters    | s3 |
[####################] 100% 0:04:07 | aligning clusters    | s3 |
[####################] 100% 0:00:17 | concat clusters      | s3 |
[####################] 100% 0:00:01 | calc cluster stats   | s3 |
[####################] 100% 0:00:13 | inferring [H, E]     | s4 |
[####################] 100% 0:00:01 | calculating depths   | s5 |
[####################] 100% 0:00:02 | chunking clusters    | s5 |
[####################] 100% 0:01:40 | consens calling      | s5 |
[####################] 100% 0:00:03 | indexing alleles     | s5 |
[####################] 100% 0:00:06 | concatenating inputs | s6 |
[####################] 100% 0:13:19 | clustering across    | s6 |
[####################] 100% 0:00:

## Final assembly with different `min_samples_locus` settings for different analysis

1. Phylogenetic analysis 
    - RAxML
    - MrBayes
    - tetRAD
2. Population analysis
    - PCA
    - STRUCTURE
    - TreeMix
3. Test for introgression using abba-baba test
    - ...
    
#### In case comming back to continue from here, load assembly object to continue after step 6

In [76]:
## load assembly object when comming back
data = ip.load_json("./Mus_Assembly/Muscari.json")
data_clust85 = ip.load_json("./Mus_Assembly/data_clust85.json")
data_clust90 = ip.load_json("./Mus_Assembly/data_clust90.json")

## check again the stat-file sorted by number of consensus reads
#data.stats.sort_values(by=['reads_consens'])

## check name
#data.stats

loading Assembly: Muscari
from saved path: ~/GBS/Muscari/Mus_Assembly/Muscari.json
loading Assembly: data_clust85
from saved path: ~/GBS/Muscari/Mus_Assembly/data_clust85.json
loading Assembly: data_clust90
from saved path: ~/GBS/Muscari/Mus_Assembly/data_clust90.json


### 1. Assembly for Phylogenetic analysis
#### *But first lets exclude samples with low read number (< 1000 reads after step 6), which are outsite the target group or with odd placements in preliminary analysis:*

**Samples outsite the target group are:**
- ...

In [None]:
## exclude samples from assembly with ...
keep_list = [i for i in data.samples.keys() if i not in [
    ## ... low read number (< 5000 )
    #"", "",
    
    ## ... other samples to exclude
    "", "", "",
]]

## make a new data branch from the keep_list
data = data.branch("data", subsamples = keep_list, force = True)

## double check taxon sampling
#data.stats.sort_values(by=['reads_consens'])
data.stats

In [None]:
################################################################
#############    TEMPLATE :::: do not run    ###################
################################################################

## ::: Template for step 7 assembly with in- and outgroup ::: ##
## create a branch for outputs with min_samples = x
min4 = data.branch("min_4")
min4.set_params("min_samples_locus", 4)
min4.run("7")

## ::: Template for step 7 assembly with in- and outgroup ::: ##
## create a branch for outputs with min_samples = x BUT only for ingroup
pops = data.branch("pops")
pops.population = {
    "ingroup": (20, [i for i in pops.samples if "Frai" in i]),
    "outgroup": (0, [i for i in pops.samples if "Frai" not in i])
}
pops.run("7", force = True)

################################################################
#############    TEMPLATE :::: do not run    ###################
################################################################

In [78]:
## ::::::: WORK IN PROGRESS
## WRITE THE RESULTS OF THE PERCENTAGE LOOP INTO A DICTIONARY 
## WHICH THEN CAN BE USED IN THE FOLLOWING STEPS
## INSTEAD OF MAKING THE DICTIONARY BY HAND 


## first check number of remaining samples
ingroup = data_clust85.stats.state.count() - 4
print("Number of ingroup taxa:", ingroup)
print("Calculate different sets of missing data:")

## for loop to calculate different values for min_sample_locus
percent = [10, 15, 20, 25, 30, 35, 40]
for i in percent:
    res = ingroup / 100 * i
    print(i,"% = ", round(res))

Number of ingroup taxa: 39
Calculate different sets of missing data:
10 % =  4
15 % =  6
20 % =  8
25 % =  10
30 % =  12
35 % =  14
40 % =  16


In [79]:
## Cluster Treshold 85
## -------------------
## Run the final assembly step 7 through for loop with different min_sample_locus
## based on estimated number of remaining samples MINUS outgroup

## make a dictionary with the percentage of missing data as keys and 
## the actual min_sample_locus specified as values based on the number of "ingroup samples"
sample_dict = {10: 4,
               15: 6,
               20: 8,
               25: 10,
               30: 12,
               35: 14,
               40: 16}

## loop over the dictionary 
for key, value in sample_dict.items():
    newname = "pops{}_clust85".format(key)
    newdata = data_clust85.branch(newname)
    newdata.populations = {
        "ingroup":  (value, [i for i in newdata.samples if "B" not in i]),
        "outgroup": (0,     [i for i in newdata.samples if "B" in i]),
         }
    ## run final step on every interation of the loop
    newdata.run("7", force = True)

Parallel connection | Cryptantha: 64 cores
[####################] 100% 0:00:03 | applying filters     | s7 |
[####################] 100% 0:00:16 | building arrays      | s7 |
[####################] 100% 0:00:07 | writing conversions  | s7 |
[####################] 100% 0:00:17 | indexing vcf depths  | s7 |
[####################] 100% 0:00:49 | writing vcf output   | s7 |
Parallel connection | Cryptantha: 64 cores
[####################] 100% 0:00:04 | applying filters     | s7 |
[####################] 100% 0:00:10 | building arrays      | s7 |
[####################] 100% 0:00:04 | writing conversions  | s7 |
[####################] 100% 0:00:06 | indexing vcf depths  | s7 |
[####################] 100% 0:00:29 | writing vcf output   | s7 |
Parallel connection | Cryptantha: 64 cores
[####################] 100% 0:00:04 | applying filters     | s7 |
[####################] 100% 0:00:07 | building arrays      | s7 |
[####################] 100% 0:00:03 | writing conversions  | s7 |
[############

In [75]:
## Cluster Treshold 90
## -------------------
## Run the final assembly step 7 through for loop with different min_sample_locus
## based on estimated number of remaining samples MINUS outgroup

## make a dictionary with the percentage of missing data as keys and 
## the actual min_sample_locus specified as values based on the number of "ingroup samples"
sample_dict = {10: 4,
               15: 6,
               20: 8,
               25: 10,
               30: 12,
               35: 14,
               40: 16}

## loop over the dictionary 
for key, value in sample_dict.items():
    newname = "pops{}_clust90".format(key)
    newdata = data_clust90.branch(newname)
    newdata.populations = {
        "ingroup":  (value, [i for i in newdata.samples if "B" not in i]),
        "outgroup": (0,     [i for i in newdata.samples if "B" in i]),
         }
    ## run final step on every interation of the loop
    newdata.run("7", force = True)

Parallel connection | Cryptantha: 64 cores
[####################] 100% 0:00:05 | applying filters     | s7 |
[####################] 100% 0:00:19 | building arrays      | s7 |
[####################] 100% 0:00:08 | writing conversions  | s7 |
[####################] 100% 0:00:23 | indexing vcf depths  | s7 |
[####################] 100% 0:00:56 | writing vcf output   | s7 |
Parallel connection | Cryptantha: 64 cores
[####################] 100% 0:00:03 | applying filters     | s7 |
[####################] 100% 0:00:13 | building arrays      | s7 |
[####################] 100% 0:00:05 | writing conversions  | s7 |
[####################] 100% 0:00:09 | indexing vcf depths  | s7 |
[####################] 100% 0:00:37 | writing vcf output   | s7 |
Parallel connection | Cryptantha: 64 cores
[####################] 100% 0:00:03 | applying filters     | s7 |
[####################] 100% 0:00:09 | building arrays      | s7 |
[####################] 100% 0:00:03 | writing conversions  | s7 |
[############

In [11]:
## Cluster Treshold 95
## -------------------
## Run the final assembly step 7 through for loop with different min_sample_locus
## based on estimated number of remaining samples MINUS outgroup

## make a dictionary with the percentage of missing data as keys and 
## the actual min_sample_locus specified as values based on the number of "ingroup samples"
sample_dict = {10: 4,
               15: 6,
               20: 8,
               25: 10,
               30: 12,
               35: 14,
               40: 16}

## loop over the dictionary 
for key, value in sample_dict.items():
    newname = "pops{}_clust95".format(key)
    newdata = data_clust95.branch(newname)
    newdata.populations = {
        "ingroup":  (value, [i for i in newdata.samples if "B" not in i]),
        "outgroup": (0,     [i for i in newdata.samples if "B" in i]),
         }
    ## run final step on every interation of the loop
    newdata.run("7", force = True)

Parallel connection | Cryptantha: 64 cores
[####################] 100% 0:00:05 | applying filters     | s7 |
[####################] 100% 0:00:18 | building arrays      | s7 |
[####################] 100% 0:00:09 | writing conversions  | s7 |
[####################] 100% 0:00:25 | indexing vcf depths  | s7 |
[####################] 100% 0:00:51 | writing vcf output   | s7 |
Parallel connection | Cryptantha: 64 cores
[####################] 100% 0:00:03 | applying filters     | s7 |
[####################] 100% 0:00:12 | building arrays      | s7 |
[####################] 100% 0:00:05 | writing conversions  | s7 |
[####################] 100% 0:00:10 | indexing vcf depths  | s7 |
[####################] 100% 0:00:35 | writing vcf output   | s7 |
Parallel connection | Cryptantha: 64 cores
[####################] 100% 0:00:03 | applying filters     | s7 |
[####################] 100% 0:00:09 | building arrays      | s7 |
[####################] 100% 0:00:04 | writing conversions  | s7 |
[############

In [None]:
## Does the same as above but without ingroup and outgroup
sample_dict = {10: 4,
               15: 6,
               20: 8,
               25: 10,
               30: 11,
               35: 13,
               40: 15}

## loop over the dictionary 
for key, value in sample_dict.items():
    newname = "min_{}".format(key)
    newdata = data.branch(newname)
    newdata.set_params("min_samples_locus", value)
    newdata.run("7", force = True)

### 2. Assembly for population analysis with outgroups removed

In [60]:
## load assembly object when comming back
data_clust90 = ip.load_json("./Mus_Assembly/data_clust90.json")

## check name
#data.stats

loading Assembly: data_clust90
from saved path: ~/GBS/Muscari/Mus_Assembly/data_clust90.json


In [16]:
## exclude samples from assembly with ...
keep_list = [i for i in data_clust90.samples.keys() if i not in [
    "Bellevalia_dubia_W6083", "Bellevalia_paradoxa_ED1272",
    "Bellevalia_speciosa_W6085", "Brimeura_amethystina_W6084"
]]

## make a new data branch from the keep_list
nout_clust90 = data_clust90.branch("nout_clust90", subsamples = keep_list, force = True)

## double check taxon sampling
#data.stats.sort_values(by=['reads_consens'])
#data.stats

In [17]:
## run final assembly without outgroups and no missing data allowed for the ingroup
nout_clust90.set_params("min_samples_locus", 20)
nout_clust90.run("7", force = True)

Parallel connection | Cryptantha: 64 cores
[####################] 100% 0:00:03 | applying filters     | s7 |
[####################] 100% 0:00:03 | building arrays      | s7 |
[####################] 100% 0:00:01 | writing conversions  | s7 |
[####################] 100% 0:00:01 | indexing vcf depths  | s7 |
[####################] 100% 0:00:05 | writing vcf output   | s7 |


### 3. Assembly for D-Statistics

In [None]:
## load assembly object when comming back
data = ip.load_json("./Mus_Assembly/Muscari.json")

## check name
#data.stats

## Phylogenetic downstream analysis
First, check if you need to install additional packages which are not included in the ipyrad package dependencies. Use the following commands to install the packages in the terminal.

In [None]:
## following programs are required
# conda install toytree -c conda-forge
# conda install tetrad -c eaton-lab -c conda-forge
# conda install raxml -c bioconda

### RAxML

create a raxml analysis object for the Backbone tree
```
rax = ipa.raxml(
    name = Cris_pops30.name,
    data = Cris_pops30.outfiles.phy,
    workdir = "./Mus_Analysis/Mus_RAxML",
    T = 16,
    N = 200,
    o = "Bellevallia_pycantha_ED1272",
    )
```

Analysis for this study were performed directly in the RAxML command line tool using a coustom script

#### Plot RAxML `clust85` trees together

In [74]:
## Plot all six clust85 RAxML trees together

## Load trees
tre15 = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust85_20210812/RAxML_bipartitions.pops_15.phy")
tre20 = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust85_20210812/RAxML_bipartitions.pops_20.phy")
tre25 = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust85_20210812/RAxML_bipartitions.pops_25.phy")
tre30 = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust85_20210812/RAxML_bipartitions.pops_30.phy")
tre35 = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust85_20210812/RAxML_bipartitions.pops_35.phy")
tre40 = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust85_20210812/RAxML_bipartitions.pops_40.phy")

tre15 = tre15.root(wildcard = "Brimeura")
tre20 = tre20.root(wildcard = "Brimeura")
tre25 = tre25.root(wildcard = "Brimeura")
tre30 = tre30.root(wildcard = "Brimeura")
tre35 = tre35.root(wildcard = "Brimeura")
tre40 = tre40.root(wildcard = "Brimeura")


## set dimensions of the canvas
canvas = toyplot.Canvas(width = 2000, height = 2000)

## dissect canvas into multiple cartesian areas (x1, x2, y1, y2)
ax0 = canvas.cartesian(bounds=('2%',  '30%', '5%',  '47.5%'))
ax1 = canvas.cartesian(bounds=('33%', '63%', '5%',  '47.5%'))
ax2 = canvas.cartesian(bounds=('66%', '96%', '5%',  '47.5%'))
ax3 = canvas.cartesian(bounds=('2%',  '30%', '50%', '97.5%'))
ax4 = canvas.cartesian(bounds=('33%', '63%', '50%', '97.5%'))
ax5 = canvas.cartesian(bounds=('66%', '96%', '50%', '97.5%'))

# call draw with the 'axes' argument to pass it to a specific cartesian area
style = {
    "tip_labels_align": True,
    "tip_labels_style": {"font-size": "11px"},
    "node_labels_style":{"font-size": "12px",
                        "baseline-shift": "7px",
                        "-toyplot-anchor-shift": "-13px"},
}
tre15.ladderize(1).draw(
    axes = ax0,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tre20.ladderize(1).draw(
    axes = ax1,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tre25.ladderize(1).draw(
    axes = ax2,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tre30.ladderize(1).draw(
    axes = ax3,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tre35.ladderize(1).draw(
    axes = ax4,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tre40.ladderize(1).draw(
    axes = ax5,
    **style,
    node_sizes = 0,
    node_labels = 'support');

## hide the axes (e.g, ticks and splines)
ax0.show = False; ax1.show = False; ax2.show = False;
ax3.show = False; ax4.show = False; ax5.show = False;

## add names for the single trees
canvas.text(1000, 50, '<b><i>Muscari</b></i> — RAxML — Clustering threshold 85 %', style = {"font-size": "24px"})
canvas.text(150, 125, '85 % missing data', style={"font-size": "18px"})
canvas.text(800, 125, '80 % missing data', style={"font-size": "18px"})
canvas.text(1450, 125, '75 % missing data', style={"font-size": "18px"})
canvas.text(150, 1025, '70 % missing data', style={"font-size": "18px"})
canvas.text(800, 1025, '65 % missing data', style={"font-size": "18px"})
canvas.text(1450, 1025, '60 % missing data', style={"font-size": "18px"});

In [75]:
import toyplot.pdf
toyplot.pdf.render(canvas, "/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/RAxML_Figures/Suppl-Fig_Mus_RAxML_clust85_20210812_15-20-25-30-35-40_Anno.pdf");

#### Plot RAxML `clust90` trees together

In [76]:
## Plot all six clust90 RAxML trees together

## Load trees
tre15 = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust90_20210816/RAxML_bipartitions.pops15_clust90.phy")
tre20 = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust90_20210816/RAxML_bipartitions.pops20_clust90.phy")
tre25 = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust90_20210816/RAxML_bipartitions.pops25_clust90.phy")
tre30 = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust90_20210816/RAxML_bipartitions.pops30_clust90.phy")
tre35 = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust90_20210816/RAxML_bipartitions.pops35_clust90.phy")
tre40 = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust90_20210816/RAxML_bipartitions.pops40_clust90.phy")

tre15 = tre15.root(wildcard = "Brimeura")
tre20 = tre20.root(wildcard = "Brimeura")
tre25 = tre25.root(wildcard = "Brimeura")
tre30 = tre30.root(wildcard = "Brimeura")
tre35 = tre35.root(wildcard = "Brimeura")
tre40 = tre40.root(wildcard = "Brimeura")


## set dimensions of the canvas
canvas = toyplot.Canvas(width = 2000, height = 2000)

## dissect canvas into multiple cartesian areas (x1, x2, y1, y2)
ax0 = canvas.cartesian(bounds=('2%',  '30%', '5%',  '47.5%'))
ax1 = canvas.cartesian(bounds=('33%', '63%', '5%',  '47.5%'))
ax2 = canvas.cartesian(bounds=('66%', '96%', '5%',  '47.5%'))
ax3 = canvas.cartesian(bounds=('2%',  '30%', '50%', '97.5%'))
ax4 = canvas.cartesian(bounds=('33%', '63%', '50%', '97.5%'))
ax5 = canvas.cartesian(bounds=('66%', '96%', '50%', '97.5%'))

# call draw with the 'axes' argument to pass it to a specific cartesian area
style = {
    "tip_labels_align": True,
    "tip_labels_style": {"font-size": "11px"},
    "node_labels_style":{"font-size": "12px",
                        "baseline-shift": "7px",
                        "-toyplot-anchor-shift": "-13px"},
}
tre15.ladderize(1).draw(
    axes = ax0,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tre20.ladderize(1).draw(
    axes = ax1,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tre25.ladderize(1).draw(
    axes = ax2,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tre30.ladderize(1).draw(
    axes = ax3,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tre35.ladderize(1).draw(
    axes = ax4,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tre40.ladderize(1).draw(
    axes = ax5,
    **style,
    node_sizes = 0,
    node_labels = 'support');

# hide the axes (e.g, ticks and splines)
ax0.show = False; ax1.show = False; ax2.show = False;
ax3.show = False; ax4.show = False; ax5.show = False;

## add names for the single trees
canvas.text(1000, 50, 'RAxML — Clustering threshold 90 %', style = {"font-size": "24px"})
canvas.text(150, 125, '85 % missing data', style={"font-size": "18px"})
canvas.text(800, 125, '80 % missing data', style={"font-size": "18px"})
canvas.text(1450, 125, '75 % missing data', style={"font-size": "18px"})
canvas.text(150, 1025, '70 % missing data', style={"font-size": "18px"})
canvas.text(800, 1025, '65 % missing data', style={"font-size": "18px"})
canvas.text(1450, 1025, '60 % missing data', style={"font-size": "18px"});

In [77]:
import toyplot.pdf
toyplot.pdf.render(canvas, "/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/RAxML_Figures/Suppl-Fig_Mus_RAxML_clust90_20210816_15-20-25-30-35-40_Anno.pdf");

#### Plot RAxML `clust95` trees together

In [79]:
# Plot all six clust90 RAxML trees together

## Load trees
tre15 = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust95_20210823/RAxML_bipartitions.pops15_clust95.phy")
tre20 = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust95_20210823/RAxML_bipartitions.pops20_clust95.phy")
tre25 = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust95_20210823/RAxML_bipartitions.pops25_clust95.phy")
tre30 = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust95_20210823/RAxML_bipartitions.pops30_clust95.phy")
tre35 = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust95_20210823/RAxML_bipartitions.pops35_clust95.phy")
tre40 = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust95_20210823/RAxML_bipartitions.pops40_clust95.phy")

tre15 = tre15.root(wildcard = "Brimeura")
tre20 = tre20.root(wildcard = "Brimeura")
tre25 = tre25.root(wildcard = "Brimeura")
tre30 = tre30.root(wildcard = "Brimeura")
tre35 = tre35.root(wildcard = "Brimeura")
tre40 = tre40.root(wildcard = "Brimeura")


## set dimensions of the canvas
canvas = toyplot.Canvas(width = 2000, height = 2000)

## dissect canvas into multiple cartesian areas (x1, x2, y1, y2)
ax0 = canvas.cartesian(bounds=('2%',  '30%', '5%',  '47.5%'))
ax1 = canvas.cartesian(bounds=('33%', '63%', '5%',  '47.5%'))
ax2 = canvas.cartesian(bounds=('66%', '96%', '5%',  '47.5%'))
ax3 = canvas.cartesian(bounds=('2%',  '30%', '50%', '97.5%'))
ax4 = canvas.cartesian(bounds=('33%', '63%', '50%', '97.5%'))
ax5 = canvas.cartesian(bounds=('66%', '96%', '50%', '97.5%'))

# call draw with the 'axes' argument to pass it to a specific cartesian area
style = {
    "tip_labels_align": True,
    "tip_labels_style": {"font-size": "11px"},
    "node_labels_style":{"font-size": "12px",
                        "baseline-shift": "7px",
                        "-toyplot-anchor-shift": "-13px"},
}
tre15.ladderize(1).draw(
    axes = ax0,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tre20.ladderize(1).draw(
    axes = ax1,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tre25.ladderize(1).draw(
    axes = ax2,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tre30.ladderize(1).draw(
    axes = ax3,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tre35.ladderize(1).draw(
    axes = ax4,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tre40.ladderize(1).draw(
    axes = ax5,
    **style,
    node_sizes = 0,
    node_labels = 'support');

# hide the axes (e.g, ticks and splines)
ax0.show = False; ax1.show = False; ax2.show = False;
ax3.show = False; ax4.show = False; ax5.show = False;

## add names for the single trees
canvas.text(1000, 50, 'RAxML — Clustering threshold 95 %', style = {"font-size": "24px"})
canvas.text(150, 125, '85 % missing data', style={"font-size": "18px"})
canvas.text(800, 125, '80 % missing data', style={"font-size": "18px"})
canvas.text(1450, 125, '75 % missing data', style={"font-size": "18px"})
canvas.text(150, 1025, '70 % missing data', style={"font-size": "18px"})
canvas.text(800, 1025, '65 % missing data', style={"font-size": "18px"})
canvas.text(1450, 1025, '60 % missing data', style={"font-size": "18px"});

In [80]:
import toyplot.pdf
toyplot.pdf.render(canvas, "/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/RAxML_Figures/Suppl-Fig_Mus_RAxML_clust95_20210823_15-20-25-30-35-40_Anno.pdf");

### tetRAD
##### run a single tetRAD analysis

In [36]:
# the path to your sequence data in HDF5 format
data = "/home/tim/GBS/Muscari/Mus_Assembly/pops_15_outfiles/pops_15.snps.hdf5"

In [12]:
# init analysis object with input data and (optional) parameter options
tet = ipa.tetrad(
    name = "Mus_pops_15",
    data = data,
    workdir = "./Mus_Analysis/Mus_tetRAD",
    nquartets = 1e6,
    nboots = 200,
)

loading snps array [44 taxa x 114197 snps]
max unlinked SNPs per quartet [nloci]: 14705
quartet sampler [full]: 135751 / 135751


In [13]:
tet.run(auto = True, force = True)

Parallel connection | Cryptantha: 64 cores
initializing quartet sets database
[####################] 100% 0:00:07 | full tree * | avg SNPs/qrt: 1014 
[####################] 100% 0:00:04 | boot rep. 1 | avg SNPs/qrt: 1017 
Keyboard Interrupt by user



#### run multiple retRAD analysis in a for loop

##### Run tetRAD with clustering theshold `clust85` & Plot trees together

In [None]:
## read the *.snps.hdf5 files as values and store those path in a dictionary with assembly names as keys
dict = {
    "pop15": "/home/tim/GBS/Muscari/Mus_Assembly/pops15_clust85_outfiles/pops15_clust85.snps.hdf5",
    "pop20": "/home/tim/GBS/Muscari/Mus_Assembly/pops20_clust85_outfiles/pops20_clust85.snps.hdf5",
    "pop25": "/home/tim/GBS/Muscari/Mus_Assembly/pops25_clust85_outfiles/pops25_clust85.snps.hdf5",
    "pop30": "/home/tim/GBS/Muscari/Mus_Assembly/pops30_clust85_outfiles/pops30_clust85.snps.hdf5",
    "pop35": "/home/tim/GBS/Muscari/Mus_Assembly/pops35_clust85_outfiles/pops35_clust85.snps.hdf5",
    "pop40": "/home/tim/GBS/Muscari/Mus_Assembly/pops40_clust85_outfiles/pops40_clust85.snps.hdf5"
}

In [None]:
## Iterate through the dictionary and run a tetRAD anlysis for each assembly

for key, value in dict.items():
    tet = ipa.tetrad(
        name = "Mus_tet_clust85_" + str(key),
        data = value,
        workdir = "./Mus_Analysis/Mus_tetRAD/tet_clust85",
        nquartets = 1e6,
        nboots = 200)
    ## run 
    tet.run(auto = True, force = True)

In [None]:
## Plot all six clust85 tetRAD coalescent trees together
## Load trees
tet15 = toytree.tree("./Mus_Analysis/Mus_tetRAD/tet_clust85/Mus_tet_pop15.tree.cons").root(wildcard = "Brimeura")
tet20 = toytree.tree("./Mus_Analysis/Mus_tetRAD/tet_clust85/Mus_tet_pop20.tree.cons").root(wildcard = "Brimeura")
tet25 = toytree.tree("./Mus_Analysis/Mus_tetRAD/tet_clust85/Mus_tet_pop25.tree.cons").root(wildcard = "Brimeura")
tet30 = toytree.tree("./Mus_Analysis/Mus_tetRAD/tet_clust85/Mus_tet_pop30.tree.cons").root(wildcard = "Brimeura")
tet35 = toytree.tree("./Mus_Analysis/Mus_tetRAD/tet_clust85/Mus_tet_pop35.tree.cons").root(wildcard = "Brimeura")
tet40 = toytree.tree("./Mus_Analysis/Mus_tetRAD/tet_clust85/Mus_tet_pop40.tree.cons").root(wildcard = "Brimeura")

## set dimensions of the canvas
canvas = toyplot.Canvas(width = 2000, height = 2000)

## dissect canvas into multiple cartesian areas (x1, x2, y1, y2)
ax0 = canvas.cartesian(bounds=('2%',  '30%', '5%',  '47.5%'))
ax1 = canvas.cartesian(bounds=('33%', '63%', '5%',  '47.5%'))
ax2 = canvas.cartesian(bounds=('66%', '96%', '5%',  '47.5%'))
ax3 = canvas.cartesian(bounds=('2%',  '30%', '50%', '97.5%'))
ax4 = canvas.cartesian(bounds=('33%', '63%', '50%', '97.5%'))
ax5 = canvas.cartesian(bounds=('66%', '96%', '50%', '97.5%'))

## define style ones and use it for every tree
style = {
    "tip_labels_align": True,
    "tip_labels_style": {"font-size": "11px"},
    "node_labels_style":{"font-size": "12px",
                        "baseline-shift": "7px",
                        "-toyplot-anchor-shift": "-13px"},
}
tet15.ladderize(1).draw(
    axes = ax0,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tet20.ladderize(1).draw(
    axes = ax1,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tet25.ladderize(1).draw(
    axes = ax2,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tet30.ladderize(1).draw(
    axes = ax3,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tet35.ladderize(1).draw(
    axes = ax4,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tet40.ladderize(1).draw(
    axes = ax5,
    **style,
    node_sizes = 0,
    node_labels = 'support');

## hide the axes (e.g, ticks and splines)
ax0.show = False; ax1.show = False; ax2.show = False;
ax3.show = False; ax4.show = False; ax5.show = False;

## add names for the single trees
canvas.text(1000, 50, 'tetRAD/SVDQuartet — Clustering threshold 85 %', style = {"font-size": "24px"})
canvas.text(150, 125, '85 % missing data', style={"font-size": "18px"})
canvas.text(800, 125, '80 % missing data', style={"font-size": "18px"})
canvas.text(1450, 125, '75 % missing data', style={"font-size": "18px"})
canvas.text(150, 1025, '70 % missing data', style={"font-size": "18px"})
canvas.text(800, 1025, '65 % missing data', style={"font-size": "18px"})
canvas.text(1450, 1025, '60 % missing data', style={"font-size": "18px"});

In [None]:
import toyplot.pdf
toyplot.pdf.render(canvas, "/home/tim/GBS/Muscari/Mus_Analysis/Mus_tetRAD/tetRAD_Figures/Suppl-Fig_Mus_tetRAD-consens_clust85_20210811_15-20-25-30-35-40_Anno.pdf");

##### Run tetRAD with clustering theshold `clust90` & Plot trees together

In [None]:
## read the *.snps.hdf5 files as values and store those path in a dictionary with assembly names as keys
dict = {
    "pop15": "/home/tim/GBS/Muscari/Mus_Assembly/pops15_clust90_outfiles/pops15_clust90.snps.hdf5",
    "pop20": "/home/tim/GBS/Muscari/Mus_Assembly/pops20_clust90_outfiles/pops20_clust90.snps.hdf5",
    "pop25": "/home/tim/GBS/Muscari/Mus_Assembly/pops25_clust90_outfiles/pops25_clust90.snps.hdf5",
    "pop30": "/home/tim/GBS/Muscari/Mus_Assembly/pops30_clust90_outfiles/pops30_clust90.snps.hdf5",
    "pop35": "/home/tim/GBS/Muscari/Mus_Assembly/pops35_clust90_outfiles/pops35_clust90.snps.hdf5",
    "pop40": "/home/tim/GBS/Muscari/Mus_Assembly/pops40_clust90_outfiles/pops40_clust90.snps.hdf5"
}

In [None]:
## Iterate through the dictionary and run a tetRAD anlysis for each assembly

for key, value in dict.items():
    tet = ipa.tetrad(
        name = "Mus_tet_clust90_" + str(key),
        data = value,
        workdir = "./Mus_Analysis/Mus_tetRAD/tet_clust90",
        nquartets = 1e6,
        nboots = 200)
    ## run 
    tet.run(auto = True, force = True)

In [None]:
## Plot all six clust90 tetRAD coalescent trees together
## Load trees
tet15 = toytree.tree("./Mus_Analysis/Mus_tetRAD/tet_clust90/Mus_tet_clust90_pop15.tree.cons").root(wildcard = "Brimeura")
tet20 = toytree.tree("./Mus_Analysis/Mus_tetRAD/tet_clust90/Mus_tet_clust90_pop20.tree.cons").root(wildcard = "Brimeura")
tet25 = toytree.tree("./Mus_Analysis/Mus_tetRAD/tet_clust90/Mus_tet_clust90_pop25.tree.cons").root(wildcard = "Brimeura")
tet30 = toytree.tree("./Mus_Analysis/Mus_tetRAD/tet_clust90/Mus_tet_clust90_pop30.tree.cons").root(wildcard = "Brimeura")
tet35 = toytree.tree("./Mus_Analysis/Mus_tetRAD/tet_clust90/Mus_tet_clust90_pop35.tree.cons").root(wildcard = "Brimeura")
tet40 = toytree.tree("./Mus_Analysis/Mus_tetRAD/tet_clust90/Mus_tet_clust90_pop40.tree.cons").root(wildcard = "Brimeura")

## set dimensions of the canvas
canvas = toyplot.Canvas(width = 2000, height = 2000)

## dissect canvas into multiple cartesian areas (x1, x2, y1, y2)
ax0 = canvas.cartesian(bounds=('2%',  '30%', '5%',  '47.5%'))
ax1 = canvas.cartesian(bounds=('33%', '63%', '5%',  '47.5%'))
ax2 = canvas.cartesian(bounds=('66%', '96%', '5%',  '47.5%'))
ax3 = canvas.cartesian(bounds=('2%',  '30%', '50%', '97.5%'))
ax4 = canvas.cartesian(bounds=('33%', '63%', '50%', '97.5%'))
ax5 = canvas.cartesian(bounds=('66%', '96%', '50%', '97.5%'))

## define style ones and use it for every tree
style = {
    "tip_labels_align": True,
    "tip_labels_style": {"font-size": "11px"},
    "node_labels_style":{"font-size": "12px",
                        "baseline-shift": "7px",
                        "-toyplot-anchor-shift": "-13px"},
}
tet15.ladderize(1).draw(
    axes = ax0,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tet20.ladderize(1).draw(
    axes = ax1,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tet25.ladderize(1).draw(
    axes = ax2,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tet30.ladderize(1).draw(
    axes = ax3,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tet35.ladderize(1).draw(
    axes = ax4,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tet40.ladderize(1).draw(
    axes = ax5,
    **style,
    node_sizes = 0,
    node_labels = 'support');

## hide the axes (e.g, ticks and splines)
ax0.show = False; ax1.show = False; ax2.show = False;
ax3.show = False; ax4.show = False; ax5.show = False;

## add names for the single trees
canvas.text(1000, 50, 'tetRAD/SVDQuartet — Clustering threshold 90 %', style = {"font-size": "24px"})
canvas.text(150, 125, '85 % missing data', style={"font-size": "18px"})
canvas.text(800, 125, '80 % missing data', style={"font-size": "18px"})
canvas.text(1450, 125, '75 % missing data', style={"font-size": "18px"})
canvas.text(150, 1025, '70 % missing data', style={"font-size": "18px"})
canvas.text(800, 1025, '65 % missing data', style={"font-size": "18px"})
canvas.text(1450, 1025, '60 % missing data', style={"font-size": "18px"});

In [None]:
import toyplot.pdf
toyplot.pdf.render(canvas, "/home/tim/GBS/Muscari/Mus_Analysis/Mus_tetRAD/tetRAD_Figures/Suppl-Fig_Mus_tetRAD-consens_clust90_20210816_15-20-25-30-35-40_Anno.pdf");

##### Run tetRAD with clustering theshold `clust95` & Plot trees together

In [4]:
## read the *.snps.hdf5 files as values and store those path in a dictionary with assembly names as keys
dict = {
    "pop15": "/home/tim/GBS/Muscari/Mus_Assembly/pops15_clust95_outfiles/pops15_clust95.snps.hdf5",
    "pop20": "/home/tim/GBS/Muscari/Mus_Assembly/pops20_clust95_outfiles/pops20_clust95.snps.hdf5",
    "pop25": "/home/tim/GBS/Muscari/Mus_Assembly/pops25_clust95_outfiles/pops25_clust95.snps.hdf5",
    "pop30": "/home/tim/GBS/Muscari/Mus_Assembly/pops30_clust95_outfiles/pops30_clust95.snps.hdf5",
    "pop35": "/home/tim/GBS/Muscari/Mus_Assembly/pops35_clust95_outfiles/pops35_clust95.snps.hdf5",
    "pop40": "/home/tim/GBS/Muscari/Mus_Assembly/pops40_clust95_outfiles/pops40_clust95.snps.hdf5"
}

In [5]:
## Iterate through the dictionary and run a tetRAD anlysis for each assembly

for key, value in dict.items():
    tet = ipa.tetrad(
        name = "Mus_tet_clust95_" + str(key),
        data = value,
        workdir = "./Mus_Analysis/Mus_tetRAD/tet_clust95",
        nquartets = 1e6,
        nboots = 200)
    ## run 
    tet.run(auto = True, force = True)

loading snps array [43 taxa x 131859 snps]
max unlinked SNPs per quartet [nloci]: 21083
quartet sampler [full]: 123410 / 123410
Parallel connection | Cryptantha: 64 cores
initializing quartet sets database
[####################] 100% 0:00:09 | full tree * | avg SNPs/qrt: 1051 
[####################] 100% 0:00:05 | boot rep. 1 | avg SNPs/qrt: 1069 
[####################] 100% 0:00:04 | boot rep. 2 | avg SNPs/qrt: 1091 
[####################] 100% 0:00:05 | boot rep. 3 | avg SNPs/qrt: 1061 
[####################] 100% 0:00:04 | boot rep. 4 | avg SNPs/qrt: 1026 
[####################] 100% 0:00:04 | boot rep. 5 | avg SNPs/qrt: 1074 
[####################] 100% 0:00:05 | boot rep. 6 | avg SNPs/qrt: 1073 
[####################] 100% 0:00:05 | boot rep. 7 | avg SNPs/qrt: 1045 
[####################] 100% 0:00:05 | boot rep. 8 | avg SNPs/qrt: 1020 
[####################] 100% 0:00:04 | boot rep. 9 | avg SNPs/qrt: 1056 
[####################] 100% 0:00:05 | boot rep. 10 | avg SNPs/qrt: 1089 
[

[####################] 100% 0:00:02 | boot rep. 16 | avg SNPs/qrt: 1056 
[####################] 100% 0:00:03 | boot rep. 17 | avg SNPs/qrt: 1056 
[####################] 100% 0:00:02 | boot rep. 18 | avg SNPs/qrt: 1059 
[####################] 100% 0:00:02 | boot rep. 19 | avg SNPs/qrt: 1037 
[####################] 100% 0:00:02 | boot rep. 20 | avg SNPs/qrt: 1052 
[####################] 100% 0:00:02 | boot rep. 21 | avg SNPs/qrt: 1063 
[####################] 100% 0:00:02 | boot rep. 22 | avg SNPs/qrt: 1073 
[####################] 100% 0:00:02 | boot rep. 23 | avg SNPs/qrt: 1064 
[####################] 100% 0:00:02 | boot rep. 24 | avg SNPs/qrt: 1033 
[####################] 100% 0:00:02 | boot rep. 25 | avg SNPs/qrt: 1061 
[####################] 100% 0:00:03 | boot rep. 26 | avg SNPs/qrt: 1033 
[####################] 100% 0:00:02 | boot rep. 27 | avg SNPs/qrt: 1029 
[####################] 100% 0:00:02 | boot rep. 28 | avg SNPs/qrt: 1025 
[####################] 100% 0:00:02 | boot rep. 29 

[####################] 100% 0:00:02 | boot rep. 34 | avg SNPs/qrt: 1019 
[####################] 100% 0:00:02 | boot rep. 35 | avg SNPs/qrt: 1061 
[####################] 100% 0:00:02 | boot rep. 36 | avg SNPs/qrt: 988 
[####################] 100% 0:00:02 | boot rep. 37 | avg SNPs/qrt: 1035 
[####################] 100% 0:00:02 | boot rep. 38 | avg SNPs/qrt: 1092 
[####################] 100% 0:00:02 | boot rep. 39 | avg SNPs/qrt: 1075 
[####################] 100% 0:00:02 | boot rep. 40 | avg SNPs/qrt: 1078 
[####################] 100% 0:00:02 | boot rep. 41 | avg SNPs/qrt: 1010 
[####################] 100% 0:00:02 | boot rep. 42 | avg SNPs/qrt: 1029 
[####################] 100% 0:00:02 | boot rep. 43 | avg SNPs/qrt: 1020 
[####################] 100% 0:00:02 | boot rep. 44 | avg SNPs/qrt: 1026 
[####################] 100% 0:00:02 | boot rep. 45 | avg SNPs/qrt: 1076 
[####################] 100% 0:00:02 | boot rep. 46 | avg SNPs/qrt: 1002 
[####################] 100% 0:00:02 | boot rep. 47 |

[####################] 100% 0:00:02 | boot rep. 53 | avg SNPs/qrt: 1012 
[####################] 100% 0:00:02 | boot rep. 54 | avg SNPs/qrt: 1040 
[####################] 100% 0:00:02 | boot rep. 55 | avg SNPs/qrt: 996 
[####################] 100% 0:00:02 | boot rep. 56 | avg SNPs/qrt: 1049 
[####################] 100% 0:00:02 | boot rep. 57 | avg SNPs/qrt: 1044 
[####################] 100% 0:00:02 | boot rep. 58 | avg SNPs/qrt: 1030 
[####################] 100% 0:00:02 | boot rep. 59 | avg SNPs/qrt: 1045 
[####################] 100% 0:00:02 | boot rep. 60 | avg SNPs/qrt: 1065 
[####################] 100% 0:00:02 | boot rep. 61 | avg SNPs/qrt: 1007 
[####################] 100% 0:00:02 | boot rep. 62 | avg SNPs/qrt: 959 
[####################] 100% 0:00:02 | boot rep. 63 | avg SNPs/qrt: 1002 
[####################] 100% 0:00:02 | boot rep. 64 | avg SNPs/qrt: 1069 
[####################] 100% 0:00:02 | boot rep. 65 | avg SNPs/qrt: 977 
[####################] 100% 0:00:02 | boot rep. 66 | a

[####################] 100% 0:00:01 | boot rep. 72 | avg SNPs/qrt: 1021 
[####################] 100% 0:00:02 | boot rep. 73 | avg SNPs/qrt: 1020 
[####################] 100% 0:00:02 | boot rep. 74 | avg SNPs/qrt: 1015 
[####################] 100% 0:00:02 | boot rep. 75 | avg SNPs/qrt: 991 
[####################] 100% 0:00:02 | boot rep. 76 | avg SNPs/qrt: 979 
[####################] 100% 0:00:02 | boot rep. 77 | avg SNPs/qrt: 1013 
[####################] 100% 0:00:02 | boot rep. 78 | avg SNPs/qrt: 1010 
[####################] 100% 0:00:02 | boot rep. 79 | avg SNPs/qrt: 1014 
[####################] 100% 0:00:02 | boot rep. 80 | avg SNPs/qrt: 1014 
[####################] 100% 0:00:02 | boot rep. 81 | avg SNPs/qrt: 980 
[####################] 100% 0:00:02 | boot rep. 82 | avg SNPs/qrt: 1000 
[####################] 100% 0:00:02 | boot rep. 83 | avg SNPs/qrt: 985 
[####################] 100% 0:00:02 | boot rep. 84 | avg SNPs/qrt: 984 
[####################] 100% 0:00:02 | boot rep. 85 | avg

[####################] 100% 0:00:01 | boot rep. 92 | avg SNPs/qrt: 1000 
[####################] 100% 0:00:02 | boot rep. 93 | avg SNPs/qrt: 975 
[####################] 100% 0:00:02 | boot rep. 94 | avg SNPs/qrt: 999 
[####################] 100% 0:00:02 | boot rep. 95 | avg SNPs/qrt: 970 
[####################] 100% 0:00:02 | boot rep. 96 | avg SNPs/qrt: 962 
[####################] 100% 0:00:02 | boot rep. 97 | avg SNPs/qrt: 947 
[####################] 100% 0:00:02 | boot rep. 98 | avg SNPs/qrt: 950 
[####################] 100% 0:00:02 | boot rep. 99 | avg SNPs/qrt: 955 
[####################] 100% 0:00:02 | boot rep. 100 | avg SNPs/qrt: 944 
[####################] 100% 0:00:02 | boot rep. 101 | avg SNPs/qrt: 972 
[####################] 100% 0:00:02 | boot rep. 102 | avg SNPs/qrt: 938 
[####################] 100% 0:00:02 | boot rep. 103 | avg SNPs/qrt: 983 
[####################] 100% 0:00:02 | boot rep. 104 | avg SNPs/qrt: 973 
[####################] 100% 0:00:02 | boot rep. 105 | avg 

In [6]:
## Plot all six clust95 tetRAD coalescent trees together
## Load trees
tet15 = toytree.tree("./Mus_Analysis/Mus_tetRAD/tet_clust95/Mus_tet_clust95_pop15.tree.cons").root(wildcard = "Brimeura")
tet20 = toytree.tree("./Mus_Analysis/Mus_tetRAD/tet_clust95/Mus_tet_clust95_pop20.tree.cons").root(wildcard = "Brimeura")
tet25 = toytree.tree("./Mus_Analysis/Mus_tetRAD/tet_clust95/Mus_tet_clust95_pop25.tree.cons").root(wildcard = "Brimeura")
tet30 = toytree.tree("./Mus_Analysis/Mus_tetRAD/tet_clust95/Mus_tet_clust95_pop30.tree.cons").root(wildcard = "Brimeura")
tet35 = toytree.tree("./Mus_Analysis/Mus_tetRAD/tet_clust95/Mus_tet_clust95_pop35.tree.cons").root(wildcard = "Brimeura")
tet40 = toytree.tree("./Mus_Analysis/Mus_tetRAD/tet_clust95/Mus_tet_clust95_pop40.tree.cons").root(wildcard = "Brimeura")

## set dimensions of the canvas
canvas = toyplot.Canvas(width = 2000, height = 2000)

## dissect canvas into multiple cartesian areas (x1, x2, y1, y2)
ax0 = canvas.cartesian(bounds=('2%',  '30%', '5%',  '47.5%'))
ax1 = canvas.cartesian(bounds=('33%', '63%', '5%',  '47.5%'))
ax2 = canvas.cartesian(bounds=('66%', '96%', '5%',  '47.5%'))
ax3 = canvas.cartesian(bounds=('2%',  '30%', '50%', '97.5%'))
ax4 = canvas.cartesian(bounds=('33%', '63%', '50%', '97.5%'))
ax5 = canvas.cartesian(bounds=('66%', '96%', '50%', '97.5%'))

## define style ones and use it for every tree
style = {
    "tip_labels_align": True,
    "tip_labels_style": {"font-size": "11px"},
    "node_labels_style":{"font-size": "12px",
                        "baseline-shift": "7px",
                        "-toyplot-anchor-shift": "-13px"},
}
tet15.ladderize(1).draw(
    axes = ax0,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tet20.ladderize(1).draw(
    axes = ax1,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tet25.ladderize(1).draw(
    axes = ax2,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tet30.ladderize(1).draw(
    axes = ax3,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tet35.ladderize(1).draw(
    axes = ax4,
    **style,
    node_sizes = 0,
    node_labels = 'support');

tet40.ladderize(1).draw(
    axes = ax5,
    **style,
    node_sizes = 0,
    node_labels = 'support');

## hide the axes (e.g, ticks and splines)
ax0.show = False; ax1.show = False; ax2.show = False;
ax3.show = False; ax4.show = False; ax5.show = False;

## add names for the single trees
canvas.text(1000, 50, 'tetRAD/SVDQuartet — Clustering threshold 95 %', style = {"font-size": "24px"})
canvas.text(150, 125, '85 % missing data', style={"font-size": "18px"})
canvas.text(800, 125, '80 % missing data', style={"font-size": "18px"})
canvas.text(1450, 125, '75 % missing data', style={"font-size": "18px"})
canvas.text(150, 1025, '70 % missing data', style={"font-size": "18px"})
canvas.text(800, 1025, '65 % missing data', style={"font-size": "18px"})
canvas.text(1450, 1025, '60 % missing data', style={"font-size": "18px"});

In [7]:
import toyplot.pdf
toyplot.pdf.render(canvas, "/home/tim/GBS/Muscari/Mus_Analysis/Mus_tetRAD/tetRAD_Figures/Suppl-Fig_Mus_tetRAD-consens_clust95_20210824_15-20-25-30-35-40_Anno.pdf");

#### Plot tetRAD trees
##### Plot coud tree with custom tip order

In [12]:
treeorder = ["Brimeura_amethystina_W6084", "Bellevalia_paradoxa_ED1272",
           "Bellevalia_dubia_W6083", "Bellevalia_speciosa_W6085",
           "Muscari_macrocarpum_ED1252", "Muscari_racemosum_ED1258",
           "Pseudomuscari_chalusicum_ED1255", "Pseudomuscari_azureum_ED1270",
           "Pseudomuscari_inconstrictum_ED3234", "Muscari_parviflorum_ED1245",
           "Muscari_commutatum_ED3538", "Muscari_sivrihisardaghlarensis_ED1278",
           "Muscari_vularlii_ED3232", "Muscari_anatolicum_W6087",
           "Muscari_discolor_ED1266", "Pseudomuscari_coeruleum_ED1261",
           "Pseudomuscari_pallens_ED1267", "Muscari_adilii_W6090",
           "Muscari_armeniacum_ED1244", "Muscari_armeniacum_W6089",
           "Muscari_neglectum_ED1253", "Muscari_baeticum_ED1281", 
           "Muscari_neglectum_ED1254", "Muscari_botryoides_ED1279",
           "Muscari_pulchellum_ED3231", "Muscari_kerkis_ED1280",
           "Muscari_bourgaei_ED1259", "Muscari_latifolium_ED1265",
           "Leopoldia_tenuiflora_ED1263", "Leopoldia_longipes_ED3233",
           "Muscari_massayanum_ED1251", "Leopoldia_neumannii_ED1607",
           "Leopoldia_neumannii_ED1243", "Muscari_mirum_ED1250",
           "Leopoldia_matritensis_ED1282", "Leopoldia_spreitzenhoferi_ED1248",
           "Leopoldia_cycladica_W6082", "Leopoldia_weissii_W6081",
           "Leopoldia_caucasica_ED1262", "Leopoldia_comosa_ED3539",
           "Leopoldia_comosa_ED3965", "Leopoldia_comosa_ED1274", "Leopoldia_comosa_ED1256"]

In [4]:
## Load the 200 bootstrap trees from pops30 TetRad analysisis and root it
tetcloud30 = toytree.mtree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_tetRAD/tet_clust90/Mus_tet_clust90_pop30.tree.boots")
tetcloud30.treelist = [i.root(["Brimeura_amethystina_W6084"]) for i in tetcloud30.treelist]

## plot the rooted bootstrap trees as a cloud tree
canvas, axes, mark = tetcloud30.draw_cloud_tree(
    height = 600,
    width = 400,
    
    ## define a fix tree order to make it comparable with the cons tree
    fixed_order = treeorder,
    use_edge_lengths = False,
    edge_style = {"stroke-opacity": 0.05,
                  "stroke-width": 1}
);


##### Plot consensus tree against cloud tree

In [18]:
## Load TetRad tree and consensus tree and root ith with Brimeura
constree30 = toytree.tree("./Mus_Analysis/Mus_tetRAD/tet_clust90/Mus_tet_clust90_pop30.tree.cons" ).root(wildcard = "Brimeura")

## Load TetRad bootstrap trees and root it with Brimeura
tetcloud30 = toytree.mtree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_tetRAD/tet_clust90/Mus_tet_clust90_pop30.tree.boots")
tetcloud30.treelist = [i.root(["Brimeura_amethystina_W6084"]) for i in tetcloud30.treelist]

## set dimensions of the canvas
canvas = toyplot.Canvas(width = 1300, height = 900)

## dissect canvas into multiple cartesian areas (x1, x2, y1, y2)
ax0 = canvas.cartesian(bounds=('5%',  '47.5%', '5%',  '95%'))
ax1 = canvas.cartesian(bounds=('52.5%', '95%', '5%',  '95%'))

# call draw with the 'axes' argument to pass it to a specific cartesian area
style = {"tip_labels_align": True,
         "tip_labels_style": {"font-size": "12px"},
         "node_labels_style":{"font-size": "12px",
                              "baseline-shift": "7px",
                              "-toyplot-anchor-shift": "-13px"},
}

cstyle = {"tip_labels_align": True,
          "layout": 'l',
          "tip_labels_style": {"font-size": "12px"},
          "node_labels_style":{"font-size": "12px",
                               "baseline-shift": "7px",
                               "-toyplot-anchor-shift": "-13px"},
}

constree30.ladderize(1).draw(
    axes = ax0,
    **style,
    node_sizes = 0,
    node_labels = 'support');

## plot the rooted bootstrap trees as a cloud tree
tetcloud30.draw_cloud_tree(
    axes = ax1,
    fixed_order = treeorder,  ## define a fix tree order to make it comparable with the cons tree
    **cstyle,
    use_edge_lengths = False,
    #tip_labels = False,
    edge_style = {"stroke-opacity": 0.05,
                  "stroke-width": 1}
);

# hide the axes (e.g, ticks and splines)
ax0.show = False; ax1.show = False;

In [19]:
import toyplot.pdf
toyplot.pdf.render(canvas, "/home/tim/GBS/Muscari/Mus_Analysis/Mus_tetRAD/tetRAD_Figures/Fig_Mus_tet_clust90_cons-cloud_20210816_pops30.pdf");

##### plot RAxML tree against tetRAD consensus tree

In [44]:
## Load TetRad tree and consensus tree and root ith with Brimeura
#constree30 = toytree.tree("./Mus_Analysis/Mus_tetRAD/tet_clust90/Mus_tet_clust90_pop30.tree.cons" ).root(wildcard = "Brimeura")
constree30 = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_tetRAD/tet_clust85/Mus_tet_pop20.tree.cons").root(wildcard = "Brimeura")

tre = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust85_20210812/RAxML_bipartitions.pops_20.phy")
rtre = tre.root(wildcard = "Brimeura")

## Define the leucantha clade to be rotated in the tree
comosa = ["Leopoldia_cycladica_W6082", "Leopoldia_weissii_W6081", 
          "Leopoldia_spreitzenhoferi_ED1248", "Leopoldia_matritensis_ED1282", "Leopoldia_caucasica_ED1262"]

## set dimensions of the canvas
canvas = toyplot.Canvas(width = 1400, height = 900)

## dissect canvas into multiple cartesian areas (x1, x2, y1, y2)
ax0 = canvas.cartesian(bounds=('5%',  '60%', '5%',  '95%'))
ax1 = canvas.cartesian(bounds=('57.5%', '95%', '5%',  '95%'))

# call draw with the 'axes' argument to pass it to a specific cartesian area
style = {"tip_labels_align": True,
         "tip_labels_style": {"font-size": "12px"},
         "node_labels_style":{"font-size": "12px",
                              "baseline-shift": "7px",
                              "-toyplot-anchor-shift": "-13px"},
}

cstyle = {"tip_labels_align": True,
          "layout": 'l',
          "tip_labels_style": {"font-size": "12px"},
          "node_labels_style":{"font-size": "12px",
                               "baseline-shift": "7px",
                               "-toyplot-anchor-shift": "13px"},
}

#rotate_node(wildcard = "comosa").

rtre.ladderize(1).draw(
    axes = ax0,
    **style,
    node_labels = 'support',
    node_sizes = 0,
    );



constree30.ladderize(1).draw(
    axes = ax1,
    **cstyle,
    node_sizes = 0,
    node_labels = 'support');

#
# hide the axes (e.g, ticks and splines)
ax0.show = False; ax1.show = False;

In [45]:
import toyplot.pdf
toyplot.pdf.render(canvas, "/home/tim/GBS/Muscari/Mus_Analysis/FiguresForPaper/Fig_Mus_RAxML_tet_clust85_pops30.pdf");

##### plot all three tetRAD trees against each other

In [27]:
## Load TetRad tree and consensus tree and root ith with Brimeura
fulltree30 = toytree.tree("./Mus_Analysis/Mus_tetRAD/Mus_tet_pop30.tree"      ).root(wildcard = "Brimeura")
constree30 = toytree.tree("./Mus_Analysis/Mus_tetRAD/Mus_tet_pop30.tree.cons" ).root(wildcard = "Brimeura")

## Load TetRad bootstrap trees and root it with Brimeura
cloudtree30 = toytree.mtree("./Mus_Analysis/Mus_tetRAD/Mus_tet_pop30.tree.boots")
cloudtree30.treelist = [i.root(["Brimeura_amethystina_W6084"]) for i in cloudtree30.treelist]

## set dimensions of the canvas
canvas = toyplot.Canvas(width = 1800, height = 900)

## dissect canvas into multiple cartesian areas (x1, x2, y1, y2)
ax0 = canvas.cartesian(bounds=('2%',  '30%', '5%',  '97.5%'))
ax1 = canvas.cartesian(bounds=('33%', '61%', '5%',  '97.5%'))
ax2 = canvas.cartesian(bounds=('64%', '91%', '5%',  '97.5%'))

# call draw with the 'axes' argument to pass it to a specific cartesian area
style = {
    "tip_labels_align": True,
    "tip_labels_style": {"font-size": "12px"},
    "node_labels_style":{"font-size": "12px",
                        "baseline-shift": "7px",
                        "-toyplot-anchor-shift": "-13px"},
}
fulltree30.ladderize(1).draw(
    axes = ax0,
    **style,
    node_sizes = 0,
    node_labels = fulltree.get_node_values("support"));

constree30.ladderize(1).draw(
    axes = ax1,
    **style,
    node_sizes = 0,
    node_labels = constree.get_node_values("support"));

## plot the rooted bootstrap trees as a cloud tree
cloudtree30.draw_cloud_tree(
    axes = ax2,
    fixed_order = treeorder,  ## define a fix tree order to make it comparable with the cons tree
    **style,
    use_edge_lengths = False,
    edge_style = {"stroke-opacity": 0.05,
                  "stroke-width": 1}
);

# hide the axes (e.g, ticks and splines)
ax0.show = False; ax1.show = False; ax2.show = False;

## 2. Population analysis of Muscari with outgroups removed

In [20]:
## load the hdf5 data for the STRUCTURE analysis
dataclust90 = "/home/tim/GBS/Muscari/Mus_Assembly/nout_clust90_outfiles/nout_clust90.snps.hdf5"

In [104]:
# group individuals into populations
imap = {
    "Leop": ["Leopoldia_tenuiflora_ED1263", "Muscari_massayanum_ED1251", "Leopoldia_longipes_ED3233", 
             "Leopoldia_neumannii_ED1243", "Leopoldia_neumannii_ED1607", "Muscari_mirum_ED1250",
             "Leopoldia_caucasica_ED1262", "Leopoldia_matritensis_ED1282", "Leopoldia_comosa_ED3539",
             "Leopoldia_comosa_ED1274", "Leopoldia_comosa_ED3965", "Leopoldia_comosa_ED1256",
             "Leopoldia_weissii_W6081", "Leopoldia_cycladica_W6082", "Leopoldia_spreitzenhoferi_ED1248"],
    "Musc": ["Pseudomuscari_pallens_ED1267", "Pseudomuscari_coeruleum_ED1261", 
             "Muscari_sivrihisardaghlarensis_ED1278", "Muscari_anatolicum_W6087", "Muscari_vularlii_ED3232",
             "Muscari_discolor_ED1266", "Muscari_adilii_W6090", "Muscari_armeniacum_ED1244", 
             "Muscari_armeniacum_W6089", "Muscari_neglectum_ED1253", "Muscari_neglectum_ED1254",
             "Muscari_baeticum_ED1281", "Muscari_botryoides_ED1279", "Muscari_commutatum_ED3538"],
    "Pull": ["Muscari_pulchellum_ED3231", "Muscari_kerkis_ED1280", "Muscari_bourgaei_ED1259", "Muscari_latifolium_ED1265"],
    "Pseu": ["Pseudomuscari_chalusicum_ED1255", "Pseudomuscari_inconstrictum_ED3234",
             "Pseudomuscari_azureum_ED1270", "Muscari_parviflorum_ED1245"],
    "Mosc": ["Muscari_racemosum_ED1258", "Muscari_macrocarpum_ED1252"],
}

# require that 50% of samples have data in each group
minmap = {i: 0.4 for i in imap}

In [109]:
struct = ipa.structure(
    name = "Mus_STRUC_clust90",
    data = dataclust90,
    imap = imap,
    minmap = minmap,
    mincov = 0.5,
    workdir = "./Mus_Analysis/Mus_Structure/Mus_STRUC_clust90_20210817"
)

Samples: 39
Sites before filtering: 21154
Filtered (indels): 1410
Filtered (bi-allel): 2684
Filtered (mincov): 816
Filtered (minmap): 12106
Filtered (subsample invariant): 8
Filtered (minor allele frequency): 0
Filtered (combined): 13610
Sites after filtering: 7549
Sites containing missing values: 7009 (92.85%)
Missing values in SNP matrix: 44127 (14.99%)
SNPs (total): 7549
SNPs (unlinked): 1117


#### Run STRUCTURE and plot results
The `burnin` and `numreps` parameters determine the length of the run.

In [110]:
struct.mainparams.burnin  = 100000
struct.mainparams.numreps = 500000

## see all mainparams
print(struct.mainparams)

#see or ser extraparams
print(struct.extraparams)

burnin             100000              
extracols          0                   
label              1                   
locdata            0                   
mapdistances       0                   
markernames        0                   
markovphase        0                   
missing            -9                  
notambiguous       -999                
numreps            500000              
onerowperind       0                   
phased             0                   
phaseinfo          0                   
phenotype          0                   
ploidy             2                   
popdata            0                   
popflag            0                   
recessivealleles   0                   

admburnin           500                 
alpha               1.0                 
alphamax            10.0                
alphapriora         1.0                 
alphapriorb         2.0                 
alphapropsd         0.025               
ancestdist          0            

In [111]:
## set a range of k-values to test
kvalues = [2, 3, 4, 5, 6, 7]

In [112]:
## submit batches of 10 replicates jobs for each value of k
for kpop in kvalues:
    struct.run(kpop = kpop, nreps = 10, seed = 12345, ipyclient = ipyclient)#, force = True)

[####################] 100% 0:26:34 | running 10 structure jobs 
[####################] 100% 0:31:39 | running 10 structure jobs 
[####################] 100% 0:35:56 | running 10 structure jobs 
[####################] 100% 0:40:20 | running 10 structure jobs 
[####################] 100% 0:45:10 | running 10 structure jobs 
[####################] 100% 0:49:26 | running 10 structure jobs 


#### Analyze results: check results in evanno table

In [113]:
etable = struct.get_evanno_table(kvalues)
etable

Unnamed: 0,Nreps,lnPK,lnPPK,deltaK,estLnProbMean,estLnProbStdev
2,10,0.0,0.0,0.0,-12555.27,640.698
3,10,869.95,515.97,0.656,-11685.32,785.985
4,10,353.98,293.27,0.384,-11331.34,763.608
5,10,60.71,165.13,0.872,-11270.63,189.334
6,10,225.84,165.05,0.113,-11044.79,1466.782
7,10,60.79,0.0,0.0,-10984.0,1942.331


In [116]:
etable100 = struct.get_evanno_table(kvalues, max_var_multiple=100, quiet=True)
etable100

Unnamed: 0,Nreps,lnPK,lnPPK,deltaK,estLnProbMean,estLnProbStdev
2,10,0.0,0.0,0.0,-12555.27,640.698
3,10,869.95,515.97,0.656,-11685.32,785.985
4,10,353.98,293.27,0.384,-11331.34,763.608
5,10,60.71,165.13,0.872,-11270.63,189.334
6,10,225.84,165.05,0.113,-11044.79,1466.782
7,10,60.79,0.0,0.0,-10984.0,1942.331


#### Get permuted reps with CLUMPP

Calculate a permuted table of results across replicate runs for each value of K while excluding reps based on the max_var_multiple parameter

In [122]:
## summarize results
struct.clumppparams.m = 3                ## use largegreedy algorithm
struct.clumppparams.greedy_option = 2    ## test nrepeat possible orders
struct.clumppparams.repeats = 100000     ## number of repeats

In [123]:
qtable = struct.get_clumpp_table(kvalues)#, max_var_multiple=100.)

[K2] 10/10 results permuted across replicates (max_var=0).
[K3] 10/10 results permuted across replicates (max_var=0).
[K4] 10/10 results permuted across replicates (max_var=0).
[K5] 10/10 results permuted across replicates (max_var=0).
[K6] 10/10 results permuted across replicates (max_var=0).
[K7] 10/10 results permuted across replicates (max_var=0).


In [124]:
# get canvas object and set size
canvas = toyplot.Canvas(width=400, height=300)

# plot the mean log probability of the models in red
axes = canvas.cartesian(ylabel="estLnProbMean")
axes.plot(etable.estLnProbMean * -1, color="darkred", marker="o")
axes.y.spine.style = {"stroke": "darkred"}

# plot delta K with its own scale bar of left side and in blue
axes = axes.share("x", ylabel="deltaK", ymax=etable.deltaK.max() + etable.deltaK.max() * .25)
axes.plot(etable.deltaK, color="steelblue", marker="o");
axes.y.spine.style = {"stroke": "steelblue"}

# set x labels
axes.x.ticks.locator = toyplot.locator.Explicit(range(len(etable.index)), etable.index)
axes.x.label.text = "K (N ancestral populations)"

#### Analyze results: Barplots

In [145]:
k = 3
table = struct.get_clumpp_table(k)

[K3] 10/10 results permuted across replicates (max_var=0).


In [146]:
# sort list by columns
table.sort_values(by=list(range(k)), inplace=True)

# or, sort by a list of names (here taken from imap)
import itertools
onames = list(itertools.chain(*imap.values()))
table = table.loc[onames]

In [147]:
# build barplot
canvas = toyplot.Canvas(width=1000, height=500)
axes = canvas.cartesian(bounds=("10%", "90%", "10%", "45%"))
axes.bars(table)

# add labels to x-axis
ticklabels = [i for i in table.index.tolist()]
axes.x.ticks.locator = toyplot.locator.Explicit(labels=ticklabels)
axes.x.ticks.labels.angle = -60
axes.x.ticks.show = True
axes.x.ticks.labels.offset = 10
axes.x.ticks.labels.style = {"font-size": "12px"}

In [88]:
## Plot the resulting tree
## modified tree version with reduced branchlength for outgroup (Brimeura)
tre = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust85_20210812/RAxML_bipartitions.pops_20_STRUCT.phy")
rtre = tre.root(wildcard = "Brimeura")
          
# use canvas and axes function in order use export function
canvas, axes, mark = rtre.ladderize(1).draw(
    width = 1400,
    height = 900,
    #use_edge_length = False,
    tip_labels_align = True,
    tip_labels_style = {"font-size": "16px"},
    node_labels = 'support',
    node_sizes = 0,
    node_labels_style = {"font-size": "15px",
                         "baseline-shift": "7px",
                         "-toyplot-anchor-shift": "-13px"},
    );

In [119]:
myorder = [## subgen. Muscarimia
           "Muscari_racemosum_ED1258", "Muscari_macrocarpum_ED1252",
           ## subgen. Pseudomuscari
           "Pseudomuscari_chalusicum_ED1255", "Pseudomuscari_azureum_ED1270",
           "Pseudomuscari_inconstrictum_ED3234", "Muscari_parviflorum_ED1245",
           ## subgen. Muscari
           "Muscari_commutatum_ED3538", "Muscari_sivrihisardaghlarensis_ED1278",
           "Muscari_anatolicum_W6087", "Muscari_vularlii_ED3232",
           "Muscari_discolor_ED1266", "Pseudomuscari_pallens_ED1267",
           "Pseudomuscari_coeruleum_ED1261", "Muscari_armeniacum_ED1244",
           "Muscari_armeniacum_W6089", "Muscari_adilii_W6090",
           "Muscari_neglectum_ED1253", "Muscari_baeticum_ED1281",
           "Muscari_neglectum_ED1254", "Muscari_botryoides_ED1279",
           ## subgen. Pulchellum
           "Muscari_pulchellum_ED3231", "Muscari_kerkis_ED1280",
           "Muscari_latifolium_ED1265", "Muscari_bourgaei_ED1259",
           ## subgen. Leopoldia
           "Leopoldia_tenuiflora_ED1263", "Muscari_massayanum_ED1251",
           "Leopoldia_longipes_ED3233",  "Leopoldia_neumannii_ED1243", 
           "Leopoldia_neumannii_ED1607", "Muscari_mirum_ED1250",
           
           "Leopoldia_comosa_ED3539", "Leopoldia_comosa_ED1274",
           "Leopoldia_comosa_ED3965", "Leopoldia_comosa_ED1256",
           
           "Leopoldia_caucasica_ED1262", "Leopoldia_matritensis_ED1282",
           "Leopoldia_spreitzenhoferi_ED1248", "Leopoldia_weissii_W6081", "Leopoldia_cycladica_W6082"]
print("custom ordering")
print(qtable[2].loc[myorder])

custom ordering
                                           0      1
Muscari_racemosum_ED1258               0.110  0.890
Muscari_macrocarpum_ED1252             0.126  0.874
Pseudomuscari_chalusicum_ED1255        0.316  0.684
Pseudomuscari_azureum_ED1270           0.320  0.680
Pseudomuscari_inconstrictum_ED3234     0.309  0.691
Muscari_parviflorum_ED1245             0.316  0.684
Muscari_commutatum_ED3538              0.372  0.628
Muscari_sivrihisardaghlarensis_ED1278  0.811  0.189
Muscari_anatolicum_W6087               0.802  0.198
Muscari_vularlii_ED3232                0.810  0.190
Muscari_discolor_ED1266                0.813  0.187
Pseudomuscari_pallens_ED1267           0.817  0.183
Pseudomuscari_coeruleum_ED1261         0.816  0.184
Muscari_armeniacum_ED1244              0.818  0.182
Muscari_armeniacum_W6089               0.817  0.183
Muscari_adilii_W6090                   0.815  0.185
Muscari_neglectum_ED1253               0.822  0.178
Muscari_baeticum_ED1281                0.820  0.

#### Plot all STRUCTURE results against Phylogeny

In [120]:
etable

Unnamed: 0,Nreps,lnPK,lnPPK,deltaK,estLnProbMean,estLnProbStdev
2,10,0.0,0.0,0.0,-12555.27,640.698
3,10,869.95,515.97,0.656,-11685.32,785.985
4,10,353.98,293.27,0.384,-11331.34,763.608
5,10,60.71,165.13,0.872,-11270.63,189.334
6,10,225.84,165.05,0.113,-11044.79,1466.782
7,10,60.79,0.0,0.0,-10984.0,1942.331


In [91]:
## get tree from RAxML results
## modified tree version with reduced branchlength for outgroup (Brimeura)
tre = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust85_20210812/RAxML_bipartitions.pops_20_STRUCT.phy")
rtre = tre.root(wildcard = "Brimeura")

## further styling of plot with css 
style = {"stroke":toyplot.color.near_black, 
         "stroke-width": 0.25}

##    y1
## x1    x2
##    y2

## built & dissect canvas into multiple cartesian areas (x1, x2, y1, y2)
c = toyplot.Canvas(width = 900, height = 700)
a1 = c.cartesian(bounds=('1%', '46.5%', '5%', '95%'))       # The tree
a2 = c.cartesian(bounds=('50.5%', '59%', '5.25%', '86.25%'))  # K=2
a3 = c.cartesian(bounds=('59.5%', '68%', '5.25%', '86.25%'))  # K=3
a4 = c.cartesian(bounds=('68.5%', '77%', '5.25%', '86.25%'))  # K=4
a5 = c.cartesian(bounds=('77.5%', '86%', '5.25%', '86.25%'))  # K=5
a6 = c.cartesian(bounds=('86.5%', '95%', '5.25%', '86.25%'))  # K=6
a1.show = False
a2.show = False
a3.show = False
a4.show = False
a5.show = False
a6.show = False

## draw the tree
rtre.ladderize(1).draw(
    axes = a1,
    use_edge_lengths = True,
    tip_labels_align = True,
    tip_labels_style = {"font-size": "9px"},
    node_labels = "support",
    node_sizes = 0,
    node_labels_style={"font-size": "9px",
                       "baseline-shift": "7px",
                       "-toyplot-anchor-shift": "-8px"});

## draw the STRUCTURE bar plots
## 'along' defines plot orientation; x = vertical; y = horizontal
a2.bars(qtable[2].loc[myorder], style = style, along = 'y');
a3.bars(qtable[3].loc[myorder], style = style, along = 'y');
a4.bars(qtable[4].loc[myorder], style = style, along = 'y');
a5.bars(qtable[5].loc[myorder], style = style, along = 'y');
a6.bars(qtable[6].loc[myorder], style = style, along = 'y');

## add header for the bar plots
c.text(495, 23, 'K = 2', style={"font-size": "13px"})
c.text(575, 23, 'K = 3', style={"font-size": "13px"})
c.text(655, 23, 'K = 4', style={"font-size": "13px"})
c.text(735, 23, 'K = 5', style={"font-size": "13px"})
c.text(815, 23, 'K = 6', style={"font-size": "13px"})

## add deltaK values below the bar plots
c.text(495, 615, '0.0', style={"font-size": "10px"})
c.text(575, 615, '1.1', style={"font-size": "10px"})
c.text(655, 615, '5.9', style={"font-size": "10px"})
c.text(735, 615, '1.1', style={"font-size": "10px"})
c.text(815, 615, '0.9', style={"font-size": "10px"})
c.text(655, 630, 'delta <b>K</b>', style={"font-size": "10px"});

In [126]:
## get tree from RAxML results
## modified tree version with reduced branchlength for outgroup (Brimeura)
tre = toytree.tree("/home/tim/GBS/Muscari/Mus_Analysis/Mus_RAxML/Mus_RAxML_clust85_20210812/RAxML_bipartitions.pops_20_STRUCT.phy")
rtre = tre.root(wildcard = "Brimeura")

## further styling of plot with css 
style = {"stroke":toyplot.color.near_black, 
         "stroke-width": 0.25}

##    y1
## x1    x2
##    y2

## built & dissect canvas into multiple cartesian areas (x1, x2, y1, y2)
c = toyplot.Canvas(width = 900, height = 700)
a1 = c.cartesian(bounds=('1%', '46.5%', '5%', '95%'))       # The tree
a2 = c.cartesian(bounds=('50.5%', '59%', '5.25%', '86.25%'))  # K=2
a3 = c.cartesian(bounds=('59.5%', '68%', '5.25%', '86.25%'))  # K=3
a4 = c.cartesian(bounds=('68.5%', '77%', '5.25%', '86.25%'))  # K=4
a5 = c.cartesian(bounds=('77.5%', '86%', '5.25%', '86.25%'))  # K=5
a6 = c.cartesian(bounds=('86.5%', '95%', '5.25%', '86.25%'))  # K=6
a1.show = False
a2.show = False
a3.show = False
a4.show = False
a5.show = False
a6.show = False

## draw the tree
rtre.ladderize(1).draw(
    axes = a1,
    use_edge_lengths = True,
    tip_labels_align = True,
    tip_labels_style = {"font-size": "9px"},
    node_labels = "support",
    node_sizes = 0,
    node_labels_style={"font-size": "9px",
                       "baseline-shift": "7px",
                       "-toyplot-anchor-shift": "-8px"});

## draw the STRUCTURE bar plots
## 'along' defines plot orientation; x = vertical; y = horizontal
a2.bars(qtable[2].loc[myorder], style = style, along = 'y');
a3.bars(qtable[3].loc[myorder], style = style, along = 'y');
a4.bars(qtable[4].loc[myorder], style = style, along = 'y');
a5.bars(qtable[5].loc[myorder], style = style, along = 'y');
a6.bars(qtable[6].loc[myorder], style = style, along = 'y');

## add header for the bar plots
c.text(495, 23, 'K = 2', style={"font-size": "13px"})
c.text(575, 23, 'K = 3', style={"font-size": "13px"})
c.text(655, 23, 'K = 4', style={"font-size": "13px"})
c.text(735, 23, 'K = 5', style={"font-size": "13px"})
c.text(815, 23, 'K = 6', style={"font-size": "13px"})

## add deltaK values below the bar plots
c.text(495, 615, '0.0', style={"font-size": "10px"})
c.text(575, 615, '0.7', style={"font-size": "10px"})
c.text(655, 615, '0.4', style={"font-size": "10px"})
c.text(735, 615, '0.9', style={"font-size": "10px"})
c.text(815, 615, '0.1', style={"font-size": "10px"})
c.text(655, 630, 'delta <b>K</b>', style={"font-size": "10px"});

In [127]:
import toyplot.pdf
toyplot.pdf.render(c, "/home/tim/GBS/Muscari/Mus_Analysis/FiguresForPaper/Mus_clust90_pop20_RAxML_STRUCTURE_20210817.pdf");

### PCA

In [45]:
# init pca object with input data and (optional) parameter options
pca = ipa.pca(
    data = dataclust90,
    imap = imap,
    minmap = minmap,
    mincov = 0.5,
    impute_method = "sample",
)

Samples: 39
Sites before filtering: 21154
Filtered (indels): 1410
Filtered (bi-allel): 2684
Filtered (mincov): 816
Filtered (minmap): 12741
Filtered (subsample invariant): 8
Filtered (minor allele frequency): 0
Filtered (combined): 14097
Sites after filtering: 7062
Sites containing missing values: 6522 (92.35%)
Missing values in SNP matrix: 37116 (13.48%)
SNPs (total): 7062
SNPs (unlinked): 1034
Imputation: 'sampled'; (0, 1, 2) = 89.0%, 7.6%, 3.5%


In [46]:
# run the PCA analysis
pca.run()

Subsampling SNPs: 1034/7062


In [47]:
# store the PC axes as a dataframe
df = pd.DataFrame(pca.pcaxes[0], index=pca.names)

# write the PC axes to a CSV file
df.to_csv("pca_analysis.csv")

# show the first ten samples and the first 10 PC axes
df.iloc[:10, :10].round(2)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
Leopoldia_caucasica_ED1262,-4.85,-3.75,-1.07,-0.78,-0.43,-0.5,0.01,0.39,0.14,1.06
Leopoldia_comosa_ED1256,-5.71,-3.42,-2.2,-2.6,-0.5,-3.05,-1.49,0.86,-1.46,-3.58
Leopoldia_comosa_ED1274,-5.5,-3.92,-1.52,-2.1,-0.7,-1.76,-1.1,0.75,-1.15,-2.73
Leopoldia_comosa_ED3539,-5.87,-4.09,-2.36,-2.3,-0.76,-2.86,-1.52,1.27,-0.72,-2.81
Leopoldia_comosa_ED3965,-5.6,-4.39,-1.59,-2.08,-1.09,-2.65,-0.82,1.32,-1.75,-3.44
Leopoldia_cycladica_W6082,-4.8,-3.34,-2.46,-1.57,0.22,-1.19,-0.2,-0.69,0.11,2.96
Leopoldia_longipes_ED3233,-4.72,-1.28,0.08,0.84,0.17,4.69,0.65,1.35,3.32,-1.42
Leopoldia_matritensis_ED1282,-5.23,-3.42,-1.5,-0.82,-0.39,-1.01,-0.15,-0.26,1.15,5.46
Leopoldia_neumannii_ED1243,-5.16,-1.66,-0.26,3.48,0.75,5.28,1.89,-2.51,-6.1,-1.54
Leopoldia_neumannii_ED1607,-5.2,-1.9,-0.48,3.35,1.15,5.07,1.9,-2.34,-5.42,-0.42


In [48]:
pca.draw(0, 2);
pca.draw(0, 1);


In [56]:
# init pca object with input data and (optional) parameter options
pca2 = ipa.pca(
    data = dataclust90,
    imap = imap,
    minmap = minmap,
    mincov = 0.5,
    impute_method = "sample",
)

# run and draw results for impute_method=None and mincov=1.0
pca2.run(nreplicates = 25, seed=123)
pca2.draw(0, 1);
pca2.draw(0, 2);
pca2.draw(0, 3);

#pca2.draw(0, 1, outfile = "./Mus_Analysis/Mus_PCA/Mus_clust90_PCA0-1_20210816.pdf");
#pca2.draw(0, 2, outfile = "./Mus_Analysis/Mus_PCA/Mus_clust90_PCA0-2_20210816.pdf");
#pca2.draw(0, 3, outfile = "./Mus_Analysis/Mus_PCA/Mus_clust90_PCA0-3_20210816.pdf");

Samples: 39
Sites before filtering: 21154
Filtered (indels): 1410
Filtered (bi-allel): 2684
Filtered (mincov): 0
Filtered (minmap): 12741
Filtered (subsample invariant): 8
Filtered (minor allele frequency): 0
Filtered (combined): 14097
Sites after filtering: 7062
Sites containing missing values: 6522 (92.35%)
Missing values in SNP matrix: 37116 (13.48%)
SNPs (total): 7062
SNPs (unlinked): 1034
Imputation: 'sampled'; (0, 1, 2) = 88.9%, 7.5%, 3.5%
Subsampling SNPs: 1034/7062


## 3. Test for introgression and Incomplete lineage sorting

### Bucky

In [6]:
## software requirements
## conda install -c BioBuilds mrbayes
## conda install -c ipyrad ipyrad
## conda install -c ipyrad bucky

Create a bucky analysis object

The two required arguments are the `name` and `data` arguments. The `data` argument should be a .loci file or a .alleles.loci file. The name will be used to name output files, which will be written to `{workdir}/{name}/{number}.nexus`. Bucky doesn’t deal well with missing data, so loci will only be included if they contain data for all samples in the analysis. By default, all samples found in the loci file will be used, unless you enter a list of names (the `samples` argument) to subsample taxa, which we do here. It is best to select one individual per species or subspecies. You can set a number of additional parameters in the `.params` dictionary. Here I use the `maxloci` argument to limit the total number of loci so that the example analysis will finish faster. But in practice, BUCKy runs quite fast and I would typically just use all of your loci in a real analysis.

In [None]:
## make a list of sample names you wish to include in your BUCKy analysis
samples = ["Brimeura_amethystina_W6084", "Bellevalia_paradoxa_ED1272",
           "Bellevalia_dubia_W6083", "Bellevalia_speciosa_W6085",
           "Muscari_racemosum_ED1258", "Muscari_macrocarpum_ED1252",
           "Pseudomuscari_chalusicum_ED1255", "Pseudomuscari_azureum_ED1270",
           "Muscari_parviflorum_ED1245", "Pseudomuscari_inconstrictum_ED3234",
           "Muscari_commutatum_ED3538", "Muscari_sivrihisardaghlarensis_ED1278",
           "Muscari_anatolicum_W6087", "Muscari_vularlii_ED3232",
           "Muscari_discolor_ED1266", "Pseudomuscari_pallens_ED1267",
           "Pseudomuscari_coeruleum_ED1261", "Muscari_adilii_W6090",
           "Muscari_armeniacum_ED1244", "Muscari_armeniacum_W6089",
           "Muscari_neglectum_ED1253", "Muscari_baeticum_ED1281",
           "Muscari_botryoides_ED1279", "Muscari_neglectum_ED1254",
           "Muscari_pulchellum_ED3231", "Muscari_kerkis_ED1280",
           "Muscari_bourgaei_ED1259", "Muscari_latifolium_ED1265",
           "Leopoldia_tenuiflora_ED1263", "Leopoldia_longipes_ED3233",
           "Muscari_massayanum_ED1251", "Leopoldia_neumannii_ED1243",
           "Leopoldia_neumannii_ED1607", "Muscari_mirum_ED1250",
           "Leopoldia_matritensis_ED1282", "Leopoldia_spreitzenhoferi_ED1248",
           "Leopoldia_cycladica_W6082", "Leopoldia_weissii_W6081",
           "Leopoldia_caucasica_ED1262", "Leopoldia_comosa_ED3539",
           "Leopoldia_comosa_ED3965", "Leopoldia_comosa_ED1274", "Leopoldia_comosa_ED1256"
]

In [None]:
## initiate a bucky object
buck = ipa.bucky(
    name = "Mus_bucky",
    data = "./Mus_Assembly/pops30_clust90_outfiles/pedic.alleles.loci",
    workdir = "./Mus_Analysis/Mus_Bucky",
    samples = samples,
    minsnps = 0,
    #maxloci=100,
)

In [None]:
## print the params dictionary
buck.params