# Make tree for East Woods
### v1, 2018-04-11<br>ahipp@mortonarb.org
This is our first round of East Woods data analysis, and the tree that we produce today will not be the final version. But we can edit from here, and we'll use this notebook to perform and document analyses.

Steps:
1. Read species list and tree files
2. Clean species names
3. Prune tree
4. Visualize tree, make sure things are working well
5. Quantify what species are missing

We'll stop at ths point for now; in the next round, we at least two options: 
* use the `makeMat` function from the [morton project](https://github.com/andrew-hipp/morton/blob/master/R/makeMat.R)... this is essentially a Phylomatic approach with a bit more control;
* add in taxa using sequence data

We'll figure this out later.

## Read data 
First, let's make a working directory and move into it.

In [None]:
if(!"WORKING" %in% dir('../')) dir.create('../WORKING')
setwd('../WORKING')
getwd()

Now let's get our data. I've put what we want into a folder called 'DATA' that is at the same level as 'WORKING'. Our data are in three spreadsheets, one each for shrubs, trees and herbs. We'll deal with that later.

In [27]:
spp.list <- lapply(dir('../DATA', patt = 'spp.', full = T), read.csv, as.is = T)
names(spp.list) <- gsub('.csv', '', dir('../DATA', patt = 'spp.'))
temp <- sapply(spp.list, dim)
temp
message(paste('Total spp entries:', sum(temp[1,])))

spp.herb.2007,spp.shrub.2007,spp.trees.2007
2780,862,4782
8,10,8


Total spp entries: 8424


And let's get our tree. We'll use the `ape` package for most tree manipulations, and `ggtree` for visualization.

In [18]:
library(ape)
#library(ggtree)

if(!exists('tr.zanne')) tr.zanne <- read.tree('../DATA/phylo.zanne.tre') # check first, b/c it's a slow file to load
tr.zanne


Phylogenetic tree with 31749 tips and 31748 internal nodes.

Tip labels:
	Blasia_pusilla, Lunularia_cruciata, Marchantia_polymorpha, Riccia_fluitans, Reboulia_hemisphaerica, Marchantia_foliacea, ...
Node labels:
	, , , , , , ...

Rooted; includes branch lengths.

In [None]:
plot(tr.zanne, 'fan', show.tip.label = F)

## Clean species names
Let's get one list of species names, and make sure they have underlines instead of spaces between the genus and epithet. Then let's check to make sure there are no monomials or trinomials. 

In [40]:
spp.vect <- do.call(c, lapply(spp.list, function(x) x$species))
message(paste('Total spp entries:', length(spp.vect)))
spp.vect <- sort(unique(trimws(spp.vect)))
message(paste('Unique spp entries:', length(spp.vect)))
spp.vect <- gsub(" ", "_", spp.vect)
spp.vect <- sapply(strsplit(spp.vect, "_"), function(x) paste(x[1:2], sep = "_"))
spp.vect <- unique(spp.vect)
message(paste('Unique binomials:', length(spp.vect)))
spp.vect

Total spp entries: 8424
Unique spp entries: 318
Unique binomials: 636


0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
??,Acalypha,Acer,Acer,Acer,Acer,Acer,Acer,Acer,Achillea,⋯,Viburnum,Viburnum,Viburnum,Viburnum,Viburnumspecies,Viburnum,Vine,Viola,Viola,Vitis
,rhomboidea,negundo,nigrum,platanoides,rubrum,saccharinum,saccharum,species,millefolium,⋯,opulus,prunifolium,rafinesquianum,recognitum,,species,species,pubescens,species,riparia
