# Rows and rows of trees, figuratively speaking  
### What factors contribute to the size of trees on Vancouver boulevards?

#### Eric Dennis

#### Prepared and submitted as part of "Data Visualization" 

May 2022

## Introduction

Trees are quite fascinating in their rich diversity, but it can be easy to take forgranted that they are simply where they are and not think any deeper about them. Prior to this final project, I was completely oblivious to the fact that a city might keep a database of the trees planted on street boulevards. I quickly gained an appreciation for the sheer number of trees that are maintained by The City of Vancouver and the effort involved with collecting and updating the database information. The database sparked a number of questions regarding tree size and genus, and how tree diameters and heights might vary with age or if a root barrier is installed.

### Questions of interest
1) Is the size of a tree related to age?
2) Is the size of the tree related to the tree genus?
3) Does the installation of a root barrier affect the size of the tree? 

### Dataset description


The City of Vancouver maintains a database of trees planted on street boulevards at this [website](https://opendata.vancouver.ca/explore/dataset/street-trees/information/?disjunctive.species_name&disjunctive.common_name&disjunctive.height_range_id&disjunctive.on_street&disjunctive.neighbourhood_name&location=12,49.22567,-123.11022&dataChart=eyJxdWVyaWVzIjpbeyJjb25maWciOnsiZGF0YXNldCI6InN0cmVldC10cmVlcyIsIm9wdGlvbnMiOnsiZGlzanVuY3RpdmUuc3BlY2llc19uYW1lIjp0cnVlLCJkaXNqdW5jdGl2ZS5jb21tb25fbmFtZSI6dHJ1ZSwiZGlzanVuY3RpdmUuaGVpZ2h0X3JhbmdlX2lkIjp0cnVlLCJkaXNqdW5jdGl2ZS5vbl9zdHJlZXQiOnRydWUsImRpc2p1bmN0aXZlLm5laWdoYm91cmhvb2RfbmFtZSI6dHJ1ZX19LCJjaGFydHMiOlt7ImFsaWduTW9udGgiOnRydWUsInR5cGUiOiJsaW5lIiwiZnVuYyI6IkFWRyIsInlBeGlzIjoiaGVpZ2h0X3JhbmdlX2lkIiwic2NpZW50aWZpY0Rpc3BsYXkiOnRydWUsImNvbG9yIjoiIzAyNzlCMSJ9XSwieEF4aXMiOiJkYXRlX3BsYW50ZWQiLCJtYXhwb2ludHMiOiIiLCJ0aW1lc2NhbGUiOiJ5ZWFyIiwic29ydCI6IiJ9XSwiZGlzcGxheUxlZ2VuZCI6dHJ1ZSwiYWxpZ25Nb250aCI6dHJ1ZX0%3D). This database contains information such as tree height, diameter, location, age, tree genus, species and common name, alog with other related characteristics. At the latest update of this webpage, there are 151,516 records of public trees. As this is number of records is much too large to handle for this projects purpose, a reduced 5000 record dataset taken randomly from the full database has been provided for this analysis. While this makes the dataset easier to work with, it should be noted that conclusions made on this dataset may not be reflected in the full database.

* **small_unique_vancouver**
    * This file contains the pre-prepared [csv file](https://raw.githubusercontent.com/UBC-MDS/data_viz_wrangled/main/data/Trees_data_sets/small_unique_vancouver.csv). This csv comprises of 5000 unique records of Vancouver trees, and contains 21 columns of categorical and numeric data associated with each tree.


## Methods and analysis

### Data import and column descriptions

A good place to start is to read in **small_unique_vancouver** and look at the datatypes of each column. The column *date_planted* contains a date stamp, so this column will be parsed in a datetime datatype.

In [1]:
# Let's begin with importing in necessary libraries
import pandas as pd
import altair as alt
alt.data_transformers.enable('default', max_rows=1000000)
import json
import os

# Reading in the data

#Reading in the dataset.

raw_trees = pd.read_csv('https://raw.githubusercontent.com/UBC-MDS/data_viz_wrangled/main/data/Trees_data_sets/small_unique_vancouver.csv', parse_dates=['date_planted'])

print(raw_trees.shape)
raw_trees.head()

(5000, 21)


Unnamed: 0.1,Unnamed: 0,std_street,on_street,species_name,neighbourhood_name,date_planted,diameter,street_side_name,genus_name,assigned,...,plant_area,curb,tree_id,common_name,height_range_id,on_street_block,cultivar_name,root_barrier,latitude,longitude
0,10747,W 20TH AV,W 20TH AV,PLATANOIDES,Riley Park,2000-02-23,28.5,EVEN,ACER,N,...,15,Y,21421,NORWAY MAPLE,4,0,,N,49.252711,-123.106323
1,12573,W 18TH AV,W 18TH AV,CALLERYANA,Arbutus-Ridge,1992-02-04,6.0,ODD,PYRUS,N,...,7,Y,129645,CHANTICLEER PEAR,2,2300,CHANTICLEER,N,49.25635,-123.158709
2,29676,ROSS ST,ROSS ST,NIGRA,Sunset,NaT,12.0,ODD,PINUS,N,...,7,Y,154675,AUSTRIAN PINE,4,7800,,N,49.213486,-123.083254
3,8856,DOMAN ST,DOMAN ST,AMERICANA,Killarney,1999-11-12,11.0,EVEN,FRAXINUS,N,...,7,Y,180803,AUTUMN APPLAUSE ASH,4,6900,AUTUMN APPLAUSE,N,49.220839,-123.036721
4,21098,EAST BOULEVARD,EAST BOULEVARD,HIPPOCASTANUM,Shaughnessy,NaT,15.5,ODD,AESCULUS,Y,...,N,Y,74364,COMMON HORSECHESTNUT,4,5200,,N,49.238514,-123.154958


In [2]:
raw_trees.dtypes

Unnamed: 0                     int64
std_street                    object
on_street                     object
species_name                  object
neighbourhood_name            object
date_planted          datetime64[ns]
diameter                     float64
street_side_name              object
genus_name                    object
assigned                      object
civic_number                   int64
plant_area                    object
curb                          object
tree_id                        int64
common_name                   object
height_range_id                int64
on_street_block                int64
cultivar_name                 object
root_barrier                  object
latitude                     float64
longitude                    float64
dtype: object

### Columns of interest

The datatype analysis shows that *raw_trees* contains 12 categorical, 8 numeric, and 1 datetime columns as described below in Table 1: The bolded column number signify these columns are useful to answering our questions. 

#### Table 1: Column description and datatypes from small_unique_vancouver
| Column                                     | Description                                                  |Datatype
|--------------------------------------------|:-------------------------------------------------------------|:-------------|
| Unnamed: 0                                 | A number associated to each tree with no data explanation    |numeric (int) |    
| std_street                                 | Street name where tree is associated                         |categorical   |
| on_street                                  | Street name where tree is physically located                 |categorical   |
| **species_name**                           | Name of tree species                                         |categorical   |
| **neighbourhood_name**                     | Local municipal area where tree is located                   |categorical   |
| **date_planted**                           | Date on which tree was planted                               |datetime      |
| **diameter**                               | Diameter of the tree in inches at DHB                        |numeric (dec) |
|                                            | (Diameter at Breast Height)                                  |              |
| street_side_name                           | The street side which the tree is physically located on      |categorical   |
|                                            | (Even, Odd or Median)                                        |              |
| **genus_name**                             | Name of the genus group of tree                              |categorical   |
| assigned                                   | Indicates whether the address is made up to associate        |categorical   |
|                                            | the tree with a nearby lot (Yes or No)                       |              |
| civic_number                               | Numeric street address where tree is associated              |numeric (int) |
| plant_area                                 | Indicates space around tree if sidewalk, tree grate          |categorical   |
| curb                                       | Indicates if tree is planted on a curb (Yes or No)           |categorical   |
| tree_id                                    | Unique numberical ID of tree                                 |numeric (int) |
| **common_name**                            | The common name of the tree                                  |categorical   |
| **height_range_id**                        | Provides a integer value indicator for height range of 0-10  |numeric (int) |
|                                            | for every 10 feet (e.g., 0 = 0-10 ft, 1 = 10-20 ft,          |              |
|                                            | 2 = 20-30 ft, and10 = 100+ ft)                               |              |
| on_street_block                            | The street block at which the tree is physically located on  |numeric (int) |
| cultivar_name                              | The name of the tree cultivar                                |categorical   |
| **root_barrier**                           | Indicates if a root barrier is installed on the tree or not  |categorical   |
| latitude                                   | The geograpgic latitude of the tree's location               |numeric (dec) |
| longitude                                  | The geograpgic longitude of the tree's location              |numeric (dec) |




EDA of this dataset revealed that only 2363 of the 5000 date_planted entries contain non-null values, and dataframes were cleaned to remove trees that didnt have a planting date for age-related questions.

An initial clean up of the dataframe is useful to more easily target the information we desire. The dataframe **trees_cleaned** includes only the highlighted columns from Table 1 and will be used as the foundation for detailed analysis. The two dataframes **trees_date_cleaned** and **trees_year_planted** were also prepared. These cleaned dataframes are read in below.

In [3]:
# Reading in dataframes trees_cleaned, trees_year_planted, and trees_year_planted prepared from EDA.

#trees_cleaned
trees_cleaned = raw_trees.drop(columns=['tree_id','civic_number', 'Unnamed: 0', 'std_street', 'on_street', 'street_side_name', 'assigned',
                                        'civic_number', 'plant_area', 'curb', 'tree_id', 'on_street_block', 'cultivar_name','latitude','longitude'])
print(trees_cleaned.shape)

#trees_date_cleaned
trees_date_cleaned = trees_cleaned[~trees_cleaned['date_planted'].isna()]
print(trees_date_cleaned.shape)

#trees_year_planted
trees_year_planted = trees_date_cleaned.assign(year = trees_date_cleaned['date_planted'].dt.year)
print(trees_year_planted.shape) #Year column is now included.
print(trees_year_planted.info())
trees_year_planted.head()

(5000, 8)
(2363, 8)
(2363, 9)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2363 entries, 0 to 4998
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   species_name        2363 non-null   object        
 1   neighbourhood_name  2363 non-null   object        
 2   date_planted        2363 non-null   datetime64[ns]
 3   diameter            2363 non-null   float64       
 4   genus_name          2363 non-null   object        
 5   common_name         2363 non-null   object        
 6   height_range_id     2363 non-null   int64         
 7   root_barrier        2363 non-null   object        
 8   year                2363 non-null   int64         
dtypes: datetime64[ns](1), float64(1), int64(2), object(5)
memory usage: 184.6+ KB
None


Unnamed: 0,species_name,neighbourhood_name,date_planted,diameter,genus_name,common_name,height_range_id,root_barrier,year
0,PLATANOIDES,Riley Park,2000-02-23,28.5,ACER,NORWAY MAPLE,4,N,2000
1,CALLERYANA,Arbutus-Ridge,1992-02-04,6.0,PYRUS,CHANTICLEER PEAR,2,N,1992
3,AMERICANA,Killarney,1999-11-12,11.0,FRAXINUS,AUTUMN APPLAUSE ASH,4,N,1999
5,PERSICA,West End,2012-04-05,3.0,PARROTIA,VANESSA PERSIAN IRONWOOD,1,N,2012
7,OFFICINALIS,Kensington-Cedar Cottage,2001-04-02,3.0,MAGNOLIA,CHINESE MAGNOLIA,2,N,2001


The dataframes we are handling now only contain the 8 columns of interest and non-null values for dates. While the removal of approximately half our data limits extrapolation of conclusions to the full dataframe, tree records with unknown ages is useless to the cause of answering our questions.

In [4]:
#The summary statistics for the dataframe of interest:
trees_year_planted.describe()

Unnamed: 0,diameter,height_range_id,year
count,2363.0,2363.0,2363.0
mean,6.348578,1.812103,2003.265764
std,4.611835,0.963907,7.126328
min,1.0,0.0,1989.0
25%,3.0,1.0,1997.0
50%,4.75,2.0,2003.0
75%,8.0,2.0,2009.0
max,56.0,8.0,2019.0


### Question 1: Is the size of a tree related to age?
It could be said somewhat intuitively that the older a tree is, the higher and wider it should get. Does this concept hold up with this dataset? The concetenated scatterplots of diameter and height vs year planted in Figure 1 show an interesting trend.

In [5]:
size_year_text = alt.TitleParams(
    'Figure 1: Trees can get bigger as they get older',
     subtitle = 'The spread of tree diameter and height increases with age',
     fontSize=20,
     subtitleColor='blue',
     anchor='middle')

year_diameter_plot = alt.Chart(trees_year_planted).mark_circle().encode(
    alt.X('diameter', title='Diameter of tree (DBH)', scale=alt.Scale(zero=False)),
    alt.Y('year', title='Year of planting', scale=alt.Scale(zero=False), axis=alt.Axis(format=' ')),
    alt.Size('count()', title='Number of trees'),
    tooltip=['diameter', 'height_range_id']
    ).properties(height=350, width=350)

year_height_plot = year_diameter_plot.encode(alt.X('height_range_id', title='Height range of tree (1 unit/10 feet)', scale=alt.Scale(zero=False)))

year_size_plot = year_diameter_plot|year_height_plot
year_size_plot = year_size_plot.properties(title=size_year_text)
year_size_plot

Figure 1 shows that there are many more younger trees than older trees, as counts are concentrated in 2005-2020, with 2010-2015 being the busiest years for tree planting in this dataset. Both diamter and height are concentrated in the top left corner, indicating that younger trees tend to small. As the trees get older, the spread of the data in both diameter and height gets larger. This suggests that some trees are certainly getting larger as they age. However, it far from a full story to say older trees are larger than their younger counterparts, as there are many examples of old trees that are small in the bottom left of each plot, along with several notable points of "middle-aged" trees, mostly planted in 2000-2005, that are relatively large and appear as outliers towards the right side of the plots. There are likely multiple other factors in play in tree size, one of which could be the genus of the tree.

### Question 2: Is the size of the tree related to the tree genus?

As determined in the EDA, there were 56, 119, 240 unique genus, species and common names, respectively, in **trees_year_planted**. As such genus was considered the most space efficient category to plot. Since many genus counts are less than 20, its hard to make any firm conclusions on a small sample size. Thus, it is prudent to filter out a new dataframe **genus_filtered** to include only genus names that appear at least 20 times.

In [6]:
#Unique values in desired columns

def number_of_uniques (dataframe, column):
    return print(len(dataframe[column].unique()))

number_of_uniques(trees_year_planted, 'genus_name') #Number of unique genus entries

number_of_uniques(trees_year_planted, 'species_name') #Number of unique species entries

number_of_uniques(trees_year_planted, 'common_name') #Number of unique common name entries

56
119
240


In [7]:
#Filtering to create genus_filtered dataframe
genus_filtered = trees_year_planted.groupby('genus_name').filter(lambda x: len(x) >= 20)
print(genus_filtered.shape)
genus_filtered.head()

#code source: https://softhints.com/pandas-how-to-filter-results-of-value_counts/

(2191, 9)


Unnamed: 0,species_name,neighbourhood_name,date_planted,diameter,genus_name,common_name,height_range_id,root_barrier,year
0,PLATANOIDES,Riley Park,2000-02-23,28.5,ACER,NORWAY MAPLE,4,N,2000
1,CALLERYANA,Arbutus-Ridge,1992-02-04,6.0,PYRUS,CHANTICLEER PEAR,2,N,1992
3,AMERICANA,Killarney,1999-11-12,11.0,FRAXINUS,AUTUMN APPLAUSE ASH,4,N,1999
5,PERSICA,West End,2012-04-05,3.0,PARROTIA,VANESSA PERSIAN IRONWOOD,1,N,2012
7,OFFICINALIS,Kensington-Cedar Cottage,2001-04-02,3.0,MAGNOLIA,CHINESE MAGNOLIA,2,N,2001


The **genus_filtered** dataframe now contains 2191 records. 

Using the revised **genus_filtered**, the original bar chart from EDA is replotted in Figure 2, with the count of each genus included for clarity.

In [8]:
#Bar chart of genus counts from genus_filtered

genus_click = alt.selection_multi(fields=['genus_name'])

genus_count_plot = alt.Chart(genus_filtered).mark_bar().encode(
    alt.X('count()', title='Total trees'),
    alt.Y('genus_name', title='Name of genus',  sort='-x'),
    opacity=alt.condition(genus_click, alt.value(0.9), alt.value(0.1))
    ).add_selection(genus_click
    ).properties(title='Figure 2: Total number of trees for each genus')

text = alt.Chart(genus_filtered).mark_text(align='center', dx=10).encode(
    alt.Y('genus_name', sort='-x'),
    alt.X('count()'),
    alt.Text('count()'))

(genus_count_plot + text)

The Acer genus is the most abundant in the dataset, making up about 30% of the total trees. This genus includes Maples, which are a common sight on Vancouver street boulevards. The other genus's incoporate between 1 and 14% of the dataset. This is a wide variety of trees, but how does this relate to tree size? To begin to answer this question, we can look into the boxplots in Figure 3. Combing these boxplots with Figure 2 with a selection multi will allow us to gain an appreciation of the proportions of each genus in the dataset alongside their median heights and diameters.

In [9]:
#Defining function to sort diameter and height lists by median

def sort_order(df,groupby_col,median_col):
    return df.groupby(groupby_col)[median_col].median().sort_values().index.tolist()

#Diameter
genus_diameter_order = sort_order(genus_filtered,'genus_name','diameter')
print(genus_diameter_order)

#Height
genus_height_order = sort_order(genus_filtered,'genus_name','height_range_id')
print(genus_height_order)

#Generating the boxplots
genus_size_text = alt.TitleParams(
    'Figure 3: Median diameters and heights of tree varities follow similar trends',
     subtitle = 'The sorted order of genus doesnt change much for diameter or height',
     fontSize=20,
     subtitleColor='blue',
     anchor='middle')

genus_diameter_plot = alt.Chart(genus_filtered).mark_boxplot().encode(
    alt.X('diameter', title='Diameter of tree (DBH)', scale=alt.Scale(zero=False)),
    alt.Y('genus_name', title='Name of genus', sort=genus_diameter_order),
    opacity=alt.condition(genus_click, alt.value(0.9), alt.value(0.1))
    ).add_selection(genus_click)

genus_height_plot = genus_diameter_plot.encode(alt.X('height_range_id', title='Height range of tree (1 unit/10 feet)', scale=alt.Scale(zero=False)), 
                                               alt.Y('genus_name', title='Name of genus', sort=genus_height_order))

#Concatenating with Figure 2
genus_size_plot = ((genus_count_plot.properties(title=' ', height=400, width=250)) + text)|genus_diameter_plot.properties(height=400, width=250)|genus_height_plot.properties(height=400, width=250)
genus_size_plot = genus_size_plot.properties(title=genus_size_text)
genus_size_plot


['MAGNOLIA', 'CORNUS', 'PARROTIA', 'SYRINGA', 'CERCIDIPHYLLUM', 'CRATAEGUS', 'SORBUS', 'PYRUS', 'PRUNUS', 'MALUS', 'STYRAX', 'ACER', 'FAGUS', 'CARPINUS', 'FRAXINUS', 'GLEDITSIA', 'LIQUIDAMBAR', 'QUERCUS', 'TILIA', 'PLATANUS']
['CERCIDIPHYLLUM', 'CORNUS', 'CRATAEGUS', 'SORBUS', 'PRUNUS', 'SYRINGA', 'PARROTIA', 'MAGNOLIA', 'STYRAX', 'QUERCUS', 'PYRUS', 'ACER', 'LIQUIDAMBAR', 'GLEDITSIA', 'FRAXINUS', 'FAGUS', 'CARPINUS', 'MALUS', 'TILIA', 'PLATANUS']


Figure 3 highlights an interesting trend. The sorted order of genus diameter and height are very similar. Within the list of genus discussed herein, a genus that is at the narrow end of the sorted diameter order tends to be in the lower end of the height order. A similar relationship is seen with different genus in the middle and higher ends of the sorted order. The genus click tool allows us to quickly focus on one specific genus and where it is located in the diameter and height orders. 

One limitation of the conclusions that can be made here is the method of the data collection for height being a binned-type height range ID, with an integer value for every 10 feet of height. Plotting these binned values as a median looks cumbersome in Figure 3, and could certainly lead to some skewing of the sorting order, especially for any genus that has a small total count in the dataset. An example of this is the Gleditsia genus, which has only 40 records and appears only as outliers in the height range box plot.

### Question 3: Does the installation of a root barrier affect the size of the tree? 

We can from Figures 1-3 that tree size is related to age and genus. What about a root barrier? That is, some sort of barrier that prevents roots from growing beyond a certain point around the tree base. It would be expected that a root barrier would stunt growth. If roots can't grow, would you expect that the tree above it would?

As a first step in this question, let's explore the proportion of trees with or without root barriers in the **trees_year_planted** dataframe with a bar chart in Figure 4.

In [10]:
#Bar chart showing proportion of trees with and without root barriers.

root_barrier_bar = alt.Chart(trees_year_planted).mark_bar().encode(
    alt.X('count()', title='Number of trees'),
    alt.Y('root_barrier', sort='x', title=None),
    alt.Color('root_barrier', legend=None, scale=alt.Scale(scheme='tableau10'))
    ).properties(title='Figure 4: About 13% of trees have root barriers installed')

root_barrier_text = alt.Chart(trees_year_planted).mark_text(align='center', dx=15).encode(
    alt.Y('root_barrier', sort='x'),
    alt.X('count()'),
    alt.Text('count()'))

root_barrier_bar + root_barrier_text

We can see from Figure 4 that the dataset is weighted heavily towards trees without root barriers, with only 13% having an root barrier installed. 

Can these root barriers impede growth of the trees? The stacked histograms of total trees vs diameter and height in Figure 5 help to visualize this.

In [13]:
#Selecting height and diameter columns
size_columns= trees_year_planted.select_dtypes('number').columns.tolist()[0:2]
print(size_columns)

#Plotting the stacked histogram
barrier_stack_title = alt.TitleParams(
    'Figure 5: Root barriers do not appear to hinder tree growth',
     subtitle = 'Proportions of trees with and without barriers appears constant',
     fontSize=20,
     subtitleColor='red',
     anchor='middle')

root_barrier_stack = alt.Chart(trees_year_planted).mark_bar().encode(
    alt.X(alt.repeat('column'), type='quantitative', scale=alt.Scale(zero=False), bin=alt.Bin(maxbins=30)),
    alt.Y('count()', title='Total trees'),
    alt.Color('root_barrier', title="Root barrier", scale=alt.Scale(scheme='tableau10'))
    ).properties(height=350, width=350
    ).repeat(column=size_columns
    )

root_barrier_stack = root_barrier_stack.properties(title=barrier_stack_title)
root_barrier_stack


['diameter', 'height_range_id']


The ratios of root barrier installed or not in Figure 5 seems to be similar for each size bin. That is, there appears to be a faily constant Yes to No ratio of root barrier in each of the bins for both diameter and height. Albeit there are fewer trees in each bin as the size of the diameter or height increases, but the relative heights of Yes vs No appear about the same. This suggests that root barriers arent playing a significant hindrance to the size of trees. It could be argued that the absence of any tree with root barriers above 16 inches diameter and 50 feet in height that root barriers can prevent trees becoming very large, but the small size of the data set limits us making any firm conclusions on this observation.

## Discussion

The trees of Vancouver have taken us on quite a journey and we have uncovered a number of observations relating to their size in the course of this analysis.

From Figure 1, we are able to see the spread of tree height and diameters becomes larger as the trees in the dataset get older. Its not as simple as trees get bigger as they get older, some remain small while others get very tall, and  most grow somewhere in between. Since there are a lot of different genus and species making up this dataset, that is also going to play a role as to how big a tree will grow. The contribution of genus to the size of trees was laid out in Figure 3, whereby differences in median tree diameters and heights were observed for different genus of trees in the dataset. 

An interesting side observation is that the Acer genus, encompassing Maples, makes up about 30% of the dataset, by far the most of any genus. The Wikipedia page for [Maples](https://en.wikipedia.org/wiki/Maple) states these trees are in the order of 33-148 feet high, so they are a good medium sized tree to plant on a boulevard without worrying about them growing to tower over the city. Their dramatic colour changes in Fall is quite spectacular to see. These two attributes combined provide some insight as to why city planners would pick these trees to line their streets.

Lastly, Figure 5 suggested that installation of root barriers did not play a significant role in hindering heights of trees. This is somewhat surprising as I would have thought that tree growth would be hampered if tree roots are prevented from properly growing out as far as they want.

This project has certainly given me a greater appreciation for the thought that goes into the boulevard trees in a city, along with the effort needed to maintain a large, up-to-date database set. One future direction that I see a viable and interesting topic to delve deeper is the distribution of the different genus and species of trees among the neighbourhoods of Vancouver.

## Dashboard

In [12]:
# Dropdown controls
dropdown = alt.binding_select(name='Genus ', options=genus_diameter_order)
genus_dropdown = alt.selection_single(fields=['genus_name'], bind=dropdown)

brush = alt.selection_interval()

# Year Dash plots
year_title_dash= alt.TitleParams('Trees can get bigger as they get older', anchor='middle')

year_diameter_dash = alt.Chart(genus_filtered).mark_circle().encode(
    alt.X('diameter', title='Diameter of tree (DBH)', scale=alt.Scale(zero=False)),
    alt.Y('year', title='Year of planting', scale=alt.Scale(zero=False), axis=alt.Axis(format=' ')),
    alt.Size('count()', title='Number of trees'),
    color=alt.condition(brush, 'genus_name', alt.value('lightgray')),
    opacity=alt.condition(genus_dropdown, alt.value(1), alt.value(0))
    ).add_selection(brush, genus_dropdown).properties(height=280, width=280)

year_height_dash = year_diameter_dash.encode(alt.X('height_range_id', title='Height range of tree (1 unit/10 feet)', scale=alt.Scale(zero=False)))

year_dash = (year_diameter_dash|year_height_dash).properties(title=year_title_dash)

# Genus dash plots

genus_title_dash= alt.TitleParams('Median diameters and heights of tree varities follow similar trends', anchor='middle')

genus_diameter_dash = alt.Chart(genus_filtered).mark_boxplot().encode(
    alt.X('diameter', title='Diameter of tree (DBH)', scale=alt.Scale(zero=False)),
    alt.Y('genus_name', title='Name of genus', sort=genus_diameter_order),
    opacity=alt.condition(genus_dropdown, alt.value(0.9), alt.value(0.05))
    ).add_selection(brush, genus_dropdown).properties(height=300, width=250)

genus_height_dash = genus_diameter_dash.encode(alt.X('height_range_id', title='Height range of tree (1 unit/10 feet)', scale=alt.Scale(zero=False)), 
                                               alt.Y('genus_name', title='Name of genus', sort=genus_height_order))

genus_dash = (genus_diameter_dash|genus_height_dash).properties(title=genus_title_dash)

#Combining together
dash_plots = (year_dash)&(genus_dash)
dash_plots

## References

Not all the work in this notebook is original. Parts that were borrowed from other resources are as follows:

### Resources used
- UBC Data Visualization modules and assignments for plot inspiration and code snippets.
- Programming in Python for Data Science for wrangling refreshers.
- [The data](https://raw.githubusercontent.com/UBC-MDS/data_viz_wrangled/main/data/Trees_data_sets/small_unique_vancouver.csv) based off the City of Vancouver database.
- The City of Vancouver [street trees data portal](https://opendata.vancouver.ca/explore/dataset/street-trees/information/?disjunctive.species_name&disjunctive.common_name&disjunctive.height_range_id&disjunctive.on_street&disjunctive.neighbourhood_name&dataChart=eyJxdWVyaWVzIjpbeyJjb25maWciOnsiZGF0YXNldCI6InN0cmVldC10cmVlcyIsIm9wdGlvbnMiOnsiZGlzanVuY3RpdmUuc3BlY2llc19uYW1lIjp0cnVlLCJkaXNqdW5jdGl2ZS5jb21tb25fbmFtZSI6dHJ1ZSwiZGlzanVuY3RpdmUuaGVpZ2h0X3JhbmdlX2lkIjp0cnVlLCJkaXNqdW5jdGl2ZS5vbl9zdHJlZXQiOnRydWUsImRpc2p1bmN0aXZlLm5laWdoYm91cmhvb2RfbmFtZSI6dHJ1ZX19LCJjaGFydHMiOlt7ImFsaWduTW9udGgiOnRydWUsInR5cGUiOiJsaW5lIiwiZnVuYyI6IkFWRyIsInlBeGlzIjoiaGVpZ2h0X3JhbmdlX2lkIiwic2NpZW50aWZpY0Rpc3BsYXkiOnRydWUsImNvbG9yIjoiIzAyNzlCMSJ9XSwieEF4aXMiOiJkYXRlX3BsYW50ZWQiLCJtYXhwb2ludHMiOiIiLCJ0aW1lc2NhbGUiOiJ5ZWFyIiwic29ydCI6IiJ9XSwiZGlzcGxheUxlZ2VuZCI6dHJ1ZSwiYWxpZ25Nb250aCI6dHJ1ZX0%3D&location=12,49.22567,-123.11022) for column descriptions.
- Altair documentation including, but not limited to, 
    - [Title parameters](https://altair-viz.github.io/user_guide/generated/core/altair.TitleParams.html)
    - [Customization](https://altair-viz.github.io/user_guide/customization.html)
- [Color schemes](https://vega.github.io/vega/docs/schemes/) from Vega github.
- This [Softhints](https://softhints.com/pandas-how-to-filter-results-of-value_counts/) page for help with filtering value counts. 
- Maple [wikipedia page](https://en.wikipedia.org/wiki/Maple)
- Analise Hoffmann for EDA feedback.
