![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Trees in Strathcona County

Strathcona County collects data on all trees that are on public land. We are going to explore this dataset.

## Getting Ready

This section sets up many things behind the scenes which are required for the rest of this notebook. Most of the code blocks in this section are ready-to-run so you won't have to do any modifications. You don't need to know everything about various tasks being accomplished by the code cell in this section to complete the challenges. However feel free to ask mentors about anything that makes you curious.

### Importing Libraries

`▸Run` the cell below to import the required Python libraries.

In [None]:
# We will store the data in a 'dataframe' using pandas
import pandas as pd
# Plotly Express will be used to make graphs
import plotly.express as px
# We will visualize the coordinates in a map using folium
import folium
from folium.plugins import MarkerCluster
print('Setup Complete')

### Importing Data

We'll use a data set provided by Strathcona County on [data.strathcona.ca](https://data.strathcona.ca/Environment/Trees/ig6t-pdus). It contains tree locations and types, updated four times per year.

In [52]:
trees = pd.read_csv('https://data.strathcona.ca/api/views/ig6t-pdus/rows.csv?accessType=DOWNLOAD')
# display the data
trees

Unnamed: 0,the_geom,TreeSiteID,Orig_Name,Name,longitude,latitude,Genus,Species
0,POINT (-113.27990691495326 53.51475642946733),3180.0,pine spp,Pine spp,-113.279907,53.514756,Pine,Pine spp
1,POINT (-113.28087645626125 53.52109987792144),3414.0,"ash,green",Green Ash,-113.280876,53.521100,Ash,Green Ash
2,POINT (-113.25761177298322 53.53540572039219),6169.0,"ash,green",Green Ash,-113.257612,53.535406,Ash,Green Ash
3,POINT (-113.25328772314695 53.539579013605476),7801.0,"ash,green",Green Ash,-113.253288,53.539579,Ash,Green Ash
4,POINT (-113.31951009270878 53.527176356856444),8117.0,"ash,green",Green Ash,-113.319510,53.527176,Ash,Green Ash
...,...,...,...,...,...,...,...,...
34444,POINT (-113.14017862601068 53.55290677437548),41203.0,"elm,brandon",Brandon Elm,-113.140179,53.552907,Elm,Brandon Elm
34445,POINT (-113.2649735660162 53.555572941713656),40993.0,"elm,american",American Elm,-113.264974,53.555573,Elm,American Elm
34446,POINT (-113.28986295225532 53.55552474620585),41084.0,"oak,bur",Bur Oak,-113.289863,53.555525,Oak,Bur Oak
34447,POINT (-113.29937574240984 53.55305299330779),40882.0,"buckeye,ohio",Ohio Buckeye,-113.299376,53.553053,Buckeye,Ohio Buckeye


There is information about 34,449 different individual trees in the County!

## Data Cleaning

We can see that the column `the_geom` is a duplication of the columns `longitude` and `latitude`, so we can drop it.

In [None]:
trees.drop(columns=['the_geom'], inplace=True)
trees

## Analysis

We can now do some analysis of the dataset, such as figuring out which tree types are the most common.

We'll group data by `Name`, and use the `size()` method to count how many of each kind there are. The `.sort_values()` method will then sort by the `count` we created.

In [None]:
counts_by_name = trees.groupby('Name').size().reset_index(name='count')
counts_by_name.sort_values(by='count', ascending=False, inplace=True)
counts_by_name

You can now see the most common types of trees in Strathcona County. Let's visualize the data with a pie chart.

In [None]:
px.pie(counts_by_name.head(5), values='count', names='Name', title='Most Common Trees in Strathcona County')

Or the top ten most common as a bar graph.

In [None]:
px.bar(counts_by_name.head(10), x='Name', y='count', title='Most Common Trees in Strathcona County')

## Mapping Data

Since we have a dataframe with `latitude` and `longitude` columns, we will use the Python library called `folium` to visualize our data on a map.

First we will create and display a map. To figure out where the center of the map should be, we'll find the median values from those columns.

In [None]:
median_latitude = trees['latitude'].median()
median_longitude = trees['longitude'].median()

tree_map = folium.Map(location=[median_latitude, median_longitude], zoom_start=10)
display(tree_map)

There are other map styles that we can try:

* `openstreetmap`
* `stamen terrain`
* `Stamen toner`
* `stamen watercolor`
* `cartodb positron`
* `cartodb dark_matter`
* `mapbox bright` (Limited zoom levels)
* `mapbox control room` (Limited zoom levels)

In [None]:
tree_map = folium.Map(location=[median_latitude, median_longitude], zoom_start=10, tiles='stamen terrain')
display(tree_map)

We can now add the tree locations into our map. 

In the cell below we will [iterate](https://www.merriam-webster.com/dictionary/iteration) over each row in the dataframe and add markers using the `folium.Marker` function. Each marker will be created from the `latitude` and `longitude` coordinates, labelled with its `Name` as a `popup`. We will add this to our `marker_cluster` on our map called `tree_map`.

The cell will take a while to run, you'll know it's running if you see a `[*]` by the top left of the cell.

In [None]:
marker_cluster = MarkerCluster().add_to(tree_map)

for row in trees.iterrows():
    location = [row[1]['latitude'], row[1]['longitude']]
    name = row[1]['Name']
    folium.Marker(location=location, popup=folium.Popup(name, sticky=True), icon=folium.Icon(color='green', icon='tree', prefix='fa'), clustered_marker=True).add_to(marker_cluster)

display(tree_map)

You can now continue your own analysis in the [next notebook](trees-challenge.ipynb).

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)