![alt text](https://github.com/callysto/callysto-sample-notebooks/blob/master/notebooks/images/Callysto_Notebook-Banner_Top_06.06.18.jpg?raw=true)

# Strathcona County Tree Location

##### Summary

This data set is provided by the Strathcona County, Environment https://data.strathcona.ca/Environment/Tree/v78i-7ntw

##### Content
Locations where there is a tree. The data are loaded 4 times per year from the 'treeworks' dataset. Last updated: April 6th, 2016.


## Downloading and parsing data into 'dataframe'

We begin by downloading the data directly from the website. 

https://data.strathcona.ca/Environment/Tree/v78i-7ntw

We can do this by selecting the 'API' tag and choosing CSV format on the top right side. Pressing the 'Copy' button will give us the URL we need to download the full dataset.


In [None]:
#!pip install folium

In [None]:
# Import modules
# We will store the data into a 'dataframe' using pandas
import pandas as pd
# We want to be as precise as possible in keeping tree coordinates
from decimal import *
# We will visualize the coordinates in a map using the folium 
import folium
# We want to cluster them using the MarkerCluster submodule from folium plugins
from folium.plugins import MarkerCluster

In [None]:
# Download data from API 
# Main Source: https://data.strathcona.ca/Environment/Tree/v78i-7ntw
# Pick API Tag - I chose the CSV option
link = "https://data.strathcona.ca/resource/v78i-7ntw.csv"

# Read and parse data as a pandas CSV
rawData = pd.read_csv(link)

# Rename columns
rawData = rawData.rename(columns={"treesiteid": "ID", "name": "name","location":"location"})

# Look at the first five columns
rawData.head()

---
### Exercise 1

Look at the table above. It has three columns: a tree id column (`ID`), a tree name column (`name`) and a tree location number (`location`).  

1. Look at the table above under the `location` column. Is there anything strange in the values under it? 

---

## Data Cleanup

The table contains an ID uniquely identifying each tree, a tree name and its location by using coordinates into a pair. 

A clean coordinate pair would look like

`(53.5227206433883, -113.324197520184)`

Our data set is not clean, we observe the presence of special characters `\n, \n` in each entry. We need to clean it up before we can visualize it. 

The special character `\n` is known as a 'line break'. 

This tells us that the coordinates are given as a string. 

In the cell below we will clean the data following three steps:

1. Remove special characters `\n, \n`
2. Remove left parenthesis `(` and right parenthesis `)`
3. Separate the pair into an `X` and `Y` number - and create separate columns (one for each)

### Step 1. Remove special characters `\n, \n`

In [None]:
# Data cleanup - Remove line breaks 
rawData = rawData.replace('\n,  \n','', regex=True)

### Step 2. Remove left parenthesis `(` and right parenthesis `)` 

In [None]:
# Data cleanup - Remove parenthesis 
rawData = rawData.replace(r"\(","", regex=True)

---
### Exercise 2

The cell above removes the left parenthesis `(`. 

1. Using the cell below, enter a command similar as the one above, only this time remove the right parenthesis `)`
2. Run the cell and confirm that the odd characters and parenthesis were removed

---

In [None]:
# Your code here
rawData = rawData.replace(r"\)","", regex=True)
# Look at the first five entries
rawData.head()

### Step 3. Separate the pair into an `X` and `Y` number - and create separate columns (one for each)

In the cell below we will manipulate the values under the `locations` column. 

In [None]:

# Set precision to 16 significant digits
getcontext().prec = 16

# Get the maximum number of records
MAX_RECORDS = len(rawData)

# Let us separate the 'location' column into 'X' and 'Y' coordinates using the split() method
x_coor = [Decimal(each[1]["location"].split(",")[0]) for each in rawData[0:MAX_RECORDS].iterrows()]
y_coor = [Decimal(each[1]["location"].split(",")[1]) for each in rawData[0:MAX_RECORDS].iterrows()]

# Create a new column called 'X' to store the first coordinate in location
rawData["X"] = x_coor
# Create a new column called 'Y' to store the second coordinate in location
rawData["Y"] = y_coor

# Look at the first five columns
rawData.head()

## Data Visualization

Now that we have cleaned up the dataframe and separated the string `location` values into separate numerical values containing the `X` and `Y` coordinates, we will use the `folium` package to visualize our data geographically. 


---
### Exercise 3 

1. Look up the coordinates for Strathcona County
2. In the cell below, enter the North coordinate and the West coordinate into separate variables (we have created the variable names for you). Make sure you enter numbers only!
3. These will be the initial coordinates that will help us locate our map. 
4. Run the cell to display the map. Ensure you are in the right location (hint: Edmonton should appear in the map)
---

In [None]:
# Your code here 
north_coordinate = 53.5701
west_coordinate = -113.0741

# Initial coordinates 
SC_COORDINATES = [north_coordinate, west_coordinate]

# Create a map using our initial coordinates
map_osm=folium.Map(location=SC_COORDINATES, zoom_start=10, tiles='Stamen Terrain')

# Display the map 
display(map_osm)

## Displaying the tree locations

We can now add the tree locations into our map. 

In the cell below we will iterate over each record in our dataframe `rawData`. 

We will then add markers (one marker for each pair of coordinates) using the `folium.Marker` function. 

We will pass the `X` and `Y` coordinates using the `location` parameter, and mark each tree with its `name` using the `popup` parameter. 

We will ad this to our `marker_cluster`, which has been added to our map `map_osm`. 

Run the cell below to see the locations of the trees.

In [None]:
# Create marker cluster and add to our map
marker_cluster = MarkerCluster().add_to(map_osm)

# Iterate over each record, and add tree x and y coordinates, as well as tree name
MAX_RECORDS = len(rawData)
# For each record in rawData
for each in rawData[0:MAX_RECORDS].iterrows():
    # Use folium.Marker function, use X and Y coordinates to specify location
    folium.Marker(location = [each[1]['X'],each[1]['Y']], 
                  # Add tree name
                  popup=folium.Popup(each[1]['name'],sticky=True),
                  #Make color/style changes here
                  icon=folium.Icon(color='green', icon='tree', prefix='fa'),
                  # Make sure our trees cluster nicely!
                  clustered_marker = True).add_to(marker_cluster)


# Show the map
display(map_osm)

# Optional - you can save this map as an HTML file
map_osm.save('TreeMap.html')

---
### Exercise 4

Use the interactive map above for this exercise. 

1. Click on the largest cluster. It will break into smaller subclusters. (Hint: it has over 900 trees)
2. Pick on the next largest subcluster. It will break again into smaller subclusters. (Hint: it has over 200 trees)
3. Observe and ask yourself: how are the subclusters distributed? Where are the clusters bigger (green areas or the city).
4. Pick any subcluster and continue zooming in until you see tree icons. Click on the tree icons to reveal their names. 
5. Take note of: the total number of trees in the smallest cluster that revealed tree names, and their tree names. Add your observations by double clicking on this cell. How many trees of each kind did you find?

#### Your observations here

---

## Further Visualization and Statistics

A natural question to ask is what is the most common kind of tree. 

We will next group and plot the data to find out which tree is the most common one. 

1. Setting up cufflinks for data visualization.

In [None]:
#load "cufflinks" library under short name "cf"
import cufflinks as cf

#command to display graphics correctly in Jupyter notebook
cf.go_offline()

def enable_plotly_in_cell():
    import IPython
    from plotly.offline import init_notebook_mode
    display(IPython.core.display.HTML('''
        <script src="/static/components/requirejs/require.js"></script>
  '''))
    init_notebook_mode(connected=False)
    
get_ipython().events.register('pre_run_cell', enable_plotly_in_cell)

2. Grouping data by `name` and counting. 

In [None]:
count_by_tree_name = rawData.groupby("name").size().reset_index(name="count")

3. Visualizing. 

In [None]:
count_by_tree_name.iplot(kind="pie",values="count",labels="name",title="Tree Percentages") 

---
### Exercise 5

1. Hover over the plot. 
2. What are the three most common trees in the Strathcona County? Double click this cell and enter your response below. 

#### Your observations here

---

## Bonus: How many trees per park?

In the section below we will have an opportunity to repeat the exercise above using data on parks from the Strathcona County. 

Run the cells below, and take note of the number of trees in each park, vs the number of trees where no park was identified. 

In [None]:
### PARK DATA
link_park = "https://data.strathcona.ca/resource/533n-6rzt.csv"
# Store into CSV 
parkData = pd.read_csv(link_park)
# Cleanup data 
parkData = parkData.replace('\n,  \n','', regex=True)
parkData = parkData.replace(r"\)","", regex=True)
parkData = parkData.replace(r"\(","", regex=True)

MAX_RECORDS = len(parkData)

# Let us separate the 'location' column into 'X' and 'Y' coordinates using the split() method
x_coor = [Decimal(each[1]["location"].split(",")[0]) for each in parkData[0:MAX_RECORDS].iterrows()]
y_coor = [Decimal(each[1]["location"].split(",")[1]) for each in parkData[0:MAX_RECORDS].iterrows()]

# Create a new column called 'X' to store the first coordinate in location
parkData["X"] = x_coor
# Create a new column called 'Y' to store the second coordinate in location
parkData["Y"] = y_coor

# Look at the first five columns
parkData = parkData[(parkData["wilderness_trail"]=="YES")|  (parkData["nature_appreciation"]=="Yes") |  (parkData["off_leash"]=="Yes")]
parkData.head()

In [None]:
marker_cluster = MarkerCluster().add_to(map_osm)

MAX_RECORDS = len(parkData)   
for each in parkData[0:MAX_RECORDS].iterrows():
    # Use folium.Marker function, use X and Y coordinates to specify location
    folium.Marker(location = [each[1]['X'],each[1]['Y']], 
                  # Add tree name
                  popup=folium.Popup(each[1]['park_name'],sticky=True),
                  #Make color/style changes here
                  icon=folium.Icon(color='blue', icon='acorn', prefix='fa'),
                  # Make sure our trees cluster nicely!
                  clustered_marker = True).add_to(marker_cluster)
# Show the map
display(map_osm)

# Optional - you can save this map as an HTML file
map_osm.save('TreeMapParks.html')

---
### Exercise 5

1. Interact with the plot by clicking on the clusters. Blue markers denote parks. Click on them to reveal their names. 
2. Compare number of trees in regions with no parks, vs regions with parks. 
3. Pick a park, count the number of trees and keep track of their names. 

#### Your observations here

---


![alt text](https://github.com/callysto/callysto-sample-notebooks/blob/master/notebooks/images/Callysto_Notebook-Banners_Bottom_06.06.18.jpg?raw=true)