![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Data Detective:
## Where to Replant Trees After the Zombie Apocalypse 

#### What you need to do

The zombie apocalypse hit Strathcona County, but these zombies decided to eat trees. 

Now the County has a problem: they need to replant trees, but don’t know where. You’ve been hired to tell them what areas of the County need trees replanted and which areas still have lots of trees.

The County needs this information to promote wellness and outdoor activities. During the apocalypse no one wanted to go to parks when the zombies were munching the trees. The County also needs a recommendation for what kinds of trees would be best to plant. 

##### Summary

This data set is provided by the Strathcona County Recreation, Parks, and Culture and is available at https://data.strathcona.ca/Environment/Tree/v78i-7ntw.

##### Content
Tree locations and types (common names). The data are loaded four times per year from the 'treeworks' dataset. Last updated: December 14th, 2019.

## Downloading the data into a 'dataframe'

We begin by downloading the data directly from the website. From [the website](https://data.strathcona.ca/Environment/Tree/v78i-7ntw) we selected the 'API' tag and chose CSV format on the top right side. Pressing the 'Copy' button gave us [the URL](https://data.strathcona.ca/resource/v78i-7ntw.csv) we need to download the full dataset.

In [None]:
!pip install folium --upgrade

In [None]:
# Import libraries or modules that we will need

# We will store the data into a 'dataframe' using pandas
import pandas as pd
# We want to be as precise as possible in keeping tree coordinates
from decimal import *
# We will visualize the coordinates in a map using the folium
#!pip install folium
import folium
# We want to cluster them using the MarkerCluster submodule from folium plugins
from folium.plugins import MarkerCluster

print("Importing Python libraries was successful!")

In [None]:
# Download data from API 
# Main Source: https://data.strathcona.ca/Environment/Tree/v78i-7ntw
# Pick API Tag - we chose the CSV option
link = "https://data.strathcona.ca/resource/v78i-7ntw.csv"

# Read and parse data as a pandas CSV
rawData = pd.read_csv(link)

# Rename the columns
rawData = rawData.rename(columns={"treesiteid": "ID", "name": "name","location":"location"})

# Look at the first five columns
rawData.head()

---
### Challenge 1

Look at the table above. It has three columns: a tree id column (`ID`), a tree name column (`name`) and a tree location number (`location`).  

1. Look at the table above under the `location` column. Is there anything strange about the values under it? 

---

## Data Cleaning

The table contains an ID uniquely identifying each tree, a tree name and its location by using coordinates into a pair. 

A clean coordinate pair would look like

`(53.5227206433883, -113.324197520184)`

Our data set is not clean, we observe the presence of special characters `\n, \n` in each entry. We need to clean it up before we can visualize it. 

The special character `\n` is known as a 'line break'. 

This tells us that the coordinates are given as a string. 

In the cell below we will clean the data using the following steps:

1. Remove special characters `\n, \n`
2. Remove left parenthesis `(` and right parenthesis `)`
3. Separate the pair into an `latitude` and `longitude` coordinates - and create separate columns (one for each)
4. Remove `'location'` column

In [None]:
# Helper function to clean up the data

def clean_dataframe(dataframe):
    try:
        """
        Function description: this function takes as input a dataframe with special characters
        and returns a 'clean' dataframe - a dataframe with no special characters and no parenthesis,
        as well as a latitude and longitude coordinate
        """
        
        # Remove special characters 
        dataframe = dataframe.replace('\n,  \n','',regex=True)
        # Data cleanup - Remove parentheses 
        dataframe['location'] = dataframe['location'].str.replace('(','').str.replace(')','')
        # Split the column into latitude and longitude
        dataframe[['latitude','longitude']] = dataframe['location'].str.split(",", n=1, expand=True)
        # Delete the 'location' column
        dataframe = dataframe.drop(['location'],axis=1)

        return dataframe
    
    except:
        print("WARNING! Make sure you are passing a pandas dataframe, and make sure your dataframe contains\
              a column named 'location' with comma-separated values.")

---
### Challenge 2

1. Use the helper function to cleanup the data. 
2. Run the cell and confirm that the odd characters and parenthesis were removed and that we have a latitude and a longitude column. 

---

In [None]:
# Your code here
rawData = clean_dataframe()
# Look at the first five entries
rawData.head()

## Data Visualization

Now that we have cleaned up the dataframe and separated the string `location` values into separate numerical values containing the `latitude` and `longitude` coordinates, we will use the Python library called `folium` to visualize our data geographically.

---
### Challenge 3 

1. Look up the coordinates for Strathcona County: https://www.google.com/search?q=strathcona+county+latitude+and+longitude&oq=strathcona+county
2. In the cell below, enter the North coordinate (latitude) and the West coordinate (longitude) into separate variables (we have created the variable names for you). Make sure you enter numbers only, no letters!
3. These will be the initial coordinates that will help us locate our map. 
4. Run the cell to display the map. Ensure you are in the right location (hint: Edmonton should appear in the map)
---

In [None]:
# Your code here 
latitude = 
longitude = 

# Initial coordinates 
SC_COORDINATES = [latitude, longitude]

# Create a map using our initial coordinates
map_osm=folium.Map(location=SC_COORDINATES, zoom_start=10, tiles='Stamen Terrain')

# Display the map 
display(map_osm)

## Displaying the tree locations

We can now add the tree locations into our map. 

In the cell below we will [iterate](https://www.merriam-webster.com/dictionary/iteration) over each record in our dataframe `rawData`. 

We will then add markers (one marker for each pair of coordinates) using the `folium.Marker` function. 

We will pass the `latitude` and `longitude` coordinates using the `location` parameter, and mark each tree with its `name` using the `popup` parameter. 

We will add this to our `marker_cluster` on our map called `map_osm`. 

Run the cell below to see the locations of the trees.

In [None]:
# Create marker cluster and add to our map
marker_cluster = MarkerCluster().add_to(map_osm)

# Iterate over each record, and add tree x and y coordinates, as well as tree name
MAX_RECORDS = len(rawData)
# For each record in rawData
for each in rawData[0:MAX_RECORDS].iterrows():
    # Use folium.Marker function, use X and Y coordinates to specify location
    folium.Marker(location = [each[1]['latitude'],each[1]['longitude']], 
                  # Add tree name
                  popup=folium.Popup(each[1]['name'],sticky=True),
                  #Make color/style changes here
                  icon=folium.Icon(color='green', icon='tree', prefix='fa'),
                  # Make sure our trees cluster nicely!
                  clustered_marker = True).add_to(marker_cluster)

# Show the map
display(map_osm)

# Optional - you can save this map as an HTML file
#map_osm.save('TreeMap.html')

---
### Challenge 4

Use the interactive map above for this exercise. You will see 'clusters' of trees. 

Clusters with over 100 trees will be coloured in red, clusters with fewer than 100 trees will be coloured in yellow, while clusters with fewer than 10 trees will be coloured in green. A single tree has a green colour and a tree shape in it. 

1. Click on the largest cluster (hint: it has over 900 trees). It will break into smaller clusters. 
2. How many red clusters do you see? How many yellow clusters? and green ones? (Hint: there are more than two red clusters)
3. Pick a red cluster and click on it. Are the clusters evenly distributed? If no, where are the clusters concentrated? 
4. Identify three areas that would benefit from populating with trees. What are the names of the streets/neighbourhoods where they are located?

#### Your answers and observations here:

---

## Further Visualization and Statistics

A natural question to ask is what is the most common kind of tree. To find out, we'll group and plot the data.

We start by setting up our visualizing environment. 

In [None]:
#load "cufflinks" library under short name "cf"
import cufflinks as cf

#command to display graphics correctly in Jupyter notebook
cf.go_offline()

def enable_plotly_in_cell():
    import IPython
    from plotly.offline import init_notebook_mode
    display(IPython.core.display.HTML('''<script src="/static/components/requirejs/require.js"></script>'''))
    init_notebook_mode(connected=False)
    
get_ipython().events.register('pre_run_cell', enable_plotly_in_cell)

First we'll group data by `name` using the `groupby()` method. 

Then we'll use the `size()` method to count how many trees of each kind there are. 

Next we'll sort the data. 

Run the cell below to perform these steps and show the five most common trees in Strathcona County. 

In [None]:
# This cell groups trees by name, and counts them
count_by_tree_name = rawData.groupby("name").size().reset_index(name="count")
# once it does that, it sorts the counts in descending order
ordered_count = count_by_tree_name.sort_values(by='count', ascending=False)
# And displays the first 5 results. 
ordered_count.head()

You can see the most common tree in Strathcona County. Let's visualize these data in a pie chart.

In [None]:
ordered_count.iloc[0:5].iplot(kind="pie",values="count",labels="name",title="Five Most Common Trees") 

In [None]:
ordered_count.iplot(kind="pie",values="count",labels="name",title="All Trees") 

---
### Challenge 5

1. Hover over the plots.
2. What is the percentage associated to each of the five most common trees? 
3. What is the most common, or 'dominant' type of tree? 
4. What tree species would you recommend restoring and why? 

#### Your answers and observations here

---

# Conclusions

Edit this cell to describe **where you would recommend planting trees** and the **types of trees recommended**. Include any data filtering and sorting steps that you recommend, and why you would recommend them.



## Reflections

Write about some or all of the following questions, either individually in separate markdown cells or as a group.
- What is something you learned through this process?
- How well did your group work together? Why do you think that is?
- What were some of the hardest parts?
- What are you proud of? What would you like to show others?
- Are you curious about anything else related to this? Did anything surprise you?
- How can you apply your learning to future activities?

![alt text](https://github.com/callysto/callysto-sample-notebooks/blob/master/notebooks/images/Callysto_Notebook-Banners_Bottom_06.06.18.jpg?raw=true)