![alt text](https://github.com/callysto/callysto-sample-notebooks/blob/master/notebooks/images/Callysto_Notebook-Banner_Top_06.06.18.jpg?raw=true)

# Data detective: are there still zombies in Strathcona County?


#### What you need to do

The apocalypse happened because of a chemical explosion; people turned into tree-eating zombies. However, the County’s zombie hunter, James Bond, found Polar spp trees turn them to ash. 

There are reports the zombie apocalypse is over in Strathcona County and parks are safe. People don’t know if it’s true. That’s where you come in. 

It was recently discovered that zombies do not like Poplar trees and that Parks where Poplar spp trees are found tend to be safe areas. 

There are two things the County needs you to do:

1. Use their data (you’ll download it) to find zombie “hot spots” in local parks. Red,  yellow, and orange areas = zombies. 

2. Use their data (you’ll also download it) to find the biggest group of “Poplar spp” trees. 

##### Data 

The tree data set is provided by the Strathcona County, Environment https://data.strathcona.ca/Environment/Tree/v78i-7ntw

The Park data set is provided by the Strathcona County, Recreation-Culture  https://data.strathcona.ca/Recreation-Culture/Parks/533n-6rzt


## Downloading and parsing data into 'dataframe'

We begin by downloading the data directly from the website. 

https://data.strathcona.ca/Environment/Tree/v78i-7ntw

We can do this by selecting the 'API' tag and choosing CSV format on the top right side. Pressing the 'Copy' button will give us the URL we need to download the full dataset.


In [1]:
!pip install git+https://github.com/python-visualization/folium -q

/bin/sh: pip: command not found


In [2]:
# Import modules
# We will store the data into a 'dataframe' using pandas
import pandas as pd
# We want to be as precise as possible in keeping tree coordinates
from decimal import *
# We will visualize the coordinates in a map using the folium 
import folium
# We want to cluster them using the MarkerCluster submodule from folium plugins
from folium.plugins import MarkerCluster

print("Importing Python libraries was successful!")

Importing Python libraries was successful!


### Tree Data

In [3]:
# Download tree data from API 
# Main Source: https://data.strathcona.ca/Environment/Tree/v78i-7ntw
# Pick API Tag - I chose the CSV option
link = "https://data.strathcona.ca/resource/v78i-7ntw.csv"

# Read and parse data as a pandas CSV
treeData = pd.read_csv(link)

# Rename columns
treeData = treeData.rename(columns={"treesiteid": "ID", "name": "name","location":"location"})

# Look at the first five columns
treeData.head()

Unnamed: 0,ID,name,location
0,22439,Poplar spp,"\n, \n(53.5227206433883, -113.324197520184)"
1,19106,Spruce spp,"\n, \n(53.5580354097226, -113.311519305079)"
2,30088,Colorado Spruce,"\n, \n(53.5279674658552, -113.311713218285)"
3,33800,Schubert Chokecherry,"\n, \n(53.515577567706, -113.320470094948)"
4,13305,Green Ash,"\n, \n(53.5194628194035, -113.324964991257)"


### Park Data

In [4]:
# Download park data from API 
# Main Source: "https://data.strathcona.ca/Recreation-Culture/Parks/533n-6rzt"
# Pick API Tag - I chose the CSV option

link_park = "https://data.strathcona.ca/resource/533n-6rzt.csv"
# Store into CSV 
parkData = pd.read_csv(link_park)

# Remove unused columns 
parkData = parkData.drop(['athletic_park','athletic_park','bmx_skate','campground','cross_country_skiing',\
              'football', 'golf', 'outdoor_rink','baseball',
       'playground', 'rugby', 'soccer', 'tennis', 'tot_playground',
       'volleyball', 'water_recreation', 'x_coord',
       'y_coord','longitude','latitude'],axis=1)
# Diplay first few rows 
parkData.head()

Unnamed: 0,parkid,park_name,purpose,day_use,nature_appreciation,off_leash,wilderness_trail,location
0,1,Antler Lake Uncas Community Hall,Community Hall or Centre,NO,NO,NO,NO,"\n, \n(53.5036720717526, -112.97119804582)"
1,20,Brookville Community Hall,Community Hall or Centre,NO,NO,NO,NO,"\n, \n(53.5707675993221, -112.999606215391)"
2,144,A.J. Ottewell Community Centre,Recreation,NO,NO,NO,NO,"\n, \n(53.5317110349762, -113.321813252836)"
3,105,Smeltzer House,Recreation,YES,YES,NO,NO,"\n, \n(53.522350868774, -113.319096832185)"
4,124,Westlake Beach Park,Recreation,NO,NO,NO,NO,"\n, \n(53.4067763916463, -112.919368859337)"


---
### Challenge 1

Explore the dataset above. 

What kinds of parks would have trees in them? 

Using your knowledge of pandas, select an appropriate data subset.

We have selected hint conditions that can help you narrow down the data. 

`condition_1 = parkData["wilderness_trail"]=="YES"`

`condition_2 = parkData["nature_appreciation"]=="Yes"`

`condition_3 = parkData["off_leash"]=="Yes"`


Each condition is separated by `|`. Add at least one more category into the data frame. Follow the same format. 

A sample code sniped is added in the cell below. 

---

In [5]:
# Look at the first five columns
condition_1 = parkData["wilderness_trail"]=="YES"
condition_2 = parkData["nature_appreciation"]=="YES"
condition_3 = parkData["off_leash"]=="YES"
condition_4 = parkData["purpose"] == 'Recreation'
condition_5 = parkData["day_use"] =='YES'
parkData = parkData[(condition_1)| (condition_2) ]
parkData.head()

Unnamed: 0,parkid,park_name,purpose,day_use,nature_appreciation,off_leash,wilderness_trail,location
3,105,Smeltzer House,Recreation,YES,YES,NO,NO,"\n, \n(53.522350868774, -113.319096832185)"
5,28,Clarkdale Meadows Pond,Storm Water Management Facilit,NO,YES,NO,NO,"\n, \n(53.5503288685846, -113.254860852091)"
7,147,Fultonvale Elementary Junior High School,School,NO,YES,NO,YES,"\n, \n(53.4575490207503, -113.19078051146)"
13,112,Summerton Park,Recreation,NO,YES,NO,NO,"\n, \n(53.557477112004, -113.266941473586)"
16,163,Upper Nottingham Lake Park East,Recreation,NO,YES,NO,NO,"\n, \n(53.515831109296, -113.27355464205)"


## Data Cleanup

The two tables contain data identifying parks and trees along with their `location`. 

A clean coordinate pair would look like

`(53.5227206433883, -113.324197520184)`

Our data set is not clean, we observe the presence of special characters `\n, \n` in each entry. We need to clean it up before we can visualize it. 

The special character `\n` is known as a 'line break'. 

This tells us that the coordinates are given as a string. 

In the cell below we will clean the data following three steps:

1. Remove special characters `\n, \n`
2. Remove left parenthesis `(` and right parenthesis `)`
3. Separate the pair into an `latitude` and `longitude` coordinates - and create separate columns (one for each)
4. Remove `'location'` column

In [6]:
# Helper function to clean up data

def clean_dataframe(dataframe):
    try:
        """
        Function description: this function takes as input a dataframe with special characters
        and returns a 'clean' dataframe - a dataframe with no special characters and no parenthesis,
        as well as a latitude and longitude coordinate
        """
        
        # Remove special characters 
        dataframe = dataframe.replace('\n,  \n','',regex=True)
        # Data cleanup - Remove parenthesis 
        dataframe['location'] = dataframe['location'].str.replace('(','').str.replace(')','')

        dataframe[['latitude','longitude']] = dataframe['location'].str.split(",", n=1, expand=True)

        dataframe = dataframe.drop(['location'],axis=1)

        return dataframe
    
    except:
        
        print("WARNING! Make sure you are passing a pandas dataframe, and make sure your dataframe contains\
              a column named 'location' with comma-separated values.")

---
### Challenge 2a

1. Use the helper function to cleanup the tree data `treeData`.
2. Run the cell and confirm that the odd characters and parenthesis were removed and that we have a latitude and a longitude column. 

---

In [7]:
# Your code here
treeData = clean_dataframe(treeData)
# Look at the first five entries
treeData.head()

Unnamed: 0,ID,name,latitude,longitude
0,22439,Poplar spp,53.5227206433883,-113.324197520184
1,19106,Spruce spp,53.5580354097226,-113.311519305079
2,30088,Colorado Spruce,53.5279674658552,-113.311713218285
3,33800,Schubert Chokecherry,53.515577567706,-113.320470094948
4,13305,Green Ash,53.5194628194035,-113.324964991257


---
### Challenge 2b

1. Use the helper function to cleanup the park data `parkData`. (Hint: use the cell above to help you)
2. Run the cell and confirm that the odd characters and parenthesis were removed and that we have a latitude and a longitude column. 

---

In [8]:
# Your code here
parkData = clean_dataframe(parkData)
# Look at the first five entries
parkData.head()

Unnamed: 0,parkid,park_name,purpose,day_use,nature_appreciation,off_leash,wilderness_trail,latitude,longitude
3,105,Smeltzer House,Recreation,YES,YES,NO,NO,53.522350868774,-113.319096832185
5,28,Clarkdale Meadows Pond,Storm Water Management Facilit,NO,YES,NO,NO,53.5503288685846,-113.254860852091
7,147,Fultonvale Elementary Junior High School,School,NO,YES,NO,YES,53.4575490207503,-113.19078051146
13,112,Summerton Park,Recreation,NO,YES,NO,NO,53.557477112004,-113.266941473586
16,163,Upper Nottingham Lake Park East,Recreation,NO,YES,NO,NO,53.515831109296,-113.27355464205


## Data Visualization

Now that we have cleaned up the dataframe and separated the string `location` values into separate numerical values containing the `X` and `Y` coordinates, we will use the `folium` package to visualize our data geographically. 


---
### Challenge 3 

1. Look up the coordinates for Strathcona County
2. In the cell below, enter the North coordinate (latitude) and the West coordinate (longitude) into separate variables (we have created the variable names for you). Make sure you enter numbers only!
3. These will be the initial coordinates that will help us locate our map. 
4. Run the cell to display the map. Ensure you are in the right location (hint: Edmonton should appear in the map)
---

In [9]:
# Your code here 
latitude = 53.5701
longitude = -113.0741

# Initial coordinates 
SC_COORDINATES = [latitude, longitude]

# Create a map using our initial coordinates
map_osm=folium.Map(location=SC_COORDINATES, zoom_start=10, tiles='Stamen Terrain')

# Display the map 
display(map_osm)

## Displaying the tree and park locations

We can now add the tree locations into our map. 

In the cell below we will iterate over each record in our dataframe `treeData`. We repeat this for out `parkData` dataframe. 

Parks are markers in blue, while trees are markers in green with a tree icon in them. 

![PT](ParkTree.png)

We will then add markers (one marker for each pair of coordinates) using the `folium.Marker` function. 

We will pass the `latitude` and `longitude` coordinates using the `location` parameter, and mark each tree with its `name` using the `popup` parameter. 

We will ad this to our `marker_cluster`, which has been added to our map `map_osm`. 

Run the cell below to see the locations of the trees and parks. 

In [12]:
from folium.plugins import HeatMap

treeData['count'] = treeData.groupby('name')['name'].transform('count')

# Create marker cluster and add to our map
marker_cluster = MarkerCluster().add_to(map_osm)

# Iterate over each record, and add tree x and y coordinates, as well as tree name
MAX_RECORDS = len(treeData)
# For each record in rawData
for each in treeData[0:MAX_RECORDS].iterrows():
    # Use folium.Marker function, use X and Y coordinates to specify location
    folium.Marker(location = [each[1]['latitude'],each[1]['longitude']], 
                  # Add tree name
                  popup=folium.Popup(each[1]['name'],sticky=True),
                  #Make color/style changes here
                  icon=folium.Icon(color='green', icon='tree', prefix='fa'),
                  # Make sure our trees cluster nicely!
                  clustered_marker = True).add_to(marker_cluster)

# Add park data points 
marker_cluster = MarkerCluster().add_to(map_osm)

MAX_RECORDS = len(parkData)   
for each in parkData[0:MAX_RECORDS].iterrows():
    # Use folium.Marker function, use X and Y coordinates to specify location
    folium.Marker(location = [each[1]['latitude'],each[1]['longitude']], 
                  # Add tree name
                  popup=folium.Popup(each[1]['park_name'],sticky=True),
                  #Make color/style changes here
                  icon=folium.Icon(color='blue', icon='acorn', prefix='fa'),
                  # Make sure our trees cluster nicely!
                  clustered_marker = True).add_to(marker_cluster)

# Add heatmap
max_amount = float(treeData['count'].max())    
hm_wide = HeatMap( list(zip(treeData.latitude.values, treeData.longitude.values, treeData['count'])),
                    min_opacity=0.5,
                  max_val=max_amount,
                    radius=15, blur=20, 
                   max_zoom=1, 
                 )


map_osm.add_child(hm_wide)
# Show the map
display(map_osm)

# Optional - you can save this map as an HTML file
map_osm.save('TreeMapParks.html')

---
### Challenge 4: Find the special tree configurations

The map above contains 'warm' and 'cold' regions. Warm regions have a higher concentration of trees - and zombies! 

There is one special kind of tree configuration, where the trees are clustered into a star-like shape. These trees are usually of the same species. Below is an example:

![Tree](5StarPoplar.png)

All of these trees are of the same species: the Poplar spp. Can you find it using the heat regions and park locations? 

Use the interactive map above for this exercise. 

1. Click on the tree clusters using the blue park markers and the heat regions. 
2. If you cannot find a cluster when you zoom in, zoom back out and try a different heat region. 
3. Once you find the special configuration, click on the tree icons to reveal the species.

#### Your observations here

We found a tree cluster with ....... trees in ........ Park. 

The trees in the cluster were ...... species. 

---

## Final remarks

In this notebook we explored two datasets that helped us find the special tree configuration that will defeat the zombies. The Strathcona County can be saved if we plant more configurations like this in other parks. 

We found ....... species clusters in ....... Park, making this the safest park. 

We can make other Parks safer by bringing in more tree samples and special configurations. 

![alt text](https://github.com/callysto/callysto-sample-notebooks/blob/master/notebooks/images/Callysto_Notebook-Banners_Bottom_06.06.18.jpg?raw=true)