### **Metric 3: Trees per Square Mile**
The **number of trees** located in a neighborhood is a good metric for determining how *good* a neighborhood is. Neighborhoods with lots of trees will likely have more green space and more nature, making them much more **liveable**. Simply the *number* of trees, however, is not quite good enough. Of course, a larger neighborhood will have the opportunity to have more trees, so raw numbers is not an optimal metric. Instead, we can find the **trees per square mile** in each neighborhood. This is a much better metric because it is *proportional* to the size of the nieghborhood. To do this, we can count up the number of trees in each neighborhood using the **trees.csv** data file, and then divide by the square mileage of each neighborhood, found in the **demographics.csv** data file.

In [1]:
# Obviously, we need to import Pandas first
import pandas as pd

# Then, we load both data files that we need: trees and demographics
trees = pd.read_csv('trees.csv')
demographics = pd.read_csv('../crime/demographics.csv', index_col='Neighborhood_2010_HOOD')

First, we'll deal with the trees data set, which obviously contains the info we need about trees in each nieghborhood

In [2]:
# We create a simplified version of the dataframe with only two columns: 'nieghborhood' and 'id' (used to count the number of trees)
treesCount = trees[['neighborhood', 'id']]
# We group the data frame by neighborhood, counting the number of trees in each neighborhood
treesCount = treesCount.groupby('neighborhood').count()

Great, we have info about the number of trees in each neighborhood, but as we said, that's not the best metric. We want square milage. Next, we'll deal with the demographics data set, which we need to use to extract the square mileage of each neighborhood

In [3]:
# Again, we simplify the data set because we only need the square mileage column
sqmiles = demographics[['Neighborhood_2010_SQMILES']]
# We sort the neighborhood names in alphabet order. This will make it easier to calculate trees per square mile in the next step
sqmiles = sqmiles.sort_values(by='Neighborhood_2010_HOOD')

Alright, we have both of our datasets simplified and nicely organized. Now we just need to put it all together and find the trees per square in each neighborhood!

In [4]:
# Create our new, final dataframe to store trees per square mile in each neighborhood
treesSqMile = treesCount.copy()
# Rename the 'id' column to 'trees_sqmile' for clarity
treesSqMile.rename(columns={'id': "trees_sqmile"}, inplace=True)
# Divide the number of trees in each neighborhood by the square milage of each neighborhood
treesSqMile['trees_sqmile'] = treesCount['id'] / sqmiles['Neighborhood_2010_SQMILES']
# Sort the dataframe from neighborhood with greatest to neighborhood with least trees per square mile
treesSqMile = treesSqMile.sort_values(by='trees_sqmile', ascending=False)
# Save to csv file
treesSqMile.to_csv("trees_psqm.csv")
# Display the first 5 rows in the dataframe
treesSqMile.head()

Unnamed: 0_level_0,trees_sqmile
neighborhood,Unnamed: 1_level_1
Allegheny Center,3585.714286
Friendship,2886.792453
Allegheny West,2425.531915
Central Northside,2332.046332
Manchester,2275.985663


And we're done! We've found the number of trees per square miles in every neighborhood in pittsburgh. Displayed are the top 5 neighborhoods to live in in Pittsburgh if you really value your greenery.