## Introduction

For our presentation we decided to focus on the environmental side of the neighborhoods. 

In [None]:
# Just import commands.
import pandas            as pd
import numpy             as np
import matplotlib.pyplot as plt
import geopandas
%matplotlib inline

### ----- Yuqing -----

### Illegal Dump Sites

In order to find the most enviromentally firendly neighborhood. I choose the data of the illegal dump sites in the Allegheny County.

In [None]:
# show the first ten rows of the data
dumps = pd.read_csv('Datasets\illegaldumpsites.csv')
dumps.head(10)

Since the data contains all the places in Allegeny County, I should pick out the data of Pittsburgh. What's more, I found some illegal sites were already been completed. I assume they had been removed. So I also need to delete the sites that status are completed.

In [None]:
# pick out the city of Pittsburgh and incompleted dump sites
dump_in_pitts = dumps.loc[dumps['City'] == 'Pittsburgh']
dump_in_pitts2 = dump_in_pitts.loc[dumps['Status'] != 'Completed']
dump_in_pitts2.head(10)

In [None]:
# removed unecessary columns and leave 'Status' as 'count' to calculate the number of dump sites in each neighborhood
clean_dump_in_pitts = dump_in_pitts2.drop(['site_name', 'City', 'location_description','latitude', 'longitude', 'estimated_tons','Unnamed: 8'], axis=1)
# group the count from smallest to largest
dump_in_neighborhoods = clean_dump_in_pitts.groupby('Neighborhood').size().sort_values()
dump_in_neighborhoods.head(10)

From the data above, I can find that from 'Beltzhoover'to 'Perry South' neighborhood, the number of dump sites are over 10. So these neighborhoods can be wiped from our consideration about the most enviromentally friendly neighborhood in Pittsburgh.

I graph a bar chart to show the numbers of dump sites in each neighborhood

In [None]:
dump_in_neighborhoods.plot.bar(figsize=[30,10])

I also graph a map of the dump sites in each neighborhood to enhance visualization.

In [None]:
new = clean_dump_in_pitts.groupby('Neighborhood').count()
neighborhoods = geopandas.read_file("Neighborhoods/Neighborhoods_.shp")
bin_map = neighborhoods.merge(new, how='left', left_on='hood', right_on='Neighborhood')
bin_map.plot(column='Status',
             cmap='GnBu',
             edgecolor="black",
             legend=True,
             legend_kwds={'label':"Number of illegal dump sites"},
             figsize=(15,10),
             missing_kwds={"color": "lightgrey"}
            )

### ----- Kenny -----

### Smart Trash in Our Neighborhoods 

One of our main focal points in determining the best neighborhood is analyzing the environmental aspects of each area and trying to find some particular attributes that we would desire in our ideal neighborhood. In this case, it is the idea of general cleanliness of the neighborhood. Nobody wants to live in an area where the streets are full of litter with sidewalks cluttered with trash. One proposed method is the idea of "smart waste management" and its implementation of smart trash cans. Pittsburgh has adopted this system with the deployment of trash cans that monitor the volume of trash in each bin. This allows for munipalities and trash management workers to optimize their time to empty bins that are more full compared to rotating on a weekly schedule. While this can not account for those who litter ignorantly, it does minimize the excess waste as trash bins are more likely to be empty for use.

### Analysis

To start, we will take a brief look at the information provided by the dataset in the TrashContainers.

In [None]:
containers  = pd.read_csv('Datasets\TrashContainers.csv')
containers.head(10)

Certain columns such as receptacle model id,the dates, and fire zones are not particularly relevant in our analysis as we are going to focus on concentrated areas of smart trash containers. We can drop these columns to refine the data a little bit.

In [None]:
containers.drop('receptacle_model_id', inplace=True, axis=1)
containers.drop('assignment_date', inplace=True, axis=1)
containers.drop('last_updated_date', inplace=True, axis=1)
containers.drop('fire_zone', inplace=True, axis=1)
containers.head(10)

This allows use to see what specifications we are working with in the smart trash dataset. The next step is to sum up the amount of smart trash cans in each neighborhood.

In [None]:
bins = containers.groupby('neighborhood').count() # forming a dataset that is used to produce a choropleth map of the bins

containers['neighborhood'].value_counts().head(20)

This is getting showing the numerical value of the sum of trash bins in each neighborhood. I want to visualize this into a choropleth graph to see the concentrations of smart trash bins with the corresponding neighborhood.

In [None]:
neighborhoods = geopandas.read_file("Neighborhoods/Neighborhoods_.shp")
bin_map = neighborhoods.merge(bins, how='left', left_on='hood', right_on='neighborhood')
bin_map.plot(column='ward',
             cmap='OrRd',
             edgecolor="black",
             legend=True,
             legend_kwds={'label':"Number of Smart Trash Bins"},
             figsize=(15,10),
             missing_kwds={"color": "lightgrey"}
            )


In the end, Shadyside is the best neighborhood based solely on the concentration of the smart trash bins located there. This does not come as a surprise as Shadyside is quite reknown for its attractive and cleanliness.

### ----- Anderis -----

Pittsburgh is a pretty large city. Despite all the high rises and parking complexes, there are still quite a lot of trees throughout each neighborhood of the city. For my part of the Project I am going to be looking at the number of (legally documented and cared for) trees that are within each neighborhood. Throughout this file will have information on the numerics of these trees, as well as their general wellbeing. All of these factors will go into my metric to decide which neighborhood is truly the best in Pittsburgh. (based on this arbitrary metric :D).

In [None]:
# Initializing datasets and changing some of the indexes to better fit my needs
trees    = pd.read_csv('Datasets\Trees.csv', low_memory=False)
fname    = "Datasets/Neighborhoods.geojson"
pitt_map = geopandas.read_file(fname)
pitt_map = pitt_map.rename(columns={'Neighborhood_2010_HOOD' : 'Neighborhood'})
df       = pd.DataFrame(data=trees['neighborhood'].value_counts(sort=False))
df       = df.rename(columns={'neighborhood' : 'count'}).reset_index()
df       = df.rename(columns={'index' : 'neighborhood'})

# Sorted both datasets so they would match up
pitt_map = pitt_map.sort_values(by='Neighborhood').reset_index()
df       = df.sort_values(by='neighborhood').reset_index()

# merges the two data sets together with a concatination.
frame    = [pitt_map,df]
merged   = pd.concat(frame, axis=1, ignore_index=False, sort=False)

# general variables to help setup the Choropleth map
variable   = 'count'
vmin, vmax = 0, 5073
fig, ax    = plt.subplots(1, figsize=(10, 6))

# turns off the axis lines
ax.axis('off')

# sets up the legend for the map
sm   = plt.cm.ScalarMappable(cmap='Greens', norm=plt.Normalize(vmin=vmin, vmax=vmax))
cbar = fig.colorbar(sm)

# writes the map
merged.plot(column=variable, cmap='Greens', linewidth=0.8, vmin=vmin , vmax=vmax , ax=ax, edgecolor='0.8')

This is the first of my Choropleth maps!

This one is quite simple, it just maps out all the different neighborhoods and colors them in based on total number of trees within the neighborhood. Obviously Squirrel Hilll South, the largest neighborhood, is top of the list on this one.

In [None]:
# Lists out the neighborhoods with the top 10 highest tree counts.
merged[['neighborhood','count']].sort_values(by='count', ascending=False).head(10)

Now do we get all the information we want from that map? of course not! Obviously the alrgest neighborhood has the most amount of trees. So lets try incorporating the areas of the neighborhoods into the calculations!

In [None]:
# Calculates the amount of trees per square kilometer within each neighborhood
TSK = (merged['count']).div(merged['SHAPE_Area'].mul(100000)).to_frame('Trees per Square Kilometer')

# Concatinates the datasets. Be prepared for lots of mergedx variables.. I'm not the best at naming :D
merged2 = pd.concat([merged, TSK], axis=1, ignore_index=False, sort=False)

# once again just variable stuff for the map
variable = 'Trees per Square Kilometer'
vmin,vmax = 10, 151
fig, ax = plt.subplots(1, figsize=(10, 6))
ax.axis('off')

# legend stuff
sm = plt.cm.ScalarMappable(cmap='Greens', norm=plt.Normalize(vmin=vmin, vmax=vmax))
cbar = fig.colorbar(sm)

# draws out the map
merged2.plot(column=variable, cmap='Greens', linewidth=0.8, vmin=vmin , vmax=vmax , ax=ax, edgecolor='0.8')

Now the map is looking a lot different! accounting for the total number of trees mixed with the area has made it so Allegheny Center, a decently small neighborhood, is shown to have the highest density of trees per square kilometer. In fact Squirrel Hill, our highest before, does not even fall into the top 10 now.

In [None]:
merged2[['neighborhood','Trees per Square Kilometer']].sort_values(by='Trees per Square Kilometer', ascending=False).head(10)

Our new top 10 has a lot of smaller neighborhoods starting to gain in spots. Of course not every tree is created equal. So how do all of these neighborhoods fair when the health of the trees are a concern?

The beginning of this next section of code uses the conditions of the trees in order to calulate a total number of healthy trees. I used a scoring system from Dead equaling -2 trees, all the way up to Excellent equalling 1.4 trees. This way neighborhoods that take more care of their trees gain more of an advantage.

In [None]:
# list to hold the condition values.
cond_list = []

# code to ammend a list with all the values based on condition of the trees.
for x in trees['condition']:
    if x != x:
        cond_list.append(-1)
    elif x == 'Dead':
        cond_list.append(-2)
    elif x == 'Critical':
        cond_list.append(0.2)
    elif x == 'Poor':
        cond_list.append(0.4)
    elif x == 'Fair':
        cond_list.append(0.6)
    elif x == 'Good':
        cond_list.append(1.0)
    elif x == 'Very Good':
        cond_list.append(1.2)
    elif x == 'Excellent':
        cond_list.append(1.4)
    else:
        cond_list.append(x)

# adding the values into a DataFrame
cond_val = pd.DataFrame().append(cond_list)
cond_val.columns = ['Tree Health']

# merging the dataframe into the Tree.csv file DataFrame
tree_merge = pd.concat([trees,cond_val], axis=1, ignore_index=False, sort=False)

# groups all of the different values based on the neighborhood they reside and sums the scores together.
tree_health = tree_merge.groupby(by='neighborhood').sum().sort_values('neighborhood', ascending = True)['Tree Health']

# flips the columns and rows to better match later DataFrames.
tree_health = pd.DataFrame().append(tree_health).transpose()
tree_health = tree_health.reset_index()

# Hey look! another mergex variable. Sadly, theres still more later.
merged3 = pd.concat([merged2,tree_health], axis=1, ignore_index=False, sort=False)

# more empty lists to store things!
count_list = []
health_list = []
calc_list = []

# fills the first two lists with the original tree counts and the sum of Tree Health.
for x in merged3['count']:
    count_list.append(x)
for x in merged3['Tree Health']:
    health_list.append(x)

# fills a list with the calulations of what percent of trees are healthy within each neighborhood.
for x in range(90):
    calc_list.append((((health_list[x]) / (count_list[x]))) * 100)

# puts it into a DataFrame
perc_health = pd.DataFrame().append(calc_list)
perc_health.columns = ['Tree Health Percentage']

# Another mergex variable!
merged4 = pd.concat([merged3,perc_health], axis=1, ignore_index=False, sort=False)

# variables for the Choropleth map :D
variable = 'Tree Health Percentage'
vmin,vmax = 0, 100
fig, ax = plt.subplots(1, figsize=(10, 6))
ax.axis('off')

# Legends never die
sm = plt.cm.ScalarMappable(cmap='Greens', norm=plt.Normalize(vmin=vmin, vmax=vmax))
cbar = fig.colorbar(sm)

# makes the map
merged4.plot(column=variable, cmap='Greens', linewidth=0.8, vmin=vmin , vmax=vmax , ax=ax, edgecolor='0.8')

So now we have a map based on the percentages of healthy trees within each of the neighborhoods. Some of the neighborhoods did really well with Chartiers City getting an astounding 86.4% on the scale of healthy trees. And while there may be highs like this, Oh there are far, far worse lows. Poor Hays somehow managed to pull of a -200% on the scale of healthy trees. Actually in total 9 of the neighborhoods all scored a negative number for this section. A negative in this case means that they just have more dead trees than alive ones.

In [None]:
merged4[['neighborhood','Tree Health Percentage']].sort_values(by='Tree Health Percentage', ascending = False).head(10)

In [None]:
merged4[['neighborhood','Tree Health Percentage']].sort_values(by='Tree Health Percentage', ascending = True).head(10)

Yeah the low scores for here are abysmal. As another note there was a section of the datasets that had the value "NaN" which through looking at the data appeared to coincide with where there were simply stumps left of trees. So stumps are actually classified differently than simply being "Dead" within these sets. So the low scores ( expecially Hays ) had a lot of either Dead trees, or Stumps where trees used to be. (well as of like March 7, 2021 when this was last updated)

But anyway, we now have the percentage of healthy trees within each neighborhood! so does that mean Chartiers City gets to take home the Golden Tree Crown? Not quite yet. We need to once again account for the area of each of these neighborhoods. Then we will finally be able to crown the Truest Healthy Tree filled neighborhood in Pittsburgh!

In [None]:
# Calculates The number of healthy trees per Square Kilometer
HTSK = (merged4['Tree Health']).div(merged4['SHAPE_Area'].mul(100000)).to_frame('Health of Trees per Square Kilometer')

# hey the final mergex variable. They grow up so fast.
merged5 = pd.concat([merged4, HTSK], axis=1, ignore_index=False, sort=False)

# more map variables. 
variable = 'Health of Trees per Square Kilometer'
vmin,vmax = 0, 113
fig, ax = plt.subplots(1, figsize=(10, 6))
ax.axis('off')

# There have been many legends carried down through these long few days. Legends that one day, the Healthy 
# trees may soon rise up and take their rightful place among the streets of Pittsburgh.
sm = plt.cm.ScalarMappable(cmap='Greens', norm=plt.Normalize(vmin=vmin, vmax=vmax))
cbar = fig.colorbar(sm)

# The Final Map Creation :D
merged5.plot(column=variable, cmap='Greens', linewidth=0.8, vmin=vmin , vmax=vmax , ax=ax, edgecolor='0.8')

And now we finally have our winner. It goes to Alleghey Center! This neighborhood has proven to have not only the highest density of trees, but also the highest density of healthy tree too! We can finally crown the best Neighborhood in Pittsburgh based on these arbitrary measurements I have compounded together.

In [None]:
merged5[['neighborhood','Health of Trees per Square Kilometer']].sort_values(by='Health of Trees per Square Kilometer',ascending = False).head(10)

As a bit of bonus information. I already shows one of the lists earlier, but heres the rest of the Worst neighborhoods for each category I tested above! So lets get right into it by starting with tree count!

In [None]:
merged[['neighborhood','count']].sort_values(by='count', ascending=True).head(10)

Next we have the worst in tree density!

In [None]:
merged2[['neighborhood','Trees per Square Kilometer']].sort_values(by='Trees per Square Kilometer', ascending=True).head(10)

And finally, the worst of the worst when based on The density of healthy trees they have!

In [None]:
merged5[['neighborhood','Health of Trees per Square Kilometer']].sort_values(by='Health of Trees per Square Kilometer',ascending = True).head(10)

So the Absolute worst neighborhood for trees is Hays, coming up in last place in 3 out of the 4 different measurements.

Thats all for this Data, Hopefully more trees will continue to be placed around the city.

## Final Conclusions

So we've gone through the data, but which neighborhood truly reigns supreme?
To figure that out we did some basic addition by applying a point total from 1-10 based on the placement of each neighborhood in our respective datasets.

So the overall winnder of the best neighborhood in Pittsburgh (base off of a few arbitrary metrics) is

# *Drum Roll*

## East Liberty

This neighborhood didn't get any number 1 spots on the datasets, but it did make up for it by generally good. Scoring a good number two placement based on Trash Containers and a strong 6th place in number of trees. Waste sites didn't help it's score at all, but it still made managed to come out on top.

## Top 5 scores

### East Liberty     - 14
### Allegheny Center - 10
### Shadyside        - 10
### Squirrel Hill S  - 10
### Friendship       - 9