# Environmental Impacts by Neighborhood

## By Ashu Sangar

Data set: [Illegal Dump Sites](https://data.wprdc.org/dataset/allegheny-county-illegal-dump-sites/resource/ee834d8d-ae71-4b3b-b02b-312ba321ff17?view_id=1988368f-ec43-430c-9478-bd31164c1326)

In [33]:
import geopandas as gpd
import pandas as pd

# load dataset

illegal_dsites = gpd.read_file('illegaldumpsites.csv')

illegal_dsites.head(3)


Unnamed: 0,site_name,Status,City,Neighborhood,estimated_tons,location_description,latitude,longitude,field_9,geometry
0,St. Martin Street,Surveyed,Pittsburgh,Allentown,0.5,,40.42221971,-79.99022525,,
1,Brosville Street,Surveyed,Pittsburgh,Allentown,3.0,,40.42370101,-79.98657393,,
2,McCain Street,Surveyed,Pittsburgh,Allentown,1.0,,40.42427063,-79.99022675,,


We see that the neighborhood name is contained in the column labeled "Neighborhood" and the amount of garbage in tons is contained in the "estimated_tons" column. This gives us the metric that is going to be used, whichever neighborgood has the least amount of illegal garbage will be deemed the best. 

In [36]:
# In order to do this, since the Neighborhoods are in alplebetical order, we can use the groupby fuction in pandas in order to tally the total amount of garbage in each neighborhood. 

# create dataframe that simplifies given info

nb_df = pd.DataFrame(columns = ["Street", "City", "Neighborhood", "Total Garbage Tons"])

nb_df["Street"] = illegal_dsites["site_name"]
nb_df["City"] = illegal_dsites["City"]
nb_df["Neighborhood"] = illegal_dsites["Neighborhood"]
nb_df["Total Garbage Tons"] = illegal_dsites["estimated_tons"]

#display entire dataframe
print(nb_df.head())


              Street        City Neighborhood Total Garbage Tons
0  St. Martin Street  Pittsburgh    Allentown                0.5
1   Brosville Street  Pittsburgh    Allentown                  3
2      McCain Street  Pittsburgh    Allentown                  1
3          Ceres Way  Pittsburgh    Allentown                0.5
4      Eureka Street  Pittsburgh    Allentown                0.1


In [41]:
#groupby to show total garbage in each hood'
#groupby to show total tons of each neighborhood + displaying

neighborhood_totals = nb_df.groupby('Neighborhood')['Total Garbage Tons'].sum()

print(neighborhood_totals)


Neighborhood
Allentown                         0.5310.50.150.3
Arlington                     342.520.50.51.521.5
Avalon                                     0.50.5
Banksville                                      1
Barking Slopes                                 12
                                   ...           
Wilkins                                 0.50.31.5
Wilkins/Monroeville                           1.5
Wilkinsburg               20.51.320.50.52.50.30.3
Wilkinsburg/Penn Hills                          6
Windgap                                   20.51.5
Name: Total Garbage Tons, Length: 152, dtype: object
