# BEST NEIGHBORHOOD

## Introduction

<font size="4">To find the best neighborhood, we used datasets provided by WPRDC. The metrics being used are </font>
* <font size = "4">Cleanliness
* <font size = "4">Arrest Counts
* <font size = "4">Average Air Pollution
  <br>
<font size ="4">
These three metrics impact the quality of life and community well-being of the neighborhood. Cleanliness serves as an indicator of how well-maintained and cared for a neighborhood is. It shows the community pride and services, which can influence property values. Arrest counts provide a look into public safety and crime levels in the neighborhoods. Neighborhoods with fewer arrests typically offer residents a greater sense of safety and a better environment for families and individuals. Average air pollution measures environmental health, which could affect an individual's health. Together, these metrics allow us to see a neighborhood's livability. 
</font>

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import fpsnippets

## Arrest Counts

## Average Air Pollution

<font size="4">To determine the best neighborhood in Pittsburgh, we analyzed average air pollution levels across different neighborhoods using data from the Allegheny County Air Quality Emissions dataset. This is a metric since it can impact a resident's health.</font>

### Steps

<font size="4">We calculated the average air pollution for each neighborhood by:</font>
1. <font size ="4">Grouping emissions by their geographic coordinates
2. <font size ="4">Sorting those geographic coordinates into their respective neighborhood
3. <font size ="4">Finding the average tons of pollutants per year for each neighborhood
4. <font size ="4">Identifying the lowest neighborhoods

### Results

In [5]:
df = pd.read_csv('AQEData.csv')

location_pollution = df.groupby(['lat', 'lon'])['tons_per_yr'].sum().reset_index()

location_pollution['neighborhood'] = location_pollution.apply(
    lambda row: fpsnippets.geo_to_neighborhood(row['lat'], row['lon']),
    axis=1
)

neighborhood_avg_pollution = location_pollution.groupby('neighborhood')['tons_per_yr'].mean().reset_index()

df['neighborhood'] = df.apply(
    lambda row: fpsnippets.geo_to_neighborhood(row['lat'], row['lon']),
    axis=1
)
report_counts = df.groupby('neighborhood').size().reset_index(name='num_reports')

neighborhood_stats = neighborhood_avg_pollution.merge(report_counts, on='neighborhood')
neighborhood_stats.columns = ['neighborhood', 'avg_tons_per_yr', 'num_reports']

neighborhood_stats_sorted = neighborhood_stats.sort_values(by='avg_tons_per_yr', ascending=True)

print("Neighborhoods with the lowest average air pollution:")
print(neighborhood_stats_sorted.head(15))

Neighborhoods with the lowest average air pollution:
                neighborhood  avg_tons_per_yr  num_reports
5                    Carrick         3.160000           31
14             Homewood West         3.160000           31
11                    Esplen         5.840000           26
24       Squirrel Hill South         7.630000           31
20        Point Breeze North       118.430000          130
27       Upper Lawrenceville       188.677800          171
12                 Fairywood       304.280000           16
15                   Larimer       534.410000          160
23          South Side Flats      1294.155000           66
17       Lower Lawrenceville      1451.656545          162
4       California-Kirkbride      1981.590000           22
8                    Chateau      2079.330000           14
16  Lincoln-Lemington-Belmar      7444.010000           99
3                  Brookline      9437.223400          233
2                      Bluff     12648.700400          397


<font size = "4">What we can see is that our lowest average is Carrick at 3.16 tons per year.</font>

## THE BEST NEIGHBORHOOD

<font size="4">The best neighborhood between all three metrics has been different, but within our metrics, there is one neighborhood that stands out as best between the three. That neighborhood is **Esplen.** Esplen is placed eleventh for cleanliness, tenth for lowest arrest counts, and third for lowest average air pollution. This is a better result than any other neighborhood from our three metrics.</font>