# INTRODUCTION:
     
The primary focus of this project is to utilize datasets from WPRDC to argue for the best neighborhood in Pittsburgh. The idea of best is very subjective to every individual, however; our team defined the best neighborhood in three categories. The neighborhood that excels in these categories is deemed the best neighborhood. There are a multitude of criteria we could have used to determine the best neighborhood. For instance, affordability, housing, happiness, community, air quality, parks, sidewalks, schools, populations and many more can be used to determine. Every neighborhood provides a variety of different aspects that can hurt or boost their statues. The most important ones we have highlighted are:


# The Metrics:
The three mains important criteria we used to determine the best neighborhood are: housing affordability, safety, and longevity. Housing affordability is a simple one; the less expense a neighborhood is, the more likely people will move there and less stress financially for housing. In particular, we will be using the dataset to see which neighborhood has cheaper but quality houses. Another big criterion is how safe a neighborhood is. Neighborhoods with less crime rate are more attractive to people. The lower the crime rate the safer the neighborhood is because there is no fear of being a victim. For this, we will be utilizing dataset of arrest incident. An arrest indicates that a situation might become harmful; therefore, a neighborhood with higher arrest rate can indicate a more aggressive and harmful neighborhood. Finally, longevity of people living in a neighborhood can help us determine how happy a neighborhood is. Since people who live a longer life usually have happier and less stressful life, the higher the longevity of a neighborhood resident the more likely we can say that it is a happy and stressless neighborhood.

In [70]:
# Importing necessary tools
import pandas as pd
import matplotlib.pyplot as plt #for charts
import seaborn as sns #for statistics

# Setting up the notebook for grids 
%matplotlib inline
sns.set_theme(style="whitegrid")

# URLs 
affordability_url = "https://data.wprdc.org/datastore/dump/ed0d1550-c300-4114-865c-82dc7c23235b"
safety_url = "https://data.wprdc.org/datastore/dump/e03a89dd-134a-4ee8-a2bd-62c40aeebc6f"
longevity_url = "https://data.wprdc.org/dataset/ee735209-4de7-4ea4-b446-bf0f0f6d9cb3/resource/c2e1500a-a12a-4e91-be94-76c6a892b7e2/download/nhoodmedianage20112015.csv"

In [103]:

datasets = {}

# Load Affordability Data
try:
    affordability_data = pd.read_csv(affordability_url)
    datasets['Affordability'] = affordability_data
    print("Data Loaded Successfully.")
except Exception as e:
    print(f"Error loading data: {e}")

# Loading Safety Data
try:
    safety_data = pd.read_csv(safety_url)
    datasets['Safety'] = safety_data
    print("Data Loaded Successfully.")
except Exception as e:
    print(f"Error loading data: {e}")

# Loading Longevity Data
try:
    longevity_data = pd.read_csv(longevity_url)
    datasets['Longevity'] = longevity_data
    print("Data Loaded Successfully.")
except Exception as e:
    print(f"Error loading data: {e}")

# Organizing and displaying the top 5 neighborhoods 

# Affordability data:

if 'Affordability' in datasets:
    
    affordability_data = datasets['Affordability']

    affordability_data_sorted = affordability_data.sort_values(by='current_delq_tax', ascending=True) #Sort by current delinquent tax 

    # chart of the top 5 neighborhood with the least current delinquent tax
    print("\n Top 5 Neighborhoods based on Least Current Delinquency Tax:")
    display(affordability_data_sorted.head(5))
    

    # list that takes the counts of how many delinquent tax a neighborhood has
    delinquent_counts = affordability_data['neighborhood'].value_counts()

    # Display the top 5
    print("\n Top 5 Neighborhoods with Least Deliquency Tax")
    display(delinquent_counts.tail(5))



# Safety data:
if 'Safety' in datasets:
    safety_data = datasets['Safety']
    

    safety_data_sorted = safety_data.sort_values(by='INCIDENTNEIGHBORHOOD', ascending=True) #sorts the data by neightborhood alphabetically
    print("\n Some Arrest Data:")
    display(safety_data_sorted.head(5))

    # Counts the amount of arrest in a neighborhood
    arrest_counts = safety_data['INCIDENTNEIGHBORHOOD'].value_counts()

    # Display the top 5 neighborhoods with the Least arrest count
    print("\n Top 5 Neighborhoods by Least Arrest Count")
    display(arrest_counts.tail(5))




# Longevity data:
if 'Longevity' in datasets:
    longevity_data = datasets['Longevity']

    #sorts the data by the total median age at death of all neightborhood in descending order
    longevity_data_sorted = longevity_data.sort_values(by='TOTAL MD AGE AT DEATH', ascending=False)
    print("\n Top 5 Neighborhoods based on Longevity")
    display(longevity_data_sorted.head(5))


Data Loaded Successfully.
Data Loaded Successfully.
Data Loaded Successfully.

 Top 5 Neighborhoods based on Least Current Delinquency Tax:


Unnamed: 0,_id,pin,address,billing_city,current_delq_tax,current_delq_pi,prior_years,prior_delq_tax,prior_delq_pi,state_description,neighborhood,council_district,ward,public_works_division,pli_division,police_zone,fire_zone,longitude,latitude
25473,687883,8000T00071000000,106 BERRY ST UNIT 28,"PITTSBURGH, PA",0.0,0,19,1805.29,1993.21,Residential,Windgap,2.0,28.0,5.0,28.0,6.0,1-16,-80.068832,40.448316
10989,673399,0084D00261000000,6290 BROAD ST,"PITTSBURGH, PA",0.0,0,1,2808.39,795.67,Commercial,East Liberty,9.0,11.0,2.0,11.0,5.0,3-10,-79.919389,40.460432
10990,673400,0084E00014000000,321 S NEGLEY AVE,"IRVING, TX",0.0,0,1,2824.83,800.34,Residential,East Liberty,7.0,8.0,2.0,8.0,5.0,3-23,-79.932488,40.459963
10991,673401,0084E00034000000,344 AMBER ST,"PITTSBURGH, PA",0.0,0,1,1031.92,197.78,Residential,East Liberty,7.0,8.0,2.0,8.0,5.0,3-23,-79.932417,40.459217
22941,685351,0056F00174000000,205 GLEN CALADH ST,"PITTSBURGH, PA",0.0,0,18,7738.14,9204.82,Residential,Hazelwood,5.0,15.0,3.0,15.0,4.0,2-13,-79.941981,40.409743



 Top 5 Neighborhoods with Least Deliquency Tax


neighborhood
Glen Hazel           9
Arlington Heights    3
Squirrel Hill        2
Allegheny Center     2
Crawford Roberts     1
Name: count, dtype: int64


 Some Arrest Data:


Unnamed: 0,_id,PK,CCR,AGE,GENDER,RACE,ARRESTTIME,ARRESTLOCATION,OFFENSES,INCIDENTLOCATION,INCIDENTNEIGHBORHOOD,INCIDENTZONE,INCIDENTTRACT,COUNCIL_DISTRICT,PUBLIC_WORKS_DIVISION,X,Y
9838,9839,1986937,17111618,30.0,F,W,2017-06-23T11:57:00,"900 Block 2ND AV Pittsburgh, PA 15219",5503 Disorderly Conduct. / 5505 Public Drunken...,"East Ohio ST & Cedar AV Pittsburgh, PA 15212",Allegheny Center,1,2204.0,1.0,1.0,-80.001694,40.453312
18474,18475,1997976,18047358,38.0,F,B,2018-03-13T20:14:00,"200 Block East Ohio ST Pittsburgh, PA 15212",2701 Simple Assault.,"200 Block East Ohio ST Pittsburgh, PA 15212",Allegheny Center,1,2204.0,1.0,1.0,-80.00366,40.453022
54614,71508,2046619,22039380,34.0,M,B,2022-03-17T11:30:00,"Allegheny SQ E Pittsburgh, PA 15212",3502 Burglary.,"Allegheny SQ Pittsburgh, PA 15212",Allegheny Center,1,2204.0,1.0,1.0,-80.007024,40.451502
26412,26920,2008067,18229846,26.0,M,B,2018-11-25T03:00:00,"900 Block 2nd AV Pittsburgh, PA 15219",2701 Simple Assault. / 2706 Terroristic Threats.,"200 Block East Ohio ST Pittsburgh, PA 15212",Allegheny Center,1,2204.0,,,,
46736,58463,2035166,21008186,38.0,M,B,2021-02-24T18:16:00,"200 Block East Ohio ST Pittsburgh, PA 15212",2701 Simple Assault. / 2709(a)(1) Harassment b...,"200 Block East Ohio ST Pittsburgh, PA 15212",Allegheny Center,1,2204.0,,,,



 Top 5 Neighborhoods by Least Arrest Count


INCIDENTNEIGHBORHOOD
Regent Square              37
Central Northside          23
Mt. Oliver Boro            18
Troy Hill-Herrs Island      6
Mt. Oliver Neighborhood     2
Name: count, dtype: int64


 Top 5 Neighborhoods based on Longevity


Unnamed: 0,NEIGHBORHOOD,BLACKdeaths,Black MD AGE AT DEATH,WHITEdeaths,White MD AGE AT DEATH,TOTALdeaths*,TOTAL MD AGE AT DEATH
89,SQUIRREL HILL SOUTH,58.0,83.2,721.0,86.2,802.0,85.9
76,NORTH OAKLAND,45.0,70.9,227.0,86.8,279.0,85.6
71,BANKSVILLE,4.0,,247.0,85.5,253.0,85.5
53,POINT BREEZE,13.0,66.7,158.0,85.7,175.0,85.1
68,SQUIRREL HILL NORTH,6.0,75.0,231.0,85.3,242.0,85.0


# The Best Neighborhood:
For each dataset, we have sorted the data in order to determine the best neighborhood. For house affordability, we utilized the current delinquency tax. What this means is that a resident of a house has not paid taxes by a due date which results in accumulated interest. Thus, we used this data to show that a place with a high delinquency rate is a neighborhood is not affordable because the taxes on the houses are so high that they go into delinquency tax. As a result, we sorted the data in ascending order. When we do this, all the neighborhoods that are affordable are shown. The top 5 most affordable neighborhoods were: Glen Hazel, Arlington Heights, Squirrel Hill, Allegheny Center, and Crawford Roberts. 

Next, we collected and sorted the data about safety in each of the neighborhood. To do this, we used the amount of arrest a neighborhood had. The higher the amount the less safe that neighborhood and vice versa. Doing all that, we found that the neighborhoods with the least arrest rate were: Regent Square, Central Northside, Mt. Oliver Boro, Troy Hill-Herrs Island, and Mt. Oliver Neighborhood. 

Lastly, to show a neighborhood is a happy and stressless area, we have collected and sorted data about the median age of the resident of a neighborhood.  When we sorted by the total median age, we found that the neighborhoods with the most longevity were: Squirrel Hill North & South, North Oakland, Banksville, and Point Breeze.  

Considering all of those in mind, the best neighborhood stands out to be Squirrel Hill. It has one of the most affordable houses, a longer longevity on average, and a relatively low crime rate. One of the few objections are: it has an arrest count of around 800, which is very high, indicating an unsafe environment, and it has a big population, which is why the longevity is bigger. The primary reason these objections are invalid is because they fail to account for the size of this neighborhood. Squirrel Hill is a large neighborhood, and it is significantly larger than other neighborhoods, whose small size helps them rank so high. Thus, Squirrel Hill ranking so highly even though they are a larger neighborhood proves that it is one of the best neighborhoods to live in. 

In [127]:

Arrest_Total = arrest_counts.get('Squirrel Hill North', 0) + arrest_counts.get('Squirrel Hill South', 0)  # This will return 0 if 'Squirrel Hill' is not in the dataset


print(f"Arrest count for Squirrel Hill: {Arrest_Total}")


Arrest count for Squirrel Hill: 1102


# Conclusion
A data-driven approach to finding the best neighborhood is drastically different than my personal favorite neighborhood. The data-driven approach does not incorporate any sort of bias which results in statically factual information. It does not account for an individual’s experiences or perspective but rather integrates a general and broad take on a neighborhood using data. As a result, there is a drastic difference between that and my personal favorite. My personal favorite is filled personal experience and bias. For instance, I personally find Brentwood Borough to be a nice neighborhood because the houses are affordable here and the people are also nice. Furthermore, it is a small community with a low crime rate. In a data-driven, this neighborhood isn’t ranked highly because of the small size and population. 