# Final Project
## Luke Braido and Eric Wang
### "Darius Enjoyers"
#### Favorite Neighborhood: North Shore
#### Targeted Audience: People thinking of moving to Pittsburgh

# Datasets used:
**Police Incident Blotter(wprdc)(crimes):** Neighborhoods with lower amounts of crimes are more prefered neighborhoods.

**City Parks(wprdc)(parks):** Parks are visually appealing, places for people to interact, and they add to property value

**SNAP Census Housing Data(wprdc)(housing):** Better affordability and quality of housing makes a neighborhood more prefered.

### Dataset Category: Quality of Life

# Crimes in Pittsburgh

In [16]:
import pandas as pd

# Load the dataset
nhoods = pd.read_csv("crimes.tsv", sep="\t")

# Get a list of neighborhoods
neighborhoods = nhoods['INCIDENTNEIGHBORHOOD'].dropna().unique().tolist()

# Count the number of crimes per neighborhood
crime_counts = nhoods['INCIDENTNEIGHBORHOOD'].value_counts()

# Rank the neighborhoods by crime count, with 1 being the least
crime_counts_ranked = crime_counts.sort_values().reset_index()
crime_counts_ranked.columns = ['Neighborhood', 'Crime Count']
crime_counts_ranked['Rank'] = crime_counts_ranked['Crime Count'].rank(method='min')

# Convert the Rank column to integer to remove the .0
crime_counts_ranked['Rank'] = crime_counts_ranked['Rank'].astype(int)

# Save the ranked data to a CSV file
crime_counts_ranked.to_csv("ranked_crime_counts.csv", index=False)

# Print the ranked neighborhoods without the index and without .0 in the rank
print("\nRanked crime counts by neighborhood (1 = least crimes):")
print(crime_counts_ranked[['Rank', 'Neighborhood', 'Crime Count']].to_string(index=False))


Ranked crime counts by neighborhood (1 = least crimes):
 Rank                Neighborhood  Crime Count
    1             Mt. Oliver Boro          101
    2     Mt. Oliver Neighborhood          117
    3      Troy Hill-Herrs Island          252
    4              Outside County          256
    5                   Ridgemont          296
    6              Chartiers City          339
    7               New Homestead          353
    8               East Carnegie          401
    8               Outside State          401
   10              Swisshelm Park          443
   11                 Summer Hill          523
   12                Mount Oliver          541
   13               Regent Square          565
   14                        Hays          569
   15           Arlington Heights          573
   16                     Oakwood          629
   17                   St. Clair          657
   18                      Esplen          705
   19                  Glen Hazel          778
   

![Crime Photo](CrimeCount.png)

**Conclusion:**

Based on the data from this crime-based dataset, the "best neighborhood" is Mt. Oliver Boro with only 101 crimes. North Shore is ranked 65 with a total of 3728 crimes, which puts it in the lower-middle side of the rankings. This doesn't really surprise me since there it's sort of a high-volume area with a lot of people passing through it every day. This is probably because it is located in the middle of Pittsburgh and there are many attractions in the neighborhood.

# Parks in Pittsburgh

In [37]:
import pandas as pd

# Load the dataset
df_parks = pd.read_csv("parks.tsv", sep="\t")

# Count the number of parks per neighborhood
park_counts = df_parks['neighborhood'].value_counts()

# Rank the neighborhoods by park count, with 1 being the most parks
park_counts_ranked = park_counts.sort_values(ascending=False).reset_index()
park_counts_ranked.columns = ['Neighborhood', 'Park Count']
park_counts_ranked['Rank'] = park_counts_ranked['Park Count'].rank(method='min', ascending=False)

# Convert the Rank column to integer to remove the .0
park_counts_ranked['Rank'] = park_counts_ranked['Rank'].astype(int)

# Print the ranked neighborhoods without the index and without .0 in the rank
print("\nRanked park counts by neighborhood (1 = most parks):")
print(park_counts_ranked[['Rank', 'Neighborhood', 'Park Count']].to_string(index=False))

# Save the ranked park counts to a CSV file
park_counts_ranked.to_csv("ranked_park_counts.csv", index=False)



Ranked park counts by neighborhood (1 = most parks):
 Rank              Neighborhood  Park Count
    1              East Liberty          12
    2 Central Business District          10
    3                 Beechview           8
    4         South Side Slopes           7
    4              Point Breeze           7
    6       Squirrel Hill South           6
    6          Mount Washington           6
    6                  Sheraden           6
    6                 Hazelwood           6
    6          South Side Flats           6
   11             South Oakland           5
   11                 Troy Hill           5
   13        Marshall-Shadeland           4
   13                   Elliott           4
   13           Central Oakland           4
   13          Brighton Heights           4
   17                  Garfield           3
   17                Greenfield           3
   17               North Shore           3
   17         Central Northside           3
   17            Swiss

![Crime Photo](ParkCount.png)

**Conclusion:**

Based on the data from this dataset of parks in Pittsburgh, the "best neighborhood" is East Liberty with 12 parks. North Shore is tied for 17th with 3 parks, which is in the upper-middle side of the rankings. This kind of surprised me because North Shore is one of the smallest neighborhoods in pittsburgh, so I was expecting it to have a lot less parks than other neighborhoods.

# Housing in Pittsburgh

In [54]:
import pandas as pd

house = pd.read_csv("housedata.tsv", sep="\t")
house.head()

Unnamed: 0,_id,Neighborhood,Sector #,Population (2010),Total # Units (2000),Total # Units (2010),% Occupied Units (2010),% Vacant Units (2010),# Occupied Units (2010),% Owner Occupied Units (2010),...,% Units Built before 1939,Median Home Value (2000),Med. Val. ('00 in '10 Dollars),Median Home Value (2010),% Change Real Value 2000-2010,Median Sale Price (2010),# Sales Counted (2010),Foreclosures (2008),Foreclosures (2010),% of all Housing Units Foreclosed (2010)
0,1,Allegheny Center,3,933,675,1052,51.5%,48.5%,535,10.1%,...,1.3%,"$86,500","$109,535","$136,300",24.4%,,0,0,0,0.0%
1,2,Allegheny West,3,462,390,355,74.9%,25.1%,203,18.2%,...,57.4%,"$159,700","$202,228","$123,600",-38.9%,"$309,940",7,0,1,0.3%
2,3,Allentown,6,2500,1505,1291,80.0%,20.0%,953,59.2%,...,62.9%,"$34,300","$43,434","$42,200",-2.8%,"$8,500",70,27,11,0.9%
3,4,Arlington,7,1869,880,886,86.6%,13.4%,754,65.4%,...,72.3%,"$38,800","$49,132","$44,200",-10.0%,"$15,397",34,12,13,1.5%
4,5,Arlington Heights,7,244,557,148,91.2%,8.8%,139,18.7%,...,9.2%,"$45,000","$56,984","$64,400",13.0%,,0,0,0,0.0%


In [57]:
import pandas as pd

# Load the dataset
file_path = 'housedata.tsv'
data = pd.read_csv(file_path, sep='\t')

# Clean and convert relevant columns for calculations
columns_to_clean = [
    '% Occupied Units (2010)',
    '% Owner Occupied Units (2010)',
    '% of all Housing Units Foreclosed (2010)',
    'Median Home  Value (2010)',
]

for column in columns_to_clean:
    data[column] = (
        data[column]
        .replace({r'[$%,]': ''}, regex=True)
        .astype(float)
    )

# Calculate the score for each neighborhood
data['Score'] = (
    data['% Occupied Units (2010)'] * 0.4 +
    data['% Owner Occupied Units (2010)'] * 0.3 +
    data['Median Home  Value (2010)'] * 0.2 -
    data['% of all Housing Units Foreclosed (2010)'] * 0.1
)

# Sort the neighborhoods by score in descending order
sorted_data = data.sort_values('Score', ascending=False)


pd.set_option('display.max_rows', None)  # Display all rows
pd.set_option('display.max_columns', None)  # Display all columns

print(sorted_data[['Neighborhood', 'Score']])  # Displays the top neighborhoods

                 Neighborhood     Score
75        Squirrel Hill North  69034.94
55              North Oakland  54646.57
67                  Shadyside  53018.17
62               Point Breeze  48121.75
76        Squirrel Hill South  40331.52
65              Regent Square  39320.72
39              Highland Park  33532.72
33                 Friendship  33500.97
79             Strip District  32404.76
71           South Side Flats  30797.91
19            Central Oakland  28911.82
0            Allegheny Center  27283.63
16  Central Business District  25199.73
18          Central Northside  24905.27
1              Allegheny West  24755.39
54              New Homestead  23544.61
14       California-Kirkbride  22684.91
5                  Banksville  22233.11
81             Swisshelm Park  21784.59
58                    Oakwood  21751.30
48                 Manchester  20308.16
24           Duquesne Heights  20152.19
63         Point Breeze North  19947.41
23           Crawford-Roberts  18963.69


![Housing Photo](houses.png)

Conclusion:
When creating the dataset I decided to make a point system to determine which neighborhood had the best housing. Based off of the scores given, Squirrel Hill North is the best place for housing in Pittsburgh. In comparison, our favorite, North Shore, is one of the worst neighborhoods for housing. North Shore scored really low and has lots of apartments instead of houses. There was also not enough data on North Shore to be able to determine whether or not North Shore has bad or good housing.

# Final Calculations:
The five neighborhoods with the lowest total rank based on the sum of their score, crime count, and park count ranks are:

### 1. Central Northside
Total Rank: 51 (Score Rank 14 + Crime Rank 20 + Park Rank 17)

### 2. Squirrel Hill North
Total Rank: 59 (Score Rank 1 + Crime Rank 67 + Park Rank 51)

### 3. Point Breeze
Total Rank: 68 (Score Rank 4 + Crime Rank 58 + Park Rank 6)

### 4. Regent Square
Total Rank: 70 (Score Rank 6 + Crime Rank 13 + Park Rank 51)

### 5. Strip District
Total Rank: 105 (Score Rank 9 + Crime Rank 61 + Park Rank 35)

**Final Conclusion:**

Our favorite neighborhood(North Shore) was not even in the top 5, it was in the middle of the pack for all 3 datasets, even though it was not ranked in the housing dataset. This didn't surprise us much because we were expecting it to be around the middle fo each dataset. The actual best neighborhood, based off of our datasets was Central Northside, with a total rank of 51. To find total rank we just added up the ranks from each of the datasets and put them in order from lowest to highest totals.