# What Neighborhood Is Best to Walk Your Fish?

### By Team Garlic (Haiden Hunter, Andrew Noonan, and Jeffery He)

![alt text](https://i.pinimg.com/736x/41/88/1c/41881cdd0b6c1c5eb9ee45b3e7395c0b.jpg)

## Introduction

What is the weirdest activity you can imagine? I hope that walking your pet fish was your first thought, because it was ours. Now lets take that silly idea and make it comically practical. You would need a place to walk your fish. At first, we thought of public rivers, ponds, and lakes. However, when talking about the neighborhoods around Pittsburgh the rivers dominate the waterways. And what if a much bigger fish is lurking in that murky water waiting to eat your beloved pet? Instead, we turned to public pools. Assuming your fish is supernatural and can survive in chlorine dense waters, but that is a given. Now you do not want someone to mistake your pet fish for one that is meant to be cooked right? So we also analyzed local fish fry locations in order to avoid thoes areas. And to top everything off, you and your fish deserve the highest air quality. So we also made sure that the local air quality was up to standard. But in the end its all about your fish's happiness. So we will combine all these metrics into a *Fish Enjoyment Score* to ultimatly conclude the best neighborhood to walk your fish.

## The Metrics

The *Fish Enjoyment Score* is the ultimate fish walkability metric created by analyzing

1. The number of fish fries.
2. The number of public pools.
3. The air quality of the area.

To reach our conclusing we analyzed the following data sets:

1. [Pittsburgh Fish Fry Locations](https://data.wprdc.org/dataset/pittsburgh-fish-fry-map)

    This dataset contains lists of local fish fry locations of the last 5 years.
    
2. [Public Pool Inspections](https://data.wprdc.org/dataset/allegheny-county-public-swimming-pool-hot-tub-and-spa-inspections)

    This dataset contains the data collected by pool inspectors around the Pittsburgh areas.
    
3. [Air Quality](https://data.wprdc.org/dataset/allegheny-county-air-quality)

    This dataset contains ratings of the local air quality.
    
4. [Reference List of Pittsburgh Neighborhoods](https://data.wprdc.org/dataset/neighborhoods2/resource/668d7238-cfd2-492e-b397-51a6e74182ff)

    This dataset contains a list of all Pittsburgh neighborhoods.




## Metric #1: Avoid Fish Fries at All Cost

##### Haiden Hunter

We begin by importing all of our data and merging it into one file.

In [None]:
import pandas as pd
import fpsnippets

fishFryData2023 = pd.read_csv("https://data.wprdc.org/datastore/dump/511a29f6-3217-4f61-a9ba-b3b5b35ab5fb")
fishFryData2022 = pd.read_csv("https://data.wprdc.org/dataset/682daad1-6d3a-45d3-8710-6c961146e19b/resource/f4d7e81a-ac39-4f84-a249-c68524e8258a/download/2022_pittsburgh_fish_fry_locations.csv")
fishFryData2021 = pd.read_csv("https://data.wprdc.org/dataset/682daad1-6d3a-45d3-8710-6c961146e19b/resource/dfa58a5b-d221-411f-bd7e-32837ff99993/download/2021_pittsburgh_fish_fry_locations.csv")
fishFryData2020 = pd.read_csv("https://data.wprdc.org/dataset/682daad1-6d3a-45d3-8710-6c961146e19b/resource/d802d628-bd44-47cc-bbc9-c691f9026ca1/download/2020_pittsburgh_fish_fry_locations.csv")
fishFryData2019 = pd.read_csv("https://data.wprdc.org/dataset/682daad1-6d3a-45d3-8710-6c961146e19b/resource/5b58c467-8e6a-4abc-9dd5-a39881770b3c/download/2019_pittsburgh_fish_fry_locations.csv")
fishFryData = pd.concat([fishFryData2023, fishFryData2022, fishFryData2021, fishFryData2020, fishFryData2019]).reset_index()

fishFryData.head(3)

Now this data contains a lot of unnecessary information. Lets remove unneeded columns.

In [None]:
fishFryData = fishFryData[['validated', 'venue_name', 'latitude', 'longitude']]
fishFryData.head(3)

This is progess but we will need to spend some time getting the data into a state that is best
suited to analysis it by location. I began this process by converting all of raw data into a list
of neighborhoods using the `fpsnippets.geo_to_neighborhood` function.

In [None]:
finalFishFryData = pd.DataFrame()

for x, row in fishFryData.iterrows():
    neighborhood = fpsnippets.geo_to_neighborhood(row['latitude'], row['longitude'])
    if neighborhood != None:
        if ((row["validated"]) == "t") or ((row["validated"]) == True):
            finalFishFryData = pd.concat([finalFishFryData, pd.DataFrame([{'Neighborhood':neighborhood}])])

finalFishFryData = finalFishFryData.reset_index(drop=True)

finalFishFryData.head(3)

Much better! All that is left is to turn it into a comprehensive list to analyze. To do this I used a reference list of all the neighborhoods in Pittsburgh, then looped through our list of fish fry neighborhoods and kept a tally. What we get will be a compleated list of the number of fish frys in each neighborhood.

In [None]:
listOfHoods = pd.read_csv("https://data.wprdc.org/datastore/dump/668d7238-cfd2-492e-b397-51a6e74182ff")
listOfHoods = listOfHoods[['hood']]
neighborhoodFishFrys = pd.DataFrame(columns=["Neighborhood", "Number of Fish Fries"])

for index, row in listOfHoods.iterrows():
    hood = row["hood"]
    count = 0
    for index2, row2 in finalFishFryData.iterrows():
        if hood == row2["Neighborhood"]:
            count = count + 1
    neighborhoodFishFrys = pd.concat([neighborhoodFishFrys, pd.DataFrame([{'Neighborhood':hood, 'Number of Fish Fries':count}])])

neighborhoodFishFrys = neighborhoodFishFrys.sort_values(by=['Number of Fish Fries'], ascending=False).reset_index(drop=True)

# Adjusting pandas display settings.
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

display(neighborhoodFishFrys)

## Metric #2: Public Pool Search

##### Jeffery He

To begin, lets import the data.

In [None]:
import pandas as pd

poolData = pd.read_csv("DATAFILES/AquaticInspections.csv")

poolData.head(3)

Remove unwanted columns.

In [None]:
poolData = poolData[['Facility Latitude', 'Facility Longitude', 'Venue Type', 'Inspection Passed']]
poolData.head(3)

Now lets sort the data based on neighborhood using `fpsnippets`.

In [None]:
poolDataHood = pd.DataFrame()

for x, row in poolData.iterrows():
    hood = fpsnippets.geo_to_neighborhood(row['Facility Latitude'], row['Facility Longitude'])
    if hood != None:
        poolDataHood = pd.concat([poolDataHood, pd.DataFrame([{'Hood':hood, 'Venue Type':row['Venue Type'], 'Inspection Passed':row['Inspection Passed']}])])
        
poolDataHood = poolDataHood.reset_index(drop=True)
poolDataHood.head(3)

Now we can remove all data rows that do *NOT* pass inspection. We would not want your fish swimming in bad water!

In [None]:
poolDataHoodMask = poolDataHood[poolDataHood['Inspection Passed'].str.contains('t')]
poolDataHoodMask = poolDataHoodMask.reset_index(drop=True)
poolDataHoodMask.head(3)

Now lets make sure that its only pools we are dealing with.

In [None]:
poolDataHoodMask = poolDataHood[poolDataHood['Venue Type'].str.contains('POOLS')]
poolDataHoodMask = poolDataHoodMask.reset_index(drop=True)
poolDataHoodMask.head(10)

Now we have a list of Pittsburgh pools that have passed inspection. This data can be used when computing the best fish friendly neighborhood.

## Metric #3: Air Quality Verification

##### Andrew Noonan

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
airQuality = pd.read_csv("https://data.wprdc.org/datastore/dump/4ab1e23f-3262-4bd3-adbf-f72f0119108b")
airQuality.head(3)

I visualized the dataset to get an initial picture of where neighborhoods stand.

In [None]:
plt.figure(figsize=(12, 6))
plt.bar(airQuality['site'], airQuality['index_value'], color='skyblue')
plt.xlabel('Neighborhood')
plt.ylabel('Air Quality Value')
plt.title('Air Quality in Different Neighborhoods')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
# Show the plot
plt.show()

I created a query mask to eliminate any neighborhoods that had moderate or bad air quality and printed the dataset.

In [None]:
air_query_mask = airQuality['description'].str.contains("Good")
good_air = airQuality[air_query_mask]
good_air.head(3)

I visualized the dataset of neighborhoods with good air.

In [None]:
plt.figure(figsize=(12, 6))
plt.bar(good_air['site'], good_air['index_value'], color='skyblue')
plt.xlabel('Neighborhood')
plt.ylabel('Air Quality Value')
plt.title('Air Quality in Different Neighborhoods')
plt.xticks(rotation=45, ha='right')  # Rotate x-axis labels for better readability
plt.tight_layout()
# Show the plot
plt.show()

As we can see the suburbs of Pittsburgh, escpecially Lawrenceville and West Mifflin. However, the dataset reported some inconsistencies with how Lawrenceville 2 records air quality in comparison to the rest of the sites. So if we were judging purly on air quality, West Mifflin would win. Yet, the rest of out data is comprised of neighborhoods within Pittsburgh. And as the graph shows, Pittsburgh is not terribly far behind. So the air quality can be considered high enough in any Pittsburgh neighborhood to walk your fish.

## Fish Enjoyment Score Calculator

##### Haiden Hunter

We can now combine all of our data into one final score to determine the best neighborhood to walk your fish.

In [None]:
listOfHoods = pd.read_csv("https://data.wprdc.org/datastore/dump/668d7238-cfd2-492e-b397-51a6e74182ff")
listOfHoods = listOfHoods[['hood']]
fishEnjoymentScore = pd.DataFrame(columns=["Neighborhood", "Score"])

![alt text](https://media.giphy.com/media/LnnvGmYxaHiMlMKfuh/giphy.gif)

## Questions?

### Thank you.