Introduction:

This is our Final Project for Big Ideas in Computing and Information in which we were tasked with finding the 'best' neighborhood in Pittsburgh. To determine the best neighborhood, we looked at a multitude of different datasets, which gave us many ideas for what our metric could be. We originally started out with finding the highest rent-to-income ratio among neighborhoods, but eventually landed on the best neighborhood to own a dog in.

The Metric:

For our final project we found the best city to own a dog in. To do this, we used the metrics of the number of parks, amount of traffic, and number of smart trash containers a neighborhood has. We decided on these metrics as parks are a great place to walk a dog, lots of traffic is unsafe for dogs, and trash containers are needed while walking a dog. With these three metrics combined we were able to figure out what the best niehgborhood to own a dog in is. Below are links of the datasets used in our investigation.

- https://data.wprdc.org/dataset/smart-trash-containers/resource/75b83ac9-8069-4cf1-bcc3-b9e6b04487d9
- https://data.wprdc.org/dataset/parks/resource/fa329e3d-89ff-4708-8dall-81bfedcad11d/view/3dae5fa4-b30f-467c-84bc-f42e670b2fce
- https://data.wprdc.org/dataset/traffic-count-data-city-of-pittsburgh

**The Best Neighborhood**

In [1]:
#Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt

First we need to import the datasets and preprocess/clean up the data for our use case. For each dataset, we will be looking for the neighborhood and our chosen statistic.


**Trash Containers**

In [None]:
# import data set and get only the neighborhoods
containers = pd.read_csv('nfb25/trash-containers.csv')
containers = containers[['neighborhood']]
containers.head()

As this data contains each trash container along with the neighborhood it is in, after only keeping the neighborhoods, we are left with all the neighborhood occurances. This is great, but does not give us the count of each neighborhood in a useful way.

In [None]:
# get count of each neighborhood
containers = containers['neighborhood'].value_counts()
data = {
  "neighborhood": containers.keys(),
  "count": containers.values
}

containers = pd.DataFrame(data)
containers.head()

The code above counts each occurance of each neighborhood and totals them. These values are then placed in a new column labeled count, representing the total number of smart trash containers located in each neighborhood. These are then sorted from highest to lowest for easier usage. After cleanup, we are left with the chart below.


In [None]:
# display chart of trash containers
plt.figure(figsize=(15, 10))
plt.xlabel('Neighborhood')
plt.ylabel('Containers')
plt.bar(containers['neighborhood'], containers['count'], width = .6, color='skyblue')
plt.title('Smart Trash Containers Per Neighborhood')
plt.xticks(rotation=45, ha='right')
plt.xticks(fontsize=8)
plt.tight_layout()
plt.show()

The chart above shows the number of smart trash containers located in each neighborhood. For our metric, the best neighborhood is located on the left as the more containers the better. Based on the data, Shadyside has the greatest number of smart trash containers, making it the best in this metric.

**Traffic**

In [None]:
# import formatted traffic for testing
traffic = pd.read_csv('dantewarhola/traffic.csv')
traffic = traffic.rename(columns={'average_daily_car_traffic': 'traffic'})
traffic.head()

The code above shows the average daily per traffic neighborhood in Pittsburgh. There is a lot of information insiode this dataframe that we do not need.

In [None]:
df = pd.read_csv('dantewarhola/traffic.csv')
df = df[['average_daily_car_traffic', 'neighborhood']]

# Convert 'None' to NaN and sort the DataFrame
df_sorted = df.replace('None', pd.NA).sort_values(by='average_daily_car_traffic', ascending=True)

# Convert the column to integers, handling NaN values
columns_to_convert = ['average_daily_car_traffic']

for col in columns_to_convert:
    df_sorted[col] = pd.to_numeric(df_sorted[col], errors='coerce').astype('Int64')

df_sorted = df_sorted.groupby('neighborhood', as_index=False).mean().sort_values(by="average_daily_car_traffic").dropna()

df_sorted.head()

I then was able to remove all of the useless information from the dataframe and organize it in ascending order.

In [None]:
#display chart of trash containers
plt.figure(figsize=(15, 10))
plt.xlabel('Neighborhood')
plt.ylabel('Traffic')
plt.bar(traffic['neighborhood'], traffic['average_daily_car_traffic'], width = .6, color='skyblue')
plt.title('Average Daily Traffic Per Neighborhood')
plt.xticks(rotation=45, ha='right')
plt.xticks(fontsize=8)
plt.tight_layout()
plt.show()

The chart above shows the daily average traffic count per neighborhood. As traffic is unsafe for dogs, the best neighborhood is located on the left side of the chart. Based on this, the best neighborhood to own a dog based on traffic would be the Side Flats.

**Parks**

In [None]:
#display chart of parks 
#Reading the data
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('C:/Users/roger/Documents/GitHub/Big-Ideas-in-Computing-and-Information-Group-38-Final-Project/ROW03/Park.csv')
#ONly reading the name and neighborhood column 
df = df[['name', 'neighborhood']]
plt.figure(figsize = (20,10))
#Counting the that the neighborhood appears 
numberofparks = df['neighborhood'].value_counts()
#Making the x-axis neighborhood and the y-axis number of parks in every neighborhood 
plt.xlabel('neighborhood')
plt.ylabel('number of parks in every neighborhood')
#Designing the graph 
plt.bar(numberofparks.index, numberofparks, width = 0.5, color = 'skyblue')
plt.title('# of parks in every pittsburgh neighborhood')
plt.xticks(rotation = 45, ha = 'right', fontsize = 8)
plt.tight_layout()
plt.show()

The chart above shows the neighborhood names on the x axis and the number of parks in each neighborhood, in order to determine the best neighborhood to own a dog, the more park the better because more parks allows dogs to enjoy running around without a leash and interacting with other dogs and people as well.

Since each metric is very different, we need some way to incorporate all of the metrics. In order to combine the metrics to get the best neighborhood, we used the pandas function 'rank', which allowed us to compute the best neighborhood. As this process incorporates all three metrics, we needed to include weights for each of the metrics. These weights allowed us to involve all three metrics and make sure that each of them contributed to the final result.

In [None]:
all = pd.merge(containers, traffic, on='neighborhood')
all['average_daily_car_traffic'] = all['average_daily_car_traffic'] * -1
all['rank'] = (all['count'] * 50 + all['average_daily_car_traffic']).rank(ascending=False).astype('int64')
all = all.sort_values(by='rank')
print(all)

# will need to mess around with the weights/how the ranking system works

**Conclusion**

Based on all the data presented, we have come to the conclusion that based on the number of smart trash containers, average daily traffic, and the number of parks, that the best neighborhood to own a dog is Squirrel Hill South.