# Final Project: Team M.E.M

***

### Definition of "Best Neighborhood":
* We defined the best neighborhood as the place that was the best in leisure activities for kids and families, since this would make it a very fun place to live. We looked at some data sets, and found a few that would support this idea. We used those data sets to create our metric.

### Metric:
* The metric that we used to measure the best neighborhood is the number of playgrounds, the number of playing fields, and the largest pool capacity in gallons.

***

### Data:
* We then began to analyze our data to determine the best neighborhood.

In [30]:
#load pandas
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib

#Each set of data is loaded in its section.

***

### The Playgrounds of Pittsburgh:
* I worked on figuring out which neighborhoods had the **most playgrounds**. 

In [32]:
#load data from WPRDC
playgrounds = pd.read_csv("./Files/playgrounds.csv", index_col="id", parse_dates=True)

playgrounds.head()

* Since the table was set up by the name of the playground, I went off of the "neighborhood" column to see how many times each neighborhood showed up. 

In [33]:
p2 = playgrounds['neighborhood'].value_counts()
print(p2)

* I then used this information to make a bar plot to see the data more clearly.

In [34]:
bPlot = p2.plot.bar(legend=False, figsize = (15,5))
plt.title("Playgrounds In Neighborhoods")
plt.xlabel("Neighborhoods")   
plt.ylabel("Number of Playgrounds")

* I made a couple more bar plots, narrowing down the data each time so it would be easier to see which neighborhoods are the "best."

In [35]:
#narrow down to 3 or more
moreThanTwo = p2.loc[p2>2]
moreThanTwo.plot.bar(legend=False, figsize = (10,5))
plt.title("Neighborhoods With Three or More Playgrounds")
plt.xlabel("Neighborhoods")
plt.ylabel("Number of Playgrounds")

In [36]:
#narrow down to 4 or more
moreThanThree = p2.loc[p2>3]
moreThanThree.plot.bar(legend=False, figsize = (10,5))
plt.title("Neighborhoods With Four or More Playgrounds")
plt.xlabel("Neighborhoods")
plt.ylabel("Number of Playgrounds")

#### The Results: 
* From this, I determined that:
    1. **Squirrel Hill South** had the **most** playgrounds, at **8**.
    2. **Beechview** and **South Side Slopes** tied for **second**, both having **5**.
    3. **Highland Park, Allegheny Center, Beltzhoover**, and **Sheraden** are all tied at **third**, with **4** playgrounds.

***

### The Playing Fields of Pittsburgh:

First, I intialized the variables playing_fields and playing_fields_names for my dataset.

In [8]:
#load data from WPRDC
playing_fields = pd.read_csv("https://data.wprdc.org/datastore/dump/6af89346-b971-41d5-af09-49cfdb4dfe23", index_col = "neighborhood", parse_dates = True)
playing_fields_names = pd.read_csv("https://data.wprdc.org/datastore/dump/6af89346-b971-41d5-af09-49cfdb4dfe23", index_col = "id", parse_dates = True)

I utilized the len() function, groupby operation, and the count() function in order to calculate the number of playing fields within the neighborhoods of Pittsburgh.

In [37]:
number_of_playing_fields = len(playing_fields.groupby("neighborhood").count())
print("The city of Pittsburgh has a total of " + str(number_of_playing_fields) + " playing fields in its neighborhoods.")

Here I made the decision to place each neighborhood into a list and reuse the groupby function to later create a dataframe and loop through that list.

In [39]:
#neighborhood_names = playing_fields.groupby("neighborhood").count()
neighborhood_names = []
for i in playing_fields_names["neighborhood"]:
    if i not in neighborhood_names:
        neighborhood_names.append(i)
    else:
        continue
neighborhood_names = sorted(neighborhood_names)
print(neighborhood_names)

playing_fields.groupby("neighborhood").count()

I placed all of the neighborhoods into a DataFrame called playing_fields_per_neighborhood. 
After this, I decided to use a bar graph to represent this data because it would make it easier to interpret as opposed to other graphs.

In [40]:
playing_fields_per_neighborhood = pd.DataFrame({"neighborhood": ["Allegheny Center", "Allentown", "Arlington", "Banksville", "Bedford Dwellings", "Beechview", "Beltzhoover", "Brighton Heights", 
                                                                 "Brookline", "Carrick", "Central Lawrenceville", "Central Oakland", "Crafton Heights", "East Hills", "East Liberty", "Elliott", 
                                                                 "Fineview", "Garfield", "Greenfield", "Hazelwood", "Highland Park", "Homewood South", "Larimer", "Lincoln Place", 
                                                                 "Lincoln-Lemington-Belmar", "Lower Lawrenceville", "Manchester", "Marshall-Shadeland", "Morningside", "Mount Washington", 
                                                                 "Oakwood", "Perry North", "Perry South", "Polish Hill", "Regent Square", "Shadyside", "Sheraden", "South Oakland", "South Side Flats",
                                                                 "South Side Slopes", "Spring Garden", "Spring Hill-City View", "Squirrel Hill North", "Squirrel Hill South", "Stanton Heights", 
                                                                 "Terrace Village", "Troy Hill", "Upper Lawrenceville", "West End", "Westwood", "Windgap"], 
                                                "amount of playing fields" : [2, 1, 1, 2, 2, 3, 3, 7, 6, 5, 2, 1, 2, 1, 1, 2, 1, 1, 2, 4, 4, 1, 1, 1, 2, 4, 2, 1, 3, 3, 1, 2, 1, 1, 2, 3, 3, 
                                                                              2, 2, 6, 1, 2, 1, 5, 1, 1, 3, 1, 1, 2, 1]})
playing_field_bargraph = playing_fields_per_neighborhood.plot.bar(x = "neighborhood", legend = False, figsize = (20, 5))
plt.ylabel("Amount of Playing Fields")
plt.title("Playing Fields Per Neighborhood")

Next, I found the number of playing fields in the neighborhoods of Pittsburgh, as well as the mean number of playing fields per neighborhood.

In [41]:
pitts_playing_fields = 0 
for i in range(51):
    pitts_playing_fields = pitts_playing_fields + playing_fields_per_neighborhood["amount of playing fields"][i]

mean = pitts_playing_fields / 51

print("The total number of playing fields in the city of Pittsburgh is " + str(pitts_playing_fields) + ".")
print("The mean number of playing fields per neighborhood in the city of Pittsburgh is " + str(mean) + ".")

Finally, I used a for loop to calculate the number of neighborhooods with a total amount of playing fields greater than the mean.

In [42]:
for i in range(51):
    if playing_fields_per_neighborhood["amount of playing fields"][i] > 2:
        print(playing_fields_per_neighborhood["neighborhood"][i])

In [43]:
top15_neighborhoods = pd.DataFrame({"top 15 neighborhoods": ["Beechview", "Beltzhoover", "Brighton Heights", "Brookline", "Carrick",
                                                            "Hazelwood", "Highland Park", "Lower Lawrenceville", "Morningside",
                                                            "Mount Washington", "Shadyside", "Sheraden", "South Side Slopes", 
                                                            "Squirrel Hill South", "Troy Hill"], 
                                    "amount of playing fields": [3, 3, 7, 6, 5, 4, 4, 4, 3, 3, 3, 3, 6, 5, 3]})
top15_neighborhoods_bargraph = top15_neighborhoods.plot.bar(x = "top 15 neighborhoods", legend = False, figsize = (20, 5))
plt.ylabel("amount of playing fields")
plt.title("Top 15 Neighborhoods for Playing Fields")

#### The Results:
The Neighborhood that has the greatest amount of playing fields is Brighton Heights. It leads Pittsburgh with seven total.
Brookline and South Side Slopes each follow with six playing fields, which is the second most.
Carrick and Stanton Heights each have five, which is the third most. 
Additionally, ten other neighborhoods had three playing fields (which was greater than the mean).

***

### The Pools of Pittsburgh:

In [44]:
# load data from file
pools = pd.read_csv("https://data.wprdc.org/datastore/dump/5cc254fe-2cbd-4912-9f44-2f95f0beea9a", index_col = "neighborhood")
pools.head(30)

This table organizes all the data from the Pittsburgh Pools dataset by neighborhood.

In [45]:
# dropped unnecessary columns from the data leaving only capacity column left
pools2 = pools.drop(['id','name','type','retired', 'water_source', 'image', 'council_district', 'ward', 'tract', 'public_works_division', 'pli_division', 'police_zone', 'fire_zone', 'latitude', 'longitude'], axis=1)
pools2.head(30)

Now I removed all of the unnecessary columns for our analysis. After doing this, I was only left with the capacity and neighborhood columns.

In [46]:
# now need to remove any rows that have a NaN capacity 
pools2 = pools2.dropna()
pools2.head(30)

From the table before, I noticed that some of the pools have NaN capacity. Upon further investigation, I realizedthat those pools are spray parks or children pools. I do not want to include those pools in our analysis because I want to determine the best pool among actual pools made for adults. So, I dropped the NaN rows and created a new table for better visualization.

In [47]:
#plotting the data to get a better visual
poolPlot = pools2.plot.bar(legend=False, figsize = (15,10))
plt.xlabel("Neighborhood")
plt.title("Capacity of pools in Neighborhoods")
plt.ylabel("Pool Capacity (in gallons)")

This is a bar graph of all the applicable Pittsburgh pools and their corresponding capacity (in gallons)

In [48]:
#narrow down the capacity to be above 200000
greaterThan2 = pools2.loc[pools2['capacity'] > 200000]
greaterThan2.plot.bar(legend=False, figsize = (15,10))
plt.xlabel("neighborhood")
plt.title("Capacity of pools in Neighborhoods")
plt.ylabel("Pool Capacity (in gallons)")

In order to find the largest pool, I narrowed it down so that only the pools greater than 200000 gallons were graphed.

In [49]:
#plotting data for capacity greater than 300000
greaterThan3 = pools2.loc[pools2['capacity'] > 300000]
greaterThan3.plot.bar(legend=False, figsize = (15,10))
plt.xlabel("neighborhood")
plt.title("Capacity of pools in Neighborhoods")
plt.ylabel("Pool Capacity (in gallons)")

The graph before did not narrow down the pools enough, so I narrowed the pools even further and decided to graph the pools with a capacity greater than 300000 gallons

In [50]:
greaterThan3.sort_values("capacity", ascending=False).head

Lastly, I sorted the final graphs values from descending order to gain a better understanding of which pool in each neighborhood have the largest capacity in gallons.

#### The Results:

The pool with the largest capacity in gallons is loacted in Highland Park with 560242 gallons. Bedford Dwellings follows for second place at 538000 gallons, and Brookline comes in third at 417657 gallons.

***

### Conclusion: What is the best neighborhood?
* Summary of Data Sets (top neighborhoods from above, and broadened where needed):
    * Playgrounds:
        1. Squirrel Hill South had 8 playgrounds
        2. Beechview and South Side Slopes had 5 playgrounds.
        3. Highland Park, Allegheny Center, Beltzhoover, and Sheraden had 4 playgrounds. 
        4. Three Playgrounds: Carrick, Elliot, Mount Washington, Uppper Lawrenceville, Hazelwood, East Liberty, Crawford-Roberts, South Oakland.
        5. Two Playgrounds: Brighton Heights. (not all listed) 
    * Playing Fields:
        1. The Neighborhood that has the greatest amount of playing fields is Brighton Heights. It leads Pittsburgh with seven total.
        2. Brookline and South Side Slopes each follow with six playing fields, which is the second most.
        3. Carrick and Stanton Heights each have five, which is the third most. 
        4. Four Fields: Hazelwood, Highland Park, Lower Lawrenceville.
        5. Three Fields: Beechview, Beltzhoover, Morningside, Mount Washington, Shadyside, Sheredan, Troy Hill.
    * Pools:
        1. The pool with the largest capacity in gallons is loacted in Highland Park with 560242 gallons. 
        2. Bedford Dwellings follows for second place at 538000 gallons.
        3. Brookline comes in third at 417657 gallons.
        4. Mount Washington had 356000 gallons.
        5. Bloomfield had 335000 gallons. 
        6. Sheraden had 278000 gallons. 
        7. Allegheny Center had 271000 gallons. 
        8. Carrick had 224100 gallons.
        9. Polish Hill had 215451 gallons. 
        10. Brighton Heights and South Side Flats had 205800 gallons.
        11. Perry North had 205800 gallons. 

#### Our Best Neighborhood: Highland Park
* Our data sets had many top neighborhoods in common, but none the exact same as the "absolute top best." so we had to go down the lists a bit to compare. We did several comparisons to pick:
    * Carrick was third in playing fields, fourth in playgrounds, and eighth in pools.
    * Beechview was second in playgrounds and fifth in playing fields, but had a NaN value for pools. 
    * Brighton Heights was first in playing fields, tenth in pools, and fifth in playgrounds. 
    * Highland Park was first in pools, third in playgrounds, and fourth in playing fields.
    
* These are the comparisons that led us to chose Highland Park as the "Best Neighborhood in Pittsburgh." With a 560242 gallon pool capacity, 4 playgrounds, and 4 playing fields, while it wasn't the highest in every category, none of the other neighborhoods could come close to it. 

#### Erin's Conclusion:
* I was kind of shocked by the results, just because I had never heard the name "Highland Park" before. My family is from Pittsburgh, but I haven't lived there since I've been 10, so I'm not very familiar with the different neighborhoods. So, I decided to look it up. [On the Pittsburgh Parks website](https://pittsburghparks.org/explore-your-parks/regional-parks/highland-park/) I found that it has "...a popular bike track, swimming pool, sand volleyball courts, and the Pittsburgh Zoo and PPG Aquarium." I found that neat, since it fits with our metric. I feel that this, along with our data, supports the idea of Highland Park being the best neighborhood in Pittsburgh. 

#### Michael's Conclusion:
* I am not very familiar with the neighborhoods of Pittsburgh, but the results of our study were not all that unpredictable. For example, the city that was the best neighborhood by our metric and standard of bestness of a neighborhood in Pittsburgh was Highland Park and it had a total area of 1.163 mi². On the other hand, the neighborhood, Stanton Heights that ranked poorly by our metric and standard of bestness had 470 acres (which is less than the area of Highland Park). So, it makes sense that Highland Park ranked higher because it has more room for pools, playgrounds, and playing fields. This is further evidence that our conclusion that Highland Park is the best neighborhood in Pittsburgh is reliable.

#### Meryem's Conclusion:
* Although I am from Pittsburgh, I was very shocked to see that Highland Park was rated as the best neighborhood. I have lived here my whole life, but I have never even heard of the neighborhood Highland Park. However, based on the metrics we chose and data we collected, I trust that Highland Park is considered the best neighborhood in Pittsburgh.