# Best Neighborhood in Pittsburgh Final Project
## Madeline Fontana's Project Notebook

In [None]:
import pandas as pd
%matplotlib inline

**This notebook looks at two data sets analyzing City of Pittsburgh parks, courts, and rinks to determine the best neighborhood in Pittsburgh. Our group's criteria for best neighborhood is the best neighborhood for children. This will be determined on multiple factors including amount of local parks, courts, and rinks, and the types of of each of these items. My partner will also be analyzing information about playgrounds and playground equipment best suited for children. Neighborhoods will be ranked based on certain criteria. The best neighborhood will be a neighborhood with plenty of access to parks, courts, rinks, and playgrounds, and the best quality locations for each. The best neighborhood will be great for promoting exercise, healthy habits, and emotional well being for children. Notice that some parks overlap neighborhoods. For this particular analysis, we have chosen to ignore this piece of information.**

## **City of Pittsburgh Parks**

**First, I read in the City of Pittsburgh Parks data set and took a sample of 10 values from the data set.**

In [None]:
#Read in Data
parks_data = pd.read_csv("cityofpghparks.csv")
parks_data.sample(10)

**Next, I performed some data analysis on the data set. Here I created a data frame that contained all of the neighborhoods with no repeats and their cooresponding number of parks in each neighborhood. Here is a sample of 20 neighborhoods from this data frame. I used `.value_counts()` to find these values.**

In [None]:
#Parks Per Neighborhood

parks_values = parks_data['neighborhood'].value_counts()
parks = pd.DataFrame(parks_values)
parks.sample(20)

**Here are the top neighborhoods and the amount of parks in each neighborhood.**

In [None]:
parks.head(12)

**Next, I performed a statistical analysis of the data frame I created. The average number of parks per neighborhood was 2.86 so about 3 parks per neighborhood. The maximum number of parks per neighborhood was 12 parks.**

In [None]:
#Statistical Analysis of Data
parks.describe()

**Here I generated a sample graph containing 6 random neighborhoods and each neighborhood's number of parks. Run this multiple times to get an idea for how neighborhoods compare.**

In [None]:
#Sample Graph
#parks = pd.Series(quantities, index=neighborhoods)

parks.sample(6).plot.bar(rot=0, figsize=[12,8])

**Here I generated a graph of the top neighborhoods and the amount of parks in each.**

In [None]:
#Top Neighborhoods Graph
top_q = [12,10,10,8,7,7,6,6,6,6,6,5,5]
top_n = ['East Liberty','Central Business District', 'Point Breeze', 'Beechview', 'South Side Slopes', 'Point Breeze', 'South Side Flats', 'Hazelwood', 'Squirrel Hill South', 'Mount Washington', 'Sheraden', 'South Oakland', 'Troy Hill']
top_neighborhoods = pd.Series(top_q, index=top_n)

top_neighborhoods.plot.bar(rot=0, figsize=[26,8])

**Neighborhoods with Most Parks Ranking**
1. **East Liberty**
2. **Central Business District and Point Breeze North**
3. **Beechview**
4. **South Side Slopes and Point Breeze**
5. **South Side Flats, Hazelwood, Squirrel Hill South, Mount Washington, and Sheraden**
6. **South Oakland and Troy Hill**

**Here is a line graph showing the entire data set. Notice how the values 12 and 10 and outliers being that most neighborhoods do not have that many parks. Also notice how many neighborhoods have a value centering around 5 and 3, showing that many neighborhoods on average have a range from about 2 to 5 parks.**

In [None]:
#Entire Data Set Graph
parks.plot.bar(figsize=[15,8])

**Here is a tool I created using user input to allow a user to look up the amount of parks in a certain neighborhood. This could be valuable as a resource when researching whether or not someone should live in a certain neighborhood in the City of Pittsburgh.**

In [None]:
#Lookup a Specific Neighborhood


all_neighborhoods = parks_data.iloc[0:,5]
neighborhoods = []


for neighborhood in all_neighborhoods:
       if neighborhood not in neighborhoods:
            neighborhoods.append(neighborhood)




print("Enter a neighborhood to find its number of parks")
print("Enter the word 'stop' to stop searching")
print()
while True:
    search = input(prompt="Enter a neighborhood: ")
    search = str(search)
    if (search.startswith('stop')):
        break
    elif search not in neighborhoods:
        print("The neighborhood you entered is not in this data set.")
        break
    else:
        print(parks_values[search], "parks")
    print()
            


**However, after analyzing this data set, I noticed that some of the parks listed were not traditional parks, so I analyzed this as well.**

In [None]:
#Number of Each Type of Park 

type_values = parks_data['type'].value_counts()
types = pd.DataFrame(type_values)
types.head(10)

**Here is a graph showing how many of each type of park there is in the data set.**

In [None]:
types.plot.bar(rot=0, figsize=[10,8])

**Now let's analyze on a broader scale what may be the best region to live in, and what neighborhoods are in that region.**

In [None]:
#Regions

region_values = parks_data['maintenance_responsibility'].value_counts()
regions = pd.DataFrame(region_values)
top_regions = regions.head(5)
top_regions

**The Parks-Northern region has the most parks in it with 39 parks being in that region. Next, I found which neighborhoods are in the top five regions listed.**

In [None]:
lists_in_list = []

n_in_top_r = []
all_regions = parks_data.iloc[0:,3]
top_regions = ['Parks - Northern','Parks - Western','Parks - Southern','Parks - Northeast','Parks - Schenley']
index = 0
for region in top_regions:
    print("Region:", region)
    print()
    for n in all_neighborhoods:
        if all_regions[index].startswith(region) and n not in n_in_top_r:
            print(n)
            n_in_top_r.append(n)
        index = index + 1
    lists_in_list.append(n_in_top_r)
    n_in_top_r = []
    index = 0
    print()
    

**Here I have shown which top neighborhoods are in top regions. Top neighborhoods are neighborhoods ranked for having a lot of parks and top regions are regions with a lot of parks.**

In [None]:
#Top Neighborhoods in Top Regions
top_n_in_top_r_counts = []

r_num = 1
count = 0
for l in lists_in_list:
    print("Top Neighborhoods in Region",r_num)
    print()
    for t in top_n:
        if t in l:
            count = count + 1
            print(t)
          
    top_n_in_top_r_counts.append(count)
    lists_in_list
    count = 0
    r_num += 1
    print()





**Notice that regions 'Parks - Southern' and 'Parks - Schenely' (regions 3 and 5) have the most top neighborhoods in them, making them the best regions to live in. After analyzing the data based on region, I established a new ranking based on most top neighborhoods in top regions using parks per neighborhood counts to break ties.**

**New Ranking**
1. **Beechview**
2. **South Side Slopes**
3. **South Side Flats**
4. **Hazelwood**
5. **Squirrel Hill South**
6. **South Oakland**
7. **Central Business District**
8. **Mount Washington**
9. **Sheraden**
10. **Troy Hill**
11. **East Liberty**

**Notice how much of a drastic change there was in the ranking when we base the ranking on best regions. East Liberty, who was ranked first when only looking at most parks, is now in last because it is not in a top region (meaning a region with a lot of parks). Beechview still is ranked high, and both the South Side Flats and Slopes moved up in the ranking slightly.**

### **Summary of City of Pittsburgh Parks Data Set**

**The overall winner for this data set is Beechview ranked first and third after performing analysis of two different aspects of the data set and forming a general conclusion. Beechview has 8 parks, ranked third in the ranking based of amount of parks and ranked first in the ranking based on region. Beechview could be considered a good neighborhood to live in and a good neighborhood for chidlren based on access to many parks of different types and for being in a region with many parks.**

## **City of Pittsburgh Courts and Rinks**
**First, I read in the City of Pittsburgh Courts and Rinks data set. Here is a sample of 10 entires in this data set.**

In [None]:
#Read in Data
courts_rinks_data = pd.read_csv("cityofpghcourtsandrinks.csv")
courts_rinks_data.sample(10)

**Next, I found how many courts/rinks were in each neighborhood using `.value_counts()`. Here is a sample of 20 neighborhoods and the amount of courts/rinks in each.**

In [None]:
#Courts and Rinks Per Neighborhood

courts_values = courts_rinks_data['neighborhood'].value_counts()
courts_and_rinks = pd.DataFrame(courts_values)

courts_and_rinks.sample(20)

**Here is a graph of all the neighborhoods and the amount of courts and rinks in each.**

In [None]:
courts_values.plot.bar(figsize=[15,8])

**Here is the top 8 neighborhoods with the most courts/rinks with some neighborhoods forming a tie for the amount of courts/rinks. From this I found the top 6 rankings for most courts/rinks per neighborhood.**

In [None]:
courts_and_rinks.head(8)

**Next, I performed a statistical analysis of the data frame using `.describe()`. Notice that the average number of courts/rinks is 3.9, so about 4 courts/rinks per neighborhood. The maximum number of courts/rinks per neighborhood is 26 courts/rinks.**

In [None]:
courts_and_rinks.describe()

**Next, I put the top neighborhoods and top quantities into lists and then a Series and then plotted this Series using a bar chart, showing the amount of parks in each of these top neighborhoods. Squirrel Hill South has the most courts/rinks with 26 courts/rinks.**

In [None]:
#Top Neighborhoods Graph
top_q2 = [26,20,10,9,9,9,8,7]
top_n2 = ['Squirrel Hill South','Highland Park','Hazelwood','Brookline','Allegheny Center','Beltzhoover','Troy Hill','Beechview']
top_neighborhoods = pd.Series(top_q2, index=top_n2)

top_neighborhoods.plot.bar(rot=0, figsize=[15,8])

**Neighborhoods with the Most Courts/Rinks Ranking**
1. **Squirrel Hill South**
2. **Highland Park**
3. **Hazelwood**
4. **Brookline, Allegheny Center, and Beltzhoover**
5. **Troy Hill**
6. **Beechview**

**Afer forming this ranking, I wanted to analyze a different aspect of this data set. I found the amount of each type of court or rink there is in the City of Pittsburgh. The most common type is a Basketball (Full) court, with the City of Pittsburgh having 88 Basketball (Full) courts.**

In [None]:
#Number of Each Type of Court/Rink 

court_types = courts_rinks_data['type'].value_counts()
types = pd.DataFrame(court_types)

types

**Here are the top five types of court/rink in the data set.**

In [None]:
types.head(5)

**Here is a graph containing all of the different types of court/rink nad how many of each are in the data set.**

In [None]:
#Types of Court/Rink Graph

types.plot.bar(rot=0, figsize=[16,8])

**Here is a graph of the top five types of court or rink in the City of Pittsburgh.**

In [None]:
types.head(5).plot.bar(rot=0, figsize=[10,6])

**Using the information I collected about the most commo types of court or rink, I put the neighborhoods into categories based on which type of court or rink a neighborhood has.**

In [None]:
#Neighborhoods With Each Type of Court/Rink

all_neighborhoods = courts_rinks_data.iloc[0:,8]
neighborhoods = []


for neighborhood in all_neighborhoods:
       if neighborhood not in neighborhoods:
            neighborhoods.append(neighborhood)



lists_in_list = []

n_in_top_t = []
all_types = courts_rinks_data.iloc[0:,2]
top_types = ['Basketball (Full)','Tennis','Basketball (Half)','Hockey','Pickleball']
index = 0
for name in top_types:
    print("Court/Rink Type:", name)
    print()
    for n in all_neighborhoods:
        if all_types[index].startswith(name) and n not in n_in_top_t:
            print(n)
            n_in_top_t.append(n)
        index = index + 1
    lists_in_list.append(n_in_top_t)
    n_in_top_t = []
    index = 0
    print()


**After looking at all the neighborhoods, I then found which top neighborhoods from the earlier ranking were in each of the top court or rink types.**

In [None]:
#Top Neighborhoods in Top Court/Rink Types

top_n_in_top_t_counts = []

t_num = 1
count = 0
for l in lists_in_list:
    print("Top Neighborhoods in Top Court/Rink Type",t_num)
    print()
    for t in top_n2:
        if t in l:
            count = count + 1
            print(t)
          
    top_n_in_top_t_counts.append(count)
    lists_in_list
    count = 0
    t_num += 1
    print()



**Notice that most top neighborhoods have Basketball (Full) and Tennis courts (court/rink types 1 and 2). The second most common court/rink type is Basketball (Half) courts/rinks with the second most top neighborhoods having that type. Here is a new ranking of neighborhoods based on top court/rink types and most top neighborhoods within them, using amount of courts/rinks in each neighborhood and how many repeats of the neighborhood in top types there are to break ties.**

**New Ranking Based on Court/Rink Types:**
1. **Squirrel Hill South**
2. **Highland Park**
3. **Hazelwood**
4. **Allegheny Center and Brookline**
5. **Beltzhoover**
6. **Troy Hill**
7. **Beechview**

**Notice that the ranking did not change very much based on the criteria, with only Beltzhoover moving down a level. Therefore, we could infer that most neighborhoods have common types of courts and rinks, so either ranking of neighborhoods will result in a list of high quality neighborhoods with access to courts and rinks that are also high quality.**

### **Summary of City of Pittsburgh Courts and Rinks**

**Overall winner for this data set is Squirrel Hill South, ranked first place for having the most courts/rinks with 26 court/rinks. Squirrel Hill South has various court types including Basketball (Full), Tennis, Hockey, and Pickle Ball. The most common type of court is a Basketball (Full) court with the City of Pittsburgh having 88 full basketball courts. The average number of courts or rinks per neighborhood is about 4 courts/rinks.**

### My Comparison of the Data Sets

**After analyzing both of these data sets, some neighborhoods appeared in the top rankings of both data sets after performing my analysis. These neighborhoods are Beechview, Hazelwood, Squirrel Hill South, and Troy Hill. These neighborhoods could be considered other good neighborhoods to live in for children because they rank well in access to park and access to courts and rinks. Playing various sports on different courts and rinks and spending time outside in a park is beneficial to children's health and well being.**

**To demonstrate the overlap of the two data sets, I looked at the 'park' column in the courts and rinks data set to see which parks were located in which neighborhoods.**

In [None]:
courts_new_data = pd.read_csv("cityofpghcourtsandrinks.csv",usecols = ['neighborhood','park'])

courts_new_data.sample(10)

**Here are the parks most present in the courts and rinks data set and a graph representing this. Highland Park shows up 18 times in the data set, meaning that many of the courts and rinks are located in that park.** 

In [None]:
courts_new_data_values = courts_new_data['park'].value_counts()
new_courts = pd.DataFrame(courts_new_data_values)
new_courts.head(6)

In [None]:
new_courts.head(10).plot.bar(rot=0,figsize=[20,8])

**Highland Park is present the most in the courts and rinks data set, located in the Highland Park neighborhood. Schenley Park is the second most present, overlapping the neighborhoods of Oakland, Squirrel Hill, and Greenfield.**

**Here are the parks that are in the Squirrel Hill South neighborhood using the park column in the courts and rinks data set.**

In [None]:
#Parks in Squirrel Hill South

all_parks = courts_rinks_data.iloc[0:,3]
parknames = []
            
i = 0
for n in all_neighborhoods:
    if n.startswith('Squirrel Hill South') and all_parks[i] not in parknames:
        parknames.append(all_parks[i])
    i += 1
    
print('Parks In Squirrel Hill South:')
print()

for park in parknames:
    print(park)


### Reflection

**Squirrel Hill South, in the top rankings for every single category, is ranked first overall in both data sets. Squirrel Hill South has 6 parks, ranked fifth by number of parks and by region the parks are in. Squirrel Hill South was also ranked first for both categories I analyzed in the courts and rinks data set with 26 courts/rinks and having 4 different top court/rink types including Basketball (Full), Tennis, Hockey, and Pickleball. I have concluded that based on my two data sets and the analysis I performed, Squirrel Hill South is the best neighborhood to live in. This neighborhood would be great for children with access to many parks, courts, and rinks while also being located in regions and having a multitude of different types of courts and rinks. Squirrel Hill South also contains part of Schenly Park, which showed up quite frequently in the courts and rinks data set. Squirrel Hill South also has Davis Park and Frick Park. This park had 16 different courts or rinks which could be used by children, making Squirrel Hill South a great neighborhood overall.**