# Final Analysis

In order to make our final ranking metric, we simply decided to combine three qualities that we each found important in a neighborhood - walkability, [x], and [x].

We felt that each dataset was equally important to the overall quality of a neighborhood. As such, we felt the final ranking should just be a simple combination of the three - an average. Our final metric is simply a ranking of the *average* ordinal rank of each neighborhood across the three datasets. For example, if a neighborhood was ranked 1st in education, 3rd in walkability, and 4th in (third metric), the its final value for our composite ranking is 2.67 - an average of the three sub-metric ranks. These composite rankings are then ordered by value, lowest to highest, to produce our final ranking, found below. Since it's a ranking of the ranks, let's call it RankRank!

In [21]:
import pandas as pd
import statistics

# import walkability data
wlkRank = pd.read_csv("Datasets/walkability_named_clean.csv")
enrRank = pd.read_csv("Datasets/enrollment_clean.csv")

# set up dictionary
rankArrays = dict()

# iterate through walk rank, adding to array of rankings
for index, row in wlkRank.iterrows():
    if row[0] not in rankArrays:
        rankArrays[row[0]] = [index+1]      
        
# iterate through enr rank, adding to array of rankings only for neighborhoods that exist in walk rank        
for index, row in enrRank.iterrows():
    if row['neighborhood'] not in rankArrays:
        continue
    else:
        rankArrays[row['neighborhood']].append(index+1)
    
print(rankArrays)

# remove neighborhoods not found in all 3 datasets
for key in list(rankArrays):
    if len(rankArrays[key]) != 2:
        rankArrays.pop(key)
    
rankRank = dict()

# generate the average of the rankings in each list of rankings that 
for key in rankArrays:
    if key not in rankRank:
        rankRank[key] = statistics.mean(rankArrays[key])
        
# make a new dataframe from this composite ranking
rankRankDF = pd.DataFrame.from_dict(rankRank,orient='index',columns=['Average Rank'])
rankRankDF = rankRankDF.sort_values(by='Average Rank', ascending=True)
# print top 5, organized from lowest to highest
rankRankDF.head(5)


{'Terrace Village': [1, 23], 'North Shore': [2], 'Allegheny Center': [3, 59], ' North Oakland': [4], 'Larimer': [5, 33], 'Garfield': [6, 13], 'Lawrenceville': [7], 'South Side Flats': [8, 54], 'Bloomfield': [9, 32], 'Shadyside': [10, 41], 'Crawford-Roberts': [11, 38], 'Squirrel Hill North': [12, 25], 'East Liberty': [13, 15], 'Lincoln': [14], 'Friendship': [15, 72], 'Point Breeze': [16, 27], 'Golden Triangle': [17, 85], 'Homewood North': [18, 8], 'Arlington': [19, 42], 'South Oakland': [20, 62], 'Knoxville': [21, 6], 'Highland Park': [22, 22], 'Central Oakland': [23, 86], 'West Oakland': [24, 71], 'Lawrencecville': [25], 'South Shore': [26], 'Stanton Heights': [27, 26], 'Greenfield': [28, 16], 'Upper Hill': [29, 49], 'Squirrel Hill South': [30, 5], 'Morningside': [31, 45], 'Mount Washington': [32, 21], 'Allentown': [33, 29], 'Brighton Heights': [34, 7], 'Beltzhoover': [35, 34], 'Mount Oliver': [36, 63], 'Perry South': [37, 20], 'North Oakland': [38, 87], 'Bedford Dwellings': [39, 40], 

Unnamed: 0,Average Rank
Garfield,9.5
Terrace Village,12.0
Homewood North,13.0
Knoxville,13.5
East Liberty,14.0


From our final metric, RankRank, you can see that the best neighborhoods to live in are X, X, and X, because they have the lowest composite ranking from all three of our metrics.