Best education metric:

To determine the best neighborhood in Pittsburgh, we decided to create a metric to determine which area has the best education. I decided a good measure of effective schooling is seeing which area has the highest rate of residents who graduated high school.

Originally we wanted to base it off the effectiveness of schools for younger kids: middle school or elementary school. However, we found it hard to find good data to measure this. I found a dataset that listed the level of education attained by every resident of each area in Pittsburgh over 18 who chose to respond to the survey; this led me to focus on highschool graduation rates, because it is generally considered indicative of the effectiveness of an education from childhood to young-adulthood.

This data was from a dataset gained from the 2014 ACS, question 11.

The metric for "best education" measures what percentage of residents over 18 in each area earned a high school diploma or higher. Whichever area has the highest rate of graduating high school determines that area's score for "best education".

For each area, I took the total number of residents with a high diploma or higher education and divided that by the total number of residents surveyed to get their score out of 1. I took this percentage and scaled it to make the total possible points for the area 30.

In [9]:
!pip install pandas

import pandas as panda
import math

#using pandas to help parce through the data set 
data = panda.read_csv('EducationalAttainmentData.csv')

#goal: read through the respective columns to sum and divide them by the column that has the number of residents surveyed.
#specifically, add the columns of those who graduated highschool (or higher education) and divide by the total number of residents surveyed

#list of the names of the columns to read through
columnsHighSchoolOrMore = [ 
    'Estimate; Total: - Regular high school diploma',
    'Estimate; Total: - GED or alternative credential',
    'Estimate; Total: - Some college, less than 1 year',
    'Estimate; Total: - Some college, 1 or more years, no degree',
    "Estimate; Total: - Associate's degree",
    "Estimate; Total: - Bachelor's degree",
    "Estimate; Total: - Master's degree",
    "Estimate; Total: - Professional school degree",
    "Estimate; Total: - Doctorate degree"
]


#iterates through the columns in the data set and adds the column if it has the same title as one of those listed above
columnsOfGrads = [col for col in columnsHighSchoolOrMore if col in data.columns] #creates list of columns that should be added
data['percentageOfHighSchoolGraduates'] = data[columnsOfGrads].sum(axis=1) / data['Estimate; Total:'] 
# creates a new column in the Dataframe called 'Sum_and_Divide' and sets its rows equal to the respective sums of each valid column, 
# then divides by the respective total number of citizens 

#converts new column into a python list
percentList = data['percentageOfHighSchoolGraduates'].tolist() #remember that this is [0]-[90], not [1]-[91]


#creates a list of doubles to store the index to track which area has what percentage to print out the name of the area later
areaAndPercentageList = [(index, value) for index, value in enumerate(percentList)]

#Rank the areas from highest to lowest HS graduation rate
rankedAreaList = sorted(areaAndPercentageList, key=lambda x: x[1], reverse=True)

#print list (for testing)
rankedAreaList #id = int + 1; column A = id + 1;
#nan means no one replied to the survey in that neighborhood

# list of neighborhoods in the dataset
# doesn't include any NaN (areas where no one responded) and then converts the rest to Strings 
listOfNeighborhoods = data['Neighborhood'].dropna().astype(str).tolist()

# Create a new list with the 'n' replaced by the corresponding string from the `strings` list
#final list to represent each area ranked from best to worst, including the area, id in the dataset, and the score
#scaled the scale from 0 to 1 to 0 to 20
#converts doubles and ints to floats
finalList = []
for duplet in rankedAreaList:
    k, inner_duplet = duplet  # unpack the outer integer and the inner duplet (which has two elements)

    # Ensure that k + 1 is within the bounds of the neighborhood_list
    if k < len(strings):
        # Get the neighborhood string at index k (adjusted for 0-indexing)
        neighborhood_string = listOfNeighborhoods[k]  # k corresponds to the index in 0-based indexing
    else:
        neighborhood_string = None  # Or set a default value if k+1 is out of bounds

    # Check if the inner_duplet is a tuple (int_n, double_value) or a single float value
    if isinstance(inner_duplet, tuple):
        int_n, double_value = inner_duplet  # unpack the inner duplet into int_n and double_value
    else:
        # Handle the case where inner_duplet is a single float (or a NaN value)
        # If the value is None or NaN, treat it accordingly
        if inner_duplet is None or (isinstance(inner_duplet, float) and math.isnan(inner_duplet)):
            int_n, double_value = (float('nan'), float('nan'))  # or any default handling for NaN
        else:
            int_n, double_value = float(inner_duplet), float(inner_duplet)  # Treat as float if no NaN

    # Replace the first part of the inner duplet with k+1
    new_inner_duplet = (float(k + 1), float(double_value))

    # Scale the second value (double_value) from the range [0.0, 1.0] to [0.0, 20.0]
    if 0.0 <= double_value <= 1.0:
        scaled_value = double_value * 20.0  # Scale from 0.0-1.0 to 0.0-20.0
    else:
        # If the value is outside the expected range, handle as needed (e.g., set to NaN or keep it unchanged)
        scaled_value = double_value  # Keep original if out of bounds (or apply custom handling)

    # Replace the second value with the scaled value
    new_inner_duplet = (new_inner_duplet[0], scaled_value)

    # Append the modified duplet to the result list
    finalList.append((neighborhood_string, new_inner_duplet))

# Output the modified duplets
print(modified_duplets)

[('North Shore', (58.0, 20.0)), ('South Shore', (72.0, 20.0)), ('Strip District', (81.0, 20.0)), ('West End', (88.0, 20.0)), ('Point Breeze North', (65.0, 19.82973149967256)), ('Point Breeze', (64.0, 19.58727001491795)), ('Squirrel Hill North', (77.0, 19.559519475791774)), ('Squirrel Hill South', (78.0, 19.4679326406306)), ('Shadyside', (69.0, 19.43913320586361)), ('Regent Square', (67.0, 19.43152454780362)), ('Central Business District', (17.0, 19.343474779823858)), ('Chateau', (22.0, nan)), ('Highland Park', (40.0, 19.59349593495935)), ('Mt. Oliver', (55.0, 19.31758530183727)), ('Troy Hill', (85.0, 19.231590181430096)), ('South Side Flats', (73.0, 19.150374125030172)), ('Greenfield', (37.0, 19.089790897908976)), ('Fairywood', (32.0, 19.07654921020656)), ('Esplen', (31.0, 18.97872340425532)), ('Bloomfield', (10.0, 18.958049371497804)), ('East Carnegie', (27.0, 18.925831202046034)), ('Hays', (38.0, 18.844765342960287)), ('Banksville', (6.0, 18.836104513064132)), ('Windgap', (91.0, 18.8