# Census Tract Indicators and Percentile Calculating
## GEOG 4503: GIS Project Management Final Project
***

**Author**: Caleb Cordsen

***

This Juypter Notebook is dedicated to setting up code to support taking in census tract information on a number of indicators and loading these into custom census tract objects. It then performs a series of calculations in order to get percentile scores for census tracts in relation to others and exports the results to a spreadsheet. This is to service the author's final project in project management which aims to identify transportation disadvantaged census tracts in Denver County but theoretically this Juypter Notebook can be used to do the calculations on any set of census tracts given some minor modifications.

---

### Class Structure

To support operations concerning census tracts, this notebook utilizes a set of classes to load and store the data. To identifiy transportation disadvantage, 21 indicators are used. These 21 indicators are grouped into six categories. Thus a class exists for each category (Transportation, Health, Environmental, Economic, Resilience, and Equity). Each of these category classes contain member fields for the indicators that belong to that category. For example, the health class contains member fields for Population 65 and Older, Uninsured, and to indicate Disability percentage. In addition to these, these classes contain member fields for the percentile rank of each of these indicators. They then have properties that calculate the average percentile amongst their indicators and a property that reports if this category is considered disadvantaged (avgPercentile > 0.5 for most). 

Then there is a census tract class which contains member fields that consist of the six category objects. This class contains two properties, one to calculate total disadvantaged score and one to report if the census tract is overall transportation disadvantaged (total score >= 4).


In [2]:
# These are the imports necessary for the project
import numpy as np
import csv

In [1]:
class transportationAccessCl:
    
    def __init__(self):
        self.stopDen = None
        self.percStopDen = None
        self.commuteTime = None
        self.percCommute = None
        self.vehcileAccess = None
        self.percVehAccess = None
    
    @property
    def avgPercentile(self):
        return (self.percStopDen + self.percCommute  + self.percVehAccess)/3
    
    @property
    def transportationScore(self):
        if(self.avgPercentile >= 0.5):
            return 1
        else:
            return 0
    
class healthCl:
    
    def __init__(self):
        self.oldPop = None
        self.percOldPop = None
        self.unInsured = None
        self.percUnInsured = None
        self.disabilityPercent = None
        self.percDisability = None
    
    @property
    def avgPercentile(self):
        return (self.percOldPop + self.percUnInsured + self.percDisability)/3
    
    @property
    def healthScore(self):
        if(self.avgPercentile >= 0.5):
            return 1
        else:
            return 0
    
class environmentalCl:
    
    def __init__(self):
        self.homesb41960 = None
        self.percHomes = None
        self.diesal = None
        self.percDiesal = None
        self.cancer = None
        self.percCancer = None
        self.trafficProx = None
        self.percTraffic = None
        self.PM25 = None
        self.percPM25 = None
        self.ozone = None
        self.percOzone = None
    
    @property
    def avgPercentile(self):
        return (self.percHomes + self.percDiesal + self.percCancer + self.percTraffic + self.percPM25 + self.percOzone)/6
    
    @property
    def environmentalScore(self):
        if(self.avgPercentile >= 0.5):
            return 1
        else:
            return 0
        
class economicCl:
    def __init__(self):
        self.HSEduc = None
        self.percHS = None
        self.Renters = None
        self.percRenters = None
        self.Unemployment = None
        self.percUnemploy = None
        self.GINIIndex = None
        self.percGINI = None
        self.lowIncome = None
        self.percLowIncome = None
        self.poverty = None
        self.percPov = None
        self.HousingCost = None
        self.percHousing = None
    
    @property
    def avgPercentile(self):
        return (self.percHS + self.percRenters + self.percUnemploy + self.percGINI + self.percLowIncome + self.percPov + self.percHousing)/7
    
    @property
    def economicScore(self):
        if(self.avgPercentile >= 0.5):
            return 1
        else:
            return 0
        
class resilienceCl:
    
    def __init__(self):
        self.ClimateHazards = None
        self.percClimate = None
    
    @property
    def resilienceScore(self):
        if(self.percClimate >= 0.75):
            return 1
        else:
            return 0
        
class equityCl:
    
    def __init__(self):
        self.LinguisticIsolation = None
        self.percLing = None
    
    @property
    def equityScore(self):
        if(self.percLing >= 0.5):
            return 1
        else:
            return 0

class censusTract:
    
    def __init__(self):
        self.name = None
        self.transportationAccess = transportationAccessCl()
        self.health = healthCl()
        self.environmental = environmentalCl()
        self.economic = economicCl()
        self.resilience = resilienceCl()
        self.equity = equityCl()
    
    def __str__(self):
        return f'Census Tract({self.name})'
    
    @property
    def totalScore(self):
        x = self.transportationAccess.transportationScore + self.health.healthScore + self.environmental.environmentalScore
        y = self.economic.economicScore + self.resilience.resilienceScore + self.equity.equityScore
        return x+y
    
    @property
    def disadvantaged(self):
        if self.totalScore >= 4:
            return 1
        else:
            return 0

### Calculating Percentiles

This script utilizes one function to calculate percentiles for each indicator called **calcPercentiles(tracts)**. This function takes in a parameter named tracts that should be a list of census tract objects. It then goes through all 21 indicators and calculates the percentile rank for each census tract's indicator in relation to the other census tracts in the inputted list.
To do so it ultilizes the following method:


$$ \frac{\text{# of Values Below Current Tract's Indicator Value}}{n} = \text{Current Tract Indicator's Percentile where n is the number of census tracts}$$

This is accomplished through using the numpy package and checking the condition on such an array.

In [3]:
def calcPercentiles(tracts):
    '''
    Parameter: tracts - This should be a list of census tract objects.
    
    This function takes in a list of census tract objects that have populated data for their indicators. It then goes through each indicator and calculates and modifies
    the percentile score for each census tract and each indicator in relation to the list of census tracts that was inputted.
    '''
    # First grab the length of the list of census tracts
    size = len(tracts)
    # Create tuples for each of the six categories. The tuple is made up of the category name aka the attribute name of that category for the census tract object.
    # The second part of the tuple is a list of tuples that are the (indicator, percentileIndicator)
    # These tuples will help us iterate the objects later
    transportation = ("transportationAccess",[("stopDen","percStopDen"),("commuteTime","percCommute"),("vehcileAccess","percVehAccess")])
    health = ("health", [("oldPop","percOldPop"),("unInsured","percUnInsured"),("disabilityPercent","percDisability")])
    environmental = ("environmental", [("homesb41960","percHomes"),("diesal","percDiesal"),("cancer","percCancer"),("trafficProx","percTraffic"),("PM25","percPM25"),("ozone","percOzone")])
    economic = ("economic", [("HSEduc","percHS"),("Renters","percRenters"),("Unemployment","percUnemploy"),("GINIIndex","percGINI"),("lowIncome","percLowIncome"),("poverty","percPov"),("HousingCost","percHousing")])
    resilience = ("resilience", [("ClimateHazards","percClimate")])
    equity = ("equity", [("LinguisticIsolation","percLing")])
    # Make a list of the tuples representing the categories
    categories = [ transportation, health, environmental, economic, resilience, equity]
    
    # Iterate the categories
    for i in categories:
        # Set the variables category and indicator lst to the tuple from the categories list
        category,indlst = i
        # Iterate each indicator (which is a tuple containing the indicator and the percentile for that indicator
        for indicator in indlst:
            # Create a numpy array of all the values for the current indicator of interest for the inputted list of census tracts
            currIndArray = np.array([getattr(getattr(centrct,category),indicator[0]) for centrct in tracts])
            #print(currIndArray)
            # print(str(category)+'.'+str(indicator[1]))
            
            # Zip the numpy array and the inputted census tract. Iterate over this using value and currCenTrct
            for value,currCenTrct in zip(currIndArray,tracts):
                if(value!=None):
                    if(indicator[0] == "stopDen"):
                        # Stop Density is our only "positive" indicator I.E. the higher it is, the better. So we must do a special check for it
                        # since its percentile will be based on if it is a lower value.
                        percentile = np.where(currIndArray>value,1,0).sum()/size
                    else:
                        # Calculate the percentile for the current indicator for the current census tract by doing a np.where
                        # on the np array created earlier that is less than the value of the indicator. This will ouput a boolean array of 1's and 0's that will
                        # then be summed and divided by sum to get the percentile rank
                        percentile = np.where(currIndArray<value,1,0).sum()/size
                    # Set the attribute for the percentile indicator to that percentile
                    setattr(getattr(currCenTrct,category),indicator[1],percentile)

### Loading Data

This script utilizes one function to load data from a csv file into a list of census tracts. The csv file must be structured in a way such that the indicators appear in the order
that they appear in the function below, and a census tract name appears in the first column. This function loads the csv and then iterates through its content loading each row's data into a census tract object and adding it to a list. The function returns the list of tracts.

In [26]:
def loadData(fileName):
    '''
    Parameter: fileName - This should be a valid file path to a csv file.
    
    This function takes in a csv file that contains the census tract information for all 21 indicators. It then parses the csv and loads the data into a list of census tract objects
    and returns that list.
    '''
    
    indicators = [("transportationAccess","stopDen"),("transportationAccess","commuteTime"),("transportationAccess","vehcileAccess"),("health","oldPop"),("health","unInsured"),("health","disabilityPercent"),("environmental","homesb41960"),
                  ("environmental","diesal"),("environmental","cancer"),("environmental","trafficProx"),("environmental","PM25"),("environmental","ozone"),("economic","HSEduc"),
                  ("economic","Renters"),("economic","Unemployment"),("economic","GINIIndex"),("economic","lowIncome"),("economic","poverty"),("economic","HousingCost"),
                  ("resilience","ClimateHazards"),("equity","LinguisticIsolation")]
    
    tracts = []
    csvData = []
    with open(fileName, 'r') as file:
        csvreader = csv.reader(file)
        header = next(csvreader)
        for row in csvreader:
            csvData.append(row)
    
    for currTract in csvData:
        x = censusTract()
        setattr(x,"name",currTract[0])
        for column,indicator in zip(currTract[1:],indicators):
            setattr(getattr(x,indicator[0]),indicator[1],float(column))
        tracts.append(x)
    return tracts       

### Exporting the Data

This script utilizes one function to export the newly acquired data into a three new csv files. One CSV file includes the census tract name, the scores for each of the six indicators, the total score for that tract, and whether that tract is disadvantaged or not. For disadvantaged tracts, it creates a CSV with that has all the indicators and indicators score and another CSV that has average percentiles and scores for each category. It takes three parameters that are file names that will become the names of the new CSV files and one parameter that is a list of census tract objects.

In [43]:
def exportData(outputFile1,outputFile2,outputFile3,tracts):
    '''
    Parameter: outputFile1 - This should be a name for csv file.
    outputFile2 - This should be a name for csv file.
    outputFile3 - This should be a name for csv file.
    tracts - This should be a list of census tract objects.
    
    This function produces three CSV files with different information based on the input of a list of tracts.
    '''
    
    
    writeFile1 = open(outputFile1,'w',newline='')
    writer1 = csv.writer(writeFile1)
    header1 = ['NAMELSAD','Overall Disadvantaged (0 = no, 1 = yes)','Total Disadvantaged Score', 'Transportation Disadvantaged (0 = no, 1 = yes)','Health Disadvantaged (0 = no, 1 = yes)', 
               'Environmental Disadvantaged (0 = no, 1 = yes)', 'Economic Disadvantaged (0 = no, 1 = yes)', 'Resilience Disadvantaged (0 = no, 1 = yes)',
               'Equity Disadvantaged (0 = no, 1 = yes)']
    writer1.writerow(header1)
    for ct in tracts:
        row1 = [ct.name,ct.disadvantaged,ct.totalScore,ct.transportationAccess.transportationScore,ct.health.healthScore,ct.environmental.environmentalScore,
                ct.economic.economicScore,ct.resilience.resilienceScore,ct.equity.equityScore]
        writer1.writerow(row1)
        
    writeFile2 = open(outputFile2,'w',newline='')
    writer2 = csv.writer(writeFile2)
    header2 = ['NAMELSAD','Overall Disadvantaged (0 = no, 1 = yes)','Total Disadvantaged Score', 'Stops Per 1000 People','Stops Percentile Score', '>30 Min Commute Percent',
               'Commute Percentile Score','Percentage of occupied housing units with no vehicles available', 'No Vehicle Percentile Score',
               'Percentage of the population 65 years and older', 'Age Percentile Score',
               'Percentage of the population without health insurance coverage', 'Insurance Percentile Score',
               'Percentage of the population with a disability', 'Disability Percentile Score',
               '% Of Homes Built Before 1960 (Lead Paint)', 'Lead Paint Percentile Score',
               'EJ Index for Diesel Particulate Matter (2017)', 'Diesel Particulate Matter Percentile Score',
               'EJ Index for 2017 Air Toxics Cancer Risk', 'Air Toxic Cancer Percentile Score',
               'EJ Index for Traffic Proximity', 'Traffic Proximity Percentile Score',
               'EJ Index for Particulate Matter 2.5', 'Particulate Matter 2.5 Percentile Score',
               'EJ Index for Ozone', 'Ozone Level Percentile Score',
               'Percentage of population over age 25 without a high school diploma (including GED)', 'High School Education Percentile Score',
               'Percentage of Occupied Housing Units not by Property Owner', 'Renter Percentile Score',
               'Percentage of the labor force unemployed', 'Unemployment Percentile Score',
               'Gini Index of income inequality (income distribution across a population)', 'GINI Index Percentile Score' ,
               '% Low Income', 'Low Income Percentile Score',
               'Percentage of Persons Below 150% Poverty Estimate (Federal Poverty Level)', 'Federal Poverty Percentile Score',
               'Percentage of housing cost burdened occupied housing units with annual income less than $75,000 (30%+ of incomespent on housing costs)', 'Housing Burden Percentile Score',
               'Expected Annual Loss - Score - Composite', 'Expected Annual Loss Percentile Score',
               'Percentage of limited English-speaking households', 'Linguistic Isolation Percentile Score'
              ]
    # Create tuples for each of the six categories. The tuple is made up of the category name aka the attribute name of that category for the census tract object.
    # The second part of the tuple is a list of tuples that are the (indicator, percentileIndicator)
    # These tuples will help us iterate the objects later
    transportation = ("transportationAccess",[("stopDen","percStopDen"),("commuteTime","percCommute"),("vehcileAccess","percVehAccess")])
    health = ("health", [("oldPop","percOldPop"),("unInsured","percUnInsured"),("disabilityPercent","percDisability")])
    environmental = ("environmental", [("homesb41960","percHomes"),("diesal","percDiesal"),("cancer","percCancer"),("trafficProx","percTraffic"),("PM25","percPM25"),("ozone","percOzone")])
    economic = ("economic", [("HSEduc","percHS"),("Renters","percRenters"),("Unemployment","percUnemploy"),("GINIIndex","percGINI"),("lowIncome","percLowIncome"),("poverty","percPov"),("HousingCost","percHousing")])
    resilience = ("resilience", [("ClimateHazards","percClimate")])
    equity = ("equity", [("LinguisticIsolation","percLing")])
    # Make a list of the tuples representing the categories
    categories = [ transportation, health, environmental, economic, resilience, equity]
    writer2.writerow(header2)
    for ct in tracts:
        if(ct.disadvantaged == 1):
            row2 = [ct.name,ct.disadvantaged,ct.totalScore]
            for i in categories:
                category,indlst = i
                for indicator in indlst:
                    row2.append(getattr(getattr(ct,category),indicator[0]))
                    row2.append(getattr(getattr(ct,category),indicator[1]))
            writer2.writerow(row2)
    
    writeFile3 = open(outputFile3,'w',newline='')
    writer3 = csv.writer(writeFile3)
    header3 = ['NAMELSAD','Overall Disadvantaged (0 = no, 1 = yes)','Total Disadvantaged Score', 'Transportation Disadvantaged (0 = no, 1 = yes)', 'Transportation Avg. Percentile Score',
               'Health Disadvantaged (0 = no, 1 = yes)', 'Health Avg. Percentile Score',
               'Environmental Disadvantaged (0 = no, 1 = yes)', 'Environmental Avg. Percentile Score',
               'Economic Disadvantaged (0 = no, 1 = yes)', 'Economic Avg. Percentile Score',
               'Resilience Disadvantaged (0 = no, 1 = yes)', 'Resilience Avg. Percentile Score',
               'Equity Disadvantaged (0 = no, 1 = yes)','Equity Avg. Percentile Score']
    writer3.writerow(header3)
    for ct in tracts:
        if(ct.disadvantaged == 1):
            row3 = [ct.name,ct.disadvantaged,ct.totalScore,ct.transportationAccess.transportationScore,ct.transportationAccess.avgPercentile,
                    ct.health.healthScore,ct.health.avgPercentile, ct.environmental.environmentalScore, ct.environmental.avgPercentile,
                    ct.economic.economicScore, ct.economic.avgPercentile,
                    ct.resilience.resilienceScore,ct.resilience.percClimate,
                    ct.equity.equityScore,ct.equity.percLing]
            writer3.writerow(row3)

### Running on Denver County Data

The cell below runs the created functions on data for Denver County. This is to aid the author's final project. To use this script on other data all users must do is change file path names to their data.

In [47]:
listOfTracts = loadData('DenverCounty21IndicatorsFinalCSV.csv')
calcPercentiles(listOfTracts)
exportData('DenverCountyTractRankingCalculations.csv','DenverCountyDisadvantagedTractsDetailed.csv','DenverCountyDisadvantagedTractsBrief.csv',listOfTracts)

### Some Summary Stats

The cell below calculates some of the totals for things like disadvantaged tracts and more to give the user a good summary of what this script found. No need to edit anything here as it is based on the listOfTracts loaded in above.

In [50]:
score0  = 0
score1 = 0
score2 = 0
score3 = 0
score4 = 0
score5 = 0
score6 = 0
scoreDis = 0
scoreNonDis = 0
scoreTransp = 0
scoreHealth = 0
scoreEnv = 0
scoreEcon = 0
scoreRes = 0
scoreEq = 0

for ct in listOfTracts:
    if(ct.totalScore == 0):
        score0 += 1
    elif(ct.totalScore == 1):
        score1 += 1
    elif(ct.totalScore == 2):
        score2 += 1
    elif(ct.totalScore == 3):
        score3 += 1
    elif(ct.totalScore == 4):
        score4 += 1
    elif(ct.totalScore == 5):
        score5 += 1
    elif(ct.totalScore == 6):
        score6 += 1
        
    if(ct.disadvantaged == 1):
        scoreDis += 1
    else:
        scoreNonDis += 1
        
    if(ct.transportationAccess.transportationScore == 1):
        scoreTransp += 1
    if(ct.health.healthScore == 1):
        scoreHealth += 1
    if(ct.environmental.environmentalScore == 1):
        scoreEnv += 1
    if(ct.economic.economicScore == 1):
        scoreEcon += 1
    if(ct.resilience.resilienceScore == 1):
        scoreRes += 1
    if(ct.equity.equityScore == 1):
        scoreEq += 1

print("This data found "+str(scoreDis)+" total overall disadvantaged tracts")
print("This data found "+str(scoreNonDis)+" total not overall disadvantaged tracts")
print("This data found "+str(score0)+" tracts with a total score of 0")
print("This data found "+str(score1)+" tracts with a total score of 1")
print("This data found "+str(score2)+" tracts with a total score of 2")
print("This data found "+str(score3)+" tracts with a total score of 3")
print("This data found "+str(score4)+" tracts with a total score of 4")
print("This data found "+str(score5)+" tracts with a total score of 5")
print("This data found "+str(score6)+" tracts with a total score of 6")
print("This data found "+str(scoreTransp)+" tracts that were transportation disadvantaged")
print("This data found "+str(scoreHealth)+" tracts that were health disadvantaged")
print("This data found "+str(scoreEnv)+" tracts that were environmentally disadvantaged")
print("This data found "+str(scoreEcon)+" tracts that were economically disadvantaged")
print("This data found "+str(scoreRes)+" tracts that were resilience disadvantaged")
print("This data found "+str(scoreEq)+" tracts that were equity disadvantaged")

This data found 70 total overall disadvantaged tracts
This data found 108 total not overall disadvantaged tracts
This data found 15 tracts with a total score of 0
This data found 38 tracts with a total score of 1
This data found 34 tracts with a total score of 2
This data found 21 tracts with a total score of 3
This data found 29 tracts with a total score of 4
This data found 37 tracts with a total score of 5
This data found 4 tracts with a total score of 6
This data found 90 tracts that were transportation disadvantaged
This data found 92 tracts that were health disadvantaged
This data found 91 tracts that were environmentally disadvantaged
This data found 88 tracts that were economically disadvantaged
This data found 44 tracts that were resilience disadvantaged
This data found 89 tracts that were equity disadvantaged
