This is a breakdown of what's occuring in this code to calculate changes from 2000 to 2020 with the Census crosswalks.

In Jupyter Notebook each code block will run after you press RUN. The code willbe highlighted by a blue box. Then press RUN Again. The box will leave but the code may not be finished. Wait until "DONE" appears under the code block the blue highlighter passed over (excluding the first two which are just set up). Do not proceed before then. Some comments may print from within the code itself but it's largely for clarity purposes. Click on this paragraph here to begin. to see the current block run. Then Press "RUN" to go to the next paragraph or code. 

The blue highlighter will pass over the code, the "IN[]" on the side will say: IN [ * ] while running and then a number will replace the asterick once finished. Press RUN and do not proceed until a "DONE" is printed below the code box and/or IN [ NUMBER ] appears. May take a few minutes or be instant. This does not apply to sentences or paragraphs blocks such as this one, which also are not denoted by In [Number]: on the left hand side. To restart at any time: Press KERNAL -> RESTART KERNAL AND CLEAR ALL OUTPUT. Then click back on this box and press RUN to begin. 

(If you don't care about any of this and want to see the final product. Click RESTART & RUN ALL. Scroll to the bottom and a geojson map file will be in the crosswalk folders after a few minutes while the code runs on automatic.)

Contact me for any questions. . .  

First we start by importing Python libraries. (Click RUN; do this every time unless you're at a code block (excluding the first two, this next one and the one after the next comment. Wait for DONE and then press RUN)



In [1]:
import ast
import copy
import json
import csv
import statistics
import os.path
from os import path


Next we establish two functions: one that takes any blocks (ignore the variable name "2020") and makes it Census tracts and the other a division operator to avoid Div By Zero errors. 

In [2]:
def tract_maker(final2020):
    tract2020 = {}
    seen = []         
    for k in final2020.keys():
        tract = k[0:14]
        if tract not in seen:
            values = []
            for key, value in final2020.items():
                if key[0:14] == tract:
                    if not values:
                        values = value
                    else:
                        values = [float(a) + float(b) for a, b in zip(values, value)]
            if values:
                tract2020[tract] = values
                seen.append(tract)
    return tract2020
    
def zerodiv(a, b):
    if b > 0:
        return round(a / b, 1)
    return 0

Now we open Census Data 2000 and 2010 at the Block level, along with the Crosswalk file explaining geographic changes between these two datasets

In [3]:


'''Program will calculate the differences in data from 2000 to 2010 at the block level.
The data accumulated can be summed up into tracts. The net data from the crosswalks will be saved
into a file so Part 2 of the program can use them and they also can be verified manually in case
of error.
'''
print("Opening files...This is the first code block. Wait until DONE appears to proceed.")
# Census 2000 data Blocks
f00 = 'Census2000/nhgis0097_ds147_2000_block.csv'
# Census 2010 data Blocks
f10 = 'Census2010/nhgis0094_ds172_2010_block.csv'
# 2000 to 2010 Crosswalks
filecw1 = 'crosswalk/nhgis_blk2000_blk2010_gj_06/nhgis_blk2000_blk2010_gj_06.csv'
with open(filecw1, 'r') as a, open(f00, 'r') as c, open(f10, 'r') as d:
    cw1 = list(csv.reader(a, delimiter=',')) # Crosswalk 2000 - 2010
    d00 = list(csv.reader(c, delimiter=',')) # Census 2000
    d10 = list(csv.reader(d, delimiter=',')) # Census 2010
    a.close()
    c.close()
    d.close()
print("DONE")

Opening files...This is the first code block. Wait until DONE appears to proceed.
DONE


Now the algorithm begins. 
PART 1: Iterating through the Crosswalk file, we multiply the weights given for each 2000 block onto any 2010 block identified with that weight and 2000 block -- meaning they intersected at some point. This is so we understand what proportion of the 2010 block geography reflects on the 2000 block. All the fragmented and transformed 2000 blocks that touched a 2010 block with their weight products will sum up into a complete version of the 2010 block equal in shape. Recall the point of crosswalks here is so that we can invent a block full of 2000 data that is identical in shape as 2010's block so we can subtract the difference.

The format within the variable "line_list" will look like:
Row #N: Block ID 2000, Block ID 2010, 2000 White Count * Weight, 2000 Black Count * Weight ... 

PART 2: Once that's done, we make a "dictionary" of sorts, and iterate through the aforementioned weighted list, physically constructing the 2010-shaped block of 2000 data which is currently only in a list form. This is so that when we need to calculate the difference between the Census 2010 - Census 2000 block, upon calling the 2010 block, we'll easily subtract its recreation 2000 geography a.k.a. a "target block or fitted block.

Example: 
Summon: Census Block 4000 2010 
Dictionary: Census Block 2010 returns Census Block 2000 TARGET, consisting of 50% of Census Block 2000 3999 and 50% of Census Block 2000 4001. 

PART 3: Once those target blocks have been assembled we'll do the actual math as described above, which is the third part after "Target Blocks Assembled"

NOTE: This part takes a few minutes. Do not proceed beyond the next code block until "START PART 1" "START PART 2" "START PART 3" "2000 to 2010 done; writing content back" appears in the console at the end of the code block. It'll start with "START PART 1"



In [4]:
'''
# STEP 1 : iterate through crosswalk file blocks, find a 2000 block
# , copy and multiply the data of the 2000 blocks (We only care about cells 46 to 78) by the weight for that block
Basically we're reconstructing the Crosswalk List but inserting a vector of the 2000 data thats been weighted
So final product looks like: [Block 2010 , Block 2020, Data0 * Weight, Data1 * Weight . . . ]
'''
print("START PART 1")
wdata00 =[d00[0][46:80]]
wdata00[0].insert(0, "Block ID 2010")
wdata00[0].insert(0, "Block ID 2000")
for cross_row in cw1: 
    if cross_row[0][0:8] == 'G0600010': 
        for row in d00:
            if cross_row[0] == row[0]:
                weight = float(cross_row[2])
                line_list = [cross_row[0], cross_row[1]]
                for i in range(46, 80):
                    line_list.append(float(row[i]) * weight)
                wdata00.append(line_list)
                break
# GOAL: I want to Know for A SPECIFIC 2010 BLOCK what are the equvilanet pieces of 1 or more
# 2000 BLOCKS that make it equvilanet to the SPECIFIC 2010 BLOCK
# With 2000 Data Calculated, Create the equivilant 2000 target by Summing
# Up the 2000 weighted data for every instance of the selected 2010 block
'''The goal is to recreate the equivilant of a 2000 data fitted to a 2010 block.
So we shall iterate through the 2010 data (we only want the block label), find overlapping 2000 data from the above Crosswalk.
Sum the weighed data up. and then put the weighted summation in a dictionary with its 2010 equal.
Once we're done we can just subtract the weighted representation in the dictionary from the 2010 tracts
these weighted representations are called "targets"
'''
print("START PART 2")
target00to10blks = {}
for row in d10:
    if row[7] == "Alameda County":
        # Regular Op
        first = True
        for wrow in wdata00:
            if row[0] == wrow[1]:
                # Append name of the 2010 block
                # then summate the fragments of the 2000 weighted data
                if first:
                    first = False
                    target00to10blks[row[0]] = wrow[2:36]
                else:
                    target00to10blks[row[0]] = [float(a) + float(b) for a, b in zip(target00to10blks[row[0]], wrow[2:36])]
print("Target Blocks Assembled - START PART 3")
final2010= {}
for row in d10:
    if row[0] in target00to10blks:
        target = target00to10blks[row[0]]
        content =[]
        content.append(float(row[58]) - target[0])#Total Pop0
        content.append(float(row[60]) - target[1] )#White1
        content.append(float(row[61]) - target[2])#Black2
        content.append(float(row[62]) - target[3])#Native3
        content.append(float(row[63]) - target[4])#Asian4
        content.append( float(row[64]) - target[5])#Haiwanna5
        content.append(float(row[65]) - target[6])#Other6
        content.append( float(row[66]) - target[7])#Multi7
        content.append(float(row[67]) - (target[8] + target[9] + target[10] + target[11] + target[12] + target[13] + target[14]))#Hispanic
        content.append( float(row[75]) - target[15]) #Homes9
        content.append( float(row[77]) - target[17])#Vacant10
        content.append( float(row[87]) - (target[25] + target[26] + target[27] + target[28] + target[29] + target[30] + target[31]))# Renters
        content.append( float(row[79]) - (target[18] + target[19] + target[20] + target[21] + target[22] + target[23] + target[24]))# Homeowners
        content.append( float(row[80]) - target[18]) #White Homeowners13
        content.append( float(row[81]) - target[19])#Black14
        content.append( float(row[82]) - target[20])#Native15
        content.append( float(row[83]) - target[21])#Asian16
        content.append( float(row[84]) - target[22])#Haiwanna17
        content.append( float(row[85]) - target[23])#Other18
        content.append( float(row[86]) - target[24])#Multi19
        content.append( float(row[98]) - target[32]) #Hispanic Honeowners20
        content.append( float(row[88]) - target[25])#White Renters21
        content.append( float(row[89]) - target[26])#Black22
        content.append( float(row[90]) - target[27])#Native23
        content.append( float(row[91]) - target[28])#Asian24
        content.append( float(row[92]) - target[29])#Haiwanna25
        content.append( float(row[93]) - target[30])#Other26
        content.append( float(row[94]) - target[31])#Multi27
        content.append( float(row[101]) - target[33])#Hispanic Renter28
        final2010[row[0]] = content
print("2000 to 2010 done; writing content back")
print("DONE")

START PART 1
START PART 2
Target Blocks Assembled - START PART 3
2000 to 2010 done; writing content back
DONE


Before we move onto the other half of the program -- 2010 to 2020 and 2000 to 2020 --we save the data calculating 2000 to 2010 in a CSV file, one in tract form, one in block form -- but we'll only use the block one. Tract was for debugging accuracy.


In [5]:
print("Writing Back")
net0010t = tract_maker(final2010) #Tract version. final2020 is the blocks
with open('crosswalk/change0010tract.csv', mode='w', newline='') as file1:
    csv_writer = csv.writer(file1)
    for key, value in net0010t.items():
        csv_writer.writerow([key, value])
    file1.close()
with open('crosswalk/change0010.csv', mode='w', newline='') as file:
    csv_writer = csv.writer(file)
    for key, value in final2010.items():
        csv_writer.writerow([key, value])
    file.close()
print("DONE")

Writing Back
DONE


ONTO THE SECOND PART:

We again open relevant datafiles: Census 2010 blocks, Census 2020 blocks and 2010 to 2020 Crosswalks. This time we also open the data we calculated in the first half, which is Census Blocks 2010 containing the net difference between 2010 and 2000 data. We'll save that in a easy dictionary (list with key value) to access called "final2010"


In [6]:
print("Crosswalks from 2000 to 2010 done, onto 2010 to 2020")
print("Opening Data files")
# Census 2010 data Blocks
f10 = 'Census2010/nhgis0094_ds172_2010_block.csv'
# Census 2020 data Blocks
f20 = 'Census2020/nhgis0095_ds258_2020_block.csv'
# Weighted Data from 2000 to 2010 from PART 1
filecw1 = 'crosswalk/change0010.csv'
# 2010 to 2020 Crosswalks
filecw2 = 'crosswalk/nhgis_blk2010_blk2020_gj_06(1)/nhgis_blk2010_blk2020_gj_06.csv'
with open(filecw1) as a, open(filecw2) as b, open(f10) as d, open(f20) as e:
    f2010 = list(csv.reader(a, delimiter=',')) # Crosswalk 2000 - 2010
    cw2 = list(csv.reader(b, delimiter=',')) # Crosswalk 2010 - 2020
    d10 = list(csv.reader(d, delimiter=',')) # Census 2010
    d20 = list(csv.reader(e, delimiter=',')) # Census 2020
final2010 = {}
# Digitizing the net changes calculated in PART 1 from a string list to a real list, accompanied with each 2010 block which is final2010
for row in f2010:
    final2010[row[0]] = ast.literal_eval(row[1])
print("DONE")

Crosswalks from 2000 to 2010 done, onto 2010 to 2020
Opening Data files
DONE


In this part 1 of the SECOND HALF, we repeat what we did in part 1 of the FIRST HALF but the key difference is in addition to applying weights to 2010's data in accordiance with the geographic proportion they make of 2020's census blocks, we also multiply the NET CHANGES 2000 -> 2010 data calculated in the FIRST HALF and stored within each Census Block in 2010 with those same 2010 -> 2020 weights we have now. 

Again, like the FIRST HALF's part 2, we also make a dictionary where each 2020 block as its associated 2010 weighted data but it also includes a second dictionary where the net change data from 2000 -> 2010 that was weighted in SECOND HALF's part 1 is also stored with each corresponding 2020 block in the variable: targetnet0010.

In [7]:

print("Weighting Data - PART 1")
''' Weighting the 2010 data from the 2020 crosswalks
    Iterate through the crosswalk file. Find a 2010 block in the crosswalk row (cross_row)
    then find its data from d10 database which is row. Take the weight and multiply each
    value by the weight. This shows the share of data represented for the 2020 block, placed
    into the "weighted" or wdata10 database
'''
wnet10 = [["Block ID 2010", "Block ID 2020", "List of Net Changes"]]
wdata10 =[d10[0][58:102]]
wdata10[0].insert(0, "Block ID 2020")
wdata10[0].insert(0, "Block ID 2010")

for cross_row in cw2:
    if cross_row[0][0:8] == 'G0600010': 
        for row in d10:
            if cross_row[0] == row[0]:
                weight = float(cross_row[2])
                line_list = [cross_row[0], cross_row[1]]
                for i in range(58, 102):
                    line_list.append(float(row[i]) * weight)
                # Now apply weights to the net changes. If block 2010 exists in the 2010 net changes sheet
                if row[0] in final2010:
                    netdata10 = copy.deepcopy(final2010[row[0]])
                    for n, d in enumerate(netdata10):
                        netdata10[n] = d * weight
                    line_list.append(netdata10) # Last element in weighted data should be a sublist of the net changes
                    wdata10.append(line_list)
                break
'''The goal is to recreate the equivilant of a 2010 data fitted to a 2020 block.
So we shall iterate through the 2020 data, find overlapping 2010 data from the above Crosswalk.
Sum the weighed data up. and then put the weighted summation in a dictionary with its 2010 equal.
Once we're done we can just subtract the weighted representation in the dictionary from the 2010 tracts
these weighted representations are called "targets"

We do the same thing with these net changes. Iterate through the 2020s block labels, find all crosswalk 2010 weighted lines
featuring that 2020 block label. Sum up the net changes for each instance of that line and then put the final product in a dictionary for that
assigned 2020 block
'''
print("Generating 2010 targets for 2020 data - PART 2")
targetnet0010 = {}
target10to20blks = {}
for row in d20:
    if row[9] == "Alameda County":
        first = True
        for wrow in wdata10:
            if wrow[1] == row[0]:
                # Append name of the 2020 block
                # then summate the fragments of the 2010 weighted data
                # and the 2010 net weighted data (APPLY THIS)
                if first:
                    first = False
                    target10to20blks[row[0]] = wrow[2:46] # 2 - 45 are weighted 2010s data and 46 is a sub list of net changes
                    targetnet0010[row[0]] = wrow[46]
                else:
                    target10to20blks[row[0]] = [float(a) + float(b) for a, b in zip(target10to20blks[row[0]], wrow[2:46])]
                    targetnet0010[row[0]] = [float(a) + float(b) for a, b in zip(targetnet0010[row[0]], wrow[46])]
print("2010 Target Blocks Assembled")
print("DONE")

Weighting Data - PART 1
Generating 2010 targets for 2020 data - PART 2
2010 Target Blocks Assembled
DONE


Now in Part 3 we calculate the net changes like the FIRST HALF of Part 3. Key difference is that after we've calculated 2010 -> 2020 net changes for each 2020 Census block, we summon the net changes from 00 -> 10 with the key from the same 2020 Census Block from the targetnet0010 dictionary, and add those 00 -> 10 net changes onto the net changes of 10 -> 20 we just calculated, to produce a calculation of changes over 20-years. 20 year net change data's stored in finalnet2020.

finalnet2020[row[0]] = [float(a) + float(b) for a, b in zip(content, ntarget)]

This is the code that calculates all the racial groups and their changes. Because all substraction changes of all the racial groups are assigned column numbers for each row (0 to 28) I don't need to write each line out like I do when pulling data from "rows" which is directly from the census data sheet. As such, the zip() function proportionally sums the 00 -> 10 and 10 -> 20 data compactly.

In [8]:
'''
At this point, we need to calculate the net differences between 2010 to 2020 between the 2020 blocks and the target 2010 recreations.
For every calculation, we can then add that to the 2020's target 2010 net change blocks to calculate the changes from 2020 to 2000, a span of 20 years
'''
print("Calculating net changes - PART 3")
final2020= {}
finalnet2020 = {}
for row in d20:
    if row[0] in target10to20blks and row[0] in targetnet0010:
        target = target10to20blks[row[0]]
        content =[]
        content.append(float(row[59]) - target[0])#Total Pop 0 
        content.append(float(row[61]) - target[2])#White 1
        content.append(float(row[62]) - target[3])#Black 2
        content.append(float(row[63]) - target[4])#Native 3
        content.append(float(row[64]) - target[5])#Asian 4
        content.append( float(row[65]) - target[6])#Haiwanna 5
        content.append(float(row[66]) - target[7])#Other 6
        content.append( float(row[67]) - target[8])#Multi 7
        content.append(float(row[68]) - target[9])#Hispanic 8
        content.append( float(row[76]) - target[17]) #Homes 9
        content.append( float(row[78]) - target[19])#Vacant 10
        content.append( float(row[92]) - target[29])# Renters 11
        content.append( float(row[84]) - target[21])# Homeowners 12
        content.append( float(row[85]) - target[22]) #White Homeowners 13
        content.append( float(row[86]) - target[23])#Black 14
        content.append( float(row[87]) - target[24])#Native 15
        content.append( float(row[88]) - target[25])#Asian 16
        content.append( float(row[89]) - target[26])#Haiwanna 17
        content.append( float(row[90]) - target[27])#Other 18
        content.append( float(row[91]) - target[28])#Multi 19
        content.append( float(row[80]) - target[40]) #Hispanic Honeowners 20
        content.append( float(row[93]) - target[30])#White Renters 21
        content.append( float(row[94]) - target[31])#Black 22
        content.append( float(row[95]) - target[32])#Native 23
        content.append( float(row[96]) - target[33])#Asian 24
        content.append( float(row[97]) - target[34])#Haiwanna 25
        content.append( float(row[98]) - target[35])#Other 26
        content.append( float(row[99]) - target[36])#Multi 27
        content.append( float(row[82]) - target[43])#Hispanic Renter 28
        final2020[row[0]] = content
        # Summating the net changes now, same as above
        #Note that its in the same alignment as the above content but of the 00 - 10 version so ZIP() will do
        ntarget =  targetnet0010[row[0]]
        finalnet2020[row[0]] = [float(a) + float(b) for a, b in zip(content, ntarget)]
print("Calculations finished. Making tracts")
print("DONE")

Calculating net changes - PART 3
Calculations finished. Making tracts
DONE


The algorithm is basically over, now I just make Census Tracts out of the Census Blocks and then write all the data back to files. 

change1020tract.csv: Change from 2010 to 2020 in Tract Form (only used for data verification)

change0020tract.csv: Change from 2000 to 2020 in Tract Form. This will be made into a map on Mapbox.

change0020.csv: Change from 2000 to 2020 in Block Form. Used mostly for verification but also fine grain level analysis.

change1020.csv: Change from 2010 to 2020 in Block Form. (Only used for data verification)

The final data should be within the crosswalks folder. Delete these files and run again to see how they're created.



In [9]:
''' At this point sum up all census block values like vectors for each census tract '''
tract0020 = tract_maker(finalnet2020) # this should be the final product 
tract1020 = tract_maker(final2020) #this tests if 2010 to 2020 c
print("Writing Back")
# Final Tract data
with open('crosswalk/change1020tract.csv', mode='w', newline='') as file0:
    csv_writer = csv.writer(file0)
    for key, value in tract1020.items():
        csv_writer.writerow([key, value])
    file0.close()
with open('crosswalk/change0020tract.csv', mode='w', newline='') as file1:
    csv_writer = csv.writer(file1)
    for key, value in tract0020.items():
        csv_writer.writerow([key, value])
    file1.close()
# Final Block data
with open('crosswalk/change0020.csv', mode='w', newline='') as file2:
    csv_writer = csv.writer(file2)
    for key, value in finalnet2020.items():
        csv_writer.writerow([key, value])
    file2.close()
with open('crosswalk/change1020.csv', mode='w', newline='') as file3:
    csv_writer = csv.writer(file3)
    for key, value in final2020.items():
        csv_writer.writerow([key, value])
    file3.close()
print("DONE. Program over.")
'''
With the net values from 2010 to 2020 calculated via a target 2010 value onto a current 2020 block
, add these differences to the net changes. Probably should do it immediately after the net changes'''


Writing Back
DONE. Program over.


'\nWith the net values from 2010 to 2020 calculated via a target 2010 value onto a current 2020 block\n, add these differences to the net changes. Probably should do it immediately after the net changes'