This notebook contains code for formatting the Station geography data for consistency with Census Race and Ethnicity Dataset P9 and then joining raceand ethnicity data to Station Geography data in order to disaggregagte the population living around stations by race and ethnicity.



# **Pad the GeoIDs for Consistency with FIPS Codes**

In [None]:
# Import necessary library
import pandas as pd

# Load the CSV file
file_path = '/content/PR_2010Facility.csv'  # Replace YOUR_FILE_PATH_HERE with the path to your CSV file
facilities_data = pd.read_csv(file_path)

# Apply zero padding to make IDs consistent with census FIPS code formats
facilities_data['STATEFP10'] = facilities_data['STATEFP10'].astype(str).str.zfill(2)
facilities_data['COUNTYFP10'] = facilities_data['COUNTYFP10'].astype(str).str.zfill(3)
facilities_data['TRACTCE10'] = facilities_data['TRACTCE10'].astype(str).str.zfill(6)
facilities_data['BLOCKCE'] = facilities_data['BLOCKCE'].astype(str)  # Convert to string if not already

# Create the GeoID column by concatenating the modified columns
facilities_data['GeoID'] = facilities_data['STATEFP10'] + facilities_data['COUNTYFP10'] + facilities_data['TRACTCE10'] + facilities_data['BLOCKCE']

# Save the modified dataframe back to a new CSV file
output_file_path = '/content/PR_2010Facility1.csv'  # Replace MODIFIED_FILE_PATH_HERE with the desired path for the modified file
facilities_data.to_csv(output_file_path, index=False)

# Print out the first few rows to confirm the changes
facilities_data.head()

Unnamed: 0,Facility I,STATEFP10,COUNTYFP10,TRACTCE10,BLOCKCE,BLOCKID10,PARTFLG,HOUSING10,POP10,GeoID
0,3938,72,21,30300,3007,720000000000000.0,Block 3007,G5040,,720210303003007
1,3939,72,21,30300,3007,720000000000000.0,Block 3007,G5040,,720210303003007
2,3938,72,21,30300,3006,720000000000000.0,Block 3006,G5040,,720210303003006
3,3939,72,21,30300,3006,720000000000000.0,Block 3006,G5040,,720210303003006
4,3938,72,21,30300,2052,720000000000000.0,Block 2052,G5040,,720210303002052


In [None]:
from google.colab import files

# Assuming 'output_file_path' is your modified file path
#output_file_path = 'MODIFIED_FILE_PATH_HERE'  # Make sure this matches the path where you saved the modified file

# Trigger the download
files.download(output_file_path)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## **Join the Padded File with the Demographics File**

In [None]:
import pandas as pd

# Load the datasets
facilities_df = pd.read_csv('/content/PR_2010Facility1.csv')
blocks_df = pd.read_csv('/content/PR_2010RE1.csv')

# Reformat 'GeoID' in facilities data by ensuring it's a string and removing any leading zeros
facilities_df['GeoID'] = facilities_df['GeoID'].apply(lambda x: str(x).lstrip('0'))

# Reformat 'GeoID' in block data by ensuring it's a string and removing any leading zeros
blocks_df['GeoID'] = blocks_df['GeoID'].apply(lambda x: str(x).lstrip('0'))

# Perform the join operation
merged_df = pd.merge(facilities_df, blocks_df, on='GeoID', how='inner')

# Export the joined file
merged_df.to_csv('/content/2010_joined_data_PR.csv', index=False)

print("Join operation completed. The merged file is saved as 'joined_data.csv'.")


Join operation completed. The merged file is saved as 'joined_data.csv'.


  blocks_df = pd.read_csv('/content/PR_2010RE1.csv')


In [None]:
merged_df.head()

Unnamed: 0,Facility I,STATEFP10,COUNTYFP10,TRACTCE10,BLOCKCE,BLOCKID10,PARTFLG,HOUSING10,POP10,GeoID,Geographic Area Name,Total,Total!!Hispanic or Latino,Total!!Not Hispanic or Latino!!Population of one race!!White alone,Total!!Not Hispanic or Latino!!Population of one race!!Black or African American alone,Total!!Not Hispanic or Latino!!Population of one race!!American Indian and Alaska Native alone,Total!!Not Hispanic or Latino!!Population of one race!!Asian alone,Total!!Not Hispanic or Latino!!Population of one race!!Native Hawaiian and Other Pacific Islander alone,Total!!Not Hispanic or Latino!!Population of one race!!Some Other Race alone,Total!!Not Hispanic or Latino!!Two or More Races
0,3938,72,21,30300,3007,720000000000000.0,Block 3007,G5040,,720210303003007,"Block 3007, Block Group 3, Census Tract 303, B...",0,0,0,0,0,0,0,0,0
1,3939,72,21,30300,3007,720000000000000.0,Block 3007,G5040,,720210303003007,"Block 3007, Block Group 3, Census Tract 303, B...",0,0,0,0,0,0,0,0,0
2,3938,72,21,30300,3006,720000000000000.0,Block 3006,G5040,,720210303003006,"Block 3006, Block Group 3, Census Tract 303, B...",0,0,0,0,0,0,0,0,0
3,3939,72,21,30300,3006,720000000000000.0,Block 3006,G5040,,720210303003006,"Block 3006, Block Group 3, Census Tract 303, B...",0,0,0,0,0,0,0,0,0
4,3938,72,21,30300,2052,720000000000000.0,Block 2052,G5040,,720210303002052,"Block 2052, Block Group 2, Census Tract 303, B...",0,0,0,0,0,0,0,0,0


In [None]:
merged_df.to_csv('/content/2010_joined_data_PR.csv', index=False)

In [None]:
from google.colab import files

# Assuming 'output_file_path' is your modified file path
#output_file_path = 'MODIFIED_FILE_PATH_HERE'  # Make sure this matches the path where you saved the modified file

# Trigger the download
files.download('2010_joined_data_PR.csv')


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>