# **Week 3 Census Data Exploration**
### **Jia Ni**
In this assignment, I conduct a series of analyses on the 2023 race data for all census tracts in LA County: [ACS Demographic and Housing Estimates](https://data.census.gov/table/ACSDP5Y2023.DP05?g=050XX00US06037$1400000&moe=false). Due to RAM capacity limitations, I divided this assignment into four notebooks. In this specific notebook, I analyzed the population distribution of different racial groups in LA County using boxplots and histograms.

### **Import the libraries**

In [None]:
import pandas as pd
import geopandas as gpd

### **Read and add shapefile to notebook**

In [None]:
tracts = gpd.read_file('data/tl_2023_06037_faces.zip')
tracts.head()

### **Datatype for each column**

In [None]:
tracts.info(verbose=True, show_counts=True)

### **Create a new columns "FIPS" to generate unique identifiers for each census tract**

In [None]:
tracts['FIPS'] ='06' + '037' + tracts['TRACTCE20']

# Select only the "FIPS" and "geometry" columns from the DataFrame
tracts = tracts[['FIPS','geometry']]
tracts.head()

### **Merge geometries that share the same "FIPS"**

In [None]:
tracts_grouped = tracts.dissolve(by="FIPS").reset_index()
tracts_grouped.head()

In [None]:
# Detailed information
tracts_grouped.info(verbose=True, show_counts=True)

In [None]:
# Export
tracts_grouped.to_file("data/tracts_grouped.geojson", driver="GeoJSON")

In [None]:
# Read and add csv to the notebook
race = pd.read_csv('data/2023_race_cleaned.csv', encoding = 'utf8', dtype = {'FIPS': str})
race.head()

In [None]:
# Detailed information
race.info(verbose=True, show_counts=True)

### **Extract and rename the columns**

In [None]:
# Extract
columns_to_keep = ['FIPS',
                   'DP05_0037E',
                   'DP05_0038E',
                   'DP05_0039E',
                   'DP05_0047E',
                   'DP05_0055E',
                   'DP05_0060E',
                   'DP05_0061E',
                   'DP05_0037PE',
                   'DP05_0038PE',
                   'DP05_0039PE',
                   'DP05_0047PE',
                   'DP05_0055PE',
                   'DP05_0060PE',
                   'DP05_0061PE']
race = race[columns_to_keep]
race.head()

In [None]:
# Rename
race.columns = ['FIPS',
                'White',
                'Black or African American',
                'American Indian and Alaska Native',
                'Asian',
                'Native Hawaiian and Other Pacific Islander',
                'Some Other Race',
                'Two or More Races',
                'White_Percent',
                'Black or African American_Percent',
                'American Indian and Alaska Native_Percent',
                'Asian_Percent',
                'Native Hawaiian and Other Pacific Islander_Percent',
                'Some Other Race_Percent',
                'Two or More Races_Percent']

### **Identify and remove the mismatched FIPS**

In [None]:
# Identify
tracts_fips = set(tracts_grouped["FIPS"])
race_fips = set(race["FIPS"])
extra_fips = tracts_fips - race_fips
extra_fips

In [None]:
# Remove
tracts_cleaned = tracts_grouped[~tracts_grouped["FIPS"].isin(extra_fips)]

# Output cleaned data
tracts_cleaned.info(verbose=True, show_counts=True)

### **Merge the two dataframes based on "FIPS"**

In [None]:
tracts_race=tracts_cleaned.merge(race,on="FIPS")
tracts_race.head()

### **Export the GeoDataFrame to a GeoJSON file**

In [None]:
tracts_race.to_file("data/tracts_race.geojson", driver="GeoJSON")

### **Import the library**

In [None]:
import matplotlib.pyplot as plt

### **Create a box plot of the population distribution by race**

In [None]:
race_columns = ["White", "Black or African American", "American Indian and Alaska Native", "Asian", "Native Hawaiian and Other Pacific Islander", "Some Other Race", "Two or More Races"]

In [None]:
plt.figure(figsize = (16,8))
tracts_race[race_columns].boxplot(patch_artist = True,
                                  boxprops=dict(facecolor="lightyellow", color='#b0ab9b', linewidth=1.5, alpha = 0.7),
                                  medianprops = dict(color='#80795b', linewidth=3),
                                  whiskerprops = dict(color='#b0ab9b', linewidth=1.5),
                                  capprops = dict(color='#b0ab9b', linewidth=2),
                                  flierprops = dict(marker="x",  markeredgecolor='#b0ab9b', markersize=6))

plt.title("Population Distribution by Race in LA County", fontsize=14, fontweight='bold', pad = 15)
plt.ylabel("Population Count", fontsize=12, fontweight='bold', labelpad = 10)
plt.xlabel("Race", fontsize=12, fontweight='bold', labelpad = 10)
plt.yticks(fontsize = 10)
plt.xticks(ticks=range(1, len(race_columns) + 1),
           labels=["White", "Black or\nAfrican American", "American Indian and\nAlaska Native", "Asian", "Native Hawaiian and\nOther Pacific Islander", "Some Other Race", "Two or\nMore Races"],
          fontsize = 10)
plt.grid(axis="y", linestyle="--", alpha=0.7)

plt.savefig("population_distribution_by_race.png", dpi=300, bbox_inches="tight")

plt.show()

### **Create histograms of population distributions of different racial groups across census tracts**

In [None]:
fig, axes = plt.subplots(2, 4, figsize=(24, 12), gridspec_kw={'hspace': 0.3, 'wspace': 0.3})
axes = axes.flatten()

colors = ['#ceebf2', '#deedc0', '#768fe8', '#ff9eb5', '#fcac77', '#ffeb9c', "lightgray"]

for i, (column, color) in enumerate(zip(race_columns, colors)):
    ax = axes[i]
    ax.hist(tracts_race[column], bins=25, color=color, edgecolor="gray", alpha=0.7)
    ax.set_title(column, fontsize=12, fontweight="bold")
    ax.set_xlabel("Population", fontsize=10, fontweight="bold")
    ax.set_ylabel("Frequency", fontsize=10, fontweight="bold")
    ax.grid(axis="y", linestyle="--", alpha=0.7)

for j in range(len(race_columns), len(axes)):
    fig.delaxes(axes[j])

plt.tight_layout()

plt.savefig("histograms_of_population_distributions_of_different_racial_groups.png", dpi=300, bbox_inches="tight")

plt.show()