# Extreme Heat Data Analysis for CCEE Whitepaper

## Python Library and Data Imports

In [18]:
import pandas as pd
from utils import *
heat = pd.read_csv('../datasets/highheat.csv')
school_data = pd.read_csv('../datasets/equity.csv')

## Dataset Merging, Filtering, and Cleaning

Combined the high heat dataset with the California school district dataset to create a singular master dataset.

In [19]:
master = pd.merge(heat, school_data, left_on='County Name', right_on='County')
master = master[~master['Number of Schools'].str.contains('-')]
# strip trailing blanks in column names
master.columns = master.columns.str.strip()

# first 5 rows of master dataset for high heat (One row for every California school district)
master.head(5)

Unnamed: 0,County Name,Climate Biome(s),CDPH Climate Impact Regions,High Heat Threshold Temperature (F),2020-2029 (2025) Average Days,2030-2039 (2035) Average Days,2040-2049 (2045) Average Days,2050-2059 (2055) Average Days,2020-2029 Average Days (Upper Range),2030-2039 Average Days (Upper Range),...,Overall District Environmental Initiatives Score\nUp to 10 Points,District-Wide Environmental Coordinator Staff\n (Whole System Responsibilities)\n1 (Yes) 0 (No),District-Wide Environmental Campus (facilities and grounds) Staff\n1 (Yes) 0 (No),"District-Wide Environmental Curriculum, Community, and Culture Staff\n1 (Yes) 0 (No)",Site-Level Environmental Staff\n1 (Yes) 0 (No),Garden Coordinator (district or site) \n1 (Yes) 0 (No),District and Site-Level Environmental Staff\nLinks and Notes,Staffing Subtotal,Staffing Score,Environmental and Climate Action Score (20 points)
0,Alameda,"Urban, Coastal, Grassland",Bay Area,92.7,11.0,16.0,16.0,21.0,21,26,...,8.5,0,0,0,0.0,0.0,,0.0,0.0,12.5
1,Alameda,"Urban, Coastal, Grassland",Bay Area,92.7,11.0,16.0,16.0,21.0,21,26,...,6.5,1,1,0,0.0,0.0,Environmental Action Committee \n\nSustainabil...,3.0,3.0,12.5
2,Alameda,"Urban, Coastal, Grassland",Bay Area,92.7,11.0,16.0,16.0,21.0,21,26,...,9.5,1,1,0,1.0,1.0,Sustainability Coordinator and Maintenance Man...,5.0,4.0,16.0
3,Alameda,"Urban, Coastal, Grassland",Bay Area,92.7,11.0,16.0,16.0,21.0,21,26,...,7.5,0,0,0,0.0,0.0,,0.0,0.0,10.5
4,Alameda,"Urban, Coastal, Grassland",Bay Area,92.7,11.0,16.0,16.0,21.0,21,26,...,7.5,0,0,0,0.0,0.0,,0.0,0.0,10.5


Cleaned the dataset by correcting for missing values, type errors, and selecting the relevant columns. At this point the primary dataset contains 936 rows, one for each California school district. Each column either represents an indicator (i.e. County or district name) or climate impact data (i.e. Number of days over 87 degrees in 2025).

In [20]:
# clean up
master = master_update(master, '% Students of Color', percent_fixer)
master = master_update(master, '% English Learners', percent_fixer)

# master dataframe after filtering
print(f'There are {master.shape[0]} school districts after filtering.')

There are 936 school districts after filtering.


## District-Level Extreme Heat Analysis

Calculated/collected the number of districts that are expected to experience at least 30, 60, 90, and 120 days a year over 87 degrees. This process was repeated for each of the 4 decade periods.

In [21]:
# cols, years, and thresholds
respective_cols = [12, 13, 14, 15]
years = [2025, 2035, 2045, 2055]
thresholds = [30, 60, 90, 120]

# Number of districts expected to experience X amount of days above 87 degrees.
district87 = analyze(master, respective_cols, years, thresholds, "87 degrees", "districts")

district87

846 districts are expected to experience at least 30 days over 87 degrees by 2025.
628 districts are expected to experience at least 60 days over 87 degrees by 2025.
469 districts are expected to experience at least 90 days over 87 degrees by 2025.
254 districts are expected to experience at least 120 days over 87 degrees by 2025.
877 districts are expected to experience at least 30 days over 87 degrees by 2035.
815 districts are expected to experience at least 60 days over 87 degrees by 2035.
547 districts are expected to experience at least 90 days over 87 degrees by 2035.
260 districts are expected to experience at least 120 days over 87 degrees by 2035.
877 districts are expected to experience at least 30 days over 87 degrees by 2045.
823 districts are expected to experience at least 60 days over 87 degrees by 2045.
628 districts are expected to experience at least 90 days over 87 degrees by 2045.
358 districts are expected to experience at least 120 days over 87 degrees by 2045.
8

Unnamed: 0,87 degrees(districts),2025,2035,2045,2055
0,30 days,846,877,877,877
1,60 days,628,815,823,822
2,90 days,469,547,628,592
3,120 days,254,260,358,319


## County-Level Extreme Heat Analysis

Get county-level data by merging two datasets together.

In [22]:
columns_to_average = master.columns[8:16]  # Adjusting for 0-indexing

# Group by 'County Name' and calculate the average for specified columns
master_county = master.groupby('County Name', as_index=False)[columns_to_average].mean()

# first 5 rows of county-level extreme heat data
master_county.head(5)

Unnamed: 0,County Name,2020-2029 Average Days (Upper Range),2030-2039 Average Days (Upper Range),2040-2049 Average Days (Upper Range),2050-2059 Average Days (Upper Range),2020-2029 Average Days Above 87 (Upper Range),2030-2039 Average Days Above 87 (Upper Range),2040-2049 Average Days Above 87 (Upper Range),2050-2059 Average Days Above 87 (Upper Range)
0,Alameda,21.0,26.0,29.0,32.0,44.0,65.0,80.0,69.0
1,Alpine,29.0,41.0,46.0,35.0,9.0,8.0,10.0,12.0
2,Amador,26.0,35.0,50.0,42.0,95.0,102.0,124.0,118.0
3,Butte,27.0,36.0,44.0,43.0,121.0,127.0,151.0,133.0
4,Calaveras,26.0,36.0,49.0,42.0,102.0,110.0,131.0,124.0


Collecting the number of counties expected to experience at least 30, 60, 90, and 120 days a year over 87 degrees. This process was also repeated for each of the 4 decade periods.

In [23]:
# Number of counties expected to experience X amount of days above 87 degrees.
respective_cols = [5, 6, 7, 8]

county87 = analyze(master_county, respective_cols, years, thresholds, "over 87 degrees", "county")

county87

50 county are expected to experience at least 30 days over over 87 degrees by 2025.
37 county are expected to experience at least 60 days over over 87 degrees by 2025.
26 county are expected to experience at least 90 days over over 87 degrees by 2025.
16 county are expected to experience at least 120 days over over 87 degrees by 2025.
52 county are expected to experience at least 30 days over over 87 degrees by 2035.
47 county are expected to experience at least 60 days over over 87 degrees by 2035.
30 county are expected to experience at least 90 days over over 87 degrees by 2035.
17 county are expected to experience at least 120 days over over 87 degrees by 2035.
52 county are expected to experience at least 30 days over over 87 degrees by 2045.
48 county are expected to experience at least 60 days over over 87 degrees by 2045.
36 county are expected to experience at least 90 days over over 87 degrees by 2045.
24 county are expected to experience at least 120 days over over 87 degree

Unnamed: 0,over 87 degrees(county),2025,2035,2045,2055
0,30 days,50,52,52,52
1,60 days,37,47,48,47
2,90 days,26,30,36,33
3,120 days,16,17,24,20


Calculated the counties most vulnerable to a large number/increase of high heat days. These were found by taking the counties with the largest increase in high heat days from 2025 to 2055, as well as taking the counties that would experience the most high heat days currently and in 2055.

In [24]:
# Notable counties at risk for high heat
most_days_HHT_currently = master_county.sort_values(by=master_county.columns[1], ascending=False).head(10).iloc[:, 0].to_numpy()
most_days_HHT_2050 = master_county.sort_values(by=master_county.columns[4], ascending=False).head(10).iloc[:, 0].to_numpy()
most_days_87_currently = master_county.sort_values(by=master_county.columns[5], ascending=False).head(10).iloc[:, 0].to_numpy()
most_days_87_2050 = master_county.sort_values(by=master_county.columns[8], ascending=False).head(10).iloc[:, 0].to_numpy()

print("Notable counties at risk for high heat:")
print("----------------")
print('Most days above HHT currently:\n', most_days_HHT_currently)
print('\nMost days above HHT by 2050:\n', most_days_HHT_2050)
print('\nMost days over 87 currently:\n', most_days_87_currently)
print('\nMost days over 87 by 2050:\n', most_days_87_2050)

Notable counties at risk for high heat:
----------------
Most days above HHT currently:
 ['Riverside' 'Imperial' 'San Bernardino' 'Los Angeles' 'Kern' 'Inyo'
 'Tulare' 'Mono' 'San Diego' 'Fresno']

Most days above HHT by 2050:
 ['Los Angeles' 'Imperial' 'Tulare' 'Inyo' 'Kern' 'Ventura' 'San Diego'
 'Riverside' 'San Luis Obispo' 'Fresno']

Most days over 87 currently:
 ['Imperial' 'Riverside' 'San Bernardino' 'Kings' 'Sutter' 'Merced' 'Yolo'
 'Sacramento' 'Colusa' 'San Joaquin']

Most days over 87 by 2050:
 ['Imperial' 'Riverside' 'San Bernardino' 'Kings' 'Sutter' 'Sacramento'
 'Merced' 'Yolo' 'Inyo' 'San Joaquin']


In [25]:
# Counties with the largest expected increase in number of high heat days from 2025 to 2055
master_county['Difference_87'] = master_county.iloc[:, 4] - master_county.iloc[:, 1]
top_5_diff_87 = master_county.sort_values(by='Difference_87', ascending=False).head(5).iloc[:, 0].to_numpy()
master_county['Difference_HHT'] = master_county.iloc[:, 8] - master_county.iloc[:, 5]
top_5_diff_HHT = master_county.sort_values(by='Difference_HHT', ascending=False).head(5).iloc[:, 0].to_numpy()

print("Counties with the largest expected increase in number of high heat days from 2025 to 2055:")
print("----------------")
print('Using 87 Degree Days:\n', top_5_diff_87)
print('\nUsing High Heat Threshold:\n', top_5_diff_HHT)

Counties with the largest expected increase in number of high heat days from 2025 to 2055:
----------------
Using 87 Degree Days:
 ['San Luis Obispo' 'Ventura' 'San Benito' 'Merced' 'San Joaquin']

Using High Heat Threshold:
 ['Sonoma' 'Monterey' 'Alameda' 'San Luis Obispo' 'Contra Costa']


## School-Level and Student-Level Extreme Heat Analysis

Data joining to get student and school level data and cleaning:

In [26]:
# master data set with information on schools and student enrollment

master["Student Enrollment"] = int_fixer(master["Student Enrollment"].str.replace(',', '').astype(str))

# filter out rows where 'Number of Schools' contains '-'
master = master[~master['Number of Schools'].str.contains('-')]

# fix 'Number of Schools' column
master['Number of Schools'] = int_fixer(master['Number of Schools'].astype(str))

# county-level data
grouped_master = master.groupby('County Name').agg({
    'Student Enrollment': 'sum',
    'Number of Schools': 'sum'
}).reset_index()

new_master_county = master_county.merge(grouped_master, on='County Name', how='left')

new_master_county.head(5)

Unnamed: 0,County Name,2020-2029 Average Days (Upper Range),2030-2039 Average Days (Upper Range),2040-2049 Average Days (Upper Range),2050-2059 Average Days (Upper Range),2020-2029 Average Days Above 87 (Upper Range),2030-2039 Average Days Above 87 (Upper Range),2040-2049 Average Days Above 87 (Upper Range),2050-2059 Average Days Above 87 (Upper Range),Difference_87,Difference_HHT,Student Enrollment,Number of Schools
0,Alameda,21.0,26.0,29.0,32.0,44.0,65.0,80.0,69.0,11.0,25.0,197983,369
1,Alpine,29.0,41.0,46.0,35.0,9.0,8.0,10.0,12.0,6.0,3.0,68,3
2,Amador,26.0,35.0,50.0,42.0,95.0,102.0,124.0,118.0,16.0,23.0,4107,13
3,Butte,27.0,36.0,44.0,43.0,121.0,127.0,151.0,133.0,16.0,12.0,27937,80
4,Calaveras,26.0,36.0,49.0,42.0,102.0,110.0,131.0,124.0,16.0,22.0,5275,22


Performed similar calculations for the total number of schools and students that would face these same climate impacts using school-level data. This means calculating the total number of students and school campuses expected to experience at least 30, 60, 90, and 120 days a year over 87 degrees.

In [27]:
# Number of students expected to experience X amount of days above 87 degrees.
respective_cols = [5,6,7,8]

students87 = analyze_students_or_school(new_master_county, respective_cols, years, thresholds, "over 87 degress", "students")

students87

5176641 students are expected to experience at least 30 days over over 87 degress by 2025.
4421294 students are expected to experience at least 60 days over over 87 degress by 2025.
3602244 students are expected to experience at least 90 days over over 87 degress by 2025.
1667566 students are expected to experience at least 120 days over over 87 degress by 2025.
5272287 students are expected to experience at least 30 days over over 87 degress by 2035.
5045530 students are expected to experience at least 60 days over over 87 degress by 2035.
3884999 students are expected to experience at least 90 days over over 87 degress by 2035.
1725133 students are expected to experience at least 120 days over over 87 degress by 2035.
5272287 students are expected to experience at least 30 days over over 87 degress by 2045.
5158987 students are expected to experience at least 60 days over over 87 degress by 2045.
4396158 students are expected to experience at least 90 days over over 87 degress by 204

Unnamed: 0,over 87 degress(students),2025,2035,2045,2055
0,30 days,5176641,5272287,5272287,5272287
1,60 days,4421294,5045530,5158987,5158588
2,90 days,3602244,3884999,4396158,4387583
3,120 days,1667566,1725133,2234655,2156662


In [28]:
# Number of schools expected to experience X amount of days above their high heat threshold.
respective_cols = [5,6,7,8]

schools87 = analyze_students_or_school(new_master_county, respective_cols, years, thresholds, "over 87 degress", "schools")

schools87

9468 schools are expected to experience at least 30 days over over 87 degress by 2025.
7874 schools are expected to experience at least 60 days over over 87 degress by 2025.
6481 schools are expected to experience at least 90 days over over 87 degress by 2025.
2838 schools are expected to experience at least 120 days over over 87 degress by 2025.
9666 schools are expected to experience at least 30 days over over 87 degress by 2035.
9231 schools are expected to experience at least 60 days over over 87 degress by 2035.
7007 schools are expected to experience at least 90 days over over 87 degress by 2035.
2938 schools are expected to experience at least 120 days over over 87 degress by 2035.
9666 schools are expected to experience at least 30 days over over 87 degress by 2045.
9381 schools are expected to experience at least 60 days over over 87 degress by 2045.
7822 schools are expected to experience at least 90 days over over 87 degress by 2045.
3949 schools are expected to experience a

Unnamed: 0,over 87 degress(schools),2025,2035,2045,2055
0,30 days,9468,9666,9666,9666
1,60 days,7874,9231,9381,9376
2,90 days,6481,7007,7822,7751
3,120 days,2838,2938,3949,3738


## Table Overview and Percentage Analysis of District and County Level Extreme Heat

4 tables, each representing a different measurement of impact (County-wide, district-wide, school-wide, and student-wide). 2 of the 4 tables were additionally calculated in percentage form (county-wide and district-wide).

In [29]:
district87, county87, schools87, students87

(  87 degrees(districts)  2025  2035  2045  2055
 0               30 days   846   877   877   877
 1               60 days   628   815   823   822
 2               90 days   469   547   628   592
 3              120 days   254   260   358   319,
   over 87 degrees(county)  2025  2035  2045  2055
 0                 30 days    50    52    52    52
 1                 60 days    37    47    48    47
 2                 90 days    26    30    36    33
 3                120 days    16    17    24    20,
   over 87 degress(schools)  2025  2035  2045  2055
 0                  30 days  9468  9666  9666  9666
 1                  60 days  7874  9231  9381  9376
 2                  90 days  6481  7007  7822  7751
 3                 120 days  2838  2938  3949  3738,
   over 87 degress(students)     2025     2035     2045     2055
 0                   30 days  5176641  5272287  5272287  5272287
 1                   60 days  4421294  5045530  5158987  5158588
 2                   90 days  3602244  388

In [30]:
districts_percent_df = district87.copy() 
# Apply the conversion to the relevant columns (2025, 2035, 2045, 2055)
percentage_columns = ['2025', '2035', '2045', '2055']
for col in percentage_columns:
    districts_percent_df[col] = districts_percent_df[col].apply(divide_district)

# Display the resulting DataFrame
districts_percent_df

Unnamed: 0,87 degrees(districts),2025,2035,2045,2055
0,30 days,90.4%,93.7%,93.7%,93.7%
1,60 days,67.1%,87.1%,87.9%,87.8%
2,90 days,50.1%,58.4%,67.1%,63.2%
3,120 days,27.1%,27.8%,38.2%,34.1%


In [31]:
county_percent_df = county87.copy() 
# Apply the conversion to the relevant columns (2025, 2035, 2045, 2055)
for col in percentage_columns:
    county_percent_df[col] = county_percent_df[col].apply(divide_county)

# Used to calculate percentages of the table values for counties
county_percent_df

Unnamed: 0,over 87 degrees(county),2025,2035,2045,2055
0,30 days,86.2%,89.7%,89.7%,89.7%
1,60 days,63.8%,81.0%,82.8%,81.0%
2,90 days,44.8%,51.7%,62.1%,56.9%
3,120 days,27.6%,29.3%,41.4%,34.5%
