---
format: 
  html:
    toc: true
execute:
  echo: true
---

# OLS Analysis of Heat Stress: Socioeconomic and Built Form Relationships

**1. Analyzing Heat Stress Around Bus Stops Using OLS**

Using Ordinary Least Squares (OLS) to analyze the relationship between heat stress around bus stops and socioeconomic and built form factors.


In [2]:
import pickle
import pandas as pd

# File paths
model_file_path = "ols_model.pkl"
vif_file_path = "vif_results.csv"

# Load the OLS model
with open(model_file_path, "rb") as file:
    loaded_model = pickle.load(file)

# Load the VIF data
loaded_vif_data = pd.read_csv(vif_file_path)

# Print results
print(loaded_model.summary())
print("\nLoaded VIF Results:")
print(loaded_vif_data)


                            OLS Regression Results                            
Dep. Variable:                   MEAN   R-squared:                       0.715
Model:                            OLS   Adj. R-squared:                  0.714
Method:                 Least Squares   F-statistic:                     1355.
Date:                Wed, 25 Dec 2024   Prob (F-statistic):               0.00
Time:                        04:15:22   Log-Likelihood:                -1190.0
No. Observations:                8126   AIC:                             2412.
Df Residuals:                    8110   BIC:                             2524.
Df Model:                          15                                         
Covariance Type:            nonrobust                                         
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                   37.6897 

**2. Discussion of the results**

- ***Model Fit***:
The results indicate that the dependent variable mean temperature of bus stops is well-explained by the model, with an R-squared value of 0.715 and an adjusted R-squared of 0.714, meaning approximately 71.4% of the variation in mean temperature of bus stops is captured by the included independent variables. 

- ***Key Predictors***:
The regression model reveals that socioeconomic, environmental, and built-environment factors significantly influence the mean temperature at bus stops. Socioeconomic factors such as the proportion of Native Americans show a strong positive relationship with bus stop temperatures (coefficient 3.4687), while higher proportions of Asian Americans and other racial groups are associated with lower temperatures. Higher poverty rates also correlate with elevated bus stop temperatures, reflecting disparities in environmental conditions.

Environmental factors like the GINI Index (income inequality) and population density have smaller impacts, with higher population density contributing to slightly higher temperatures, likely due to urban heat island effects. Green space, represented by the Green View Index (GVI), has a strong cooling effect, with a significant negative coefficient (-6.8951), indicating that greenery plays a key role in reducing temperatures at bus stops.

Built-environment features such as Floor Area Ratio (FAR) and enclosure also influence temperatures. Higher FAR (denser areas) slightly reduces temperatures, while enclosed areas significantly mitigate heat. These findings highlight the importance of urban design interventions, including increasing greenery and improving shading, to address elevated bus stop temperatures in densely populated or disadvantaged areas.

- ***VIF***:
The VIF values indicate that the model does not suffer from severe multicollinearity,

**3. Data Sources**

The datasets used in this study encompass various high-resolution spatial and meteorological data, all of which correspond to the year of 2020. These include the bus stop data, 1-meter land-use map, LiDAR point cloud data, sidewalk map, building footprint map, meteorological data, google street view images and socioeconomic data which was collected from American Community Survey 5-year data of 2016–2020, providing a comprehensive basis for analyzing heat stress and its relationship with urban environment in Philadelphia.

Bus stop data for July 2020 was obtained from the General Transit Feed Specification (GTFS) dataset provided by the SEPTA, accessed via TransitFeeds (https://transitfeeds.com/p/septa/263). This data includes detailed information on bus stop locations, routes, and schedules, essential for identifying specific bus stops and analyzing heat stress variation in the public transit system.

The high-resolution land-use map, created semi-automatically using high-resolution aerial imagery and LiDAR data with an accuracy of approximately 90%, was sourced from Pennsylvania Spatial Data Access (PASDA) (https://www.pasda.psu.edu). 

Additionally, LiDAR point cloud data in the form of pre-processed x, y, and z files was acquired from the United States Geological Survey (USGS) 3D Elevation Program (https://usgs.entwine.io/). Using the open-source tool PDAL, the LiDAR point clouds were converted into a Digital Elevation Model (DEM) and a Digital Surface Model (DSM). The DSM, in conjunction with the land-use map, was used to derive the building height model and tree canopy height model for the study area.

The sidewalk map for Philadelphia was sourced from the Delaware Valley Regional Planning Commission (DVRPC) (https://walk.dvrpc.org), allowing for a detailed assessment of pedestrian heat exposure in proximity to bus stops. 

Building footprint data was acquired from the City of Philadelphia (https://opendataphilly.org/datasets/building-footprints) with the building height attribute. 

Hourly meteorological data—including air temperature, humidity, global horizontal radiation, direct radiation, and diffuse radiation was gathered from the National Renewable Energy Laboratory (NREL) (https://nsrdb.nrel.gov), providing the necessary parameters to calculate heat stress using the Universal Thermal Climate Index (UTCI).

Street view images have provided an opportunity to conduct various urban research with its wide coverage and accessibility (Biljecki & Ito, 2021). The data can be used to assess the street-level urban greenery (Li et al., 2015), quantifying street canyons (Gong et al., 2018), and measuring human perceptions (Ma et al., 2021; Zhao et al., 2023). To systematically gather this data, we established sampling points at 50-meter intervals along streets. Using the Google Maps API, we retrieved street view image identifiers (IDs) near each sampling point, which provided geographic coordinates and image capture dates. Considering the seasonal variations, we filtered the images to those taken during the spring and summer months of 2018 to 2020, resulting in a dataset of 178,826 unique panoramic images. These images were then processed using the Deeplabv3+ model (Chen et al., 2018a), a high-resolution image segmentation model trained on the Cityscape dataset (Cordts et al., 2016). This model allowed us to quantify urban elements—such as roads, sidewalks, buildings, trees, vehicles, and pedestrians, which allowed us to quantify the percentage of each urban element.
