# Urban Greenspace and Depression Prevalence in Denver

<figure>
    <img
        src="https://www.planetizen.com/files/styles/featured_large/public/images/Central%2070%20Covertop_003.jpg.webp?itok=GTYzjMbd"
        alt="Cover park over I-70 Denver, courtesy of Planetizen and CDOT (Nelson and Winling (2023))" 
        height="600px"/>
    <figcaption aria-hidden="true">
        Cover park over I-70 Denver, courtesy of Planetizen and CDOT
        (<span 
            class="citation"
            data-cites="Hanmson_2023">
            Hammon (2023)
         </span>)
    </figcaption>
</figure>

## Overview

In this project I will focus on Denver CO, I will use health outcome data from the 
CDC, specifically % depression prevalence (using the wayback machine to acces the 
data that was taken down) as well as satellite based **multispectral** data for the 
City and County of Denver. This is to help answer the question of if vegetation related 
factors can predict % of depression prevalence by census tract.

Denver, CO was chosen as the site of choice because I hope to use Denver as the site 
for the final project and choosing Denver will help me get practice in and orient myslef 
to the spectral data needed and if it's something I can use or take pieces of for my final 
project. Additionally, I chose Phildelphia for the previous urban greenspace project in the 
previous course and wanted to choose a different city. I am most familiar with Denver and 
can use my existing knowledge of the city to guide any research and analyses.

The health outcome factor of % depression prevalence was chosen because I have done 
previous research on the effect that greenspace has on mental health in urban planning 
and I was curious on if a predictive model like OLS regression and computing error would 
show a predictive relationship between the two.

Denver, like many other cities, has uneven access to urban green space that
is rooted in racial discrimination and other injustices. Redlining that took place in 
the 'New Deal Era' has had lasting effects on the distribution of urban greenspace

### Citations:

* (Source of photo) Hammon, Mary. “Opening of Denver’s New Freeway Cap Park Triggers 
Gentrification Fears.” 2023. Planetizen.com. 2023. https://www.planetizen.com/news/2023/12/126716-opening-denvers-new-freeway-cap-park-triggers-gentrification-fears.

* Reklaitiene, Regina, Regina Grazuleviciene, Audrius Dedele, Dalia Virviciute, Jone 
Vensloviene, Abdonas Tamosiunas, Migle Baceviciene, et al. 2014. “The Relationship of 
Green Space, Depressive Symptoms and Perceived General Health in Urban Population.” 
Scandinavian Journal of Public Health 42 (7): 669–76. https://doi.org/10.1177/1403494814544494.

* Rigolon, Alessandro, and Jeremy Németh. 2018. “What Shapes Uneven Access to Urban 
Amenities? Thick Injustice and the Legacy of Racial Discrimination in Denver’s Parks.” 
Journal of Planning Education and Research 41 (3): 0739456X1878925.
https://doi.org/10.1177/0739456x18789251.

## Data Description





### * **CDC Places Data - via the 'Internet Archive'**

CDC Places prior to 2020 was known as 500 Cities, so instead of having full 
coverage across the US including rural areas, it only included 500 cities (). 
While in this phase, the datasets also only included chronic diseases (). Since 2020 
it has expanded the measures included in the datasets, and these measures included, 
are increasing each year the datasets are released (). It appears new sets of data 
are released every year rather than old/ previous years' datasets being updated 
(CDC Places About the Data). 

If curious about the methodology that CDC Places uses and how the data is validated, 
please visit [CDC Places methodology](https://www.cdc.gov/places/methodology/index.html).
*(Since writing this, this website has been taken down).*

Due this being realted to the Census, which is only done every 10 years, also 
affects this data in that prior research will need to be done to see what year 
of the Census the CDC Places data is using. For example, the 2024 CDC Places data 
is using the 2020 Census, not the 2010 Census like the prior years of the CDC Places 
data (). These datasets also differ by geography type or administrative boundaries (). 
While this project is using Census Tracts as the boundary, there are other options 
available like U.S. counties, census designated places, and ZIP Code Tabulation Areas 
(ZCTA's) (). Depending on the goal of a project would likely change the geography 
being used. CDC Places data includes all areas of the United States with 50 or 
more adult residents (CDC Places About the Data).

*(Prior to 2/2/2025) The data itself can be accessed in multiple formats icnluding JSON, GeoJSON, 
CSV if getting via API. Or, if downloading it can be accessed in even more formats 
including those listed prior and many others. This is worth exploring and considering 
depending on what project the data is being used for and how the data will be used.
I was able to use the API to get the census tracts associated with the CDC places data 
without issue; however, once i got to the downloading the health outcome data like asthma, 
depression, stroke, etc. I wasn't able to acces it via API. Instead, I used the *Internet Archive* 
to access data that was previously available on the CDC website ()

Prior to the CDC data being taken down, the data was free to access which contributed 
to Open Data Science efforts. The data was considered public domain and does not require 
or have a specific license to use or manipulate the data as of 2/2/2025.

### CDC Places Citations:
Centers for Disease Control and Prevention. PLACES: Local Data for Better Health. Accessed [2/2/2025]. https://www.cdc.gov/places

* CDC Places About the Data - https://www.cdc.gov/places/about/index.html
* CDC Places acess data here - https://data.cdc.gov/browse?category=500+Cities+%26+Places&q=2024&sortBy=relevance&tags=places
(*Note - this link takes you too all the localties and verstions avialable not 
the specific one used here).
* CDC Places data accessed for this project - https://data.cdc.gov/resource/cwsq-ngmh.geojson
* CDC Places Methodology - https://www.cdc.gov/places/methodology/index.html
* CDC Places Measure Definitions - https://www.cdc.gov/places/measure-definitions/index.html

* Internet Archive - CDC Datasets (*ALL*) - https://archive.org/details/20250128-cdc-datasets
* Internet Archive About - https://archive.org/about/

### * **NASA Harmonized Landsat Sentinal - Multispectral Data**

HLS L30 (Harmonized Landsat and Sentinel-2) data in short, is the combined 
measurement from the NASA/USGS Landsat satellites (8 and 9) and the European 
Sentinels (2A and 2B). The combination of the two is what makes them 'harmonized' 
and this also means there's more frequent observations (images) of the land being 
taken (every 2-3 days) rather than less frequent if used separately (Jeffrey G. 
Masek, Junchang Ju). The user guide can be found [here](https://lpdaac.usgs.gov/documents/1698/HLS_User_Guide_V2.pdf).
The L30 part is in reference to the images being at a special 
30-meter spatial resolution, you can read more about it [here](https://lpdaac.usgs.gov/products/hlsl30v002/).
These satellites have multiple sensors, each sensor is for a different reflective 
spectral band in the electromagnetic wavelength. A band alone cannot convey 
much, but the relationship between two or more bands does - these are normalized 
spectral indices such as NDBI, NDVI, and others. Information on more in detail 
about remote sensing can be found [here](https://seos-project.eu/remotesensing/remotesensing-c01-p06.html). 
Bands layered can also output a True Color Image that uses the visible bands 
(red, green, and blue) which is what we typically associate with a photo we 
take with a camera or our phone, and a CIR (Color Infrared) Image (a.k.a. False 
Color Image) which uses different bands (Near Infrared - 'NIR', red, and green). 
The use of NIR here enhances the visualization of healthy vegetation which the 
NIR strongly reflects, making healthy vegetation appear bright red - basically 
enhances what the human eye cannot see. 

To access the data [NASA Worldview](https://worldview.earthdata.nasa.gov) was used 
to search for Philadelphia specifically looking at the HLS L30 dataset.
A free account will need to be created to access this data and be able to 
use it in your codespace. The module earthaccess and function search_data 
using parameters are used to search for the HLS L30 data according to 
the parameters set. Because this data comes from 'professional' 
sources (NASA, USGS, and European agencies) and gathered through satellites, 
there should be a degree of trust in the data, but it depends how you are 
using it and what you are using it for. I am interested in the NDVI to tell 
a story of urban green space as it relates to the redlining map of Philadelphia.


The data can vary depending on the day or days used. For example, certain days have 
more or complete cloud coverage and it is essentially useless to for this case 
study to pick that single day. It is best to get a 'clear' image or images where 
there is little to no cloud coverage for the best results. I had to search for potential
clear days that looked to be promising date to use as the 'temporal' parameter 
when trying to access this data (will be seen later in this portfolio post). 
There will also be differences in the image based on time of year depending 
on what you are looking at. If interested in vegetation for the U.S., it's 
important to know the climate and environment because you would want a time 
of year where the vegetation is healthiest and would have the greatest % 
reflection; however, it depends on what you are using this data to accomplish. 
In this case study a general vibrant time of year for vegetation like summer 
time in Philadelphia is ideal because the reflectance for vegetation at urban 
green spaces would be high at this time in theory. Therefore, it would be more 
clear where these green spaces are and help in the comparison to the 
redlining areas.

The data provided on NASA Worldview can be used for many things beyond what this 
case study aims to do, and it is worth exploring on your own.

### NASA HLS Citations:

* NASA Goddard. 2024. “Data in Harmony: NASA’s Harmonized Landsat and Sentinel-2 Project.” 
YouTube. April 22, 2024. https://www.youtube.com/watch?v=63ljR84c85M.

**NASA HLS L30 Product Description**
* https://hls.gsfc.nasa.gov/products-description/l30/

**USGS HLS L30 V002 Description**
*  Masek,Jeffrey G. and Ju, Junchang. "HLSL30 v002: HLS Operational 
Land Imager Surface Reflectance and TOA Brightness Daily Global 30m."
<https://lpdaac.usgs.gov/products/hlsl30v002/>

**NASA Worldview - Multispectral Data**
* https://worldview.earthdata.nasa.gov

## Methods Description

OLS Regression will be used as the model to find out if there is a 
statistically significant relationship between depression and greenspace.
This will be using the variables of % of depression prevalence and 
vegeation related variables. Models can be used for different purposes 
in earth data science, one of which is prediction. Here, the goal is to see 
if the model can predict the depression prevalence accurately. In order to 
evaluate the model, calculated error could be looked at or calculate 
R squared which would say what percent of variation in depression prevalence 
can be explained by the model. However, when choosing a model it is important 
to take into account assumptions about the data and if the model is appropriate 
given the data. Some important assumptions about OLS regression are: 
linearity - assuming there is a linear relationship between the two variables, 
normally distributed error - data shouldn't have long tails, 
independence - avoid co-linearity of tightly correlated variables, 
stationarity - parameters of the model should not vary over time, 
and complete observations - shouldn't have large amounts of no data 
values.

The data can be manipulated or adjusted to be a better fit. For example:
* no data values can be dropped to account for complete observations using .dropna()
* log of variables can be taken to make the data more normally distributed and 
not have long tails using np.log.
* normalization/standardization of data to account for the different scales 
of the variables
* Etc.

For this project independence and stationairity are not of major concern, but 
there are no data values, the data has tails, and the two variables do have 
very different scales, so the data needs to have the fixes done to fit the model.
There is a delicate balance of fitting versus overfitting.
Potential issues with choosing this model are using the variables given, 
there may not be a linear relationship and so the results may be quite muddy or 
not relay that there is a relationship between the variables. Another potential 
issue is overfitting the model - this would mean making so many or enough 
adjustments to the data to 'fit it' and it results in the model fitting to 
the noise having very low error and not much can be relayed of a possible 
relationship between the variables when this happens. Because overfitting 
in particular is a worry here, a way to avoid that, is cross validation.

Cross validation...

Computing model error...

Personally not having much experience in Earth Data Science and Statistics,
I am unfamiliar with other possible models that could be used here. Another 
one I know of it the Decision Tree Model but would need to do more research 
if other models would be appropriate here.

## Site Description

Denver, like many other cities, has uneven access to urban greenspace that
is rooted in racial discrimination and other injustices going back to the
City Beautiful movement and New Deal era. The map above shows
the grid Denver is based on with 2 axes, going north to south is Broadway, and going west
to east is Colfax. The boxes or odd shapes within the boundary are urban 
greenspaces or parks, with some of the ones, such as Sloan's Lake on the far 
east border of the city, not being clearly outlined as parks. These outlines
show some of the major or flagship parks like Cheeseman Park (just north of Colfax
and just to the east of Broadway), Washington Park (slender rectangle south
of Colfax), and City Park ( north of Colfax, east of Broadway).

The practice of redlining in this movement and era used zoning ordinances 
and other policy (housing and insurance policy) to exclude low-income people who were 
predominately Black or of races and ethnicties other than White) and could 
only live and play in certain areas. Because parks were a large push of 
the City Beautiful Movement which was about increasing the city's prestige, 
the majority of the parks created were in upscale white neighborhoods
that paid higher property taxes. "Zoning became less restrictive as one
moved away from the parks, suggesting that city planners
aimed to maximize proximity between expensive homes and
prestigious parks". This was deliberate action taken by numerous people including
politicians, city planners, insurance people, people in real estate, etc. to
purposefully disinvest in non-White neighborhoods and areas. While there 
has been work to undo these redlining policies and create hopefully someday
equal access and use to urban greenspace- to this day those flagship parks
predominately still serve the city's most affluent groups of whom are White.
([What Shapes Uneven Access to Urban Amenities? Thick Injustice 
and the Legacy of Racial Discrimination in Denver’s 
Parks](https://collective.coloradotrust.org/wp-content/uploads/sites/2/2019/07/jper.pdf))


### Citations:

* Cernansky, Rachel. 2019. “Unequal Access to Parks in Denver Has Roots in 
History.” Collective Colorado. July 16, 2019. 
https://collective.coloradotrust.org/stories/unequal-access-to-parks-in-denver-has-roots-in-history/.

* Chen, Victor, Stefan Chavez-Norgaard, and University of Richmond. 2025. “Mapping Inequality: Denver.” 
Mapping Inequality: Redlining in New Deal America. University of Richmond. 2025. 
https://dsl.richmond.edu/panorama/redlining/map/CO/Denver/context#loc=12/39.6994/-104.9581.

* Rigolon, Alessandro, and Jeremy Németh. 2018. “What Shapes Uneven Access to Urban 
Amenities? Thick Injustice and the Legacy of Racial Discrimination in Denver’s Parks.” 
Journal of Planning Education and Research 41 (3): 0739456X1878925.
https://doi.org/10.1177/0739456x18789251.

* Sachs, David. 2018. “This Shape Explains Denver’s Past, Present and Likely Its Future.” 
Denverite. December 21, 2018. https://denverite.com/2018/12/21/denver-socioeconomic-map-shape/.

* Sachs, David. 2021. “How Denver Is Chipping Away at the Inverted L: Housing and Trees Edition.” 
Denverite. April 7, 2021. 
https://denverite.com/2021/04/07/how-denver-is-chipping-away-at-the-inverted-l-housing-and-trees-edition/.


