# How climate and envrionmental justice tools are used in research

studies

Elham Ali (Public Environmental Data Project)  
September 2, 2025

TBD

## Background

When public climate & EJ evidence disappears (removed, restricted, or altered), a decade of downstream knowledge becomes harder to verify, reproduce, teach, or apply—especially for communities and decisions that most need it.  
This analysis looks at how many studies have used these tools, their topics, and their use cases.

## Questions

Here are the key questions explored in this analysis:

-   How many research papers or studies have cited or relied on five U.S. federal climate and environmental tools that are threatened, not being maintained, or no longer available?  
-   What research topics are associated with each tool, and how do these topics vary by citation frequency?  
-   In what ways are these tools applied in research?

## Data Sources

The climate tools assessed are:

-   CEQ’s **Climate and Economic Justice Screening Tool (CEJST)**  
-   EPA’s **EJScreen**  
-   USDA Forest Service’s **Climate Risk Viewer (CRV)**  
-   FEMA’s **Future Risk Index (FRI)**  
-   CDC’s **Environmental Justice Index (EJI)**

The data for this project comes from Google scholar and last refreshed on August 29, 2025.

Original raw datasets are saved in the `data/` folder. This script reduces and cleans those datasets to prepare them for analysis.

------------------------------------------------------------------------

## Cleaning

I start by loading the packages needed for file handling, data wrangling, and visualization.

In [None]:
## Folder structure helpers
library(here)

here() starts at /Users/elhamali/Documents/Data Projects/climate-ej-tools-data-story

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors


Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test

Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
Highcharts (www.highcharts.com) is a Highsoft software product which is
not free for commercial and Governmental use


Attaching package: 'igraph'

The following objects are masked from 'package:lubridate':

    %--%, union

The following objects are masked from 'package:dplyr':

    as_data_frame, groups, union

The following objects are masked from 'package:purrr':

    compose, simplify

The following object is masked from 'package:tidyr':

    crossing

The following object is masked from 'package:tibble':

    as_data_frame

The following objects are masked from 'package:stats':

    decompose, spectrum

The following object is masked from 'package:base':

    union

### Import raw data

I import all .csv files from the `data/` folder, then save them as .rds files into `output/`. This preserves their structure and speeds up future reads.

In [None]:
# List all CSV files
csv_files <- list.files(here("data"), pattern = "\\.csv$", full.names = TRUE)

# Read into a list of dataframes
datasets <- map(csv_files, read.csv)
names(datasets) <- tools::file_path_sans_ext(basename(csv_files))

# Save each dataset as .rds in output/
walk2(
  datasets,
  names(datasets),
  ~ saveRDS(.x, here("output", paste0(.y, ".rds")))
)

### Merge EJScreen datasets

Google Scholar caps search results at 1,000 records ([Harzing 2025](#ref-harzing2025)).

Because EJScreen returned more than this, I split the results into four files (ejscreen_1–ejscreen_4). Here I combine them into one dataframe.

In [None]:
# List EJScreen RDS files
ejscreen_files <- here("output") |>
  list.files(pattern = "^ejscreen_\\d+\\.rds$", full.names = TRUE)

# Read, tag, clean variable names
ejscreen_list <- map(ejscreen_files, ~ {
  readRDS(.x) |>
    mutate(source_file = tools::file_path_sans_ext(basename(.x))) |>
    janitor::clean_names()
})

# Merge into one dataframe
ejscreen_all <- bind_rows(ejscreen_list)

# Quick check on dimensions
dim(ejscreen_all)  

[1] 1242   27

### Clean and tag all other tools

I apply the same cleaning process to the other tools (CEJST, CRV, EJI, FRI):

-   Standardize variable names to snake_case
-   Add a source_file column for provenance
-   Guess variable types (integers, doubles, dates, etc.)

In [None]:
# Helper function: read, clean names, add provenance, type-convert
read_clean_tag <- function(path) {
  df <- readRDS(path) |>
    janitor::clean_names()

  source_stem <- tools::file_path_sans_ext(basename(path))  # e.g., "ejscreen_1" or "cejst"

  df |>
    mutate(
      source_file = source_stem,
      tool_name   = sub("_\\d+$", "", source_stem)  # "ejscreen_1" -> "ejscreen"; "cejst" -> "cejst"
    ) |>
    readr::type_convert(col_types = readr::cols(.default = readr::col_guess())) |>
    mutate(across(where(is.character), trimws))
}
# 
# # Rebuild ejscreen_all if needed
# if (!exists("ejscreen_all")) {
#   ejs_files <- list.files(here("output"), pattern = "^ejscreen_\\d+\\.rds$", full.names = TRUE)
#   ejscreen_all <- ejs_files |> map(read_clean_tag) |> list_rbind()
# }

# Process other tools
other_tools <- c("cejst", "crv", "eji", "fri")
other_paths <- here("output", paste0(other_tools, ".rds"))

other_list <- other_paths |>
  set_names(~ tools::file_path_sans_ext(basename(.x))) |>
  map(read_clean_tag)

### Final merge and labeling

I now merge all tools into a single dataset. Additional steps:

-   Ensure consistent date formats
-   Map tool codes (ejscreen, cejst, etc.) to their full names
-   Drop empty placeholder columns

In [None]:
# Helper: coerce anything date-like into Date
to_Date <- function(x) {
  if (inherits(x, "Date"))    return(x)
  if (inherits(x, "POSIXt"))  return(as_date(x))
  if (is.numeric(x))          return(as_date(as.POSIXct(x, origin = "1970-01-01", tz = "UTC")))
  if (is.character(x)) {
    suppressWarnings({
      dt <- ymd_hms(x, quiet = TRUE, tz = "UTC")
      idx <- is.na(dt); if (any(idx)) dt[idx] <- ymd(x[idx], quiet = TRUE)
      idx <- is.na(dt); if (any(idx)) dt[idx] <- mdy(x[idx], quiet = TRUE)
      idx <- is.na(dt); if (any(idx)) dt[idx] <- dmy(x[idx], quiet = TRUE)
    })
    return(as_date(dt))
  }
  as_date(as.character(x))
}

# Apply date coercion
if ("query_date" %in% names(ejscreen_all)) {
  ejscreen_all <- ejscreen_all %>% mutate(query_date = to_Date(query_date))
}

if (exists("ejscreen_all") && "query_date" %in% names(ejscreen_all)) {
  ejscreen_all <- ejscreen_all %>% mutate(query_date = to_Date(query_date))
}

if (exists("other_list")) {
  other_list <- other_list %>%
    map(~ if ("query_date" %in% names(.x)) mutate(.x, query_date = to_Date(query_date)) else .x)
} else {
  other_list <- list()
}

# Merge all datasets
all_tools <- bind_rows(ejscreen_all, !!!other_list)

# Map tool codes to full names
all_tools <- all_tools %>%
  mutate(
    source_file = if_else(is.na(source_file) | source_file == "",
                          coalesce(source_file, tool_name), source_file),
    base_tool   = sub("_\\d+$", "", source_file),   # ejscreen_1 -> ejscreen
    tool_name   = dplyr::recode(
      base_tool,
      ejscreen = "EJScreen",
      cejst    = "Climate and Economic Justice Screening Tool (CEJST)",
      crv      = "Climate Risk Viewer",
      fri      = "Future Risk Index (FRI)",
      eji      = "Environmental Justice Index (EJI)",
      .default = tool_name  # keep whatever was there if not matched
    )
  ) %>%
  select(-base_tool)

# Drop empty placeholder columns
all_tools <- all_tools %>%
  select(-any_of(c("issn", "citation_url", "volume", "issue", "start_page", "end_page")))

# Save combined dataset
glimpse(all_tools)

Rows: 2,170
Columns: 22
$ cites            <int> 1, 0, 0, 61, 0, 2, 0, 14, 0, 4, 14, 126, 71, 0, 3, 0,…
$ authors          <chr> "A Katner, E LeCompte, C Stallard, I Walsh, K Brisola…
$ title            <chr> "Traffic Related Pollutants and Human Health within t…
$ year             <int> 2019, 2020, 2019, 2018, 2020, 2018, 2018, 2016, 2018,…
$ source           <chr> "", "", "", "Critical Criminology", "", "University o…
$ publisher        <chr> "cnu.org", "digital.lib.washington.edu", "epa.gov", "…
$ article_url      <chr> "https://www.cnu.org/sites/default/files/Claiborne%20…
$ cites_url        <chr> "https://scholar.google.com/scholar?cites=16608281987…
$ gs_rank          <int> 17, 20, 22, 15, 14, 10, 28, 25, 26, 13, 6, 4, 2, 11, …
$ query_date       <date> 2025-08-29, 2025-08-29, 2025-08-29, 2025-08-29, 2025…
$ type             <chr> "PDF", "", "PDF", "", "PDF", "PDF", "CITATION", "PDF"…
$ doi              <chr> "", "", "", "10.1007/s10612-018-9399-6", "", "", "", …
$ ecc           

## Analysis

I will look at each question one by one and clean the data as I go. I will organize the data during the analysis before exploring the results. I’ll also export intermediate results into tidy CSV files so they are ready for further visualization and exploration.

### Research question 1

How many research papers or studies have cited or relied on five U.S. federal climate and environmental tools that are threatened, not being maintained, or no longer available?

#### Total number of studies

In [None]:
nrow (all_tools)

[1] 2170

#### Count of studies per tool

In [None]:
df_tool <- all_tools %>%
  count(tool_name, name = "n_studies") %>%
  arrange(desc(n_studies))

hchart(df_tool, "column", hcaes(x = tool_name, y = n_studies)) %>%
  hc_title(text = "Number of studies by tool") %>%
  hc_subtitle(text = "Total count of research papers citing or relying on each tool") %>%
  hc_xAxis(title = list(text = "Tool")) %>%
  hc_yAxis(title = list(text = "Number of studies"), allowDecimals = FALSE) %>%
  hc_tooltip(pointFormat = "<b>{point.y}</b> studies") %>%
  hc_legend(enabled = FALSE) %>%
  hc_exporting(enabled = TRUE)

#### Count of studies per year per tool

In [None]:
df_year_tool <- all_tools %>%
  filter(!is.na(year)) %>%
  count(tool_name, year, name = "n_studies") %>%
  arrange(tool_name, year)

hchart(
  df_year_tool,
  "line",
  hcaes(x = year, y = n_studies, group = tool_name)
) %>%
  hc_title(text = "Studies over time by tool") %>%
  hc_subtitle(text = "Annual counts of studies citing or relying on each tool") %>%
  hc_xAxis(title = list(text = "Year"), allowDecimals = FALSE) %>%
  hc_yAxis(title = list(text = "Number of studies"), allowDecimals = FALSE) %>%
  hc_tooltip(shared = TRUE, crosshairs = TRUE,
             headerFormat = "<b>Year: {point.key}</b><br/>",
             pointFormat = "{series.name}: <b>{point.y}</b><br/>") %>%
  hc_legend(title = list(text = "Tool")) %>%
  hc_exporting(enabled = TRUE)

#### Top 10 most cited studies overall per tool

In [None]:
df_top_cites <- all_tools %>%
  filter(!is.na(cites)) %>%
  group_by(tool_name) %>%
  slice_max(order_by = cites, n = 10, with_ties = FALSE) %>%
  ungroup() %>%
  select(tool_name, year, title, cites, authors) %>%
  arrange(tool_name, desc(cites))

# Show table (optional)
df_top_cites

# A tibble: 46 × 5
   tool_name            year title                                 cites authors
   <chr>               <int> <chr>                                 <int> <chr>  
 1 Climate Risk Viewer  2024 Transition versus physical climate r…   227 G Bua,…
 2 Climate Risk Viewer    NA USDA Forest Service                      27 I Fore…
 3 Climate Risk Viewer  2024 Physical and transition risk premium…    14 JV Bat…
 4 Climate Risk Viewer  2023 Measuring the climate risk exposure …    13 H Jung…
 5 Climate Risk Viewer  2024 Perceived climate risk and stock pri…    12 H Ben …
 6 Climate Risk Viewer  2024 Physical climate risk factors and an…     6 H Jung…
 7 Climate Risk Viewer  2024 A practical framework for applied fo…     5 AD Bow…
 8 Climate Risk Viewer  2024 Climate risks, corporate bonds, and …     5 V Lalw…
 9 Climate Risk Viewer  2025 Land of opportunity: potential for r…     4 T Mai,…
10 Climate Risk Viewer  2023 Firm-specific climate risk estimated…     3 T Dang…
# ℹ 36 mo

#### Top 10 highest `cites_per_year` per tool

This shows which studies are “most cited relative to their age.”

In [None]:
df_top_cpy <- all_tools %>%
  filter(!is.na(cites_per_year)) %>%
  group_by(tool_name) %>%
  slice_max(order_by = cites_per_year, n = 10, with_ties = FALSE) %>%
  ungroup()

df_top_cpy %>% 
  select(tool_name, year, title, cites_per_year, cites, authors) %>%
  arrange(tool_name, desc(cites_per_year))

# A tibble: 46 × 6
   tool_name            year title                  cites_per_year cites authors
   <chr>               <int> <chr>                           <dbl> <int> <chr>  
 1 Climate Risk Viewer  2024 Transition versus phy…          227     227 G Bua,…
 2 Climate Risk Viewer  2024 Physical and transiti…           14      14 JV Bat…
 3 Climate Risk Viewer  2024 Perceived climate ris…           12      12 H Ben …
 4 Climate Risk Viewer  2023 Measuring the climate…            6.5    13 H Jung…
 5 Climate Risk Viewer  2024 Physical climate risk…            6       6 H Jung…
 6 Climate Risk Viewer  2024 A practical framework…            5       5 AD Bow…
 7 Climate Risk Viewer  2024 Climate risks, corpor…            5       5 V Lalw…
 8 Climate Risk Viewer  2025 Land of opportunity: …            4       4 T Mai,…
 9 Climate Risk Viewer  2023 Firm-specific climate…            1.5     3 T Dang…
10 Climate Risk Viewer    NA Forest Carbon Assessm…            0       0 A Duga…
# ℹ 36 mo

#### Average `cites_per_year` over time per tool

This shows whether more recent papers citing a tool are getting traction.

In [None]:
df_avg_cpy <- all_tools %>%
  filter(!is.na(year), !is.na(cites_per_year)) %>%
  group_by(tool_name, year) %>%
  summarise(avg_cpy = mean(cites_per_year, na.rm = TRUE), .groups = "drop")

hchart(df_avg_cpy, "line", hcaes(x = year, y = avg_cpy, group = tool_name)) %>%
  hc_title(text = "Average cites per year over time, by tool") %>%
  hc_yAxis(title = list(text = "Average cites per year")) %>%
  hc_exporting(enabled = TRUE)

#### Publishers

Which publishers/platforms appear most frequently?

In [None]:
df_publishers <- all_tools %>%
  filter(publisher != "") %>%
  count(publisher, sort = TRUE) %>%
  slice_head(n = 10)

df_publishers

              publisher   n
1   search.proquest.com 205
2              osti.gov 140
3            HeinOnline 126
4              Elsevier 115
5              Springer  87
6       Taylor &Francis  59
7       papers.ssrn.com  57
8        liebertpub.com  54
9  Wiley Online Library  43
10             mdpi.com  42

### Research question 2

What research topics are associated with each tool, and how do these topics vary by citation frequency?

To answer this question, I will work with a collaborator to “code” all of the research studies manually and to categorize them into topics. Before we do that, I’

### Research question 3

In what ways are these tools applied in research?

Harzing, Anne-Wil. 2025. “13.2.4 Google Scholar Results Are Limited to the 1000 Most Cited Papers.” <https://harzing.com/popbook/ch13_2_4.htm>.