Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
391 changes: 391 additions & 0 deletions planning/active/findings.md

Large diffs are not rendered by default.

626 changes: 626 additions & 0 deletions planning/active/precip_drying_methodology_quotes.md

Large diffs are not rendered by default.

48 changes: 48 additions & 0 deletions planning/active/progress.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Progress — Lit-review precipitation + drying methodology + interpretation backing (#61)

## Session 2026-05-05

- Closed #58 (temperature lit review): PR #60 merged, v0.2.2
released, planning files archived to
`planning/archive/2026-05-issue-58-temperature-lit-review/`
- Filed #61 (precip + drying lit review, Issue 2 of 3-split)
- Created branch `61-precip-drying-lit-review` off main
- Scaffolded PWF baseline mirroring #58 phase structure exactly,
adapted for precip/drying topics + candidate papers
- Carrying forward lessons from #58:
- BBT 9.x for Zotero 8/9 (compat split)
- No `Citation Key:` overrides in `extra` (BBT auto-derives)
- PATCH individual authors after Web API POST when CrossRef returns
only corporate authorship (Pepin 2015 lesson)
- OCR image-only scans before Zotero attach (Karl 93 + Richter
& Kolmes 05 lesson)
- `noun_verb` naming for new rag scripts
- Phase 1 done: 7 new candidate papers identified (DOI + OA status
confirmed); 5 existing climate-collection items + cross-rag from
snow + temp flagged for reuse. 11-topic coverage matrix in
findings.md
- Phase 2 done: 7 PDFs in cache (4 curl, 3 user-RG; 1 OCR'd from
Marvel LLNL preprint); 7 papers POSTed to NewGraphEnvironment/
climate (parent itemKey + attach itemKey table in findings.md);
3 fresh PDF uploads + 4 md5-dedupes. No `Citation Key:` overrides
(soul#43 + #58 lesson applied). All 7 items have ≥2 creators
- **User action pending: restart Zotero desktop** so BBT generates
citation keys for the 7 new items
- Auto-restarted Zotero via osascript+open (verified, ~30s
sufficient); pattern documented in soul#43. All 7 BBT keys
captured cleanly
- Phase 3 done: scripts/rag_precip_drying_methodology_build.R
cloned from temp build script with 7-paper pdf_specs map; built
data/rag/precip_drying_methodology.duckdb (526 chunks, 7 sources,
~25 s)
- Phase 4 done: scripts/rag_precip_drying_methodology_query.R
written; 24 queries × top-5 chunks = 120 candidates captured to
planning/active/precip_drying_methodology_quotes.md (626 lines).
Distribution healthy across all 7 papers
- Phase 5 done: synthesis section in findings.md covers 8 topics
with selected quotes; 15-row "cite this for that" map with BBT
keys baked in. Trenberth 2014 → BBT key shows 2013 (CrossRef
issued date is 2013-12-17 online; print issue 2014) — leaving
as-is per auto-derived convention
- Next: Phase 6 — /code-check, push branch, open PR (Fixes #61,
SRED tag in body), /planning-archive after merge, release v0.2.3
179 changes: 179 additions & 0 deletions planning/active/task_plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# Task: Lit-review precipitation + drying methodology + interpretation backing (#61)

## Problem

Vignette interpretation paragraphs in `vignettes/peace-fwcp.Rmd` and
`vignettes/kootenay-lake.Rmd` make defensible-sounding claims about
precipitation departure and "drying" — falling annual precipitation,
rising VPD/evapotranspiration, declining soil moisture, "soils
drying due to both ↓P and ↑ET" (the v0.1.1 finding) — but currently
land on **zero** peer-reviewed citations. For FWCP fish-passage
reporting context, these need the same cited backing #53/#54 gave
Snowpack and #58/#60/v0.2.2 gave Temperature.

## Scope

Second of three sequential climate-departure lit reviews covering
the non-snow vignette sections (3-split: temperature [done #58],
**precip+drying [this issue]**, interpretation framing).
Mirrors the #58 / #54 / v0.1.7 pattern verbatim:

- Targeted lit search → ~10 papers
- Add to `NewGraphEnvironment/climate` Zotero collection (key
`8MH9LCC9`), PDFs first per `/lit-search` policy
- Build `data/rag/precip_drying_methodology.duckdb` ragnar store
- Mine for methodology quotes; write `findings.md` with "cite this
for that" map
- Vignette `[@key]` insertion happens on a downstream branch, not here

## Phase 1 — Targeted literature search

- [x] Web search candidate papers for DOI + access status; refined
candidate list to **7 confirmed new papers** (leaner than #58's
10 since precip+drying leans heavily on existing collection
items + cross-rag)
- [x] Identified OA-fetchable: Trenberth 2014, Marvel 2019, Williams
2020, Mekis & Vincent 2011, Grossiord 2020. Paywalled needing
RG: Ficklin & Novick 2017, Min 2011
- [x] Cross-checked existing 19 items in the `climate` collection
(5 reuse-relevant for #61) + temp/snow rag references for
cross-rag candidates (Knowles 2006, Vincent 2018, Yue & Wang
2002 — already-rag'd, no re-add)
- [x] Documented final candidate list + 11-topic coverage matrix
in `findings.md` (Phase 1 search log section)

Starting candidate list (will refine):

| Citation key | Topic / Why |
|---|---|
| `donat_etal2013` HadEX2 | Global temp + precip extremes dataset (J Geophys Res) |
| `min_etal2011` | Anthropogenic contribution to extreme precipitation (Nature) |
| `mekis_vincent2011` | Adjusted daily Canadian precipitation dataset (Atmos-Ocean) |
| `daly_etal2008` PRISM | Physiographic mapping of climate (orographic methods) |
| `mass_etal2002` | Orographic precip processes PNW |
| `williams_etal2020` | Anthropogenic warming → North American megadrought (Science) |
| `ficklin_novick2017` | Globally rising VPD (J Climate) |
| `grossiord_etal2020` | Plant + ecosystem responses to rising VPD (New Phytol) |
| `trenberth_etal2014` | Global warming + drought changes (Nat Clim Chg) |
| `marvel_etal2019` | 20th-century hydroclimate changes (Nature) |
| `sheffield_wood2008` | Global drought trends + variability (J Climate) |

## Phase 2 — Add to NewGraphEnvironment/climate Zotero collection

- [x] For each of 7 candidates: CrossRef metadata fetch → POST to
Web API with `"collections": ["8MH9LCC9"]`, tags
`precip-drying-departure-methodology` + `cd-issue-61`. No
`Citation Key:` override in `extra` (per soul#43 + #58 lesson).
All 7 items created with ≥2 individual creators (no Pepin-style
fallback risk)
- [x] PDF acquisition: 4 fetched via curl (Williams emnrd.nm.gov,
Grossiord utah.edu, Mekis & Vincent ec.gc.ca, Min Edinburgh),
3 user-provided via ResearchGate (Ficklin, Trenberth, Marvel);
Marvel OCR'd (LLNL preprint image-only scan)
- [x] Auto-attached all 7 PDFs via 4-step S3 upload (3 fresh, 4
deduped via md5)
- [x] Captured per-paper `parent itemKey + attach itemKey + creator
count` in `findings.md`
- [x] PDFs in `data/rag/precip_drying_methodology_pdfs/`,
gitignored, text-layered, ready for Phase 3 ingestion
- [x] **Auto-restarted Zotero** via osascript+open (~30 s wait); BBT
generated all 7 citation keys cleanly. Pattern documented and
added to soul#43 for the /lit-search + /zotero-api skills.
Final 7 BBT keys mapped in findings.md Phase 2 table

## Phase 3 — Build ragnar DuckDB store

- [x] Cloned `scripts/rag_temp_methodology_build.R` →
`scripts/rag_precip_drying_methodology_build.R`. Adapted
header docstring + 7-paper `pdf_specs` map
- [x] Ran build — produced `data/rag/precip_drying_methodology.duckdb`
with **526 chunks across 7 sources** via Ollama
`nomic-embed-text` (~25 s)
- [x] Verified retrieval distribution: all 7 papers contributing to
top-5 chunks across queries; Grossiord 2020 (long VPD review)
gets the most hits (33), Min 2011 the fewest (6 — single-topic
paper, expected)

## Phase 4 — Mine the store for methodology quotes

- [x] Wrote `scripts/rag_precip_drying_methodology_query.R` mirroring
the temp query script. 8 topics × 3 queries × top-5 chunks =
120 candidate chunks total (note: orographic precip query
dropped from initial 8-topic plan since the precip+drying corpus
doesn't have a dedicated orographic paper — descriptive prose
grounded by `dierauer_etal2020` covers the BC ecoregion contrasts)
- [x] Topics covered: precip trend methodology, anthropogenic precip-
extremes attribution, VPD continental-scale drying, VPD
ecosystem responses, drought attribution (NA megadrought),
drought framework, 20th-century hydroclimate pattern, BC/PNW
summer flow
- [x] Raw retrieval saved to
`planning/active/precip_drying_methodology_quotes.md` (626 lines)
- [x] Synthesized per-topic into `findings.md` (Phase 5)

## Phase 5 — Synthesis + citation map

- [x] In `findings.md`: methodology-quotes-by-topic section covering
all 8 topics with selected quotes per paper
- [x] Cross-cutting methodology section: baseline window (same as
snow+temp), trend test (consistent with Vincent 2018, Mekis &
Vincent 2011, raw MK per Yue & Wang 02), ERA5-Land precip +
soil-moisture validation gap (noted; same caveat as #58)
- [x] Deviations section — no new deviations beyond #58
- [x] **"Cite this for that"** map — 15-row claim → citation lookup,
framed as a menu, not an order. BBT-auto-derived keys baked in
- [x] Documented existing items in `climate` collection (5 reuse-
relevant) + cross-rag references from snow + temperature stores

## Phase 6 — PR + release

- [ ] `/code-check` clean (lint + tests) before each commit
- [ ] Atomic commits — Phase 1 search log, Phase 2 Zotero adds
summary, Phase 3 build script, Phases 4–5 findings.md
- [ ] PR with `Fixes #61`. SRED tag (`Relates to NewGraphEnvironment/sred-2025-2026#23`)
in PR body, **not** issue
- [ ] After merge: `/planning-archive` → archived findings.md becomes
long-lived methodology reference. Bump v0.2.2 → v0.2.3

## Validation

- [ ] `data/rag/precip_drying_methodology.duckdb` exists;
`ragnar_retrieve()` returns sensible chunks for each topic
- [ ] 8–12 papers in `climate` Zotero collection with PDFs attached,
all tagged `precip-drying-departure-methodology` + `cd-issue-61`
- [ ] BBT-auto-derived citation keys captured for all new items
- [ ] `findings.md` has methodology quotes attributed by citation key
for each of the 8 query topics
- [ ] PWF checkboxes match landed work
- [ ] `devtools::test()` clean; `lintr::lint()` clean on new scripts
- [ ] Atomic-commits audit: `git log --oneline -- planning/ scripts/rag_*.R`
tells the full story

## Out of scope

- **Temperature methodology** — covered by #58/#60/v0.2.2
- **Snowpack methodology** — covered by #53/#54/v0.1.7
- **Interpretation framing** — Issue 3 of the 3-split (forthcoming)
- **BEC zone shifts** — #59 tracker (post-3-split)
- **Vignette `[@key]` insertion** — downstream consumer branch
- **Original methodology research / new metric proposals**

## Notes for execution

- Branch: `61-precip-drying-lit-review`
- Vignette edits forbidden on this branch
- Ollama prerequisite: `ollama serve` + `ollama pull nomic-embed-text`
- BBT version compat: confirm BBT 9.x active for Zotero 8/9 (Z7
needed BBT 8.x; #58 hit a compat block when BBT 8.0.25 was on
Zotero 9 — already resolved)
- Existing `climate` collection contains items relevant on the
precip side that #58 set aside as Issue-2 scope:
`vincent_etal2018ChangesCanadas` (covers Canada precip too),
`islam_etal2019Quantifyingprojected` (Fraser flow regimes),
`dierauer_etal2020Climatechange` (BC ecoregion drought),
`warkentin_etal2022Lowsummer` (BC summer flow + chinook),
`munoz-sabater_etal2021ERA5Landstateoftheart` (ERA5-Land soil
moisture validation)
- Cross-rag from snow methodology store:
`knowles_etal2006SnowfallVersus` (rain-vs-snow phase shift —
feeds into the soil-moisture and precip-fraction story)
104 changes: 104 additions & 0 deletions scripts/rag_precip_drying_methodology_build.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
#!/usr/bin/env Rscript
#
# rag_precip_drying_methodology_build.R
#
# Build a ragnar DuckDB store from Zotero PDFs for researching
# precipitation + drying departure methodology — the citation
# backbone for vignette interpretation paragraphs that currently
# make claims about falling precipitation, rising VPD/ET, and
# soil-moisture decline with zero peer-reviewed citations.
#
# Mines for:
# - Anthropogenic precip-extremes attribution (Min 2011)
# - Canadian / BC adjusted precip dataset methodology
# (Mekis & Vincent 2011)
# - VPD continental-scale drying (Ficklin & Novick 2017)
# - VPD ecosystem responses (Grossiord 2020)
# - Drought attribution + framework (Trenberth 2014, Williams 2020,
# Marvel 2019)
#
# Filed against #61 (Issue 2 of the climate-departure 3-split lit reviews).
#
# Prerequisites:
# - R packages: ragnar, DBI
# - Ollama running with nomic-embed-text model:
# ollama serve
# ollama pull nomic-embed-text
# - PDFs in data/rag/precip_drying_methodology_pdfs/ (gitignored,
# populated by user RG downloads + curl + OCR)
#
# Usage:
# Rscript scripts/rag_precip_drying_methodology_build.R
#
# Output:
# data/rag/precip_drying_methodology.duckdb (gitignored)

library(ragnar)

# --- Configuration ---
pdf_dir <- here::here("data", "rag", "precip_drying_methodology_pdfs")
store_path <- here::here("data", "rag", "precip_drying_methodology.duckdb")

# Local labels match the file basenames; the actual BBT-auto-derived
# Zotero citation keys (which is what lands in vignette [@key] markers
# downstream) are documented in planning/active/findings.md Phase 2
# table.
pdf_specs <- list(
list(label = "williams_etal2020", attach = "SBSHUENU", note = "NA megadrought attribution (Science)"),
list(label = "ficklin_novick2017", attach = "XT4HG85Q", note = "VPD US continental-scale drying (JGR)"),
list(label = "grossiord_etal2020", attach = "SGEP5ZVA", note = "Plant responses to rising VPD (New Phytol)"),
list(label = "trenberth_etal2014", attach = "Z8PQRGCS", note = "Global warming + drought changes (Nat Clim Chg)"),
list(label = "min_etal2011", attach = "X9QN8MPI", note = "Anthropogenic precip extremes (Nature)"),
list(label = "mekis_vincent2011", attach = "89KJ9JEE", note = "Adjusted Canadian precip dataset (Atmos-Ocean)"),
list(label = "marvel_etal2019", attach = "9XCZKTWD", note = "20th-century hydroclimate human signal (Nature)")
)

# --- Find PDFs ---
pdf_paths <- character()
missing <- character()
for (spec in pdf_specs) {
path <- file.path(pdf_dir, paste0(spec$label, ".pdf"))
if (file.exists(path)) {
pdf_paths <- c(pdf_paths, path)
} else {
missing <- c(missing, spec$label)
}
}

if (length(missing) > 0) {
message("MISSING PDFs in ", pdf_dir, ":")
for (m in missing) message(" ", m)
}
message("Found ", length(pdf_paths), " / ", length(pdf_specs), " PDFs")

if (length(pdf_paths) == 0) {
stop("No PDFs found in ", pdf_dir)
}

# --- Build store ---
fs::dir_create(dirname(store_path))

if (file.exists(store_path)) {
file.remove(store_path)
wal <- paste0(store_path, ".wal")
if (file.exists(wal)) file.remove(wal)
}

store <- ragnar_store_create(
location = store_path,
embed = embed_ollama(model = "nomic-embed-text"),
overwrite = TRUE
)

message("Ingesting ", length(pdf_paths), " PDFs into ", store_path)
ragnar_store_ingest(store, pdf_paths, progress = TRUE)

# --- Verify ---
n_chunks <- DBI::dbGetQuery(store@con, "SELECT COUNT(*) AS n FROM chunks")$n
n_origins <- DBI::dbGetQuery(store@con, "SELECT COUNT(DISTINCT origin) AS n FROM chunks")$n

DBI::dbDisconnect(store@con)

message("\nStore built: ", store_path)
message("Chunks: ", n_chunks)
message("Sources: ", n_origins)
Loading
Loading