Skip to content

Resolve R1-R8 desert_farm_leverage_points + add BioNumbers subset#26

Merged
MDunitz merged 9 commits into
mainfrom
desert-farm-csv-fixes-v2
Apr 28, 2026
Merged

Resolve R1-R8 desert_farm_leverage_points + add BioNumbers subset#26
MDunitz merged 9 commits into
mainfrom
desert-farm-csv-fixes-v2

Conversation

@madsCodeBuddy
Copy link
Copy Markdown
Collaborator

@madsCodeBuddy madsCodeBuddy commented Apr 28, 2026

Resolves R1–R8 data concerns + adds a standalone curated BioNumbers subset.

Supersedes #25 (which fell behind main during the BioNumbers addition; rebuilt fresh from current main per branch-hygiene rule).

Changes to data/datasets/desert_farm_leverage_points.csv

ID Row Change
R1 CO2 fixation Time_min 7.69e-3 → 6.67e-2 (Bar-On natural RuBisCO kcat ceiling, 15/s)
R2 Nutrient transport, Biochemical synthesis, Photoautotroph cell growth Replace Moore et al. (2013) mis-citation with Milo & Phillips (2015) Cell Biology by the Numbers
R3 Growth → Photoautotroph cell growth Tighten to single algal cell scope: Time 7.56e3–1e5 s (Time_min anchored to Synechococcus elongatus UTEX 2973, 2.1 h doubling per Yu et al. 2015 Sci Rep 5:8132), Space 1e-18 to 1e-14 m³. Population scales remain in Community Ecology
R4 Molecular Dynamics Models Space_max 1e-27 → 1e-22 m³ (modern atomistic MD reaches 100-nm boxes)
R5 Community Metabolic Models Time_min 991.00E+02
R6 Three model rows Fill empty References (Karplus & McCammon 2002 / Zakem et al. 2020 ISME J / Levine et al. 2025 Annu Rev Earth Planet Sci)
R7 Extraction → Fossil-fuel formation Time reflects formation duration (~1–100 My) not deposit age
R8 CO2 fixation Extend Reference text to justify both time and space bounds; drop Igamberdiev (2015) CA reference (not load-bearing once Time_min reflects intrinsic RuBisCO kcat)

New: data/references/ (standalone resource)

Curated phototroph reference data, not coupled to the main CSV — kept as a resource for future work.

  • bionumbers_subset.csv — 25 entries from the BioNumbers database with stable bion_id identifiers and direct URLs. Phototroph-relevant (cyanobacteria, green algae, diatoms) across cell generation/doubling times, biochemical synthesis rates (transcription/translation elongation), and small-molecule diffusion / transporter kinetics. Generic-organism proxies (e.g., E. coli, Xenopus) used where phototroph-specific entries were unavailable; flagged per-row in Organism.
  • README.md — schema, scope, attribution.

R1: CO2 fixation Time_min 7.69E-03 → 6.67E-02 (Bar-On natural kcat ceiling, 15/s)
R2: Replace Moore et al. 2013 mis-citations on Nutrient transport, Biochemical
    synthesis, and Cell growth with Milo & Phillips (2015) Cell Biology by
    the Numbers
R3: Rename Growth → Cell growth; tighten bounds to single algal cell scope
    (Time 1e3-1e5 s, Space 1e-18 to 1e-14 m³). Population scales remain
    covered by Community Ecology row.
R4: MD Space_max 1e-27 → 1e-22 m³ (modern atomistic-MD reach)
R5: Community Metabolic Models Time_min 99 → 1.00E+02 (formatting consistency)
R6: Fill empty Reference cells:
    - Molecular Dynamics Models: Karplus & McCammon (2002) Nat Struct Biol 9:646
    - Community Metabolic Models: Zakem et al. (2020) ISME J 14:288
    - Biogeochemical Circulation Models: Levine et al. (2025) Annu Rev
      Earth Planet Sci 53:595
R7: Rename Extraction → Fossil-fuel formation; Time bounds 3.16E+13 to
    3.16E+15 s reflect formation duration rather than deposit age
R8: Extend CO2 fixation Reference text to justify both time and space bounds
25 entries justifying time/space bounds for Cell growth (8),
Biochemical synthesis (8), and Nutrient transport (9) rows.
Phototroph-specific where available; generic-organism proxies
(E. coli, Xenopus, generic) used where phototroph data was
unavailable. Each entry includes direct URL to BioNumbers page.
Schema, curation criteria, known gaps (including the Cell growth
Time_min mismatch flagged by phototroph data), update procedure,
and BioNumbers attribution per Milo et al. (2010) Nucleic Acids
Res 38:D750.
The Time bound for cell division varies by ~3 OOM across phyla. The
previous 'Cell growth' label silently scoped to phototrophs (since
Time_min was set based on phototroph cell-cycle data). Rename makes
the implicit organism scope explicit.

Time_min 1.00E+03 → 7.56E+03 s (2.1 h, Synechococcus elongatus UTEX
2973 — currently fastest known photoautotroph). Anchored to Yu et
al. (2015) Sci Rep 5:8132. Reference cell extended to cite both
Milo & Phillips and Yu et al.
Sibling file to bionumbers_subset.csv for entries from primary
literature where BioNumbers either has no value (NaN) or no entry.

Initial entry: Synechococcus elongatus UTEX 2973 doubling time 2.1 h
from Yu et al. (2015) Sci Rep 5:8132 — anchors the Time_min for the
Photoautotroph cell growth row. BioNumbers entry 112484 exists for
this strain but has NaN Value, hence the supplementary citation.
- Add phototroph_growth_supplementary.csv documentation and schema
- Remove the 'Cell growth Time_min mismatch' known gap (now
  resolved by anchoring to UTEX 2973 in supplementary file)
- Update curation criteria to reflect Photoautotroph cell growth
  rename and the BioNumbers-NaN → supplementary workflow
- Restructure 'Known gaps' to reflect current state
Matches the R3 rename in desert_farm_leverage_points.csv. The supplementary
file already used the new name; this aligns the BioNumbers subset so the
Cited_in_row join key works for all 26 entries.
Discoverable cross-link from main CSV to data/references/ for the rows
backed by curated BioNumbers / primary-literature data: Nutrient transport,
Biochemical synthesis, Photoautotroph cell growth. Non-breaking — Reference
column is free-form display text.
Per Madison's clarification: BioNumbers data is meant as a standalone
resource for future work, not as a backing-data layer for the main CSV.

Changes:
- Drop (see data/references/) markers from 3 Reference cells in
  desert_farm_leverage_points.csv (main CSV stands alone)
- Drop Cited_in_row column from bionumbers_subset.csv (no longer a
  join key into the main CSV)
- Delete phototroph_growth_supplementary.csv (its only entry was
  UTEX 2973 / Yu 2015 / 2.1h doubling, already inline-cited in the
  main CSV's Photoautotroph cell growth Reference cell)
- Rewrite README.md to describe bionumbers_subset.csv as a standalone
  phototroph reference set, removing all cross-linking schema docs
@MDunitz MDunitz merged commit c469d75 into main Apr 28, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants