Fetch dataclip body as string, not json, from postgres #3651

taylordowns2000 · 2025-09-30T18:53:42Z

@stuartc , I think I've got something here for #3641. The change, in essence, is to extract these big objects from Postgres as text, not JSON. If we get them from Postgres as JSON, Elixir uses a lot of memory interpreting them as maps. To run the benchmarking test, create a noisy/random 2MB dataclip in your system then fire up your server and run this from Iex:

c "benchmarking/dataclip_memory_benchmark_iex.exs"

tldr - there's a big impact! 🚀

Problem:
PostgreSQL stores JSON as compact JSONB (1.86 MB for this dataclip).
When loaded as an Elixir map, it expands ~38x due to:
- Immutable data structure overhead
- Metadata for every map/list/string
- Deep nesting creates many small allocations

OLD APPROACH (baseline):
- Query JSONB → Elixir map (~38x memory amplification)
- Jason.encode!(map) → JSON string (creates another copy)
- Peak memory: ~70.5 MB for this dataclip

NEW APPROACH (optimized):
- Query with fragment("?::text", d.body) → JSON string directly
- PostgreSQL does the conversion, no Elixir map
- Peak memory: ~1.86 MB for this dataclip
- Memory reduction: ~97% ⭐

Impact on Production:
With 2000MB memory limit and 1.86MB dataclips:
- OLD: ~28 concurrent requests before OOM
- NEW: ~1077 concurrent requests before OOM
- Improvement: 38x more capacity! ⭐

Additional Benefits:

Faster response times (no map deserialization)
Lower CPU usage (PostgreSQL does the conversion efficiently)
Fewer garbage collection pauses

My full results with the problem dataclip from production:

Calculating statistics...
Formatting results...

Name                                             ips        average  deviation         median         99th %
BASELINE: Load metadata only (no body)       5701.55       0.175 ms    ±70.23%       0.160 ms        0.37 ms
NEW: Query as text from PostgreSQL             20.51       48.76 ms     ±7.20%       48.18 ms       70.78 ms
OLD: Load as Elixir map + encode                2.63      379.63 ms     ±2.05%      379.62 ms      394.64 ms

Comparison: 
BASELINE: Load metadata only (no body)       5701.55
NEW: Query as text from PostgreSQL             20.51 - 278.02x slower +48.59 ms
OLD: Load as Elixir map + encode                2.63 - 2164.49x slower +379.46 ms

Extended statistics: 

Name                                           minimum        maximum    sample size                     mode
BASELINE: Load metadata only (no body)       0.0998 ms       11.33 ms        17.07 K                 0.149 ms
NEW: Query as text from PostgreSQL            45.92 ms       70.78 ms             62                     None
OLD: Load as Elixir map + encode             371.02 ms      394.64 ms              8                     None

Memory usage statistics:

Name                                           average  deviation         median         99th %
BASELINE: Load metadata only (no body)        36.00 KB     ±0.05%          36 KB          36 KB
NEW: Query as text from PostgreSQL            88.76 KB    ±21.97%       87.95 KB      132.32 KB
OLD: Load as Elixir map + encode          209294.27 KB     ±0.00%   209294.13 KB   209302.30 KB

Comparison: 
BASELINE: Load metadata only (no body)           36 KB
NEW: Query as text from PostgreSQL            88.76 KB - 2.47x memory usage +52.76 KB
OLD: Load as Elixir map + encode          209294.27 KB - 5813.69x memory usage +209258.27 KB

Extended statistics: 

Name                                           minimum        maximum    sample size                     mode
BASELINE: Load metadata only (no body)           36 KB       37.76 KB        10.44 K                    36 KB
NEW: Query as text from PostgreSQL            49.98 KB      132.32 KB             42                     None
OLD: Load as Elixir map + encode          209280.77 KB   209302.30 KB              6                     None

=== Analysis ===
This benchmark demonstrates the memory optimization for serving dataclip bodies.

Problem:
  PostgreSQL stores JSON as compact JSONB (1.86 MB for this dataclip).
  When loaded as an Elixir map, it expands ~38x due to:
    - Immutable data structure overhead
    - Metadata for every map/list/string
    - Deep nesting creates many small allocations

Solutions Compared:

  1. OLD APPROACH (baseline):
     - Query JSONB → Elixir map (~38x memory amplification)
     - Jason.encode!(map) → JSON string (creates another copy)
     - Peak memory: ~70.5 MB for this dataclip

  2. NEW APPROACH (optimized):
     - Query with fragment("?::text", d.body) → JSON string directly
     - PostgreSQL does the conversion, no Elixir map
     - Peak memory: ~1.86 MB for this dataclip
     - Memory reduction: ~97%

Impact on Production:
  With 2000MB memory limit and 1.86MB dataclips:
    - OLD: ~28 concurrent requests before OOM
    - NEW: ~1077 concurrent requests before OOM
    - Improvement: 38x more capacity!

Additional Benefits:
  - Faster response times (no map deserialization)
  - Lower CPU usage (PostgreSQL does the conversion efficiently)
  - Fewer garbage collection pauses

taylordowns2000 · 2025-09-30T19:16:26Z

still need to figure out how to pretty print it

lib/lightning_web/controllers/dataclip_controller.ex

codecov · 2025-10-01T04:21:21Z

Codecov Report

❌ Patch coverage is 25.45455% with 41 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.63%. Comparing base (a608fae) to head (8987692).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
lib/lightning_web/live/memory_debug.ex	0.00%	40 Missing ⚠️
...b/lightning_web/controllers/dataclip_controller.ex	87.50%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3651      +/-   ##
==========================================
- Coverage   89.86%   89.63%   -0.23%     
==========================================
  Files         409      410       +1     
  Lines       17022    17075      +53     
==========================================
+ Hits        15296    15306      +10     
- Misses       1726     1769      +43

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

midigofrank

Nicely done

string, not json from postgres

15e80c5

github-project-automation bot added this to v2 Sep 30, 2025

github-project-automation bot moved this to New Issues in v2 Sep 30, 2025

taylordowns2000 added 3 commits September 30, 2025 20:57

memoru debugging

b4190ac

more

ce91b09

more

5f0f025

taylordowns2000 requested a review from stuartc September 30, 2025 19:16

pretty?

9bece03

taylordowns2000 marked this pull request as ready for review September 30, 2025 20:15

taylordowns2000 commented Oct 1, 2025

View reviewed changes

lib/lightning_web/controllers/dataclip_controller.ex Outdated Show resolved Hide resolved

taylordowns2000 added 2 commits October 1, 2025 06:00

do formatting in browser

b4ec609

run mix format

ece02b1

taylordowns2000 requested a review from midigofrank October 1, 2025 04:12

better benchee

e17ef37

taylordowns2000 added 4 commits October 1, 2025 06:28

cleanup

439e75a

dry

0e698ba

remove old measurements

0778027

cl

6292641

taylordowns2000 changed the title ~~string, not json from postgres~~ Fetch dataclip body as string, not json, from postgres Oct 1, 2025

taylordowns2000 moved this from New Issues to In review in v2 Oct 1, 2025

midigofrank approved these changes Oct 1, 2025

View reviewed changes

taylordowns2000 added 3 commits October 1, 2025 07:35

add tests, restore nesting of bodies in 'data'

60f66ec

format

d26dfd2

tests

8987692

stuartc approved these changes Oct 1, 2025

View reviewed changes

stuartc merged commit 5d1bb16 into main Oct 1, 2025
6 of 8 checks passed

stuartc deleted the mem2 branch October 1, 2025 06:07

github-project-automation bot moved this from In review to Done in v2 Oct 1, 2025

theroinaochieng mentioned this pull request Oct 4, 2025

High memory usage #3641

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fetch dataclip body as string, not json, from postgres #3651

Fetch dataclip body as string, not json, from postgres #3651

Uh oh!

taylordowns2000 commented Sep 30, 2025 •

edited

Loading

Uh oh!

taylordowns2000 commented Sep 30, 2025

Uh oh!

Uh oh!

codecov bot commented Oct 1, 2025 •

edited

Loading

Uh oh!

midigofrank left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fetch dataclip body as string, not json, from postgres #3651

Fetch dataclip body as string, not json, from postgres #3651

Uh oh!

Conversation

taylordowns2000 commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tldr - there's a big impact! 🚀

My full results with the problem dataclip from production:

Uh oh!

taylordowns2000 commented Sep 30, 2025

Uh oh!

Uh oh!

codecov bot commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

midigofrank left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

taylordowns2000 commented Sep 30, 2025 •

edited

Loading

codecov bot commented Oct 1, 2025 •

edited

Loading