Skip to content

Conversation

@taylordowns2000
Copy link
Member

@taylordowns2000 taylordowns2000 commented Sep 30, 2025

@stuartc , I think I've got something here for #3641. The change, in essence, is to extract these big objects from Postgres as text, not JSON. If we get them from Postgres as JSON, Elixir uses a lot of memory interpreting them as maps. To run the benchmarking test, create a noisy/random 2MB dataclip in your system then fire up your server and run this from Iex:

c "benchmarking/dataclip_memory_benchmark_iex.exs"

tldr - there's a big impact! 🚀

Problem:
PostgreSQL stores JSON as compact JSONB (1.86 MB for this dataclip).
When loaded as an Elixir map, it expands ~38x due to:
- Immutable data structure overhead
- Metadata for every map/list/string
- Deep nesting creates many small allocations

OLD APPROACH (baseline):
- Query JSONB → Elixir map (~38x memory amplification)
- Jason.encode!(map) → JSON string (creates another copy)
- Peak memory: ~70.5 MB for this dataclip

NEW APPROACH (optimized):
- Query with fragment("?::text", d.body) → JSON string directly
- PostgreSQL does the conversion, no Elixir map
- Peak memory: ~1.86 MB for this dataclip
- Memory reduction: ~97% ⭐

Impact on Production:
With 2000MB memory limit and 1.86MB dataclips:
- OLD: ~28 concurrent requests before OOM
- NEW: ~1077 concurrent requests before OOM
- Improvement: 38x more capacity! ⭐

Additional Benefits:

  • Faster response times (no map deserialization)
  • Lower CPU usage (PostgreSQL does the conversion efficiently)
  • Fewer garbage collection pauses

My full results with the problem dataclip from production:

Calculating statistics...
Formatting results...

Name                                             ips        average  deviation         median         99th %
BASELINE: Load metadata only (no body)       5701.55       0.175 ms    ±70.23%       0.160 ms        0.37 ms
NEW: Query as text from PostgreSQL             20.51       48.76 ms     ±7.20%       48.18 ms       70.78 ms
OLD: Load as Elixir map + encode                2.63      379.63 ms     ±2.05%      379.62 ms      394.64 ms

Comparison: 
BASELINE: Load metadata only (no body)       5701.55
NEW: Query as text from PostgreSQL             20.51 - 278.02x slower +48.59 ms
OLD: Load as Elixir map + encode                2.63 - 2164.49x slower +379.46 ms

Extended statistics: 

Name                                           minimum        maximum    sample size                     mode
BASELINE: Load metadata only (no body)       0.0998 ms       11.33 ms        17.07 K                 0.149 ms
NEW: Query as text from PostgreSQL            45.92 ms       70.78 ms             62                     None
OLD: Load as Elixir map + encode             371.02 ms      394.64 ms              8                     None

Memory usage statistics:

Name                                           average  deviation         median         99th %
BASELINE: Load metadata only (no body)        36.00 KB     ±0.05%          36 KB          36 KB
NEW: Query as text from PostgreSQL            88.76 KB    ±21.97%       87.95 KB      132.32 KB
OLD: Load as Elixir map + encode          209294.27 KB     ±0.00%   209294.13 KB   209302.30 KB

Comparison: 
BASELINE: Load metadata only (no body)           36 KB
NEW: Query as text from PostgreSQL            88.76 KB - 2.47x memory usage +52.76 KB
OLD: Load as Elixir map + encode          209294.27 KB - 5813.69x memory usage +209258.27 KB

Extended statistics: 

Name                                           minimum        maximum    sample size                     mode
BASELINE: Load metadata only (no body)           36 KB       37.76 KB        10.44 K                    36 KB
NEW: Query as text from PostgreSQL            49.98 KB      132.32 KB             42                     None
OLD: Load as Elixir map + encode          209280.77 KB   209302.30 KB              6                     None

=== Analysis ===
This benchmark demonstrates the memory optimization for serving dataclip bodies.

Problem:
  PostgreSQL stores JSON as compact JSONB (1.86 MB for this dataclip).
  When loaded as an Elixir map, it expands ~38x due to:
    - Immutable data structure overhead
    - Metadata for every map/list/string
    - Deep nesting creates many small allocations

Solutions Compared:

  1. OLD APPROACH (baseline):
     - Query JSONB → Elixir map (~38x memory amplification)
     - Jason.encode!(map) → JSON string (creates another copy)
     - Peak memory: ~70.5 MB for this dataclip

  2. NEW APPROACH (optimized):
     - Query with fragment("?::text", d.body) → JSON string directly
     - PostgreSQL does the conversion, no Elixir map
     - Peak memory: ~1.86 MB for this dataclip
     - Memory reduction: ~97%

Impact on Production:
  With 2000MB memory limit and 1.86MB dataclips:
    - OLD: ~28 concurrent requests before OOM
    - NEW: ~1077 concurrent requests before OOM
    - Improvement: 38x more capacity!

Additional Benefits:
  - Faster response times (no map deserialization)
  - Lower CPU usage (PostgreSQL does the conversion efficiently)
  - Fewer garbage collection pauses

@github-project-automation github-project-automation bot moved this to New Issues in v2 Sep 30, 2025
@taylordowns2000
Copy link
Member Author

still need to figure out how to pretty print it

@taylordowns2000 taylordowns2000 marked this pull request as ready for review September 30, 2025 20:15
@codecov
Copy link

codecov bot commented Oct 1, 2025

Codecov Report

❌ Patch coverage is 25.45455% with 41 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.63%. Comparing base (a608fae) to head (8987692).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
lib/lightning_web/live/memory_debug.ex 0.00% 40 Missing ⚠️
...b/lightning_web/controllers/dataclip_controller.ex 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3651      +/-   ##
==========================================
- Coverage   89.86%   89.63%   -0.23%     
==========================================
  Files         409      410       +1     
  Lines       17022    17075      +53     
==========================================
+ Hits        15296    15306      +10     
- Misses       1726     1769      +43     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@taylordowns2000 taylordowns2000 changed the title string, not json from postgres Fetch dataclip body as string, not json, from postgres Oct 1, 2025
@taylordowns2000 taylordowns2000 moved this from New Issues to In review in v2 Oct 1, 2025
Copy link
Collaborator

@midigofrank midigofrank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done

@stuartc stuartc merged commit 5d1bb16 into main Oct 1, 2025
6 of 8 checks passed
@stuartc stuartc deleted the mem2 branch October 1, 2025 06:07
@github-project-automation github-project-automation bot moved this from In review to Done in v2 Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants