Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlay2 Diskspace Buildup in PostHog Hobby Self-Hosted Installation #20101

Closed
AparnaTDevadas opened this issue Feb 2, 2024 · 17 comments
Closed
Labels
bug Something isn't working right

Comments

@AparnaTDevadas
Copy link

The storage directory /var/lib/docker/overlay2 on my system is experiencing rapid growth, resulting in a substantial increase in disk usage. Despite employing the docker system prune -a -f command, which is intended to remove all unused resources, the storage reclamation process seems to be unsuccessful, and the disk space continues to be occupied.

root@ip-10-88-81-122:/home/ubuntu# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 20G 11G 8.8G 55% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 1.6G 2.1M 1.6G 1% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/nvme0n1p15 105M 6.1M 99M 6% /boot/efi
tmpfs 784M 4.0K 784M 1% /run/user/1000
overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/483f16e624886b92706cb359a43b49382da8bb0e941b92300eaa734021b375a5/merged
overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/61c74810d314349e1501316eb0691cba8a2cc492a43c192be20139921e755375/merged
overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/f3096acb8c5abe39f20c8d17596fcd27ebee2da268d8f34fb5a5675408930cb0/merged
overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/e4ad001cd84511c3300ba117ed1875de451685e06ec17abbbb1a310c3829247c/merged
overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/dc4165c6c6fb31315b14f7b12bc25c9751dee19d01b59835a11612cde3fdb2b1/merged
overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/3c62cc1fa3b29fb77ebf2e687d016885d1fbf75e6a7ed61244d7208987761604/merged
overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/faaf69098c28174384aa9dd3aca9a749098df7c6e2d7f0838f9dcb2f7ab366cb/merged
overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/58d4bfb9cc2aa2887ea5932488163e0600bd7d94a4d117921f0ea10433d02a49/merged
overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/8a2fc203a1a4b5df82cb9baddc065e64a50b78deb316f26a0605961279ad9e56/merged
overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/a29e7064412daa4b4293d910c82220a4ce236f74fa330993cd81a4372c23cf04/merged
overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/91530379a8710aa7f2a1b65b7c0e38f8e33ef255c26aece4fd06e9d23f7ef0b3/merged
overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/3af1b6f804b3ca7b8956f9ea510febdb44207f29ba16f1150ed2b032d66d6ba2/merged
overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/3e82e3ebf82ed31ddf155013793194fb4fc80172dc42925ff3466ae42a87ae09/merged
overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/67f295aa42422d4d7a271bb252ab59bd8fffad0d877794a56fdede05af250c8e/merged

@AparnaTDevadas AparnaTDevadas added the bug Something isn't working right label Feb 2, 2024
@AparnaTDevadas
Copy link
Author

Any updates on this issue

@MrSnoozles
Copy link

Same issue here. /var/lib/docker/overlay2/ is usually multiple hundreds of GB in size. We're just evaluating PostHog, so there are just a few couple hundred events collected.

@olegdater
Copy link

we are seeing it too, around 4GB a day, what is it?

Is it safe to cleanup this container?

How are you guys handling this clogging of disk space?

@feedanal
Copy link
Contributor

Within 3-4 days SpaceHog PostHog ate ALL available server space (0.5Tb). Luckily it's a fixed-cost bare-metal server, not some auto-expanding and auto-billed cloud instance. Moving it to 2tb server now in attempt to figure out what's happening.

@feedanal
Copy link
Contributor

feedanal commented May 10, 2024

@movy
Copy link

movy commented May 10, 2024

/var/lib/docker/overlay2/ grows several hunders GIGABYTES per day. I believe it's session recordings blobs, I see they're in jsonl format + gzipped, but original jsonl files are not deleted (otherwise why compress them to begin with?)

On a new instance started 4 hours ago:

➜  session-buffer-files du -h --max-depth=1 /var/lib/docker/overlay2 | sort -rh | head -25

64G	/var/lib/docker/overlay2
20G	/var/lib/docker/overlay2/22ca12243f2f9f8f5cdef850c84dacdc0fb898b44424acefdd8d21d3ee83ee57
19G	/var/lib/docker/overlay2/37a2890cf25afbb5531accea5a40c32ff9abde25139d6f4dc76b4fccbb24b98e
4.6G	/var/lib/docker/overlay2/bd73c58dbd7e91576cc2e9a3a1c21f120c3f766b0704b130a08b863cb3de4cfa
4.6G	/var/lib/docker/overlay2/a5e928b2b7731333f60f257fd10d78020d66081879c9241d7b8b8d77c5c83f99
4.6G	/var/lib/docker/overlay2/644c452dc2651aa3a5070ce2a1c37d01d53c3f068c4607661f6848f82fd9c683
953M	/var/lib/docker/overlay2/40be981a0c4b1e16aaba9c4b14461ef56ac11ccd532e14b970200c5917871cfa
880M	/var/lib/docker/overlay2/5a93d1cd997d1dfad3b2cfa45c08c5d9ebfb2c3b8d83ccbc0c13038d1ddca142
868M	/var/lib/docker/overlay2/58ccd7b0eea3761c78de44a3f2651fff900f69a7dc46123277a5b4bb142b5843
857M	/var/lib/docker/overlay2/581a808e76961ce7fe967041d5c5e51e6e9a6c44b7bab44cce3676b23969e27b
818M	/var/lib/docker/overlay2/92008300249f4c515156b79880df467b59671ce6f8054c106e3222b600f1e79b
728M	/var/lib/docker/overlay2/8b2f680e6edad6f4ff2b7b780c902a169ab60eb5b43bcbc649eb9dfaf1343606
575M	/var/lib/docker/overlay2/c20d7668291ec7c0d833b78fc2f358fb4f44cc6d077a17a57a7830654b3ae70b
533M	/var/lib/docker/overlay2/e6cb565e6ed7d2617854c6676f102e92652f52aa88c4e66cde2c05514bea18ec
530M	/var/lib/docker/overlay2/57fdac189c5b45a9b86ab033ac50254a25aef1836fe4a79f9ea6e35c32f49c3b
427M	/var/lib/docker/overlay2/8a9f06aec0cde099006d741c0164b95d761cca2a56826f3ac47a06df7786a8cf
370M	/var/lib/docker/overlay2/4af2c6e6afb50f94e617ddd1b9f5b812b16a53ff146de8d787afff6ebace6c3d
357M	/var/lib/docker/overlay2/02a3d71c05d64f34645c6d8474ed9817bd249d95e8e3cff7330fae2c65f2aefa
303M	/var/lib/docker/overlay2/bcd9fd02aa542bc1da07822b41db4ac22c80d9dc42aa8678a8a746e66f404506
278M	/var/lib/docker/overlay2/de67a5fc04b3a1c4da43e90e72eaf2fc0d75dbe7d6a84f9fd52ad392b0768894
232M	/var/lib/docker/overlay2/a633121f4492ce8c8e29ad148d346d1b136e8405b3d3a06541beac0d912fd124
229M	/var/lib/docker/overlay2/010d8bcf19400830774c695e49a6dc856bbd90b81aa8b6222526d7ddb2383569
222M	/var/lib/docker/overlay2/1b3471d305ef0844347bafea4aed5db584c80bf1690aaa4a382570fbbdcdece3
192M	/var/lib/docker/overlay2/9a1200ae9f6d5ae9f79c065d080c578fe5e4501dea2a0706f888e144c4d500d6
160M	/var/lib/docker/overlay2/04cae4d2354dfd874fd4d407534318643c257079630c6fc21d9b5e57895b464a
ll /var/lib/docker/overlay2/22ca12243f2f9f8f5cdef850c84dacdc0fb898b44424acefdd8d21d3ee83ee57/diff/code/plugin-server/.tmp/sessions/session-buffer-files

[...]
-rw-r--r-- 1 root root  1.4M May 10 13:30 1.018f62ad-d0e9-7b3d-988e-6a94a2779eb9.ee243b2d-b802-4fbe-9e04-aa49707e0f6f.jsonl
-rw-r--r-- 1 root root   76K May 10 13:25 1.018f62ad-d2ef-750a-afe2-6e70ee99941d.0007eab6-c6d8-4131-be95-a990365b6e7b.gz
-rw-r--r-- 1 root root  796K May 10 13:27 1.018f62ad-d2ef-750a-afe2-6e70ee99941d.0007eab6-c6d8-4131-be95-a990365b6e7b.jsonl
-rw-r--r-- 1 root root  124K May 10 13:26 1.018f62ad-d437-7eba-bcc9-896b9b51231c.c922d076-af96-439b-92cb-8199c2f2a6a4.gz
-rw-r--r-- 1 root root  1.4M May 10 13:26 1.018f62ad-d437-7eba-bcc9-896b9b51231c.c922d076-af96-439b-92cb-8199c2f2a6a4.jsonl
-rw-r--r-- 1 root root   76K May 10 13:25 1.018f62ad-d62b-79aa-90b6-9d1691069c81.d2de25b1-d278-4456-9167-ed11912416a4.gz
-rw-r--r-- 1 root root  774K May 10 13:27 1.018f62ad-d62b-79aa-90b6-9d1691069c81.d2de25b1-d278-4456-9167-ed11912416a4.jsonl
-rw-r--r-- 1 root root   76K May 10 13:26 1.018f62ad-d727-7704-89de-6e33722904cd.5a8315d2-6c19-4155-9d47-8c8dcfef8053.gz
-rw-r--r-- 1 root root  736K May 10 13:26 1.018f62ad-d727-7704-89de-6e33722904cd.5a8315d2-6c19-4155-9d47-8c8dcfef8053.jsonl
-rw-r--r-- 1 root root   91K May 10 13:25 1.018f62ad-e22e-70ce-b1d2-ba76f7fdef10.b8433a31-d5e8-49c8-9dae-005f00f7a6f7.gz
-rw-r--r-- 1 root root  793K May 10 13:26 1.018f62ad-e22e-70ce-b1d2-ba76f7fdef10.b8433a31-d5e8-49c8-9dae-005f00f7a6f7.jsonl
-rw-r--r-- 1 root root   25K May 10 13:25 1.018f62ad-ea44-7736-af73-2536db3d3627.9c488d53-7a3b-4eef-b457-d0ad72acf3c3.gz
-rw-r--r-- 1 root root  201K May 10 13:25 1.018f62ad-ea44-7736-af73-2536db3d3627.9c488d53-7a3b-4eef-b457-d0ad72acf3c3.jsonl
-rw-r--r-- 1 root root   51K May 10 13:25 1.018f62ad-f2ad-7a16-a531-6587a99c000e.8613b939-0930-4ea9-8448-5d48d68c0d2a.gz
-rw-r--r-- 1 root root  681K May 10 13:30 1.018f62ad-f2ad-7a16-a531-6587a99c000e.8613b939-0930-4ea9-8448-5d48d68c0d2a.jsonl
-rw-r--r-- 1 root root   73K May 10 13:27 1.018f62ad-f432-7c2d-9525-4687dda261f2.6ca1db85-002b-4f5b-b6c6-2d8eb1ae579f.gz
-rw-r--r-- 1 root root  712K May 10 13:28 1.018f62ad-f432-7c2d-9525-4687dda261f2.6ca1db85-002b-4f5b-b6c6-2d8eb1ae579f.jsonl
-rw-r--r-- 1 root root   25K May 10 13:25 1.018f62ad-fa35-7cc4-a8f6-cb4645073424.1d9d68aa-3748-4833-9f94-54e8ec7d3502.gz
[...]

If only compressed files are left, it'd reduce disk usage by factor of 10x, which is manageable.

@movy
Copy link

movy commented May 10, 2024

Found another culprit:

➜  overlay2 find /var/lib/docker/containers -type f -name "*.log" -print0 | du -shc --files0-from -
3.2M	/var/lib/docker/containers/8bf955312b28af19b1a2682d87ada355e8840a9001c6f099723568462722d0cf/8bf955312b28af19b1a2682d87ada355e8840a9001c6f099723568462722d0cf-json.log
89G	/var/lib/docker/containers/c1db5eeac69cfa4069ba1269f485668a1b7fd2ff0219cf4db78056868bc369d0/c1db5eeac69cfa4069ba1269f485668a1b7fd2ff0219cf4db78056868bc369d0-json.log
0	/var/lib/docker/containers/1666de8530999e78ae73822eb69f1e75807ea496fc6720c36a054b9ed2c59705/1666de8530999e78ae73822eb69f1e75807ea496fc6720c36a054b9ed2c59705-json.log
44K	/var/lib/docker/containers/505cf7699dc0ce332515f383d5e633db9bb03bb835c6cca0349de0e7b44370ef/505cf7699dc0ce332515f383d5e633db9bb03bb835c6cca0349de0e7b44370ef-json.log
4.0K	/var/lib/docker/containers/adfd31bd32af160c405404a6f9a72c5af6e8c17ed76eb1575d52f3e8927a1e5a/adfd31bd32af160c405404a6f9a72c5af6e8c17ed76eb1575d52f3e8927a1e5a-json.log
44K	/var/lib/docker/containers/6858624ea5d4c1fffdaf2b735ce19c43c868c2fb86247b96fe1f34cddf07d556/6858624ea5d4c1fffdaf2b735ce19c43c868c2fb86247b96fe1f34cddf07d556-json.log
12K	/var/lib/docker/containers/daa981bcd2a258ef42846596b7d5358682cd4fd411cba746fd4149cbe24e9e8d/daa981bcd2a258ef42846596b7d5358682cd4fd411cba746fd4149cbe24e9e8d-json.log
200K	/var/lib/docker/containers/10c08786257e53d47164d8ac1452b0028cec62ce684f83f213e208af9e7b67dd/10c08786257e53d47164d8ac1452b0028cec62ce684f83f213e208af9e7b67dd-json.log
56K	/var/lib/docker/containers/4818f2742c121a147b316f6b0dd49776ad932ffd84e67433967007e0e0b5315e/4818f2742c121a147b316f6b0dd49776ad932ffd84e67433967007e0e0b5315e-json.log
4.0K	/var/lib/docker/containers/83ce32cd6082271ac3f8a626d4f1b853a1c5873ab1d6c102f9112a878bd725e4/83ce32cd6082271ac3f8a626d4f1b853a1c5873ab1d6c102f9112a878bd725e4-json.log
4.0K	/var/lib/docker/containers/12e8ff5a64161e9ef1c3c1b9daa74473148e53dba06480fb06c2679ec736eadd/12e8ff5a64161e9ef1c3c1b9daa74473148e53dba06480fb06c2679ec736eadd-json.log
6.8M	/var/lib/docker/containers/556c05bc2702ef63df881232bf9fc215ec31b7c94f69e8e3a3ed2423bd5ffd87/556c05bc2702ef63df881232bf9fc215ec31b7c94f69e8e3a3ed2423bd5ffd87-json.log
4.0K	/var/lib/docker/containers/b27ed0d71ab4f68ba8e439c024c6265e65f6043b9872d9f78a4cd68f8b472eca/b27ed0d71ab4f68ba8e439c024c6265e65f6043b9872d9f78a4cd68f8b472eca-json.log
40K	/var/lib/docker/containers/2edf10fa9347c586b7f32e594bb207852db074371aaed3f26298d8f1a55d0e67/2edf10fa9347c586b7f32e594bb207852db074371aaed3f26298d8f1a55d0e67-json.log
644K	/var/lib/docker/containers/1bc6ad427a6b61940a68c65b15da96e016b6ee733edce0fa6436537b4a9afe88/1bc6ad427a6b61940a68c65b15da96e016b6ee733edce0fa6436537b4a9afe88-json.log
89G	total

That 89Gb giant is just filling up with 100s of messages per second of this kind:

{"log":"{\"level\":\"warn\",\"time\":1715349912821,\"pid\":144,\"hostname\":\"c1db5eeac69c\",\"logContext\":{\"sessionId\":\"018f61ef-b287-795d-94b4-911b52b78104\",\"partition\":0,\"teamId\":1,\"topic\":\"session_recording_snapshot_item_events\",\"oldestKafkaTimestamp\":null,\"bufferCount\":0,\"referenceTime\":1715349912757,\"referenceTimeHumanReadable\":\"2024-05-10T14:05:12.757+00:00\",\"flushThresholdMs\":600000,\"flushThresholdJitteredMs\":488701.8770446094,\"flushThresholdMemoryMs\":586442.2524535313},\"msg\":\"[MAIN]  [session-manager] buffer has no oldestKafkaTimestamp yet\"}\n","stream":"stdout","time":"2024-05-10T14:05:34.679816487Z"}

wtf??

@feedanal
Copy link
Contributor

feedanal commented May 10, 2024

Thanks for suggestion, I've limited logfiles to 10mb in docker-compose.yml and now it's sorta under control, but hell, there should be a way to set log level...

    plugins:
        extends:
            file: docker-compose.base.yml
            service: plugins
        image: posthog/posthog:f1d32e6969f531577b32411e985d007f821643f6
        environment:
        logging:
            options:
                max-size: 10m  

@pauldambra
Copy link
Member

All the detail from folks is amazing here... one of the hardest things in understanding issues with self-hosted deployments is the variability in deployments means gathering info is super difficult so all this upfront detail is amazing.

I can believe that the elastic deployment of PostHog that we have could hide something that you all are experiencing

[session-manager] buffer has no oldestKafkaTimestamp yet

this is an unexpected condition

we receive a recording event
that might mean we create a new session manager for that session
then we add the event to the session manager
use the timestamp from that event to set oldestKafkaTimestamp

try {
this.buffer.oldestKafkaTimestamp = Math.min(
this.buffer.oldestKafkaTimestamp ?? message.metadata.timestamp,
message.metadata.timestamp
)
this.buffer.newestKafkaTimestamp = Math.max(
this.buffer.newestKafkaTimestamp ?? message.metadata.timestamp,
message.metadata.timestamp
)

i think that logically the presence of this log means either we're trapped in destroying state for a recording that's receiving traffic or your events don't have timestamps 🤯

does session replay work on these deployments?

@pauldambra
Copy link
Member

pauldambra commented May 10, 2024

for the logging volume

we use the pino log library in the plugin server...

you can set log level using the LOG_LEVEL environment variable

| LOG_LEVEL | minimum log level | `'info'` |

with supported values here

export enum LogLevel {
None = 'none',
Debug = 'debug',
Info = 'info',
Log = 'log',
Warn = 'warn',
Error = 'error',
}

the default level if not overridden in the environment is info

@pauldambra
Copy link
Member

and we can do this to turn down the amount of logging anyway #22251

@feedanal
Copy link
Contributor

feedanal commented May 10, 2024

i think that logically the presence of this log means either we're trapped in destroying state for a recording that's receiving traffic or your events don't have timestamps 🤯

does session replay work on these deployments?

This is interesting, as session replay worked upon initial installation, but stopped working later today. Playback won't start with "Buffering" message, and console shows error related to scenes.session-recordings.root missing from the store. Same as described here:
https://posthog.com/questions/video-not-playing

So I think it's a common issue at least with self-hosted. I will look into it tomorrow, will open another issue.

@feedanal
Copy link
Contributor

you can set log level using the LOG_LEVEL environment variable

| LOG_LEVEL | minimum log level | `'info'` |

with supported values here

export enum LogLevel {
None = 'none',
Debug = 'debug',
Info = 'info',
Log = 'log',
Warn = 'warn',
Error = 'error',
}

the default level if not overridden in the environment is info

Thanks for mentioning this! It's a very important setting, default should be 'error' I think... either way, worth adding to https://posthog.com/docs/self-host/configure/environment-variables

@pauldambra
Copy link
Member

will open another issue.

feel free to keep it here if it seems related.

thanks for taking the time 🥇

@feedanal
Copy link
Contributor

Another problem is Kafka logging:

Every 60.0s: du -h --max-depth=1 /var/lib/docker/overlay2 | sort -rh | head -15                                                              t.xfeed.com: Mon May 13 02:15:13 2024

238G    /var/lib/docker/overlay2
207G    /var/lib/docker/overlay2/8f83ff9dce79104e554c7e76c7805cd77af31cd15eb183f1ac6a518dadfaa389
5.0G    /var/lib/docker/overlay2/f80e07831c327082c849d0839efadda5d280e1858b782028594347aeec75b7d7
du -hsc /var/lib/docker/overlay2/8f83ff9dce79104e554c7e76c7805cd77af31cd15eb183f1ac6a518dadfaa389/diff/bitnami/kafka/data/session_recording_snapshot_item_events-0
100G	session_recording_snapshot_item_events-0

(everything doubles in size due to overlay2 diffs)

➜  data ll session_recording_snapshot_item_events-0 | more
total 100G
-rw-r--r-- 1 cook root   83K May 11 11:42 00000000000000000000.index
-rw-r--r-- 1 cook root  1.0G May 11 11:42 00000000000000000000.log
-rw-r--r-- 1 cook root  118K May 11 11:42 00000000000000000000.timeindex
-rw-r--r-- 1 cook root   89K May 11 12:04 00000000000000012626.index
-rw-r--r-- 1 cook root  1.0G May 11 12:04 00000000000000012626.log
-rw-r--r-- 1 cook root    10 May 11 11:42 00000000000000012626.snapshot
-rw-r--r-- 1 cook root  126K May 11 12:04 00000000000000012626.timeindex
-rw-r--r-- 1 cook root   88K May 11 12:26 00000000000000025804.index
-rw-r--r-- 1 cook root  1.0G May 11 12:26 00000000000000025804.log
-rw-r--r-- 1 cook root    10 May 11 12:04 00000000000000025804.snapshot
-rw-r--r-- 1 cook root  126K May 11 12:26 00000000000000025804.timeindex
-rw-r--r-- 1 cook root   80K May 11 12:51 00000000000000039145.index
[...]

Will figure how to take it under control and create a PR later.

@feedanal
Copy link
Contributor

I think this can be closed, as setting LOG_LEVEL, limiting logging and adusting Kafka log retention in docker-compose.yml completely fixed this for us: now we accumulate just couple of gigs of data daily, most of which in session recordings in MinIO, i.e. can be either offloaded to S3 or would be automatically wiped after 30 days, so space usage is under control now.

@pauldambra
Copy link
Member

I'll close it... folk are welcome to re-open or open a follow-up issue with more details if the fixes here don't work for you all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right
Projects
None yet
Development

No branches or pull requests

6 participants