fix: automatic restart on changedetection memory leak #1831

adripo · 2023-10-03T10:58:01Z

~~Based on https://github.com/dgtlmoon/changedetection.io/wiki/Playwright-content-fetcher#playwright-memory-leak~~

A memory leak occurs when changedetection monitors large html pages with playwright.

I added automatic restart to docker compose when memory leak occurs.

dgtlmoon · 2023-10-03T11:26:06Z

docker-compose.yml

+      deploy:
+        resources:
+          limits:
+            memory: 1G


this is super cool, however, even better is to leave this commented out # so that people can install it

juuuust incase someone really needs it to run with that much ram, atleast they can find an easy solution

dgtlmoon · 2023-10-03T11:44:28Z

@adripo please dont make silly demands in other issues, the memory issue is actually due to LXML string handling , and you even put the memory restart feature on the changedetection.io container, not the browserless container

adripo · 2023-10-04T15:03:37Z

I apologize for any misunderstanding caused by my previous messages. I've taken your feedback and removed my question from the other issue.

adripo · 2023-10-04T15:28:48Z

Going back to this PR, I followed the suggestion from the wiki, where it suggests to kill the changedetection service and not the browserless service.
This behaviour works fine for me and solves the memory leak from this container.

I've also tested moving the restart policy on the browserless container instead of the changedetection one and this is the RAM usage after 1d.

browserless container with restart policy - no restart in 1d

changedetection container without restart policy - already 4GB ram

This is the docker compose that I am using:

docker-compose.yml

version: "3"

services:
app:
  image: ghcr.io/dgtlmoon/changedetection.io:latest
  environment:
     - PUID=1000
     - PGID=1000
    # Alternative Playwright URL, do not use "'s or 's!
    - PLAYWRIGHT_DRIVER_URL=ws://playwright-chrome:3000/?--disable-web-security=true
    # Base URL of your changedetection.io install (Added to the notification alert)
    - BASE_URL=${APP_BASE_URL}
    # Hides the `Referer` header so that monitored websites can't see the changedetection.io hostname.
    - HIDE_REFERER=true
  volumes:
    - data:/datastore
  deploy:
    resources:
      limits:
        memory: 2G
    restart_policy:
      condition: on-failure
  restart: unless-stopped
  depends_on:
    - playwright-chrome

playwright-chrome:
  image: browserless/chrome:1-chrome-stable
  environment:
    # Set session timeout to 10 min
    - CONNECTION_TIMEOUT=600000
    - DEFAULT_STEALTH=true
    # Ignore HTTPS errors, like for self-signed certs
    - DEFAULT_IGNORE_HTTPS_ERRORS=true
  restart: unless-stopped

volumes:
data:
  driver: local
  driver_opts:
    type: none
    o: bind
    device: /path/to/data

Without the restart policy it already saturated all my 32GB of RAM multiple times in the past.

dgtlmoon · 2023-10-04T17:58:43Z

I understand, another critical thing is that the docker shutdown MUST send "SIGTERM" not SIGKILL or it can risk corrupting the database, does this cover that situation?

adripo · 2023-10-04T21:13:30Z

It just limits the container memory and generates an OOME exception. The kernel should manage the exception and kill the process, so docker detects it and restarts the container. The signal should be managed by the kernel so the behaviour could change from system to system.

More details here: https://docs.docker.com/compose/compose-file/compose-file-v3/#out-of-memory-exceptions-oome

Constantin1489 · 2023-10-05T14:30:00Z

I think the CD manual(or wiki) that we have, misleads us. it wrote about the playwright with their issue but the script is for CD.

So the wiki should provide about OOM killer for playwright container only.

@adripo I noticed CD doesn't write changes immediately. So when CD received the 'podman rmi --force' or whatever the unwritten changes were just gone. I've just read about OOM-killer(https://unix.stackexchange.com/a/305840). It sends SIGKILL. Also 'trap' cannot trap SIGKILL as I know. It will cause data loss. (FYI, CD has signal.signal(signal.SIGTERM, sigterm_handler))

Also, FYI, I don't know whether podman has this problem but Docker for some versions has. Hitting Container Kernel Memory Limit causes OOM on Docker Host

dgtlmoon · 2023-10-05T14:53:20Z

For the sake of the argument, here is our SIGTERM handler

https://github.com/dgtlmoon/changedetection.io/blob/master/changedetectionio/changedetection.py#L27

https://github.com/dgtlmoon/changedetection.io/blob/master/changedetectionio/changedetection.py#L105

if SIGKILL is called without SIGTERM... who knows what can happen

dgtlmoon · 2023-10-05T14:54:51Z

@adripo please update the title and description of your issue here, the memory leak you are 'fixing' is actually nothing todo with playwight other than processing huge amounts of HTML

Constantin1489 · 2023-10-05T15:42:12Z

@dgtlmoon FYI, actually my CD container doesn't stop correctly with SIGTERM if the --time is short (Mine is default 10s). (Hmmm.?)
--stop: first sigterm with waiting and sigkill.

#I use podman instead of docker.
[shelf@localhost ~]$ podman stop xpath_for_infra 
WARN[0010] StopSignal SIGTERM failed to stop container xpath_for_infra in 10 seconds, resorting to SIGKILL 
xpath_for_infra

dgtlmoon · 2023-10-05T16:21:07Z

@dgtlmoon FYI, actually my CD container doesn't stop correctly with SIGTERM if the --time is short (Mine is default 10s). (Hmmm.?)
--stop: first sigterm with waiting and sigkill.

aaah ok! yes this needs a test then, can you confirm its stopping at all under SIGTERM?

Constantin1489 · 2023-10-05T16:27:19Z

@dgtlmoon I actually noticed that your trap doesn't capture it.

This CD has two changedetction.py.

changedetection.io/changedetection.py

Line 27 in 5957880

#signal.signal(signal.SIGCHLD, sigchld_handler)

Maybe SIGTERM goes wrong py or something. Because I didn't read all the code in your program. I'm not sure how to fix it currently. I will look at it within two weeks. Currently, I'm quite busy to deep dive.

adripo · 2023-10-06T11:42:25Z

@adripo please update the title and description of your issue here, the memory leak you are 'fixing' is actually nothing todo with playwight other than processing huge amounts of HTML

Okay, then I think there is another issue with changedetection. I just have 40 amazon products that I am monitoring once every hour. Can this be a problem? Why is the memory usage continuously growing?

All the watched pages are using playwright. In the past I used basic html without noticing this memory issue, but some pages were continuously failing to parse with an error message that "filter pattern cannot be found". I had to rerun them multiple times before succeeding.

adripo · 2023-10-06T11:50:02Z

I think the CD manual(or wiki) that we have, misleads us. it wrote about the playwright with their issue but the script is for CD.

So the wiki should provide about OOM killer for playwright container only.

@adripo I noticed CD doesn't write changes immediately. So when CD received the 'podman rmi --force' or whatever the unwritten changes were just gone. I've just read about OOM-killer(https://unix.stackexchange.com/a/305840). It sends SIGKILL. Also 'trap' cannot trap SIGKILL as I know. It will cause data loss. (FYI, CD has signal.signal(signal.SIGTERM, sigterm_handler))

Also, FYI, I don't know whether podman has this problem but Docker for some versions has. Hitting Container Kernel Memory Limit causes OOM on Docker Host

Thank you for the input. I wasn't aware of this issue with docker and didn't happen for now.

I think it's better to find a solution for the growing memory and avoid those limitations then.

dgtlmoon · 2023-10-06T12:22:29Z

I believe the memory issue is part of a LXML parsing bug, which may actually related to the way the memory is 'perceived' to be used, i spent 3 weeks of unpaid time digging around and I couldnt figure it out... LXML is used to parse the HTML etc

all i can say is - dig in the direction of LXML :)

also - you can work around this by just restarting your containers reguraly

closing for now - lets add a automated test to test the shutdown ability in SIGTERM

fix: automatic restart on memory leak

e642f10

dgtlmoon reviewed Oct 3, 2023

View reviewed changes

adripo added 2 commits October 4, 2023 17:31

docs: comment out the policy

22d09bb

refactor: reformat code

651d889

dgtlmoon closed this Oct 5, 2023

dgtlmoon reopened this Oct 5, 2023

adripo changed the title ~~fix: automatic restart on memory leak~~ fix: automatic restart on changedetection memory leak Oct 6, 2023

dgtlmoon closed this Oct 6, 2023

adripo mentioned this pull request Oct 7, 2023

fix: service restart on changedetection memory leak #1840

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: automatic restart on changedetection memory leak #1831

fix: automatic restart on changedetection memory leak #1831

adripo commented Oct 3, 2023 •

edited

Loading

dgtlmoon Oct 3, 2023

dgtlmoon commented Oct 3, 2023

adripo commented Oct 4, 2023

adripo commented Oct 4, 2023

dgtlmoon commented Oct 4, 2023

adripo commented Oct 4, 2023 •

edited

Loading

Constantin1489 commented Oct 5, 2023 •

edited

Loading

dgtlmoon commented Oct 5, 2023

dgtlmoon commented Oct 5, 2023

Constantin1489 commented Oct 5, 2023 •

edited

Loading

dgtlmoon commented Oct 5, 2023

Constantin1489 commented Oct 5, 2023 •

edited

Loading

adripo commented Oct 6, 2023 •

edited

Loading

adripo commented Oct 6, 2023

dgtlmoon commented Oct 6, 2023 •

edited

Loading

fix: automatic restart on changedetection memory leak #1831

fix: automatic restart on changedetection memory leak #1831

Conversation

adripo commented Oct 3, 2023 • edited Loading

dgtlmoon Oct 3, 2023

Choose a reason for hiding this comment

dgtlmoon commented Oct 3, 2023

adripo commented Oct 4, 2023

adripo commented Oct 4, 2023

dgtlmoon commented Oct 4, 2023

adripo commented Oct 4, 2023 • edited Loading

Constantin1489 commented Oct 5, 2023 • edited Loading

dgtlmoon commented Oct 5, 2023

dgtlmoon commented Oct 5, 2023

Constantin1489 commented Oct 5, 2023 • edited Loading

dgtlmoon commented Oct 5, 2023

Constantin1489 commented Oct 5, 2023 • edited Loading

adripo commented Oct 6, 2023 • edited Loading

adripo commented Oct 6, 2023

dgtlmoon commented Oct 6, 2023 • edited Loading

adripo commented Oct 3, 2023 •

edited

Loading

adripo commented Oct 4, 2023 •

edited

Loading

Constantin1489 commented Oct 5, 2023 •

edited

Loading

Constantin1489 commented Oct 5, 2023 •

edited

Loading

Constantin1489 commented Oct 5, 2023 •

edited

Loading

adripo commented Oct 6, 2023 •

edited

Loading

dgtlmoon commented Oct 6, 2023 •

edited

Loading