Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: Long-running update end up with orphan chromium processes #550

Closed
mAAdhaTTah opened this issue Nov 25, 2020 · 2 comments
Closed
Labels
size: easy status: done Work is completed and released (or scheduled to be released in the next version) type: bug report why: functionality Intended to improve ArchiveBox functionality or features

Comments

@mAAdhaTTah
Copy link
Contributor

Describe the bug

I imported my Pocket library, totaling 27k+ links, and have been archiving those links on and off for a week. I went away for a few days, and figured I'd let the process run on my server. When I returned, I found the RAM completely maxed out on the box (16GBs) and dozens of stray Chromium processes still running. ArchiveBox was run in Docker, so I was able to kill the container and reclaim the RAM, rather than needing to kill all of the processes.

My theory is that the timeout doesn't kill the underlying process properly and so it just stayed open, but I'm not 100% sure.

Steps to reproduce

  1. Create large ArchiveBox db.
  2. Set low timeout for archving.
  3. Run archivebox update.
  4. Wait a while.
  5. Watch for stray Chromium processes.

Screenshots or log output

I can pull some logs if needed.

Software versions

  • OS: Ubuntu
  • ArchiveBox version: archivebox/archivebox:latest
  • Python version: 3.8 (whatever's in the Dockerfile)
  • Chrome version: Not sure (same as above, whatever's in the Dockerfile)
@pirate pirate added why: functionality Intended to improve ArchiveBox functionality or features size: easy good first ticket help wanted labels Jan 24, 2021
@berezovskyi
Copy link

berezovskyi commented Feb 6, 2021

This does not solve the problem but here is a workaround that I have developed on my system. I have a crontab entry to run this a few times at night:

#!/usr/bin/env bash
set -euxo pipefail

LOG=/home/driib/var/log/archivebox.log
LOG_PROGRESS=/home/driib/var/log/archivebox-update.log
REPEAT=10

touch "$LOG"

for n in {1..$REPEAT}; do
    docker restart archivebox
    sleep 10

    RESUME_ID=$( tail -n 1 "$LOG" )
    echo "[`date -Iseconds`] Restarting from $RESUME_ID" >> "$LOG_PROGRESS"
    RESUME="--resume $RESUME_ID"
    if [[ -z "$RESUME_ID" ]]; then
        RESUME=""
    fi
    docker exec -it -u archivebox archivebox archivebox update $RESUME | grep -P "\d{10}\.\d{6}" -o | sed -e 's/^[ \t]*//' >> "$LOG"

    test $? -gt 128 && break
done

If you want to update the whole archive, delete the $LOG file.

You should also enable swap limit support: https://docs.docker.com/engine/install/linux-postinstall/#your-kernel-does-not-support-cgroup-swap-limit-capabilities and set CPU/RAM limits.

@pirate
Copy link
Member

pirate commented Apr 6, 2021

I think I fixed this in e7c7a8f . Comment back here if you're still seeing issues with orphan child processes after v0.6 is released and I'll reopen the issue.

@pirate pirate closed this as completed Apr 6, 2021
@pirate pirate added status: done Work is completed and released (or scheduled to be released in the next version) and removed good first ticket help wanted labels Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size: easy status: done Work is completed and released (or scheduled to be released in the next version) type: bug report why: functionality Intended to improve ArchiveBox functionality or features
Projects
None yet
Development

No branches or pull requests

3 participants