Download a thread and its images from 4chan


4Chan Archiver

Author: Aaron DeVore

Archive images and post information from a 4chan thread.


  • Python 3.1+
  • BeautifulSoup 4.x
  • Jinja2 (optional, for HTML output)
  • Linux (not tested on Windows or Mac)


4chan has some nasty and/or illegal images. All images in a thread will be downloaded! Images from ads are not downloaded. Author(s) shall not be liable for any legal action result from use of this software. If you are unsure if a thread will contain illegal images, use tmpfs. Keep that stuff off your computer.


See LICENSE.txt.


./ [options] <board><thread> <destination>

board: Board (as in b)

thread: Thread ID

destination: Archive directory


-h, --help show this help message and exit
-b BOARD, --board=BOARD
 board name
-o, --overwrite-images
 Overwrite non-empty images
-u, --update update the thread
 Wait time between thread updates
 Wait time between image downloads

-b <board> or --board <board>: Board to use (default: b)

-o or --overwrite-images: Always download images instead of skipping already downloaded images


./ [options] <archive>


-h, --help show this help message and exit
--author AUTHOR
 Exact match author
--author-re AUTHOR_RE
 Regexp match author
--text TEXT Exact match message body
--text-re TEXT_RE
 Regexp match message body
--id ID, -i ID Message id (repeatable)
--ids IDS Comma separated id list
--images Match only posts with images
--no-images Match only posts without images
--green-text Match only posts with green text
 Match only posts without green text
--out OUT, -o OUT
 Output file
--htmldir HTMLDIR
 Write a full HTML directory including images
--format FORMAT
 Output format (html|json|plaintext)

Using --format html only outputs an HTML file. Use --htmldir to get a directory with all files properly in place.

Archive Layout

----images/: Downloaded images
----thread.js: JSON file

JSON format

Format for board-thread/thread.js file:

  "id": <thread ID>,
  "board": <board name>
  "mtime": <modified time using int(time.time())>
  "posts": [
          "image": {
              "original": The original name of the uploaded file
              "name": File name of the image as written to disk
          "utc": Time in seconds from the epoch
          "id": post ID,
          "text": Text of the post (raw HTML),
          "author": Author name (sometimes raw HTML)