4chan scraper CLI tool written in Go that downloads all images, videos and gifs from a 4chan thread (no setup required)!
Version: 1.4 - Thread Search Edition
4scraper is an open source command line tool written in Go that quickly finds and downloads all images, videos and gifs in a given thread. No setup or installation required, and no fluff.
- Run 4scraper
- Enter/paste the thread you'd like to download from and press
Enter
- Wait for the download to finish
- Profit
- 1. How to install
- 2. How to use
- 3. How it works
- 4. Found a bug? Have suggestions?
- 5. Known issues
- 6. Like 4scraper?
Head on over to the releases tab and pick the version you need. There's no setup or installation required; simply run the downloaded exe
or bin
file and you're good to go. Or you can download the code and build it yourself.
Optional: Create a config file as outlined in 2.2 Configuration
Click on the thumbnail below to watch the shortest video tutorial of your life (it's a YouTube link).
Added in v1.1, 4scraper can now be executed in silent mode by setting the flag and passing a URL as arg. Here's a full description of flags available. More information is available at 4scraper.exe --help
. For brevity I'm using 4scraper.exe
but on linux it would be ./4scraper.bin
:
Usage:
4scraper.exe [options] [URL]
[URL]
Full 4chan thread URL to download files from
OPTIONS:
-h, --help
Show this help message and exit
-v, --version
Show version number and exit
-o, --output [DIRECTORY]
Specify output directory for downloaded files
-s, --silent
Run in silent mode (no output), requires URL
-f, --find [BOARD] [KEYWORDS]
Search for threads in the specified board that match the given keywords
[BOARD] is the name of the 4chan board (e.g., 'g')
[KEYWORDS] are the terms to search for (e.g., 'linux desktop')
- If no
URL
is provided, the--silent
flag will be ignored and 4scraper will ask you to enter a thread URL as if you executed without flags. - If an
output
directory is specified, it will override thedownloads/board/threadid
directory structure and theBoardDir
ThreadDir
options set in config will be ignored. - When using
--find
, the first argument should be the board code, followed by search terms. Search terms must all be found for the thread to be returned.
As of v1.2 you can add a configuration file to setup basic settings. This is entirely optional and the software will run even if there's no config file set. If you'd like to create and setup a config file read on.
- Create a file named
4scraper.config
in the same directory as the 4scraper executable - Copy and paste the following inside it and save
# 4scraper config file
BoardDir = true
ThreadDir = true
UseOriginalFilename = true
ParallelDownload = true
Extensions = "jpeg,jpg,png,gif,webp,bpm,tiff,svg,mp4,mov,avi,webm,flv"
- Lines starting with
#
are ignored - All settings should be in the format
[key] = [value]
- These are the settings you can adjust:
BoardDir
: iftrue
a directory with the board code will be created in thedownloads
directory for organization (e.g.downloads/g/
)ThreadDir
: iftrue
a directory with the thread id will be created in thedownloads
orboard
directory for organization (e.g.downloads/g/4568995/
)UseOriginalFilename
: iffalse
a new unique filename will be generated (usingUUID
)ParallelDownload
: iftrue
downloads files concurrently up to a maximum of20
concurrent threadsExtensions
: any file type that isn't in the list won't be downloaded
- If both
BoardDir
andThreadDir
are turned off, all downloaded files will go in thedownloads/
directory
If no config file is created, the setting defaults are as shown here.
4scraper was developed as an exercise in getting my hands dirty with Go, so there's nothing wild going on behind the scenes. Still, this is how it works, in case anyone's interested.
- Once a thread URL is provided, the
board
andthreadId
are extracted for later, and an indeterminate ProgressBar is initiated and shown on screen - Next, Colly is used to scrape and retrieve all file URLs and filenames in the thread; files are found by looking for
.filetext > a
- All files found are stored in a
[]DownloadableFile slice
for later and theProgressBar
is updated to reflect the total files to download - We create directories in this structure:
downloads/<board>/<threadId>
to hold all downloaded files - For each
DownloadableFile
found we first check if the filename already exists (and append a random number to the filename if it does) then we download it from 4chan- if
ParallelDownloads
istrue
, the download code is called in a go coroutine with a queue that manages maximum threads
- if
Feel free to use the Issues tab above (or click here) if you've found bugs, have problems running 4scraper, have suggestions for improvements or general tips on how I can make the Go code better.
- None
If you're feeling generous, buy me a beer! - https://www.buymeacoffee.com/criticalsession 🍺❤️