imgscrapy

A simple CLI image scraper written in python with support for headless scraping of dynamic websites.

Installation

Build from source

git clone https://github.com/arutselvan/ImgScrapy
cd ImgScrapy
python setup.py install

As a Python package

pip install --user imgscrapy

Requirements

python>=3.6

Usage

usage: imgscrapy [-h] [-d DIRECTORY] [-i] [-n NFIRST] [-t NTHREADS] [-hd] [-to TIMEOUT] target_url

Downloads images from the given URL

positional arguments:
  target_url            URL to scrape images from
optional arguments:
  -h, --help            show this help message and exit
  -d DIRECTORY, --directory DIRECTORY
                        Directory in which images should be downloaded
  -i, --injected        Scrape images from a dynamic website and JS injected images
  -n NFIRST, --nfirst NFIRST
                        Scrape the first n images
  -t NTHREADS, --nthreads NTHREADS
                        Maximum number of threads to use
  -hd, --head           Open chromium for scraping JS injected source/images
  -to TIMEOUT, --timeout TIMEOUT
                        Timeout value for obtaining page source

Examples

Download all images from a static website

imgscrapy <Target URL>

Download the first 5 images from a dynamic website

imgscrapy <Target URL> -i --nfirst 5

Note

ImgScrapy uses pyppeteer which uses Chromium for headless scraping. When scraping a dynamic website for the first time, Chromium will be downloaded automatically which might take some time.

To Do

Write tests
Add support for Base64 images
Add support for embedded/inline svg files
Fix issues with headless browsing of dynamic sites with modal/popup
Fix issue with missing trailing slash in URL resolution
Add option to dump URL of downloaded/failed images

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
imgscrapy		imgscrapy
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

imgscrapy

imgscrapy

.gitignore

.gitignore

LICENSE.txt

LICENSE.txt

MANIFEST.in

MANIFEST.in

README.md

README.md

requirements.txt

requirements.txt

setup.cfg

setup.cfg

setup.py

setup.py

Repository files navigation

imgscrapy

Installation

Build from source

As a Python package

Requirements

Usage

Examples

Note

To Do

License

About

Releases

Packages

Languages

License

Arutselvan/ImgScrapy

Folders and files

Latest commit

History

Repository files navigation

imgscrapy

Installation

Build from source

As a Python package

Requirements

Usage

Examples

Note

To Do

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages