Image Crawler

Multi-threaded tool to download images from the internet.

General Information

This project crawls the images from the Google search engine based on the objects specified.
This tool helps to scrape the images required for your model building, reducing the manual efforts.

Features

You can specify the output directory
Runs in both headless and head mode
You can specify the number of images to download
You can specify the maximum number of google suggestions to use
You can specify the maximum workers to use for ThreadPool

Screenshots

Setup

Clone this repo using

git clone https://github.com/Anil-45/ImageCrawler.git

Install the required modules using

pip install -r requirements.txt

Usage

--object Specify comma seperated strings to search for
--out_dir Specify output directory(default: ./images)
--headless Run with or without web driver GUI open
--max_count Maximum number of images to download(default: DEFAULT_IMG_COUNT)

Example to run in background:

python main.py --object "cat, dog" --headless --out_dir "./images"  --max_count 25

Example to run in foreground:

python main.py --object "cat, dog" --out_dir "./images"  --max_count 25

You can configure more parameters using constants.py

DEFAULT_IMG_COUNT = 50 specifies the number of images to download. MAX_WORKERS = 50 specifies maximum workers to use for ThreadPool. MAX_SUGGESTIONS = 25 specifies the number of URL suggestions by Google to be used. If you are trying to download a large number of images, keep this value high.

You can find the logs in image_crawler.log

Room for Improvement

Add user interface

Contact

Created by @Anil_Reddy

License

This project is available under the MIT.

Disclaimer

This tool downloads the images shown based on Google ranking. Some of them may be subject to copyright. Please be aware while using them.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
screenshots		screenshots
LICENSE		LICENSE
README.md		README.md
browser.py		browser.py
constants.py		constants.py
logger.py		logger.py
main.py		main.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Crawler

Table of Contents

General Information

Features

Screenshots

Setup

Usage

Room for Improvement

Contact

License

Disclaimer

About

Releases

Packages

Languages

License

Anil-45/ImageCrawler

Folders and files

Latest commit

History

Repository files navigation

Image Crawler

Table of Contents

General Information

Features

Screenshots

Setup

Usage

Room for Improvement

Contact

License

Disclaimer

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages