Easy Image Scraping from Google, Bing, Yahoo and Baidu

Automatically scrape images with your query from the popular search engines

Google
Bing
Baidu
Yahoo (currently only low resolution)

using an easy-to-use Frontend or using scripts.

This code is part of a paper (citation), also check the project page if you are interested in creation a dataset for instance segmentation.

Usage

Front End

Start the front end with a single command (adjust the /PATH/TO/OUTPUT to your desired output path)

docker run -it --rm --name easy_image_scraping --mount type=bind,source=/PATH/TO/OUTPUT,target=/usr/src/app/output -p 5000:5000 ghcr.io/a-nau/easy-image-scraping:latest

Enter your query and wait for the results to show in the output folder. The web applications also shows a preview of downloaded images.

Command Line

Start using the command line with

docker run -it --rm --name easy_image_scraping --mount type=bind,source=/PATH/TO/OUTPUT,target=/usr/src/app/output -p 5000:5000 ghcr.io/a-nau/easy-image-scraping:latest bash

Search for a keyword

If you just want to search for a single keywords adjust and run search_by_keyword.py

Search for a list of keywords

Write the list of search terms in the file search_terms_eng.txt.
You can then use Google Translate to translate the whole file to new languages. Change the ending of the translated file to the respective language.
Adjust config.py to define search engines for each language
Run search_by_keywords_from_files

Installation (optional)

This is optional - you can also directly use our provided container.

Docker

You can also build the image yourself using

docker build -t easy_image_scraping .

The run it by using

docker run -it --rm --name easy_image_scraping -p 5000:5000 --mount type=bind,source=/PATH/TO/OUTPUT,target=/usr/src/app/output easy_image_scraping

For Local Setup, check this

Local installation

Set up an environment using

conda env create -f environment.yml

or

pip install -r requirements.txt

To use Selenium, we need to download the Chrome Driver (also see this)
Check your Chrome Version and download the corresponding webdriver version

Unzip it, and add it to the path (for details, see here). Alternatively, you can adjust scrape_and_download.py

with webdriver.Chrome(
    executable_path="path/to/chrome_diver.exe",  # add this line
    options=set_chrome_options()
) as wd:

Affiliations

License and Credits

Code is partially based on and borrowed from
- sczhengyabin/Image-Downloader ( mostly crawler.py) , MIT License
- Article with Gists by Fabian Bosler, see fetch_image_urls.py
Dockerfile is based on joyzoursky/ docker-python-chromedriver , MIT License
Cookie notices are handled by the I still don't care about cookies extension GNU General Public License v3.0

Unless stated otherwise, this project is licensed under the MIT license.

Citation

If you use this code for scientific research, please consider citing

@inproceedings{naumannScrapeCutPasteLearn2022,
	title        = {Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to Parcel Logistics},
	author       = {Naumann, Alexander and Hertlein, Felix and Zhou, Benchun and Dörr, Laura and Furmans, Kai},
	booktitle    = {{{IEEE Conference}} on {{Machine Learning}} and Applications ({{ICMLA}})},
	date         = 2022
}

Disclaimer

Please be aware of copyright restrictions that might apply to images you download.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Easy Image Scraping from Google, Bing, Yahoo and Baidu

Usage

Front End

Command Line

Search for a keyword

Search for a list of keywords

Installation (optional)

Docker

Local installation

Affiliations

License and Credits

Citation

Disclaimer

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Easy Image Scraping from Google, Bing, Yahoo and Baidu

Usage

Front End

Command Line

Search for a keyword

Search for a list of keywords

Installation (optional)

Docker

Local installation

Affiliations

License and Credits

Citation

Disclaimer