Skip to content

a-nau/easy-image-scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

arxiv project page

Easy Image Scraping from Google, Bing, Yahoo and Baidu

Automatically scrape images with your query from the popular search engines

  • Google
  • Bing
  • Baidu
  • Yahoo (currently only low resolution)

using an easy-to-use Frontend or using scripts.

This code is part of a paper (citation), also check the project page if you are interested in creation a dataset for instance segmentation.

Usage

Front End

Start the front end with a single command (adjust the /PATH/TO/OUTPUT to your desired output path)

docker run -it --rm --name easy_image_scraping --mount type=bind,source=/PATH/TO/OUTPUT,target=/usr/src/app/output -p 5000:5000 ghcr.io/a-nau/easy-image-scraping:latest

Enter your query and wait for the results to show in the output folder. The web applications also shows a preview of downloaded images.

Command Line

Start using the command line with

docker run -it --rm --name easy_image_scraping --mount type=bind,source=/PATH/TO/OUTPUT,target=/usr/src/app/output -p 5000:5000 ghcr.io/a-nau/easy-image-scraping:latest bash

Search for a keyword

If you just want to search for a single keywords adjust and run search_by_keyword.py

Search for a list of keywords

  • Write the list of search terms in the file search_terms_eng.txt.
  • You can then use Google Translate to translate the whole file to new languages. Change the ending of the translated file to the respective language.
  • Adjust config.py to define search engines for each language
  • Run search_by_keywords_from_files

Installation (optional)

This is optional - you can also directly use our provided container.

Docker

You can also build the image yourself using

docker build -t easy_image_scraping .

The run it by using

docker run -it --rm --name easy_image_scraping -p 5000:5000 --mount type=bind,source=/PATH/TO/OUTPUT,target=/usr/src/app/output easy_image_scraping
For Local Setup, check this

Local installation

  • Set up an environment using
    conda env create -f environment.yml
    or
    pip install -r requirements.txt
  • To use Selenium, we need to download the Chrome Driver (also see this)
  • Check your Chrome Version and download the corresponding webdriver version
  • Unzip it, and add it to the path (for details, see here). Alternatively, you can adjust scrape_and_download.py
    with webdriver.Chrome(
        executable_path="path/to/chrome_diver.exe",  # add this line
        options=set_chrome_options()
    ) as wd:

Affiliations

FZI Logo

License and Credits

Unless stated otherwise, this project is licensed under the MIT license.

Citation

If you use this code for scientific research, please consider citing

@inproceedings{naumannScrapeCutPasteLearn2022,
	title        = {Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to Parcel Logistics},
	author       = {Naumann, Alexander and Hertlein, Felix and Zhou, Benchun and Dörr, Laura and Furmans, Kai},
	booktitle    = {{{IEEE Conference}} on {{Machine Learning}} and Applications ({{ICMLA}})},
	date         = 2022
}

Disclaimer

Please be aware of copyright restrictions that might apply to images you download.