Skip to content

Latest commit

 

History

History
137 lines (99 loc) · 5.13 KB

readme.md

File metadata and controls

137 lines (99 loc) · 5.13 KB

arxiv project page

Easy Image Scraping from Google, Bing, Yahoo and Baidu

Automatically scrape images with your query from the popular search engines

  • Google
  • Bing
  • Baidu
  • Yahoo (currently only low resolution)

using an easy-to-use Frontend or using scripts.

This code is part of a paper (citation), also check the project page if you are interested in creation a dataset for instance segmentation.

Usage

Front End

Start the front end with a single command (adjust the /PATH/TO/OUTPUT to your desired output path)

docker run -it --rm --name easy_image_scraping --mount type=bind,source=/PATH/TO/OUTPUT,target=/usr/src/app/output -p 5000:5000 ghcr.io/a-nau/easy-image-scraping:latest

Enter your query and wait for the results to show in the output folder. The web applications also shows a preview of downloaded images.

Command Line

Start using the command line with

docker run -it --rm --name easy_image_scraping --mount type=bind,source=/PATH/TO/OUTPUT,target=/usr/src/app/output -p 5000:5000 ghcr.io/a-nau/easy-image-scraping:latest bash

Search for a keyword

If you just want to search for a single keywords adjust and run search_by_keyword.py

Search for a list of keywords

  • Write the list of search terms in the file search_terms_eng.txt.
  • You can then use Google Translate to translate the whole file to new languages. Change the ending of the translated file to the respective language.
  • Adjust config.py to define search engines for each language
  • Run search_by_keywords_from_files

Installation (optional)

This is optional - you can also directly use our provided container.

Docker

You can also build the image yourself using

docker build -t easy_image_scraping .

The run it by using

docker run -it --rm --name easy_image_scraping -p 5000:5000 --mount type=bind,source=/PATH/TO/OUTPUT,target=/usr/src/app/output easy_image_scraping
For Local Setup, check this

Local installation

  • Set up an environment using
    conda env create -f environment.yml
    or
    pip install -r requirements.txt
  • To use Selenium, we need to download the Chrome Driver (also see this)
  • Check your Chrome Version and download the corresponding webdriver version
  • Unzip it, and add it to the path (for details, see here). Alternatively, you can adjust scrape_and_download.py
    with webdriver.Chrome(
        executable_path="path/to/chrome_diver.exe",  # add this line
        options=set_chrome_options()
    ) as wd:

Affiliations

FZI Logo

License and Credits

Unless stated otherwise, this project is licensed under the MIT license.

Citation

If you use this code for scientific research, please consider citing

@inproceedings{naumannScrapeCutPasteLearn2022,
	title        = {Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to Parcel Logistics},
	author       = {Naumann, Alexander and Hertlein, Felix and Zhou, Benchun and Dörr, Laura and Furmans, Kai},
	booktitle    = {{{IEEE Conference}} on {{Machine Learning}} and Applications ({{ICMLA}})},
	date         = 2022
}

Disclaimer

Please be aware of copyright restrictions that might apply to images you download.