gia: General Image Aggregator

🤖📷 A powerful image aggregator for data science projects

This is a CLI tool and/or library for automating/standardizing what images you download for a data science project.

Installation

First, download the chromedriver binary here, and point the environment variable CHROME_DRIVER_LOC to it.

Pip

pip install gia

From source

git clone https://github.com/cooperhammond/gia
cd gia
sudo python setup.py install

Usage

CLI Usage

usage: gia [-h] [--depth DEPTH] destination classes queries 

positional arguments:
  destination               ABSOLUTE path for where your images should be
                            downloaded
  classes                   a python list in a string of the classes for the        
                            queries
  queries                   a python list of lists in a string of the queries       
                            corresponding to each class

optional arguments:
  -h, --help                show this help message and exit
  --depth DEPTH, -d DEPTH   the default depth to go through queries for images

The "depth" of a query literally indicates how far down the Google results page the scraper will scroll. With a depth of 0, there will be no scrolling, a depth of 1 indicates that the end key will be passed twice, a depth of two means two end presses, and so on. Each increment of depth means means 80 images will be downloaded from that query, but the exact number varies depending on Google's mood and your browser's cache. It's meant to be a general indicator of how much to weight queries.

The destination needs to be an absolute path because it is being plugged into chromedriver as the default download folder and chromedriver has no memory of the location it was spawned from.

Queries are plugged directly into the Google search bar, so you can use all of the fancy tricks you normally can do with it.

Example usage:

$ gia ~/dev/data "['jeff bezos', 'bill gates']" "[['jeff bezos', 'jeff bezos face'], ['bill gates', 'bill gates face']]"

Output:

~/dev/data
+-- _jeff bezos
    +-- 00000.jpg
    +-- 00001.jpg
    +-- ...
+-- _bill gates
    +-- 00000.jpg
    +-- 00001.jpg
    +-- ...
+-- jeff bezos.csv
+-- bill gates.csv

By default there depth is 0, so there is no scroll, but the --depth parameter can set the default depth for every query. If you don't want a query weighted so heavily, you can be more specific:

[[..., 'pepperoni pizza', ...], ...] => [[..., ['pepperoni', 5], ...]]

Example usage:

$ gia ~/dev/data --depth 3 "['pizza']" "[['pineapple pizza', 'pepperoni pizza', 'egg pizza']]"

Output:

~/dev/data
+-- _pizza # will have a much larger amount of images compared to above example
    +-- 00000.jpg
    +-- 00001.jpg
    +-- ...
+-- pizza.csv

Module Usage

Everything that applies for the CLI applies to the library as well.

from gia import ImageAggregator

destination = '~/dev/cool-data-science-project/data'
classes = ['steve jobs', 'jack black']
queries = [
    ["steve jobs' face", ['"steve jobs" -jack -black', 5]],
    ["jack black's face", ['"jack black" -steve -jobs', 4]],
]
depth = 2

ia = ImageAggregator(destination, classes, queries, default_depth=depth)
ia.aggregate()

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
gia		gia
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gia: General Image Aggregator

Installation

Pip

From source

Usage

CLI Usage

Module Usage

About

Releases

Packages

Languages

License

cooperhammond/gia

Folders and files

Latest commit

History

Repository files navigation

gia: General Image Aggregator

Installation

Pip

From source

Usage

CLI Usage

Module Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages