Skip to content

Latest commit

 

History

History
79 lines (58 loc) · 2.4 KB

README.md

File metadata and controls

79 lines (58 loc) · 2.4 KB

GathererImageGatherer

This project downloads all the card images from gatherer.wizards.com and saves them in the folder cardImages/ with their name and set.

The images can be used to build a database of perceptual hashes. Since each card artwork has a unique perceptual hash, they can be compared with perceptual hashes of a card in a picture to identify them. If a card is identified, the information can be input into http://shop.tcgplayer.com/magic for the user to quickly get the price.

Dependencies

To run these programs you will need the python libraries BeautifulSoup, requests, imagehash, PIL, and psycopg2.

    $> pip install -r requirements.txt
    or
    $> conda env create -f environment.yml
    git clone https://github.com/eulerto/pg_similarity.git
    cd pg_similarity/
    USE_PGXS=1 make
    USE_PGXS=1 make install

In postgres:

    CREATE EXTENSION pg_similarity;

Use

Download Images

    python scrapeImages.py

This downloads all the card images from http://gatherer.wizards.com/Pages/Default.aspx and saves them in the folder cardImages/ with their name and set.

The folder of pictures ends up being 1.21 GB and it takes about 25 minutes to download.

Setup The Database

Once postgres is installed, create a database and table needed for the python script.

    psql
    create database cardimages;
    \c cardimages
    create table phash(name text, set text, hash text);

Build The Database

    $> python buildDatabase.py

Populates a postgresql database with card name, set, and a perceptual hash of the artwork from the images downloaded with scrapeImages.py

Test A Card

    $> python queryDatabase.py

TODOs

  • Reorganize folders
  • Add Docker-Compose to develop without installing Postgres locally
  • Add a license (i.e. MIT)
  • Refactor the output paths to download
  • Add a way to resume downloads and avoid repetitions
  • Add a way to download from another sources (like ebay or google)
  • Refactor the way to download files creating a subfolder by card
  • Merge several ways to build the dataset
  • Test all python scripts to check function after route refactor
  • Finish Makefile
  • Add a way to automatically launch sql script once
  • Update README with makefile and new sections
  • Add a notebook example