<a href="https://colab.research.google.com/github/As-12/Temple-Image-Classification/blob/master/3_Data_Mining_Task_for_Temple_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Mining

This task documents the Data Mining process for Temple Classification Project.
The process leverages Google Image Search API and Bing Image Search API to obtain training samples.



## Install API Wrapper

Install HTTP API Wrapper for Google Image and Bing Image search. This will require you to setup API Key with the search vendor. See library documentation.

In [0]:
!pip install Google-Images-Search
!pip install bing-image-downloader

Collecting Google-Images-Search
  Downloading https://files.pythonhosted.org/packages/a9/16/108c16faaa4acb145e80b41aa3b483ba4665bc700bce73bd3be9fc622943/Google_Images_Search-1.1.1-py2.py3-none-any.whl
Collecting python-resize-image~=1.1
  Downloading https://files.pythonhosted.org/packages/bc/89/008481c95551992e1a77503eba490b75fd17c0a98e33dd4dc39e0b99e5e8/python_resize_image-1.1.19-py2.py3-none-any.whl
Collecting pyfiglet~=0.8
[?25l  Downloading https://files.pythonhosted.org/packages/33/07/fcfdd7a2872f5b348953de35acce1544dab0c1e8368dca54279b1cde5c15/pyfiglet-0.8.post1-py2.py3-none-any.whl (865kB)
[K     |████████████████████████████████| 870kB 12.8MB/s 
Collecting Pillow~=6.0
[?25l  Downloading https://files.pythonhosted.org/packages/8a/fd/bbbc569f98f47813c50a116b539d97b3b17a86ac7a309f83b2022d26caf2/Pillow-6.2.2-cp36-cp36m-manylinux1_x86_64.whl (2.1MB)
[K     |████████████████████████████████| 2.1MB 53.3MB/s 
Collecting colorama~=0.4
  Downloading https://files.pythonhosted.org/pa

Collecting bing-image-downloader
  Downloading https://files.pythonhosted.org/packages/3e/a4/3bb02e37f672bed92316bcd708c16d28942aa1ba10a1c2938f76606cc36f/bing_image_downloader-1.0.2-py3-none-any.whl
Installing collected packages: bing-image-downloader
Successfully installed bing-image-downloader-1.0.2


## Imports

In [0]:
from google_images_search import GoogleImagesSearch
from bing_image_downloader import downloader
import os

## Mount Google Drive to Store Mined Data

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
os.chdir("drive/My Drive/Colab Notebooks/")

In [0]:
os.getcwd()

'/content/drive/My Drive/Colab Notebooks'

## Setup Google Image Search API

Ensure that the API keys are set your local environment variable

In [0]:
# Get environment variables
GOOGLE_API_KEY = os.getenv('GOOGLE_API_KEY')
GOOGLE_CX_ID = os.environ.get('GOOGLE_SEARCH_ENGINE_ID')  

google_api = GoogleImagesSearch(GOOGLE_API_KEY, GOOGLE_CX_ID , validate_images=True)    

## Mine the Data from Google Image Search API

Due to API rate limit, this may take several days.



In [0]:
def scrape_google_image(search_term, folder_name):
    _search_params = {
    'q': search_term,
    'num': 10,
    'safe': 'off'
    }
    google_api.search(search_params=_search_params)
    for _ in range(1000):
        for image in google_api.results():
            image.download(os.path.join(os.getcwd(), "dataset", "google", folder_name))
            print("Downloaded:  {}".format( image.url ))
        google_api.next_page()

#### Scrape Wat Pho

In [0]:
scrape_google_image('wat pho', 'wat-pho')

#### Scrape Wat Phra Kaew

In [0]:
scrape_google_image('wat phra kaew', 'wat-phra-kaew')

## Mine the Data from Bing Image Search API

Due to API rate limit, this may take several days.

#### Scrap Wat Pho

In [0]:
downloader.download("wat pho", limit=1000, adult_filter_off=True, force_replace=False)



[!!]Indexing page: 1

[%] Indexed 150 Images on Page 1.


[%] Downloading Image #1 from https://upload.wikimedia.org/wikipedia/commons/thumb/9/9b/The_Four_Chedi_of_Wat_Pho_(II).jpg/1200px-The_Four_Chedi_of_Wat_Pho_(II).jpg
[%] File Downloaded !

[%] Downloading Image #2 from https://upload.wikimedia.org/wikipedia/commons/thumb/8/8d/Bangkok_Wat_Pho_reclining_Buddha.jpg/1200px-Bangkok_Wat_Pho_reclining_Buddha.jpg
[%] File Downloaded !

[%] Downloading Image #3 from https://monsterswithflipflops.files.wordpress.com/2015/07/wat-pho-templ1.jpg
[%] File Downloaded !

[%] Downloading Image #4 from https://upload.wikimedia.org/wikipedia/commons/thumb/6/65/Architectural_Detail_-_Wat_Pho_(12).jpg/1920px-Architectural_Detail_-_Wat_Pho_(12).jpg
[%] File Downloaded !

[%] Downloading Image #5 from http://jontotheworld.com/wp-content/uploads/2017/08/10371268_10202152577478349_5312374420290063942_o.jpg
[%] File Downloaded !

[%] Downloading Image #6 from https://www.checkoutsam.com/wp-content/uploa

In [0]:
downloader.download("wat phra kaew", limit=1000, adult_filter_off=True, force_replace=False)



[!!]Indexing page: 1

[%] Indexed 150 Images on Page 1.


[%] Downloading Image #1 from https://upload.wikimedia.org/wikipedia/commons/thumb/c/c1/Wat_Phra_Kaew_by_Ninara_TSP_edit_crop.jpg/1200px-Wat_Phra_Kaew_by_Ninara_TSP_edit_crop.jpg
[Error]Invalid image, not saving https://upload.wikimedia.org/wikipedia/commons/thumb/c/c1/Wat_Phra_Kaew_by_Ninara_TSP_edit_crop.jpg/1200px-Wat_Phra_Kaew_by_Ninara_TSP_edit_crop.jpg

[!] Issue getting: https://upload.wikimedia.org/wikipedia/commons/thumb/c/c1/Wat_Phra_Kaew_by_Ninara_TSP_edit_crop.jpg/1200px-Wat_Phra_Kaew_by_Ninara_TSP_edit_crop.jpg
[!] Error:: No active exception to reraise
[%] Downloading Image #1 from https://upload.wikimedia.org/wikipedia/commons/7/7f/Grand_Palace_Bangkok.jpg
[%] File Downloaded !

[%] Downloading Image #2 from https://anilblon.files.wordpress.com/2015/06/wat-phra-kaew.jpg?w=1200
[%] File Downloaded !

[%] Downloading Image #3 from https://www.tripsavvy.com/thmb/PXFB3CQxowtb61KqwmItEpRbyDo=/4804x3201/filters:fill(a