## Finding Images and using WordNet

This notebook will demonstrate loading images from google image search and using WordNet to find similar words.

In [1]:
import compsyn
import os

The ```compsyn.helperfunctions``` file contains helper functions to download files and to use NLTKs wordnet to find extra search terms.

In [2]:
from compsyn.helperfunctions import settings, search_and_download, run_google_vision, write_img_classifications_to_file

In [3]:
from compsyn.wordnet_functions import get_wordnet_tree_data

### Settings

In [4]:
GOOGLE_APPLICATION_CREDENTIALS = "compsyn3-8cf6580619a9.json"

In [5]:
DRIVER_PATH = "/usr/local/bin/chromedriver"

In [6]:
settings(
    application_cred_name = GOOGLE_APPLICATION_CREDENTIALS, 
    driver_browser = "Chrome",
    driver_executable_path = DRIVER_PATH,
    driver_options = ["--headless"]
)

In [7]:
number_images = 100 
search_terms = ['emotion']
filter_data = True
get_tree_data = True

### Get WordNet data

In [8]:
home = os.getcwd()

In [9]:
n_categories = 10

In [10]:
if get_tree_data: 
    print("Adding Search Terms from Tree")
    tree_search_terms, raw_tree, all_tree_data = get_wordnet_tree_data(search_terms, home)
    search_terms = tree_search_terms[:n_categories]
    print(all_tree_data.head())

Adding Search Terms from Tree
  ref_term                        new_term     role  \
0  emotion                           anger  hyponym   
1  emotion                         anxiety  hyponym   
2  emotion  conditioned_emotional_response  hyponym   
3  emotion                 emotional_state  hyponym   
4  emotion                            fear  hyponym   

                                          synset Branch_fact Num_senses  
0                           Synset('anger.n.01')          19          5  
1                         Synset('anxiety.n.02')           6          2  
2  Synset('conditioned_emotional_response.n.01')           1          1  
3                 Synset('emotional_state.n.01')          16          1  
4                            Synset('fear.n.01')          25          8  


In [11]:
tree_search_terms

['emotion',
 'joy',
 'feeling',
 'anxiety',
 'conditioned emotional response',
 'fear',
 'love',
 'hate',
 'emotional state',
 'anger']

In [12]:
search_terms

['emotion',
 'joy',
 'feeling',
 'anxiety',
 'conditioned emotional response',
 'fear',
 'love',
 'hate',
 'emotional state',
 'anger']

We might want to remove the elements from the list which won't have clear images.

In [13]:
search_terms.pop(4)

'conditioned emotional response'

In [14]:
search_terms.pop(-2)

'emotional state'

### Download Images

In [15]:
img_urls_dict = {}
for search_term in search_terms:
    print(search_term)
    urls = search_and_download(
        search_term = search_term, 
        driver_browser = "Chrome",
        driver_executable_path = DRIVER_PATH, 
        driver_options = ["--headless"],
        home = home, 
        number_images = number_images
    )
    img_urls_dict[search_term] = urls

emotion
[2020-05-18 01:13:36,405] (compsyn.fetch_image_urls)  INFO: Found: 200 search results. Extracting links from 0:200
[2020-05-18 01:14:14,527] (compsyn.fetch_image_urls)  INFO: Found: 100 image links, done!
[2020-05-18 01:14:32,211] (compsyn.save_image)  ERROR: Could not save image to disk: cannot identify image file <_io.BytesIO object at 0x1468fd040>
[2020-05-18 01:14:33,083] (compsyn.save_image)  ERROR: Could not save image to disk: cannot identify image file <_io.BytesIO object at 0x14b2c2770>
joy
[2020-05-18 01:14:37,831] (compsyn.fetch_image_urls)  INFO: Found: 200 search results. Extracting links from 0:200
[2020-05-18 01:15:20,334] (compsyn.fetch_image_urls)  INFO: Found: 100 image links, done!




[2020-05-18 01:15:31,247] (compsyn.save_image)  ERROR: Could not save image to disk: cannot identify image file <_io.BytesIO object at 0x14b2ce310>
[2020-05-18 01:15:34,737] (compsyn.save_image)  ERROR: Could not save image to disk: cannot identify image file <_io.BytesIO object at 0x14b2ce310>
[2020-05-18 01:15:35,895] (compsyn.save_image)  ERROR: Could not save image to disk: cannot identify image file <_io.BytesIO object at 0x14b2ce310>
feeling
[2020-05-18 01:15:40,580] (compsyn.fetch_image_urls)  INFO: Found: 200 search results. Extracting links from 0:200
[2020-05-18 01:16:19,258] (compsyn.fetch_image_urls)  INFO: Found: 100 image links, done!
[2020-05-18 01:16:22,517] (compsyn.save_image)  ERROR: Could not save image to disk: cannot identify image file <_io.BytesIO object at 0x14b2d4130>
[2020-05-18 01:16:24,722] (compsyn.save_image)  ERROR: Could not save image to disk: cannot identify image file <_io.BytesIO object at 0x14b2d4130>
[2020-05-18 01:16:26,108] (compsyn.save_image) 

### Run Google Vision Filter 

In [None]:
if filter_data: 
    img_classified_dict = run_google_vision(img_urls_dict)
    write_img_classifications_to_file(home, search_terms, img_classified_dict)

[2020-05-18 01:22:41,666] (compsyn.run_google_vision)  INFO: Classifying Imgs. w. Google Vision API...


You should now have the top 100 images of each of the elements of 'search_term' saved on your machine: you can now run the analysis presented in the ```compsyn_package_pipeline```.