## Finding Images and using WordNet

This notebook will demonstrate loading images from google image search and using WordNet to find similar words.


The ```compsyn.helperfunctions``` file contains helper functions to download files and to use NLTKs wordnet to find extra search terms.

In [1]:
import os
from compsyn.helperfunctions import (
    search_and_download, 
    run_google_vision, 
    write_img_classifications_to_file
)
from compsyn.wordnet_functions import get_wordnet_tree_data
from compsyn.vectors import WordToColorVector
from compsyn.trial import get_trial

### Settings

In [2]:
from compsyn.config import CompsynConfig
COMPSYN_ROOT_DIR="/Volumes/LACIE/compsyn" # change to a path on your local system where you store compsyn files
config = CompsynConfig(
    experiment_name="wordnet-example-0",
    trial_id="notebook",
    hostname="topside",
    work_dir=f"{COMPSYN_ROOT_DIR}/notebook_work_dir",
    jzazbz_array=f"{COMPSYN_ROOT_DIR}/jzazbz_array.npy",
    google_application_credentials=f"{COMPSYN_ROOT_DIR}/compsyn3-8cf6580619a9.json",
    driver_path="/usr/local/bin/geckodriver",
    driver_browser="Firefox",
)
trial = get_trial()
print("\n", config)
print("\n", trial)

[1616459788] (compsyn.Trial)  INFO: work_dir: /Volumes/LACIE/compsyn/notebook_work_dir
[1616459788] (compsyn.Trial)  INFO: experiment: wordnet-example-0
[1616459788] (compsyn.Trial)  INFO: trial_id: notebook
[1616459788] (compsyn.Trial)  INFO: hostname: topside

 CompsynConfig
	experiment_name                          = wordnet-example-0
	trial_id                                 = notebook
	hostname                                 = topside
	trial_timestamp                          = 2021-03-23
	work_dir                                 = /Volumes/LACIE/compsyn/notebook_work_dir
	jzazbz_array                             = /Volumes/LACIE/compsyn/jzazbz_array.npy
	google_application_credentials           = /Volumes/LACIE/compsyn/compsyn3-8cf6580619a9.json
	driver_browser                           = Firefox
	driver_path                              = /usr/local/bin/geckodriver
	s3_bucket                                = None
	s3_region_name                           = None
	s3_endpoint_url

In [3]:
number_images = 100 
search_terms = ['emotion']
filter_data = True
get_tree_data = True

### Get WordNet data

In [4]:
n_categories = 10

In [5]:
if get_tree_data: 
    print("Adding Search Terms from Tree")
    tree_search_terms, raw_tree, all_tree_data = get_wordnet_tree_data(
        search_terms=search_terms, 
        home=CompsynConfig().config["work_dir"]
    )
    search_terms = tree_search_terms[:n_categories]
    print(all_tree_data.head())

Adding Search Terms from Tree
  ref_term                        new_term     role  \
0  emotion                           anger  hyponym   
1  emotion                         anxiety  hyponym   
2  emotion  conditioned_emotional_response  hyponym   
3  emotion                 emotional_state  hyponym   
4  emotion                            fear  hyponym   

                                          synset Branch_fact Num_senses  
0                           Synset('anger.n.01')          19          5  
1                         Synset('anxiety.n.02')           6          2  
2  Synset('conditioned_emotional_response.n.01')           1          1  
3                 Synset('emotional_state.n.01')          16          1  
4                            Synset('fear.n.01')          25          8  


In [6]:
tree_search_terms

['emotion',
 'emotional state',
 'conditioned emotional response',
 'anxiety',
 'anger',
 'joy',
 'fear',
 'love',
 'hate',
 'feeling']

In [7]:
search_terms

['emotion',
 'emotional state',
 'conditioned emotional response',
 'anxiety',
 'anger',
 'joy',
 'fear',
 'love',
 'hate',
 'feeling']

We might want to remove the elements from the list which won't have clear images.

In [8]:
print("drop", f"'{search_terms.pop(1)}'")
print("drop", f"'{search_terms.pop(1)}'")

drop 'emotional state'
drop 'conditioned emotional response'


### Download Images

In [None]:
img_urls_dict = {}

# takes about 12 minutes
for search_term in search_terms:
    w2cv = WordToColorVector(label=search_term, trial=trial)
    w2cv.run_image_capture()
    # This logic makes this image capture resumable
    if w2cv.raw_image_urls is None:
        w2cv.load()
    else:
        w2cv.save()
    img_urls_dict[search_term] = w2cv.raw_image_urls

[1616459837] (compsyn.fetch_image_urls)  INFO: 'emotion': 100 search results. Extracting links from 0:100
[1616459919] (compsyn.search_and_download)  INFO: 94/100 images successfully downloaded for 'emotion'
[1616459919] (compsyn.WordToColorVector.emotion)  INFO: saved 10.6KiB pickle to /Volumes/LACIE/compsyn/notebook_work_dir/wordnet-example-0/vectors/unnamed/emotion/w2cv.pickle
[1616459925] (compsyn.fetch_image_urls)  INFO: 'anxiety': 100 search results. Extracting links from 0:100
[1616460009] (compsyn.search_and_download)  INFO: 94/100 images successfully downloaded for 'anxiety'
[1616460009] (compsyn.WordToColorVector.anxiety)  INFO: saved 11.3KiB pickle to /Volumes/LACIE/compsyn/notebook_work_dir/wordnet-example-0/vectors/unnamed/anxiety/w2cv.pickle
[1616460015] (compsyn.fetch_image_urls)  INFO: 'anger': 100 search results. Extracting links from 0:100
[1616460099] (compsyn.search_and_download)  INFO: 96/100 images successfully downloaded for 'anger'
[1616460099] (compsyn.WordToCo

### Run Google Vision Filter 

In [14]:
print(img_urls_dict.keys())
if filter_data: 
    for search_term in img_urls_dict.keys():
        img_urls = img_urls_dict[search_term]
        if img_urls is None:
            print(search_term)
    img_classified_dict = run_google_vision(img_urls_dict)
    write_img_classifications_to_file(home, search_terms, img_classified_dict)
    

    

dict_keys(['emotion', 'anger', 'love', 'feeling', 'joy', 'anxiety', 'hate', 'fear'])
emotion
anger
love
feeling
joy
hate
fear
[1616458327] (compsyn.run_google_vision)  INFO: Classifying Imgs. w. Google Vision API...


TypeError: 'NoneType' object is not iterable

You should now have the top 100 images of each of the elements of 'search_term' saved on your machine: you can now run the analysis presented in the ```compsyn_package_pipeline```.