## Benchmark

*This notebook implements a variety of algorithms, and checks to see how they work on a set of test images.*

**Flickr API**

*Import flickr functionality and record credentials*

In [1]:
import flickrapi
import json
import pprint
import pandas as pd
from textblob import TextBlob
import io
from google.cloud import vision
from google.cloud.vision import types
from PIL import Image, ImageDraw
import os

#pp = pprint.PrettyPrinter(indent=4)

### Establish connections to Flickr and Google

Establish Flickr connection

In [2]:
api_key = u'37528c980c419716e0879a417ef8211c'
api_secret = u'41075654a535c203'

# establish connection
flickr = flickrapi.FlickrAPI(api_key, api_secret, format='parsed-json')

Establish Google connection

In [3]:
#os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = \
#"/Users/ctoews/Documents/Insight/Project/googleAPI/MyFirstProject-76680dcd1ad6.json"

def explicit():
    from google.cloud import storage

    # Explicitly use service account credentials by specifying the private key
    # file.
    storage_client = storage.Client.from_service_account_json(
        '/Users/ctoews/Documents/Insight/Project/googleAPI/MyFirstProject-76680dcd1ad6.json')

    # Make an authenticated API request
    buckets = list(storage_client.list_buckets())
    print(buckets)
    
explicit()

[<Bucket: toews-images>]


### Get data

In [4]:


def assemble_urls(photoset):
    urls = []
    for photo in photoset['photoset']['photo']:
        url = "https://farm" + str(photo['farm']) + ".staticflickr.com/" + photo['server'] + "/" + \
              photo['id'] + "_" + photo['secret'] + ".jpg"
        urls.append(url)    
    return urls
    
# get bad photo ids
badset   = flickr.photosets.getPhotos(user_id='138072685@N02',photoset_id='72157690932631201')
goodset   = flickr.photosets.getPhotos(user_id='138072685@N02',photoset_id='72157690932695551')
bad_urls = assemble_urls(badset)
good_urls = assemble_urls(goodset)

**Google API**

*Import Google functionality*

*Authenticate*

In [11]:
from google.cloud import storage

# Explicitly use service account credentials by specifying the private key
# file.
storage_client = storage.Client.from_service_account_json(
    '/Users/ctoews/Documents/Insight/Project/googleAPI/MyFirstProject-76680dcd1ad6.json')

# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)

[<Bucket: toews-images>]


*Pass photo URLs to Google Vision for labelling*

In [12]:
client = vision.ImageAnnotatorClient()
image = types.Image()

In [9]:


bad_labels = []
for url in bad_urls:
    image.source.image_uri = url
    response = client.label_detection(image=image)
    labels = response.label_annotations
    these_labels = ''
    for label in labels:
        these_labels += (label.description + ' ')
    bad_labels.append(these_labels)
    
good_labels = []
for url in good_urls:
    image.source.image_uri = url
    response = client.label_detection(image=image)
    labels = response.label_annotations
    these_labels = ''
    for label in labels:
        these_labels += (label.description + ' ')
    good_labels.append(these_labels)
    
bl = pd.DataFrame(bad_labels,columns=['labels'])
gl = pd.DataFrame(good_labels,columns=['labels'])

In [10]:
good_labels

['people child skin facial expression person day infant sitting smile fun ',
 'gown photograph marriage bride wedding dress dress wedding bridal clothing ceremony standing ',
 'people dog breed fun grass family child play daughter father stock photography ',
 'hair woman green tree girl beauty emotion smile eye hairstyle ',
 'people child skin smile emotion fun girl infant interaction happiness ',
 'photograph black person black and white tree girl emotion woody plant photography lady ',
 "red light heart macro photography close up love computer wallpaper petal valentine's day heart ",
 'text font sky water grass stock photography number computer wallpaper ',
 'facial expression child emotion fun girl grass smile hand shout happiness ',
 'flower rose pink rose family cut flowers garden roses purple flower bouquet floristry flower arranging ']

In [27]:
good_urls

['https://farm5.staticflickr.com/4623/39834715572_1559b597ec.jpg',
 'https://farm5.staticflickr.com/4605/39834715692_e499c7d71f.jpg',
 'https://farm5.staticflickr.com/4630/39834715602_3314a7eaf4.jpg',
 'https://farm5.staticflickr.com/4653/39834716592_efe5420940.jpg',
 'https://farm5.staticflickr.com/4674/39834715812_c9b8157bc5.jpg',
 'https://farm5.staticflickr.com/4708/39834715942_d993de82f6.jpg',
 'https://farm5.staticflickr.com/4673/39834716042_ae01ea0ceb.jpg',
 'https://farm5.staticflickr.com/4699/39834716362_1c539bed39.jpg',
 'https://farm5.staticflickr.com/4611/39834716422_36a95d3667.jpg',
 'https://farm5.staticflickr.com/4723/39834716482_00c2ce1e07.jpg']

In [23]:
good_sentiment=[]
for i in np.arange(10):
    doc = TextBlob(good_labels[i])
    #print(doc.sentiment[0])
    good_sentiment.append(doc.sentiment)
    
bad_sentiment=[]
for i in np.arange(10):
    doc = TextBlob(bad_labels[i])
    #print(doc.sentiment[0])
    bad_sentiment.append(doc.sentiment)

In [25]:
np.mean(bad_sentiment)

0.11075396825396826

In [213]:
all_labels=pd.concat([bl,gl])
for i in all_labels['labels']:
    print(i)

facial expression man shout emotion smile hand mouth human tooth aggression 
black photograph black and white eyewear sunglasses monochrome photography vision care photography sky glasses 
water sky atmosphere phenomenon sea underwater silhouette darkness calm computer wallpaper 
underwater phenomenon computer wallpaper visual effects darkness organism special effects human marine biology cg artwork 
mammal nose drawing art arm watercolor paint organ figure drawing wrinkle artwork 
face person black black and white nose eye cheek monochrome photography beauty eyebrow 
person portrait black and white monochrome photography art gentleman monochrome facial hair self portrait visual arts 
tree woody plant forest woodland wilderness plant phenomenon grass jungle darkness 
nose eye close up forehead human eyelash mouth organism organ snout 
black and white monochrome photography photography tree monochrome stock photography sky landscape 
people child skin facial expression person day infant

**Match to poems**

In [78]:
import pandas as pd
import spacy
import pickle
import poeml_utility as pml

parser = spacy.load('en')

In [59]:
parser = spacy.load('en')
allvecs = pd.read_pickle('allvecs.pkl')
with open('sharespeares_stopwords.pkl','rb') as file:
    shakespeares_stopwords = pickle.load(file)

In [83]:
from collections import Counter, OrderedDict
from nltk.corpus import stopwords
from nltk import SnowballStemmer
import string
# A custom stoplist
STOPLIST = set(stopwords.words('english') + list(shakespeares_stopwords))
# List of symbols we don't care about
SYMBOLS = " ".join(string.punctuation).split(" ") + \
          ["-----", "---", "...", "“", "”", "'s"]

In [84]:
# strip blanks and other terrible things
data = all_labels['labels']
data_clean=[]
for label in data:
    data_clean.append(pml.cleanText(label))

In [85]:
# and convert to lemmas
def tokenizeText(sample):

    # get the tokens using spaCy
    tokens = parser(sample)

    # lemmatize
    lemmas = []
    for tok in tokens:
        lemmas.append(tok.lemma_.lower().strip() 
                      if tok.lemma_ != "-PRON-" else tok.lower_)
    tokens = lemmas

    # stoplist the tokens
    tokens = [tok for tok in tokens if tok not in STOPLIST]

    # stoplist symbols
    tokens = [tok for tok in tokens if tok not in SYMBOLS]

    # remove large strings of whitespace
    while "" in tokens:
        tokens.remove("")
    while " " in tokens:
        tokens.remove(" ")
    while "\n" in tokens:
        tokens.remove("\n")
    while "\n\n" in tokens:
        tokens.remove("\n\n")
    
    return tokens


# tokenize
label_token = []
for label in data_clean:
    label_token.append(tokenizeText(label))

In [90]:
# recombine
input_label = []
for label in label_token:
    input_label.append(' '.join(label))

In [55]:
import sqlalchemy # pandas-mysql interface library
import sqlalchemy.exc # exception handling
import poeml_utility as pml
from sklearn.metrics.pairwise import euclidean_distances, cosine_distances, cosine_similarity
from sklearn import preprocessing
from sklearn.preprocessing import normalize

engine = pml.connect_db()

In [215]:
#parse
parsed_labels = []
for label in all_labels['labels']:
    parsed_labels.append(parser(label))

In [216]:
parsed_labels

[facial expression man shout emotion smile hand mouth human tooth aggression ,
 black photograph black and white eyewear sunglasses monochrome photography vision care photography sky glasses ,
 water sky atmosphere phenomenon sea underwater silhouette darkness calm computer wallpaper ,
 underwater phenomenon computer wallpaper visual effects darkness organism special effects human marine biology cg artwork ,
 mammal nose drawing art arm watercolor paint organ figure drawing wrinkle artwork ,
 face person black black and white nose eye cheek monochrome photography beauty eyebrow ,
 person portrait black and white monochrome photography art gentleman monochrome facial hair self portrait visual arts ,
 tree woody plant forest woodland wilderness plant phenomenon grass jungle darkness ,
 nose eye close up forehead human eyelash mouth organism organ snout ,
 black and white monochrome photography photography tree monochrome stock photography sky landscape ,
 people child skin facial express

In [217]:
# calculate the embeddings for the picture labels

good_pics_vecs = np.zeros((10,384))
for i in np.arange(10):
    good_pics_vecs[i,:] = parser(str(parsed_labels[i])).vector
    
bad_pics_vecs = np.zeros((10,384))
for i in np.arange(10):
    bad_pics_vecs[i,:] = parser(str(parsed_labels[10+i])).vector

In [54]:
query = "select * from sonnet_sentences order by index;"
sonnet_sentences = pd.read_sql(query,engine)
len(sonnet_sentences)

NameError: name 'engine' is not defined

In [253]:
query = "select * from poem_embeddings order by index;"
poem_embeddings = pd.read_sql(query,engine)
poem_embeddings.shape

(637, 385)

In [218]:
# identify test cases
bad_idx = 282
good_idx= 35

# extract relevant embeddings
bad_vec = poem_embeddings.iloc[bad_idx,1:]
good_vec = poem_embeddings.iloc[good_idx,1:]

# check
print("bad: \n",sonnet_sentences.iloc[bad_idx,2])
print("good: \n",sonnet_sentences.iloc[good_idx,2])

bad: 
good: 
  Mark how one string, sweet husband to another, Strikes each in each by mutual ordering; Resembling sire and child and happy mother, Who, all in one, one pleasing note do sing


In [235]:
bb = cosine_distances(bad_vec.values.reshape((1,-1)), bad_pics_vecs).flatten()
bg = cosine_distances(bad_vec.values.reshape((1,-1)), good_pics_vecs).flatten()
gb = cosine_distances(good_vec.values.reshape((1,-1)), bad_pics_vecs).flatten()
gg = cosine_distances(good_vec.values.reshape((1,-1)), good_pics_vecs).flatten()
test_results = pd.DataFrame(data={'bb':bb,'bg':bg,'gb':gb,'gg':gg})

In [236]:
print("bad poem: \n",np.sign(test_results['bb']-test_results['bg']))
print("good poem: \n",np.sign(test_results['gg']-test_results['gb']))

bad poem: 
 0    1.0
1   -1.0
2   -1.0
3   -1.0
4    1.0
5   -1.0
6    1.0
7    1.0
8   -1.0
9   -1.0
dtype: float64
good poem: 
 0    1.0
1    1.0
2   -1.0
3    1.0
4   -1.0
5    1.0
6   -1.0
7   -1.0
8    1.0
9   -1.0
dtype: float64


In [233]:
bb = euclidean_distances(bad_vec.values.reshape((1,-1)), bad_pics_vecs).flatten()
bg = euclidean_distances(bad_vec.values.reshape((1,-1)), good_pics_vecs).flatten()
gb = euclidean_distances(good_vec.values.reshape((1,-1)), bad_pics_vecs).flatten()
gg = euclidean_distances(good_vec.values.reshape((1,-1)), good_pics_vecs).flatten()
test_results = pd.DataFrame(data={'bb':bb,'bg':bg,'gb':gb,'gg':gg})

In [234]:
print("bad poem: \n",np.sign(test_results['bb']-test_results['bg']))
print("good poem: \n",np.sign(test_results['gg']-test_results['gb']))

bad poem: 
 0    1.0
1   -1.0
2   -1.0
3   -1.0
4    1.0
5   -1.0
6    1.0
7    1.0
8   -1.0
9   -1.0
dtype: float64
good poem: 
 0    1.0
1    1.0
2   -1.0
3    1.0
4   -1.0
5    1.0
6   -1.0
7   -1.0
8    1.0
9   -1.0
dtype: float64


In [289]:
bg

array([ 0.21395871,  0.24607571,  0.27247986,  0.27633767,  0.19856011,
        0.23463302,  0.20448573,  0.22228259,  0.26371453,  0.28046177])

In [232]:
bad_pics_vecs = normalize(bad_pics_vecs,axis=1)
good_pics_vecs = normalize(good_pics_vecs,axis=1)

In [259]:
del poem_embeddings['index']

In [263]:
dists=cosine_distances(good_vec.reshape((1,-1)),poem_embeddings)

  """Entry point for launching an IPython kernel.


In [281]:
idx=np.argsort(dists)
dists[0,idx[0][0:5]]

array([ 0.        ,  0.07420846,  0.07768135,  0.08020837,  0.08229216])

In [283]:
dists[0,idx[0][0:5]]

array([ 0.        ,  0.07420846,  0.07768135,  0.08020837,  0.08229216])

In [69]:
parser

NameError: name 'parser' is not defined

In [286]:
x=parser("god")

In [287]:
x.vector

array([ -1.25227785e+00,   1.52718484e+00,   4.65477228e+00,
         6.29601181e-02,   2.26495790e+00,   1.77047956e+00,
        -3.30426097e+00,  -1.19847560e+00,   2.17711973e+00,
         8.89403045e-01,   1.04516804e+00,  -8.57500732e-02,
         1.24552751e+00,  -1.40893722e+00,  -9.38360512e-01,
        -8.59737396e-04,   1.99387980e+00,  -8.09948146e-01,
        -1.24704778e+00,   2.89700925e-01,   7.48762131e-01,
         1.99815631e+00,   1.24085808e+00,  -1.16656578e+00,
        -1.72519588e+00,  -2.48196587e-01,  -1.19411647e+00,
        -3.71114016e-02,  -6.74983263e-01,  -3.04378915e+00,
         4.20348644e-01,   1.25865352e+00,  -1.05744219e+00,
        -5.70542455e-01,   1.27632844e+00,   3.28767121e-01,
        -1.58082557e+00,   6.64026320e-01,  -5.24787545e-01,
        -1.00528240e-01,  -3.21371436e-01,   3.61640573e+00,
         1.50564826e+00,  -1.31605113e+00,  -3.80385542e+00,
        -1.15620434e+00,  -2.17934585e+00,  -1.26014113e+00,
         5.05967522e+00,

### Play with new testset

In [2]:
test_images_url = "https://www.flickr.com/photos/138072685@N02/albums"

In [5]:
test_flickr   = flickr.photosets.getPhotos(user_id='138072685@N02',photoset_id='72157669045554809')


In [6]:
test_urls = assemble_urls(test_flickr)


In [13]:
test_labels = []
for url in test_urls:
    image.source.image_uri = url
    response = client.label_detection(image=image)
    labels = response.label_annotations
    these_labels = ''
    for label in labels:
        these_labels += (label.description + ' ')
    test_labels.append(these_labels)

In [14]:
test_labels

['plant tree leaf rainforest girl jungle branch fun forest garden ',
 'sea sky horizon ocean water calm shore wind wave wave phenomenon ',
 'sea sky horizon ocean cloud calm water shore daytime coastal and oceanic landforms ',
 'meal fun food cuisine lunch temple event friendship girl dinner ',
 'meal food cuisine dish brunch buffet thanksgiving dinner street food breakfast meat ',
 'meal food cuisine brunch lunch dish supper restaurant breakfast table ',
 'dog dog breed snout dog breed group fur child companion dog puppy puppy love ',
 'dog dog breed dog like mammal puppy child snout puppy love carnivoran companion dog dog clothes ',
 'dog dog like mammal dog breed mammal dog breed group german spitz grass samoyed german spitz mittel snout ',
 'sea beach sky water ocean body of water shore horizon vacation fun ',
 'beach body of water vacation fun sea sand tourism leisure shore summer ',
 'photograph sea sky body of water beach vacation fun tourism photography ocean ',
 'flower sunflo

In [57]:
query = "select * from quotes;"
quotes = pd.read_sql(query,engine)
quotes.quoteText

0       Genius is one percent inspiration and ninety-n...
1                 You can observe a lot just by watching.
2            A house divided against itself cannot stand.
3       Difficulties increase the nearer we get to the...
4                  Fate is in your hands and no one elses
5                        Be the chief but never the lord.
6                  Nothing happens unless first we dream.
7                                Well begun is half done.
8       Life is a learning experience, only if you learn.
9                  Self-complacency is fatal to progress.
10       Peace comes from within. Do not seek it without.
11                         What you give is what you get.
12                   We can only learn to love by loving.
13      Life is change. Growth is optional. Choose wis...
14                     You'll see it when you believe it.
15      Today is the tomorrow we worried about yesterday.
16      It's easier to see the mistakes on someone els...
17            

In [60]:
quotevecs = pd.read_pickle('quote_vecs.pkl')

In [64]:
quotes.loc[quotes.quoteText.str.contains('sun'),:]

Unnamed: 0,index,quoteAuthor,quoteText
83,83,,It takes both sunshine and rain to make a rain...
203,203,Helen Keller,Keep yourself to the sunshine and you cannot s...
341,341,John Lennon,"Yeah we all shine on, like the moon, and the s..."
368,368,Buddha,"Three things cannot be long hidden: the sun, t..."
385,385,Morris West,If you spend your whole life waiting for the s...
753,751,Anatole France,It is better to understand a little than to mi...
798,798,Mark Twain,"Happiness is a Swedish sunset, it is there for..."
859,858,Albert Schweitzer,Constant kindness can accomplish much. As the ...
1078,1078,Ralph Waldo Emerson,Most of the shadows of life are caused by stan...
1220,1220,Ralph Emerson,To be great is to be misunderstood.


In [80]:
q1 = quotes.iloc[1362,:]
q2 = quotes.iloc[987,:]
q1v = parser(q1.quoteText).vector
q2v = parser(q2.quoteText).vector

In [72]:
import spacy
parser = spacy.load('en')

In [74]:
image_text = test_labels[-1]
image_text
image_vector = parser(image_text)
image_vector=image_vector.vector

In [75]:
from sklearn.metrics.pairwise import euclidean_distances, cosine_distances, cosine_similarity


In [84]:
cosine_similarity(image_vector.reshape(1,-1),q2v.reshape(1,-1))

array([[0.16519928]], dtype=float32)

In [91]:
np.array(q1v.reshape(1,-1),q2v.reshape(1,-1)).shape

TypeError: data type not understood

In [90]:
q1v.reshape(1,-1).shape

(1, 384)