# Fashion (MACS 40400: Computation and the Identification of Cultural Patterns)
## Part II: Working with APIs

Today, we're going to attempt to discover if there are any conventional interpretants for Zegna's Uomo cologne. In class on Tuesday, we saw that the product was intended to produce interpretants related to the concepts of masculinity, mastery and sophistication. Today, we are evaluating whether these are the sorts of interpretants that are actually produced in the circles of interpreters outside of individual advertisement that we watched by analyzing the words used in YouTube videos and the things that appear in images related to the Zegna brand more generally on Flickr.

## Ermenegildo Zegna Image Object Detection
### Flickr and Google Cloud API Demonstration

One way we might begin to understand some of the interpretants produced from the Uomo sign vehicle is to identify the objects that are in images and videos related to the Zegna brand as a whole. You might remember that the advertisement we watched on Tuesday begins with the phrase "Ermenegildo Zegna Presents..." and this Zegna name seems to be a key part in the labeling of this product. What meaning does this brand name bestow upon the product? We might begin to understand the interpretant(s) of "Ermenegildo Zegna" by identifying the entities that commonly occur in photos related to the name.

We will use the Flickr API and a pre-trained computer vision algorithm with Google Cloud Vision's API that detects objects in images related to the term "Ermenegildo Zegna" to try to identify general patterns in the way in which photographers on Flickr compose their photos related to the Zegna brand (professional shots of models, as well as from the general public). Note that in this scenario, we are not only using an API to get pre-existing data (Flickr), but also to process data for us (Google Cloud Vision).

It takes a bit too long to run this code as an in-class exercise (and requires getting an API key for the Flickr API, or some other image sharing website, along with a Google Cloud API key), but I'd encourage you to play around with and expand upon this approach on your own and/or work with image/video data in this way in your final projects.

If you want to use [Google Cloud Vision API](https://cloud.google.com/vision/), you will need to sign up for a Google Cloud account, get an API key, as well run the following command on your command line to run the code below:
`pip install google-cloud`. For more information on the approach below, see Google's official [documentation](https://cloud.google.com/vision/docs/labels#vision-label-detection-python).

In [11]:
import pandas as pd
import requests
import flickrapi
from google.cloud import vision
import io

First, let us use the Flickr API to identify images related to the keywords "Ermengildo Zegna." Here, for demonstration purposes, we will collect 500 total images, but we could collect more. We will collect a list of the image URLs.

In [12]:
flickr = flickrapi.FlickrAPI('e851fe41d31ad430a4bdf20ba5d06455', 'ea706dc2e9f33100', cache=True)

keyword = 'Ermenegildo Zegna'
urls=[]
for i, photo in enumerate(flickr.walk(text=keyword, sort='relevance', per_page=500, tags=keyword, extras='url_c')):
    url = photo.get('url_c')
    urls.append(url)
    
    # Get 500 urls:
    if i > 500:
        break
        

Then, we can get each image via the Requests library and use Google Cloud Vision's API to label each image with text of what is in it.

In [13]:
def detect_labels(img_file):
    """Detects labels for an image file input."""
    client = vision.ImageAnnotatorClient()

    image = vision.types.Image(content=img_file)

    response = client.label_detection(image=image)
    labels = response.label_annotations

    return [label.description for label in labels]

In [14]:
labels_flickr = []

for url in urls:
    try:
        response = requests.get(str(url))
        img = response.content
        detected_labels = detect_labels(img)
        labels_flickr.extend(detected_labels)
    
    except:
        pass

Finally, we can see some of the labels most associated with the Ermengildo Zegna images. It appears words like "gentleman," "urban/city/building," "car/vehicle," "male," etc. all appear fairly prominently, reinforcing the "classic," masculine image of the brand we saw in the advertisement and it seems that the photos are generally associated with something of an "urban gentleman" -- a general interpretant of these Ermengildo Zegna images.

In [10]:
pd.DataFrame({'labels':labels_flickr, 
              'frame':range(len(labels_flickr))}
            ).groupby('labels').count().sort_values('frame', ascending=False)[:50]

Unnamed: 0_level_0,frame
labels,Unnamed: 1_level_1


Note that using other image sources (such as Twitter, Instagram, or a company website) might enhance this brief study with additional data about how photographic interpretants for the brand vary within different communities of interpreters.

If we wanted to detect custom entities that are outside the scope of Google's pre-trained algorithms, we could also train our own machine learning algorithms. For instance, in my own work, I've trained classifiers to identify archaeological sites from vast amounts of satellite image data (using Python's OpenCV and Scikit learn packages). This involves feeding a classification algorithm a lot of images with labels of what is in them, so that it can begin to learn for itself how they are used. This is about as deep as we will get into these approaches in this class, but I'd encourage you to explore these approaches in your final projects if you'd like to work with images.

*******

## Importing YouTube Video Captions and Metadata

To identify the interpretants for the Uomo cologne in particular, we will gather data from YouTube videos that review the product. What language do the YouTube influencers use to describe the cologne? How do they generally judge the product? Is it consistent with the argument presented by the Uomo advertisers in the advertisement we watched on Tuesday?

To gather YouTube metadata and captions, we can use the [official YouTube API](https://developers.google.com/youtube/v3/docs/search/list) (for which you can acquire keys through the same portal as Google Cloud Vision). However, I find it significantly easier in Python to use the third-party scraper [youtube_dl](https://github.com/ytdl-org/youtube-dl/blob/master/README.md#readme). The advantage of using `youtube_dl` is that it provides a framework for downloading video data on other streaming sites as well (for instance, Vimeo and TikTok) and returns JSON formatted data in the same way as the official YouTube API. One disadvantage, however, is that it does not return comment data from YouTube. You will need to use the official YouTube API if you want to collect comments for videos.

In [6]:
import youtube_dl
import re

Once we have installed youtube_dl, we can return all metadata related to 'Zegna Uomo' videos as well as any available captions (automatically generated, or otherwise).

In [7]:
ydl_opts = {'dump_single_json': True, 'writeautomaticsub': True, 'subtitleslangs': ['en']}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
    result = ydl.extract_info("ytsearch100:Zegna Uomo", download=False)

[youtube:search] query "Zegna Uomo": Downloading page 1
[youtube:search] query "Zegna Uomo": Downloading page 2
[youtube:search] query "Zegna Uomo": Downloading page 3
[youtube:search] query "Zegna Uomo": Downloading page 4
[youtube:search] query "Zegna Uomo": Downloading page 5
[download] Downloading playlist: Zegna Uomo
[youtube:search] playlist Zegna Uomo: Collected 100 video ids (downloading 100 of them)
[download] Downloading video 1 of 100
[youtube] ow3QAF_aB_M: Downloading webpage
[youtube] ow3QAF_aB_M: Downloading video info webpage
[youtube] ow3QAF_aB_M: Looking for automatic captions
[youtube] ow3QAF_aB_M: Downloading MPD manifest
[download] Downloading video 2 of 100
[youtube] Otqi2zZrRJM: Downloading webpage
[youtube] Otqi2zZrRJM: Downloading video info webpage
[youtube] Otqi2zZrRJM: Looking for automatic captions
[download] Downloading video 3 of 100
[youtube] MibC6tRSBpk: Downloading webpage
[youtube] MibC6tRSBpk: Downloading video info webpage
[youtube] MibC6tRSBpk: Look



The text within the captions is quite messy, so before we load the data into a dataframe, we need to clean out special URLs, special characters, etc.:

In [8]:
tag_re = re.compile(r'<[^>]+>')
link_re = re.compile(r'http\S+|\d+')
def remove_tags(text):
    return tag_re.sub(' ', text)

def remove_links(text):
    return link_re.sub(' ', text)


title, description, tags, captions = ([] for i in range(4))

for i in result['entries']:
    # append each item to a separate list and then add them as a dictionary to dataframe all at once after the loop has run
    title.append(i['title'])
    description.append(remove_links(i['description']).replace('\n',' '))
    tags.append(remove_links(' '.join(i['tags'])))
    if 'en' in i['automatic_captions']:
        auto_captions = requests.get(i['automatic_captions']['en'][0]['url'])
        auto_captions_clean = remove_tags(auto_captions.text).replace('&amp;#39;', "'")         \
                                                             .replace('&lt;font'," ")           \
                                                             .replace('color=&quot;', " ")      \
                                                             .replace('&lt;/font&gt;', " ")     \
                                                             .replace('#CCCCCC&quot;&gt;', " ") \
                                                             .replace('#E5E5E5&quot;&gt;', " ")
        captions.append(auto_captions_clean)
    else:
        captions.append(float('nan'))
uomo_yt_df = pd.DataFrame({'Title': title, 'Description': description, 'Tags': tags, 'Captions': captions})

# uomo_yt_df.to_json('uomo_yt100.json') # save dataframe to JSON

uomo_yt_df.head()

Unnamed: 0,Title,Description,Tags,Captions
0,FIRST IMPRESSION! | UOMO by Ermenegildo Zegna,HELLO EVERYONE! Today at StudioScents I'm pre...,Fragrancepreview UOMO ErmenegildoZegna Firstim...,hi and welcome back to studio since I'm To...
1,Uomo Ermenegildo Zegna for Men (2013),a fragrance i TRULY love Frarantica Link Fo...,zegna uomo fragrance review mens all year roun...,hey guys how are y'all doing I've been mea...
2,Uomo by Ermenegildo Zegna - A Quicky Review,Fragrantica Profile: Zegna Website:,uomo Ermenegildo Zegna (Organization) fresh be...,come to another fragrance quickie and ...
3,Ermenegildo Zegna Uomo (Review) 🔥🔥Cheapie sexi...,,,hey everybody welcome back good evening to...
4,(Uomo by Ermenegildo Zegna) Maximilian Must Kn...,/ Maximilian's Rating System / = Fake an ...,Maximilian Must Know Maximilian Heusler Ermene...,[Music] good what is up welcome to another...


After cleaning, we have clean text for the Title, Description, Tags, and Captions for each YouTube video related to the Zegna Uomo cologne and we've saved our data in a JSON format so that we can easily reload the data in its current form as a Pandas DataFrame. Now, we're ready for text analysis! Let's move on to Part III.