# Fashion (MACS 40400: Computation and the Identification of Cultural Patterns)
## Part II: Working with APIs

Today, we're going to attempt to discover if there are any conventional interpretants for Zegna's Uomo cologne. In class on Tuesday, we saw that the product was intended to produce interpretants related to the concepts of masculinity, mastery and sophistication. Today, we are evaluating whether these are the sorts of interpretants that are actually produced in the circles of interpreters outside of individual advertisement that we watched by analyzing the words used in YouTube videos and the things that appear in images related to the Zegna brand more generally on Flickr.

## Ermenegildo Zegna Image Object Detection
### Flickr and Google Cloud API Demonstration

One way we might begin to understand some of the interpretants produced from the Uomo sign vehicle is to identify the objects that are in images and videos related to the Zegna brand as a whole. You might remember that the advertisement we watched on Tuesday begins with the phrase "Ermenegildo Zegna Presents..." and this Zegna name seems to be a key part in the labeling of this product. What meaning does this brand name bestow upon the product? We might begin to understand the interpretant(s) of "Ermenegildo Zegna" by identifying the entities that commonly occur in photos related to the name.

We will use the Flickr API and a pre-trained computer vision algorithm with Google Cloud Vision's API that detects objects in images related to the term "Ermenegildo Zegna" to try to identify general patterns in the way in which photographers on Flickr compose their photos related to the Zegna brand (professional shots of models, as well as from the general public). Note that in this scenario, we are not only using an API to get pre-existing data (Flickr), but also to process data for us (Google Cloud Vision).

It takes a bit too long to run this code as an in-class exercise (and requires getting an API key for the Flickr API, or some other image sharing website, along with a Google Cloud API key), but I'd encourage you to play around with and expand upon this approach on your own and/or work with image/video data in this way in your final projects.

If you want to use [Google Cloud Vision API](https://cloud.google.com/vision/), you will need to sign up for a Google Cloud account, get an API key, as well run the following command on your command line to run the code below:
`pip install google-cloud`. For more information on the approach below, see Google's official [documentation](https://cloud.google.com/vision/docs/labels#vision-label-detection-python).

In [50]:
import pandas as pd
import requests
import flickrapi
from google.cloud import vision
import io

First, let us use the Flickr API to identify images related to the keywords "Ermengildo Zegna." Here, for demonstration purposes, we will collect 500 total images, but we could collect more. We will collect a list of the image URLs.

In [4]:
flickr = flickrapi.FlickrAPI('##############################', '#######################', cache=True)

keyword = 'Ermenegildo Zegna'
urls=[]
for i, photo in enumerate(flickr.walk(text=keyword, sort='relevance', per_page=500, tags=keyword, extras='url_c')):
    url = photo.get('url_c')
    urls.append(url)
    
    # Get 500 urls:
    if i > 500:
        break

Then, we can get each image via the Requests library and use Google Cloud Vision's API to label each image with text of what is in it.

In [19]:
def detect_labels(img_file):
    """Detects labels for an image file input."""
    client = vision.ImageAnnotatorClient()

    image = vision.types.Image(content=img_file)

    response = client.label_detection(image=image)
    labels = response.label_annotations

    return [label.description for label in labels]

In [20]:
labels_flickr = []

for url in urls:
    try:
        response = requests.get(str(url))
        img = response.content
        detected_labels = detect_labels(img)
        labels_flickr.extend(detected_labels)
    
    except:
        pass

Finally, we can see some of the labels most associated with the Ermengildo Zegna images. It appears words like "gentleman," "urban/city/building," "car/vehicle," "male," etc. all appear fairly prominently, reinforcing the "classic," masculine image of the brand we saw in the advertisement and it seems that the photos are generally associated with something of an "urban gentleman" -- a general interpretant of these Ermengildo Zegna images.

In [24]:
pd.DataFrame({'labels':labels_flickr, 
              'frame':range(len(labels_flickr))}
            ).groupby('labels').count().sort_values('frame', ascending=False)[:50]

Unnamed: 0_level_0,frame
labels,Unnamed: 1_level_1
Architecture,75
Building,72
Outerwear,44
Suit,36
Formal wear,34
City,34
Gentleman,33
Facade,33
Photography,32
Clothing,29


Note that using other image sources (such as Twitter, Instagram, or a company website) might enhance this brief study with additional data about how photographic interpretants for the brand vary within different communities of interpreters.

If we wanted to detect custom entities that are outside the scope of Google's pre-trained algorithms, we could also train our own machine learning algorithms. For instance, in my own work, I've trained classifiers to identify archaeological sites from vast amounts of satellite image data (using Python's OpenCV and Scikit learn packages). This involves feeding a classification algorithm a lot of images with labels of what is in them, so that it can begin to learn for itself how they are used. This is about as deep as we will get into these approaches in this class, but I'd encourage you to explore these approaches in your final projects if you'd like to work with images.

*******

## Importing YouTube Video Captions and Metadata

To identify the interpretants for the Uomo cologne in particular, we will gather data from YouTube videos that review the product. What language do the YouTube influencers use to describe the cologne? How do they generally judge the product? Is it consistent with the argument presented by the Uomo advertisers in the advertisement we watched on Tuesday?

To gather YouTube metadata and captions, we can use the [official YouTube API](https://developers.google.com/youtube/v3/docs/search/list) (for which you can acquire keys through the same portal as Google Cloud Vision). However, I find it significantly easier in Python to use the third-party scraper [youtube_dl](https://github.com/ytdl-org/youtube-dl/blob/master/README.md#readme). The advantage of using `youtube_dl` is that it provides a framework for downloading video data on other streaming sites as well (for instance, Vimeo and TikTok) and returns JSON formatted data in the same way as the official YouTube API. One disadvantage, however, is that it does not return comment data from YouTube. You will need to use the official YouTube API if you want to collect comments for videos.

In [1]:
#Install youtube-dl for downloading youtube videos/metadata (as well as information from other video streaming sites)
!pip install youtube_dl



In [2]:
import youtube_dl
import re

Once we have installed youtube_dl, we can return all metadata related to 'Zegna Uomo' videos as well as any available captions (automatically generated, or otherwise).

In [3]:
ydl_opts = {'dump_single_json': True, 'writeautomaticsub': True, 'subtitleslangs': ['en']}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
    result = ydl.extract_info("ytsearch100:Zegna Uomo", download=False)

[youtube:search] query "Zegna Uomo": Downloading page 1
[youtube:search] query "Zegna Uomo": Downloading page 2
[youtube:search] query "Zegna Uomo": Downloading page 3
[youtube:search] query "Zegna Uomo": Downloading page 4
[youtube:search] query "Zegna Uomo": Downloading page 5
[download] Downloading playlist: Zegna Uomo
[youtube:search] playlist Zegna Uomo: Collected 100 video ids (downloading 100 of them)
[download] Downloading video 1 of 100
[youtube] Otqi2zZrRJM: Downloading webpage
[youtube] Otqi2zZrRJM: Downloading video info webpage
[youtube] Otqi2zZrRJM: Looking for automatic captions
[download] Downloading video 2 of 100
[youtube] 4cWpCynRQzw: Downloading webpage
[youtube] 4cWpCynRQzw: Downloading video info webpage
[youtube] 4cWpCynRQzw: Looking for automatic captions
[download] Downloading video 3 of 100
[youtube] axzIdt6rFYM: Downloading webpage
[youtube] axzIdt6rFYM: Downloading video info webpage
[youtube] axzIdt6rFYM: Looking for automatic captions
[youtube] axzIdt6rFYM



[download] Downloading video 12 of 100
[youtube] TFKz6QDY_9k: Downloading webpage
[youtube] TFKz6QDY_9k: Downloading video info webpage
[youtube] TFKz6QDY_9k: Looking for automatic captions
[download] Downloading video 13 of 100
[youtube] 6jeZgHWlsao: Downloading webpage
[youtube] 6jeZgHWlsao: Downloading video info webpage
[youtube] 6jeZgHWlsao: Looking for automatic captions
[download] Downloading video 14 of 100
[youtube] e1BxYRif2FM: Downloading webpage
[youtube] e1BxYRif2FM: Downloading video info webpage
[youtube] e1BxYRif2FM: Looking for automatic captions
[download] Downloading video 15 of 100
[youtube] 0uzaSR-9owA: Downloading webpage
[youtube] 0uzaSR-9owA: Downloading video info webpage
[youtube] 0uzaSR-9owA: Looking for automatic captions
[youtube] 0uzaSR-9owA: Downloading MPD manifest
[download] Downloading video 16 of 100
[youtube] 51iCVeAYHro: Downloading webpage
[youtube] 51iCVeAYHro: Downloading video info webpage
[youtube] 51iCVeAYHro: Looking for automatic captions
[d



[download] Downloading video 21 of 100
[youtube] myGkJ3fJjrQ: Downloading webpage
[youtube] myGkJ3fJjrQ: Downloading video info webpage
[youtube] myGkJ3fJjrQ: Looking for automatic captions
[download] Downloading video 22 of 100
[youtube] tgZis3DkQVo: Downloading webpage
[youtube] tgZis3DkQVo: Downloading video info webpage
[youtube] tgZis3DkQVo: Looking for automatic captions
[download] Downloading video 23 of 100
[youtube] ioSGxWHiaZU: Downloading webpage
[youtube] ioSGxWHiaZU: Downloading video info webpage
[youtube] ioSGxWHiaZU: Looking for automatic captions
[download] Downloading video 24 of 100
[youtube] QbkoRoZZ4gw: Downloading webpage
[youtube] QbkoRoZZ4gw: Downloading video info webpage
[youtube] QbkoRoZZ4gw: Looking for automatic captions




[download] Downloading video 25 of 100
[youtube] YKx4BfZ2fJQ: Downloading webpage
[youtube] YKx4BfZ2fJQ: Downloading video info webpage
[youtube] YKx4BfZ2fJQ: Looking for automatic captions
[download] Downloading video 26 of 100
[youtube] d1_R6gFj_pw: Downloading webpage
[youtube] d1_R6gFj_pw: Downloading video info webpage
[youtube] d1_R6gFj_pw: Looking for automatic captions




[download] Downloading video 27 of 100
[youtube] 6jeZgHWlsao: Downloading webpage
[youtube] 6jeZgHWlsao: Downloading video info webpage
[youtube] 6jeZgHWlsao: Looking for automatic captions
[download] Downloading video 28 of 100
[youtube] UW-zHgizU3I: Downloading webpage
[youtube] UW-zHgizU3I: Downloading video info webpage
[youtube] UW-zHgizU3I: Looking for automatic captions




[download] Downloading video 29 of 100
[youtube] hr6mpIasoOY: Downloading webpage
[youtube] hr6mpIasoOY: Downloading video info webpage
[youtube] hr6mpIasoOY: Looking for automatic captions
[download] Downloading video 30 of 100
[youtube] jacMq_BLeoE: Downloading webpage
[youtube] jacMq_BLeoE: Downloading video info webpage
[youtube] jacMq_BLeoE: Looking for automatic captions
[youtube] jacMq_BLeoE: Downloading MPD manifest
[download] Downloading video 31 of 100
[youtube] Cfaz7OUDiZI: Downloading webpage
[youtube] Cfaz7OUDiZI: Downloading video info webpage
[youtube] Cfaz7OUDiZI: Looking for automatic captions




[youtube] Cfaz7OUDiZI: Downloading MPD manifest
[download] Downloading video 32 of 100
[youtube] zBaNNxIAOHM: Downloading webpage
[youtube] zBaNNxIAOHM: Downloading video info webpage
[youtube] zBaNNxIAOHM: Looking for automatic captions




[download] Downloading video 33 of 100
[youtube] KK4-El2cdPg: Downloading webpage
[youtube] KK4-El2cdPg: Downloading video info webpage
[youtube] KK4-El2cdPg: Looking for automatic captions
[youtube] KK4-El2cdPg: Downloading MPD manifest
[download] Downloading video 34 of 100
[youtube] f_JX6vJDAUQ: Downloading webpage
[youtube] f_JX6vJDAUQ: Downloading video info webpage
[youtube] f_JX6vJDAUQ: Looking for automatic captions
[download] Downloading video 35 of 100
[youtube] JPfvHkFnmPY: Downloading webpage
[youtube] JPfvHkFnmPY: Downloading video info webpage
[youtube] JPfvHkFnmPY: Looking for automatic captions




[download] Downloading video 36 of 100
[youtube] GxXwzovGTPw: Downloading webpage
[youtube] GxXwzovGTPw: Downloading video info webpage
[youtube] GxXwzovGTPw: Looking for automatic captions
[youtube] GxXwzovGTPw: Downloading MPD manifest
[download] Downloading video 37 of 100
[youtube] TFKz6QDY_9k: Downloading webpage
[youtube] TFKz6QDY_9k: Downloading video info webpage
[youtube] TFKz6QDY_9k: Looking for automatic captions
[download] Downloading video 38 of 100
[youtube] D8gdeC7BFRM: Downloading webpage
[youtube] D8gdeC7BFRM: Downloading video info webpage
[youtube] D8gdeC7BFRM: Looking for automatic captions
[download] Downloading video 39 of 100
[youtube] p9fE64CT0NE: Downloading webpage
[youtube] p9fE64CT0NE: Downloading video info webpage
[youtube] p9fE64CT0NE: Looking for automatic captions
[youtube] p9fE64CT0NE: Downloading MPD manifest
[download] Downloading video 40 of 100
[youtube] xNHxfcnHZcQ: Downloading webpage
[youtube] xNHxfcnHZcQ: Downloading video info webpage
[youtube



[youtube] xNHxfcnHZcQ: Downloading MPD manifest
[download] Downloading video 41 of 100
[youtube] 2oF1hP8XX8w: Downloading webpage
[youtube] 2oF1hP8XX8w: Downloading video info webpage
[youtube] 2oF1hP8XX8w: Looking for automatic captions




[download] Downloading video 42 of 100
[youtube] NmsP5lzcbOI: Downloading webpage
[youtube] NmsP5lzcbOI: Downloading video info webpage
[youtube] NmsP5lzcbOI: Looking for automatic captions
[download] Downloading video 43 of 100
[youtube] YdMP9H4Tj5Y: Downloading webpage
[youtube] YdMP9H4Tj5Y: Downloading video info webpage
[youtube] YdMP9H4Tj5Y: Looking for automatic captions
[download] Downloading video 44 of 100
[youtube] QOvyr_EKV1g: Downloading webpage
[youtube] QOvyr_EKV1g: Downloading video info webpage
[youtube] QOvyr_EKV1g: Looking for automatic captions
[download] Downloading video 45 of 100
[youtube] xSLQGo833yU: Downloading webpage
[youtube] xSLQGo833yU: Downloading video info webpage
[youtube] xSLQGo833yU: Looking for automatic captions




[download] Downloading video 46 of 100
[youtube] xax2B82CMSo: Downloading webpage
[youtube] xax2B82CMSo: Downloading video info webpage
[youtube] xax2B82CMSo: Looking for automatic captions




[download] Downloading video 47 of 100
[youtube] 9pYLGO4PFXA: Downloading webpage
[youtube] 9pYLGO4PFXA: Downloading video info webpage
[youtube] 9pYLGO4PFXA: Looking for automatic captions
[download] Downloading video 48 of 100
[youtube] 5s-USM9zR0Y: Downloading webpage
[youtube] 5s-USM9zR0Y: Downloading video info webpage
[youtube] 5s-USM9zR0Y: Looking for automatic captions
[download] Downloading video 49 of 100
[youtube] 1_zMvtu8TM4: Downloading webpage
[youtube] 1_zMvtu8TM4: Downloading video info webpage
[youtube] 1_zMvtu8TM4: Looking for automatic captions
[download] Downloading video 50 of 100
[youtube] JLULjR_tmdo: Downloading webpage
[youtube] JLULjR_tmdo: Downloading video info webpage
[youtube] JLULjR_tmdo: Looking for automatic captions
[download] Downloading video 51 of 100
[youtube] SD-58GVj_30: Downloading webpage
[youtube] SD-58GVj_30: Downloading video info webpage
[youtube] SD-58GVj_30: Looking for automatic captions
[download] Downloading video 52 of 100
[youtube] c



[download] Downloading video 56 of 100
[youtube] t426YWUgwNU: Downloading webpage
[youtube] t426YWUgwNU: Downloading video info webpage
[youtube] t426YWUgwNU: Looking for automatic captions




[download] Downloading video 57 of 100
[youtube] Kj_gvpRIo5k: Downloading webpage
[youtube] Kj_gvpRIo5k: Downloading video info webpage
[youtube] Kj_gvpRIo5k: Looking for automatic captions




[youtube] Kj_gvpRIo5k: Downloading MPD manifest
[download] Downloading video 58 of 100
[youtube] 3FW2wJgSmFY: Downloading webpage
[youtube] 3FW2wJgSmFY: Downloading video info webpage
[youtube] 3FW2wJgSmFY: Looking for automatic captions




[youtube] 3FW2wJgSmFY: Downloading MPD manifest
[download] Downloading video 59 of 100
[youtube] xE3mq6ST_LE: Downloading webpage
[youtube] xE3mq6ST_LE: Downloading video info webpage
[youtube] xE3mq6ST_LE: Looking for automatic captions




[download] Downloading video 60 of 100
[youtube] ofyA_gcOjV0: Downloading webpage
[youtube] ofyA_gcOjV0: Downloading video info webpage
[youtube] ofyA_gcOjV0: Looking for automatic captions




[download] Downloading video 61 of 100
[youtube] yP0YoXOH7d8: Downloading webpage
[youtube] yP0YoXOH7d8: Downloading video info webpage
[youtube] yP0YoXOH7d8: Looking for automatic captions




[download] Downloading video 62 of 100
[youtube] PdQXsLdkJX4: Downloading webpage
[youtube] PdQXsLdkJX4: Downloading video info webpage
[youtube] PdQXsLdkJX4: Looking for automatic captions
[download] Downloading video 63 of 100
[youtube] 8yDSqW_CS2o: Downloading webpage
[youtube] 8yDSqW_CS2o: Downloading video info webpage
[youtube] 8yDSqW_CS2o: Looking for automatic captions




[youtube] 8yDSqW_CS2o: Downloading MPD manifest
[download] Downloading video 64 of 100
[youtube] NTC1d50w_6M: Downloading webpage
[youtube] NTC1d50w_6M: Downloading video info webpage
[youtube] NTC1d50w_6M: Looking for automatic captions
[youtube] NTC1d50w_6M: Downloading MPD manifest
[download] Downloading video 65 of 100
[youtube] 9QhrdZ-mRDk: Downloading webpage
[youtube] 9QhrdZ-mRDk: Downloading video info webpage
[youtube] 9QhrdZ-mRDk: Looking for automatic captions




[download] Downloading video 66 of 100
[youtube] MNp4KYzwSmI: Downloading webpage
[youtube] MNp4KYzwSmI: Downloading video info webpage
[youtube] MNp4KYzwSmI: Looking for automatic captions




[youtube] MNp4KYzwSmI: Downloading MPD manifest
[download] Downloading video 67 of 100
[youtube] g-kn05gqJAA: Downloading webpage
[youtube] g-kn05gqJAA: Downloading video info webpage
[youtube] g-kn05gqJAA: Looking for automatic captions




[download] Downloading video 68 of 100
[youtube] y4bvfXPQiEI: Downloading webpage
[youtube] y4bvfXPQiEI: Downloading video info webpage
[youtube] y4bvfXPQiEI: Looking for automatic captions




[youtube] y4bvfXPQiEI: Downloading MPD manifest
[download] Downloading video 69 of 100
[youtube] wEQg-tguLmw: Downloading webpage
[youtube] wEQg-tguLmw: Downloading video info webpage
[youtube] wEQg-tguLmw: Looking for automatic captions




[youtube] wEQg-tguLmw: Downloading MPD manifest
[download] Downloading video 70 of 100
[youtube] 5S3pky9V0lA: Downloading webpage
[youtube] 5S3pky9V0lA: Downloading video info webpage
[youtube] 5S3pky9V0lA: Looking for automatic captions
[download] Downloading video 71 of 100
[youtube] R82UuGiG1cY: Downloading webpage
[youtube] R82UuGiG1cY: Downloading video info webpage
[youtube] R82UuGiG1cY: Looking for automatic captions




[download] Downloading video 72 of 100
[youtube] 3o6IR1MXKQ8: Downloading webpage
[youtube] 3o6IR1MXKQ8: Downloading video info webpage
[youtube] 3o6IR1MXKQ8: Looking for automatic captions




[youtube] 3o6IR1MXKQ8: Downloading MPD manifest
[download] Downloading video 73 of 100
[youtube] -ccf7LWCNds: Downloading webpage
[youtube] -ccf7LWCNds: Downloading video info webpage
[youtube] -ccf7LWCNds: Looking for automatic captions




[download] Downloading video 74 of 100
[youtube] mRgyMuInMzI: Downloading webpage
[youtube] mRgyMuInMzI: Downloading video info webpage
[youtube] mRgyMuInMzI: Looking for automatic captions




[download] Downloading video 75 of 100
[youtube] IxlIzlvt3Eo: Downloading webpage
[youtube] IxlIzlvt3Eo: Downloading video info webpage
[youtube] IxlIzlvt3Eo: Looking for automatic captions




[youtube] IxlIzlvt3Eo: Downloading MPD manifest
[download] Downloading video 76 of 100
[youtube] PvWNlvCIw4k: Downloading webpage
[youtube] PvWNlvCIw4k: Downloading video info webpage
[youtube] PvWNlvCIw4k: Looking for automatic captions




[download] Downloading video 77 of 100
[youtube] LA0WX1ujnOw: Downloading webpage
[youtube] LA0WX1ujnOw: Downloading video info webpage
[youtube] LA0WX1ujnOw: Looking for automatic captions




[youtube] LA0WX1ujnOw: Downloading MPD manifest
[download] Downloading video 78 of 100
[youtube] lxZ_YFuQQ50: Downloading webpage
[youtube] lxZ_YFuQQ50: Downloading video info webpage
[youtube] lxZ_YFuQQ50: Looking for automatic captions




[youtube] lxZ_YFuQQ50: Downloading MPD manifest
[download] Downloading video 79 of 100
[youtube] Lrv61U4edTg: Downloading webpage
[youtube] Lrv61U4edTg: Downloading video info webpage
[youtube] Lrv61U4edTg: Looking for automatic captions




[download] Downloading video 80 of 100
[youtube] _4fVVGfuJEc: Downloading webpage
[youtube] _4fVVGfuJEc: Downloading video info webpage
[youtube] _4fVVGfuJEc: Looking for automatic captions




[youtube] _4fVVGfuJEc: Downloading MPD manifest
[download] Downloading video 81 of 100
[youtube] YyLBH-e_Iwc: Downloading webpage
[youtube] YyLBH-e_Iwc: Downloading video info webpage
[youtube] YyLBH-e_Iwc: Looking for automatic captions




[download] Downloading video 82 of 100
[youtube] Ti_Z1vq1jSY: Downloading webpage
[youtube] Ti_Z1vq1jSY: Downloading video info webpage
[youtube] Ti_Z1vq1jSY: Looking for automatic captions
[download] Downloading video 83 of 100
[youtube] mcQ5EP8MJhI: Downloading webpage
[youtube] mcQ5EP8MJhI: Downloading video info webpage
[youtube] mcQ5EP8MJhI: Looking for automatic captions




[download] Downloading video 84 of 100
[youtube] lDyLJlGkrqg: Downloading webpage
[youtube] lDyLJlGkrqg: Downloading video info webpage
[youtube] lDyLJlGkrqg: Looking for automatic captions




[download] Downloading video 85 of 100
[youtube] 8NojSCWiZXo: Downloading webpage
[youtube] 8NojSCWiZXo: Downloading video info webpage
[youtube] 8NojSCWiZXo: Looking for automatic captions




[download] Downloading video 86 of 100
[youtube] p8KUP3JNI4I: Downloading webpage
[youtube] p8KUP3JNI4I: Downloading video info webpage
[youtube] p8KUP3JNI4I: Looking for automatic captions




[download] Downloading video 87 of 100
[youtube] TeXQBMlZC8k: Downloading webpage
[youtube] TeXQBMlZC8k: Downloading video info webpage
[youtube] TeXQBMlZC8k: Looking for automatic captions




[download] Downloading video 88 of 100
[youtube] PQhnEgvTFmE: Downloading webpage
[youtube] PQhnEgvTFmE: Downloading video info webpage
[youtube] PQhnEgvTFmE: Looking for automatic captions




[download] Downloading video 89 of 100
[youtube] kAgsddZ6b8U: Downloading webpage
[youtube] kAgsddZ6b8U: Downloading video info webpage
[youtube] kAgsddZ6b8U: Looking for automatic captions




[download] Downloading video 90 of 100
[youtube] fGqT3lzOSKU: Downloading webpage
[youtube] fGqT3lzOSKU: Downloading video info webpage
[youtube] fGqT3lzOSKU: Looking for automatic captions




[download] Downloading video 91 of 100
[youtube] 20Uz-o0glEM: Downloading webpage
[youtube] 20Uz-o0glEM: Downloading video info webpage
[youtube] 20Uz-o0glEM: Looking for automatic captions
[download] Downloading video 92 of 100
[youtube] U5JiaQgEI84: Downloading webpage
[youtube] U5JiaQgEI84: Downloading video info webpage
[youtube] U5JiaQgEI84: Looking for automatic captions
[youtube] U5JiaQgEI84: Downloading MPD manifest
[download] Downloading video 93 of 100
[youtube] 36hTaMiPLQ4: Downloading webpage
[youtube] 36hTaMiPLQ4: Downloading video info webpage
[youtube] 36hTaMiPLQ4: Looking for automatic captions




[download] Downloading video 94 of 100
[youtube] BRogBKf9Bm8: Downloading webpage
[youtube] BRogBKf9Bm8: Downloading video info webpage
[youtube] BRogBKf9Bm8: Looking for automatic captions




[download] Downloading video 95 of 100
[youtube] Gq3LiSFx1t4: Downloading webpage
[youtube] Gq3LiSFx1t4: Downloading video info webpage
[youtube] Gq3LiSFx1t4: Looking for automatic captions




[youtube] Gq3LiSFx1t4: Downloading MPD manifest
[download] Downloading video 96 of 100
[youtube] 9dem-ZU9OLE: Downloading webpage
[youtube] 9dem-ZU9OLE: Downloading video info webpage
[youtube] 9dem-ZU9OLE: Looking for automatic captions
[youtube] 9dem-ZU9OLE: Downloading MPD manifest
[download] Downloading video 97 of 100
[youtube] DieAckaG7TY: Downloading webpage
[youtube] DieAckaG7TY: Downloading video info webpage
[youtube] DieAckaG7TY: Looking for automatic captions




[download] Downloading video 98 of 100
[youtube] DsthYpCm0XM: Downloading webpage
[youtube] DsthYpCm0XM: Downloading video info webpage
[youtube] DsthYpCm0XM: Looking for automatic captions




[download] Downloading video 99 of 100
[youtube] Aq_QxwNf5N8: Downloading webpage
[youtube] Aq_QxwNf5N8: Downloading video info webpage
[youtube] Aq_QxwNf5N8: Looking for automatic captions




[download] Downloading video 100 of 100
[youtube] qNlo5ZuOdpA: Downloading webpage
[youtube] qNlo5ZuOdpA: Downloading video info webpage
[youtube] qNlo5ZuOdpA: Looking for automatic captions
[download] Finished downloading playlist: Zegna Uomo


The text within the captions is quite messy, so before we load the data into a dataframe, we need to clean out special URLs, special characters, etc.:

In [47]:
tag_re = re.compile(r'<[^>]+>')
link_re = re.compile(r'http\S+|\d+')
def remove_tags(text):
    return tag_re.sub(' ', text)

def remove_links(text):
    return link_re.sub(' ', text)


title, description, tags, captions = ([] for i in range(4))

for i in result['entries']:
    # append each item to a separate list and then add them as a dictionary to dataframe all at once after the loop has run
    title.append(i['title'])
    description.append(remove_links(i['description']).replace('\n',' '))
    tags.append(remove_links(' '.join(i['tags'])))
    if 'en' in i['automatic_captions']:
        auto_captions = requests.get(i['automatic_captions']['en'][0]['url'])
        auto_captions_clean = remove_tags(auto_captions.text).replace('&amp;#39;', "'")         \
                                                             .replace('&lt;font'," ")           \
                                                             .replace('color=&quot;', " ")      \
                                                             .replace('&lt;/font&gt;', " ")     \
                                                             .replace('#CCCCCC&quot;&gt;', " ") \
                                                             .replace('#E5E5E5&quot;&gt;', " ")
        captions.append(auto_captions_clean)
    else:
        captions.append(float('nan'))
uomo_yt_df = pd.DataFrame({'Title': title, 'Description': description, 'Tags': tags, 'Captions': captions})

# uomo_yt_df.to_json('uomo_yt100.json') # save dataframe to JSON

uomo_yt_df.head()

Unnamed: 0,Title,Description,Tags,Captions
0,Uomo Ermenegildo Zegna for Men (2013),a fragrance i TRULY love Frarantica Link Fo...,zegna uomo fragrance review mens all year roun...,hey guys how are y'all doing I've been mea...
1,Redolessence Smells & Rates best Zegna fragrance,Facebook Twitter IG Redolessence Zeg...,best Zegna fragrance best Zegna cologne Zegna ...,what's going on guys so today's video I go...
2,Ermenegildo Zegna Uomo (Review) 🔥🔥Cheapie sexi...,,,hey everybody welcome back good ev...
3,Uomo by Ermenegildo Zegna - A Quicky Review,Fragrantica Profile: Zegna Website:,uomo Ermenegildo Zegna (Organization) fresh be...,come to another fragrance quickie and toda...
4,FIRST IMPRESSION! | UOMO by Ermenegildo Zegna,HELLO EVERYONE! Today at StudioScents I'm pre...,Fragrancepreview UOMO ErmenegildoZegna Firstim...,hi and welcome back to studio since I'm To...


After cleaning, we have clean text for the Title, Description, Tags, and Captions for each YouTube video related to the Zegna Uomo cologne and we've saved our data in a JSON format so that we can easily reload the data in its current form as a Pandas DataFrame. Now, we're ready for text analysis! Let's move on to Part III.