# Scrape YouTube data
This notebook scrapes the general details of videos uploaded by multiple news channels between 2021-11-05 and 2021-11-15 as well as each video's details and comment sections.

## Environment
Import dependencies

In [None]:
import os
from dotenv import load_dotenv
import json
import pandas as pd
from youtube import channel, video

Get API key from environment variables

In [None]:
load_dotenv()
key = os.getenv('API_KEY')

## Import list of News channels

This data set is a handpicked list of news channels that:
1. Are relevant (top 100 subscribed or viewed channels)
1. Post political content in English
1. Have open comment sections

## 1. Build `channels` table

NOTE @ 2021-03-14: This section has already been executed. Since the requests are expensive, it's best to just load the result.

In [None]:
# Load result
df1 = pd.read_csv('../../dat/channels.csv')

## 2. Build `channelVideos` table

NOTE @ 2021-03-15: This section has already been executed. Since the requests are expensive, it's best to just load the result.

### 2.1. Pre-treatment videos
Videos uploaded on or before 2021-11-09

### 2.2. Post-treatment videos
Videos uploaded on or after 2021-11-11 (skip November 10th because the policy was gradually rolled out)

API quota ran out on `channelId = UCt-WqkTyKK1_70U4bb4k4lQ`.

Export table

In [None]:
# Load result
df2 = pd.read_csv('../../dat/videos.csv')

## 3. Build `videoDetails` table

Get the details of each video (title, description, duration, definition, etc.). These data will be used as controls.

NOTE @ 2021-03-15: This section has already been executed. Since the requests are expensive, it's best to just load the result.

Export table

In [None]:
# Load result
df3 = pd.read_csv('../../dat/videoDetails.csv')

## 4. Build `videoComments` table

In [None]:
# Resume from crash (because comments disabled...)
idx = df2.loc[df2['videoId'].eq('AoLgaqj9Q7s')].index[0]

In [None]:
for videoId in df2['videoId'].values:
    try:
        comments = video(id=videoId, key=key).get_comments()
        if len(comments) > 0:
            json.dump({videoId:comments}, open('../../dat/comments/' + videoId + '.json', 'w'))
        else:
            # Manually check error returned
            break

Quota Exceeded on `videoId = '_laKJi8Xwh8'`

In [None]:
errors = ['AoLgaqj9Q7s','vQNl8PpcSHw','bu7wwMIrxak','kujtF7tZ1Zk']