<a href="https://colab.research.google.com/github/chrdrn/digital-behavioral-data/blob/main/session_05-showcase_youtube.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Background
Practical application of the [YouTube Data Tool (YTDT)](https://tools.digitalmethods.net/netvizz/youtube/) using the example of Mai Thi Nguyen-Kim ( [<img src="https://raw.githubusercontent.com/FortAwesome/Font-Awesome/6.x/svgs/brands/twitter.svg" width="15" height="15">](https://twitter.com/maithi_nk) | [<img src="https://raw.githubusercontent.com/FortAwesome/Font-Awesome/6.x/svgs/brands/instagram.svg" width="15" height="15">](https://twitter.com/maithi_nk) ) and her <img src="https://raw.githubusercontent.com/FortAwesome/Font-Awesome/6.x/svgs/brands/youtube.svg" width="15" height="15"> Channel [maiLab](https://www.youtube.com/c/maiLab).

# Excercise 1

* Use the [`Channel Search`](https://tools.digitalmethods.net/netvizz/youtube/mod_channels_search.php) site/function of the `YTDT` to find the (correct) `channel ID` for the <img src="https://raw.githubusercontent.com/FortAwesome/Font-Awesome/6.x/svgs/brands/youtube.svg" width="15" height="15"> Channel [maiLab](https://www.youtube.com/c/maiLab). 
* Therefore, enter "*maiLab*" in the field `Search query` and download the results as `.csv`. 
* Open the file and search extract the correct channel ID.\
  *Hint:*  If in doubt, use [`Channel Info`](https://tools.digitalmethods.net/netvizz/youtube/mod_channel_info.php) function to check if the selected ID matches the channel description.

In [None]:
# Load packages
library(readr)
library(tidyverse)

# Import data
channel_list <- read_csv("data/channelsearch_channels50_2022_11_17-09_54_22.csv")

# Preview data 
channel_list %>% glimpse()

In [None]:
# Get channel description with R
channel_list %>%
  filter(title == "maiLab") %>%
  select(id, title, description)

# Exercise 2


*    With help of the [`Video List`](https://tools.digitalmethods.net/netvizz/youtube/mod_videos_list.php) site/function of the `YTDT`, get a list of all published videos of the channel `maiLab`.  
*   Therefore, use the extracted `channel id` and download the results as `.csv`.
* Import and preview the data. 



In [None]:
# Import data: video list
video_list <- read_csv("data/videolist_channel186_2022_11_17-10_20_11.csv")

# Preview data 
video_list %>% glimpse()

# Excercise 3

*   Perform an explorative data analysis

## Video uploads over time

In [None]:
# Load additional packages
library(lubridate)
library(sjPlot)

# Display 
video_list %>% 
  mutate(year  = as.factor(year(publishedAt))) %>% 
  plot_frq(
    year,
    title = "Video uploads on `maiLab` by year")

## Different location parameters



### Basic descriptive statistics

In [None]:
# Load additional packages
library(sjmisc) 

# Get distribution parameters for selected variables
video_list %>% 
  select(durationSec, viewCount, likeCount, favoriteCount, commentCount) %>% 
  descr()  

### More detailed distribution for each variable

In [None]:
video_list %>% 
  plot_frq(durationSec, viewCount, likeCount, commentCount, type = "density")

## In-depth analysis
Based on the findings of the previous section, let us take a closer look. Interestingly, although most of the varialbes have a left-sloping distribution, there are isolated outliers on the "right" edge. 

Therefore, the next goal is to find out which video(s) they are.  

### Top 5 videos with the **highest view count**

In [1]:
video_list %>% 
  arrange(-viewCount) %>% 
  select(videoTitle, publishedAt, viewCount:commentCount) %>% 
  head()

ERROR: ignored

### Top 5 videos with the **highest comment count**

In [None]:
video_list %>% 
  arrange(-commentCount) %>% 
  select(videoTitle, publishedAt, viewCount:commentCount) %>% 
  head()