## Mannequin Challenge Dataset Downloader
This notebook is a downloader for Mannequin Challenge Dataset. Written by Myeong-Gyu.Lee

* Reference: https://blog.naver.com/PostView.nhn?blogId=skyshin0304&logNo=221620513883&proxyReferer=https:%2F%2Fwww.google.com%2F

type `pip install pytube3` to install pytube library.

In [3]:
from pytube import YouTube
import os, cv2, shutil, math, datetime, ast
import matplotlib.pyplot as plt
import pandas as pd

%matplotlib inline

In [8]:
video_url = 'https://www.youtube.com/watch?v=WiXhzx2tNIw'
yt = YouTube(video_url)
print("영상 제목 :", yt.title)
print("영상 길이 :", yt.length)
print("영상 평점 :", yt.rating)
print("영상 썸네일 링크: ", yt.thumbnail_url)
print("영상 조회수 :", yt.views)
print("영상 설명 :", yt.description)

영상 제목 : YouTube
영상 길이 : 981
영상 평점 : 4.5999999
영상 썸네일 링크:  https://i.ytimg.com/vi/WiXhzx2tNIw/maxresdefault.jpg
영상 조회수 : 4614
영상 설명 : Maplewood Elementary School does a school-wide mannequin challenge. Watch each classroom stop and pose like a mannequin showing the work and fun we have at our school.


In [9]:
yt_streams = yt.streams
print("다운가능한 영상 상세 정보 :")
for i, stream in enumerate(yt_streams.all()):
    print(i, " : ", stream)

다운가능한 영상 상세 정보 :
0  :  <Stream: itag="18" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.42001E" acodec="mp4a.40.2" progressive="True" type="video">
1  :  <Stream: itag="22" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.64001F" acodec="mp4a.40.2" progressive="True" type="video">
2  :  <Stream: itag="137" mime_type="video/mp4" res="1080p" fps="30fps" vcodec="avc1.640028" progressive="False" type="video">
3  :  <Stream: itag="248" mime_type="video/webm" res="1080p" fps="30fps" vcodec="vp9" progressive="False" type="video">
4  :  <Stream: itag="136" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.4d401f" progressive="False" type="video">
5  :  <Stream: itag="247" mime_type="video/webm" res="720p" fps="30fps" vcodec="vp9" progressive="False" type="video">
6  :  <Stream: itag="135" mime_type="video/mp4" res="480p" fps="30fps" vcodec="avc1.4d401f" progressive="False" type="video">
7  :  <Stream: itag="244" mime_type="video/webm" res="480p" fps="30fps" vcodec=

### Build the dataframe and sort by `res` to get highest resolution video.

In [10]:
stream_df_list = []

for stream in yt_streams.all():
    stream_dict = dict()
    stream_str = str(stream)
    stream_elements = stream_str.replace('Stream: ', '').replace('=', ':').replace('<', '').replace('>', '').replace('"', '').split(' ')
    for elemnt in stream_elements:
        stream_dict[elemnt.split(':')[0]] = elemnt.split(':')[-1]
    stream_df_list.append(pd.DataFrame.from_dict(stream_dict, orient='index').T)
        
stream_df_global = pd.concat(stream_df_list)
stream_df_global['videoID'] = str(video_url.split('/')[-1].split('=')[-1])
stream_df_global.set_index('videoID', inplace = True)
stream_df_global = stream_df_global[pd.notnull(stream_df_global['res'])]
stream_df_global['res'] = stream_df_global['res'].str.replace(pat=r'[A-Za-z]', repl= r'', regex=True)
stream_df_global = stream_df_global.astype({'itag': int, 'res': int})
stream_df_global = stream_df_global.sort_values(by='res', ascending=False)
stream_df_global

Unnamed: 0_level_0,itag,mime_type,res,fps,vcodec,acodec,progressive,type,abr
videoID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
WiXhzx2tNIw,137,video/mp4,1080,30fps,avc1.640028,,False,video,
WiXhzx2tNIw,248,video/webm,1080,30fps,vp9,,False,video,
WiXhzx2tNIw,22,video/mp4,720,30fps,avc1.64001F,mp4a.40.2,True,video,
WiXhzx2tNIw,136,video/mp4,720,30fps,avc1.4d401f,,False,video,
WiXhzx2tNIw,247,video/webm,720,30fps,vp9,,False,video,
WiXhzx2tNIw,135,video/mp4,480,30fps,avc1.4d401f,,False,video,
WiXhzx2tNIw,244,video/webm,480,30fps,vp9,,False,video,
WiXhzx2tNIw,18,video/mp4,360,30fps,avc1.42001E,mp4a.40.2,True,video,
WiXhzx2tNIw,134,video/mp4,360,30fps,avc1.4d401e,,False,video,
WiXhzx2tNIw,243,video/webm,360,30fps,vp9,,False,video,


In [11]:
print("\"itag\"를 이용해 가장 높은 해상도의 Video Download :")

my_stream = yt_streams.get_by_itag(stream_df_global.iloc[0]['itag'])
print("선택된 stream: ", my_stream)

"itag"를 이용해 가장 높은 해상도의 Video Download :
선택된 stream:  <Stream: itag="137" mime_type="video/mp4" res="1080p" fps="30fps" vcodec="avc1.640028" progressive="False" type="video">


In [None]:
print("선택된 stream 다운로드 ")
my_stream.download(output_path='D:/MannequinChallenge_Videos', filename=stream_df_global.index[0])

선택된 stream 다운로드 
