# YouTube Channel Data Analysis
## Web scraping 
- YouTube Channel [Boho Beautiful](https://www.youtube.com/c/bohobeautiful)

#### Using YouTube API [V3](https://developers.google.com/youtube/v3/docs?hl=de)
- [Google API documentation](https://developers.google.com/youtube/v3/docs/playlists/list?hl=de)  
Scraping a YouTube channels public information using YouTube API V3.
- video title
- video statistics (likes, dislikes and views)
- video ID  

#### Installing required libraries

In [1]:
#!pip3 install google-api-python-client google-auth-httplib2 google-auth-oauthlib

#### Importing libraries
All libraries needed, including libraries for YouTube API and authentication.
Following this [video](https://www.youtube.com/watch?v=th5_9woFJmk&ab_channel=CoreySchafer).
- [GitHub repo](https://github.com/googleapis/google-api-python-client/blob/master/docs/start.md) with info & links

In [2]:
import pandas as pd
import numpy as np
import json
import pprint

from googleapiclient.discovery import build

#### Connecting to YouTube API
Importing my personal credentials from a private file (that will not be shared on github) and storing it in a variable.

In [3]:
from Credentials import API_KEY

In [4]:
api_key = API_KEY

Following YouTube API's documentation, creating a variable called "youtube" that contains different parameters and creates the connection using my personal API key.

In [5]:
youtube = build("youtube", "v3", developerKey = api_key)

#### Getting channel statistics
The channel is identified by the variable "id".

In [6]:
request = youtube.channels().list(
        part = "statistics",
        id = "UCWN2FPlvg9r-LnUyepH9IaQ")

In [7]:
response = request.execute()

In [8]:
pprint.pprint(response)

{'etag': 'V_vkQq2ns1LXN1Nu59lj768QlAo',
 'items': [{'etag': 'gz1v5jlI5ZIx53E6i0vW3TRBFd0',
            'id': 'UCWN2FPlvg9r-LnUyepH9IaQ',
            'kind': 'youtube#channel',
            'statistics': {'hiddenSubscriberCount': False,
                           'subscriberCount': '2250000',
                           'videoCount': '421',
                           'viewCount': '303093760'}}],
 'kind': 'youtube#channelListResponse',
 'pageInfo': {'resultsPerPage': 5, 'totalResults': 1}}


#### Getting channel information
Containing the channels country, title, description and published date & time.

In [9]:
request1 = youtube.channels().list(
        part = "snippet",
        id = "UCWN2FPlvg9r-LnUyepH9IaQ",
        maxResults = 50)

In [10]:
response1 = request1.execute()

In [11]:
pprint.pprint(response1)

{'etag': 'bhrjJNhjE9B7Rd3WCDuV_fZkqCs',
 'items': [{'etag': 'Va5qW2PQryl7pFVF-n-iv3X1438',
            'id': 'UCWN2FPlvg9r-LnUyepH9IaQ',
            'kind': 'youtube#channel',
            'snippet': {'country': 'CA',
                        'customUrl': 'bohobeautiful',
                        'description': 'Free Yoga Videos for the digital yogi '
                                       'age. \n'
                                       'Plus Pilates, Fitness, Vegan Food, '
                                       'Guided Meditations, and thoughtful '
                                       'blogs for conscious mindful living '
                                       'too. \n'
                                       'Boho Beautiful is Juliana Spicoluk & '
                                       'Mark Spicoluk\n'
                                       '\n'
                                       'If you are new to the  Boho Beautiful '
                                       'Yoga class library y

#### Getting information about the channels playlists
Containing the playlist title, ID, description, thumbnails and more information about each playlist of the channel.

In [12]:
request2 = youtube.playlists().list(
        part = "snippet",
        channelId = "UCWN2FPlvg9r-LnUyepH9IaQ",
        maxResults = 50)

In [13]:
response2 = request2.execute()

In [14]:
pprint.pprint(response2)

{'etag': 'Ld0qmV5hCCEvINEsqw5zNLimOgs',
 'items': [{'etag': 'Bd2enUE9lOsDCnFkK55_YBS330s',
            'id': 'PLb09q0R7gAwRdT0XuHUe3STBK0Zib4WZp',
            'kind': 'youtube#playlist',
            'snippet': {'channelId': 'UCWN2FPlvg9r-LnUyepH9IaQ',
                        'channelTitle': 'Boho Beautiful Yoga',
                        'description': 'The complete library of our pilates & '
                                       'fitness classes!',
                        'localized': {'description': 'The complete library of '
                                                     'our pilates & fitness '
                                                     'classes!',
                                      'title': 'Boho Beautiful Pilates & '
                                               'Fitness'},
                        'publishedAt': '2021-02-03T15:39:36Z',
                        'thumbnails': {'default': {'height': 90,
                                                   'url': 'ht

                                       'standard': {'height': 480,
                                                    'url': 'https://i.ytimg.com/vi/_zaSw9tj6to/sddefault.jpg',
                                                    'width': 640}},
                        'title': "'Tools For A New You' Series"}},
           {'etag': 'egERaBQX51l0ciS5o7bJInB373k',
            'id': 'PLb09q0R7gAwQvGoQ8xC3b0K6KcXWnI9dq',
            'kind': 'youtube#playlist',
            'snippet': {'channelId': 'UCWN2FPlvg9r-LnUyepH9IaQ',
                        'channelTitle': 'Boho Beautiful Yoga',
                        'description': 'We left to follow a whisper. A voice '
                                       'inside ourselves that was telling us '
                                       'there was more to all this than we '
                                       'were living.  That there is more to '
                                       'learn & more growth to experience than '
                  

#### Getting information about a playlist
Videos in playlist, video titles, descriptions and IDs.  
Using parameter "maxResults = 50" to show 50 results per page.
Check "pageInfo" at bottom of response3 for next/previous page token.  
Use this token to actually scrape all pages!
e.g. 'pageInfo': {'resultsPerPage': 50, 'totalResults': 51} 

In [15]:
request3 = youtube.playlistItems().list(
        part = "snippet",
        playlistId = "PLb09q0R7gAwRdT0XuHUe3STBK0Zib4WZp",
        maxResults = 50)

In [16]:
response3 = request3.execute()

In [17]:
pprint.pprint(response3)

{'etag': 'rvo1Gkd1K8tk0EJ_NJ3C1gpThZY',
 'items': [{'etag': '3PkCbhcdpJQKEEdr8kOBE1bUYr4',
            'id': 'UExiMDlxMFI3Z0F3UmRUMFh1SFVlM1NUQkswWmliNFdacC4zMDg5MkQ5MEVDMEM1NTg2',
            'kind': 'youtube#playlistItem',
            'snippet': {'channelId': 'UCWN2FPlvg9r-LnUyepH9IaQ',
                        'channelTitle': 'Boho Beautiful Yoga',
                        'description': 'An incredible 5 minute arm workout '
                                       'that will tone and sculpt like never '
                                       'before. If you struggling to lose '
                                       'extra arm fat or simply looking for a '
                                       'way to gain more lean muscle '
                                       'definition, well this workout is for '
                                       'you! Without having to use any weight '
                                       'resistance or other props, we will use '
                        

                                       'Big thanks to Black Magic Design for '
                                       'the amazing love!\n'
                                       'We use the ATEM mini pro & pocket '
                                       'cinema 6k cameras!\n'
                                       'https://www.blackmagicdesign.com/ca/products/atemmini\n'
                                       'https://www.blackmagicdesign.com/ca/products/blackmagicpocketcinemacamera\n'
                                       'https://www.blackmagicdesign.com/ca/products/blackmagicvideoassist\n'
                                       '\n'
                                       '\n'
                                       'Boho Beautiful is Juliana Spicoluk & '
                                       'Mark Spicoluk\n'
                                       '\n'
                                       '------------------------------------------------------------------------------------------

                                       'Location- Playa Dominical, Costa '
                                       'Rica  \n'
                                       '\n'
                                       'LINKS:\n'
                                       '\n'
                                       'Boho Beautiful Official- Our NEW '
                                       'Streaming Platform & App\n'
                                       'https://www.bohobeautiful.tv\n'
                                       '\n'
                                       'Sign Up For Our Newsletter & Get Two '
                                       'FREE Boho Beautiful Books!\n'
                                       'https://www.bohobeautiful.life/returnhome\n'
                                       '\n'
                                       'Boho Beautiful Life- Our 2nd More '
                                       'Personal Youtube Channel:\n'
                                       'https://www.you

                                       'Ab Challenge:\n'
                                       'https://youtu.be/OEinsl7TipQ\n'
                                       '\n'
                                       'Cardio Yoga Practice:\n'
                                       'https://youtu.be/yxGXmjAkE5M\n'
                                       '\n'
                                       'Get Our Fitness Program Transform:\n'
                                       'https://bohobeautiful.life/transform/\n'
                                       '\n'
                                       'Get Our New Detox Yoga Program:\n'
                                       'https://bohobeautiful.life/product/detoxify\n'
                                       '\n'
                                       'Translate Our Videos Here:\n'
                                       'https://www.youtube.com/timedtext_cs_panel?c=UCWN2FPlvg9r-LnUyepH9IaQ&tab=2\n'
                                       '\n'
    

                                                   'width': 120},
                                       'high': {'height': 360,
                                                'url': 'https://i.ytimg.com/vi/kb3mYslEa9g/hqdefault.jpg',
                                                'width': 480},
                                       'maxres': {'height': 720,
                                                  'url': 'https://i.ytimg.com/vi/kb3mYslEa9g/maxresdefault.jpg',
                                                  'width': 1280},
                                       'medium': {'height': 180,
                                                  'url': 'https://i.ytimg.com/vi/kb3mYslEa9g/mqdefault.jpg',
                                                  'width': 320},
                                       'standard': {'height': 480,
                                                    'url': 'https://i.ytimg.com/vi/kb3mYslEa9g/sddefault.jpg',
                                        

                                       'Channel:\n'
                                       'https://www.youtube.com/channel/UCBbs2c6JCjU_HZPOeY6jWVg/?sub_confirmation=1\n'
                                       '\n'
                                       'Our Premium Full Length Programs:\n'
                                       'Transform, Retreat, Complete, 10 Days, '
                                       '& Detoxify\n'
                                       'https://bohobeautiful.life/our-store\n'
                                       '\n'
                                       'Boho Beautiful RETREAT- Your 7 Day '
                                       'Home Yoga Retreat\n'
                                       'https://bohobeautiful.life/retreat\n'
                                       '\n'
                                       'Boho Beautiful Homepage:\n'
                                       'https://www.bohobeautiful.life\n'
                                       '\n'
  

#### Finding the wanted information in output
Using .keys() to check dictionary keys and to see where the needed information is stored (video title, video ID, ...). Then printing the keys to see which information is stored where.

In [18]:
response3.keys()

dict_keys(['kind', 'etag', 'nextPageToken', 'items', 'pageInfo'])

In [19]:
response3["items"][3]

{'kind': 'youtube#playlistItem',
 'etag': 'idLWNLp7u0APa8unjRD72irpuKw',
 'id': 'UExiMDlxMFI3Z0F3UmRUMFh1SFVlM1NUQkswWmliNFdacC43NERCMDIzQzFBMERCMEE3',
 'snippet': {'publishedAt': '2021-08-30T15:05:47Z',
  'channelId': 'UCWN2FPlvg9r-LnUyepH9IaQ',
  'title': '20 Min Full Body Pilates Yoga Workout | Total Body Stretch & Tone',
  'description': 'A 20 minute pilates full body yoga workout is the perfect at home total body pilates practice to fire up your entire abs, midsection, glutes, and inner thighs. Working through strengthening toning exercises and deep releasing yoga asanas, this boho beautiful workout will leave you feeling energized, centred, and strong to continue with your day. \n\nRemember to breathe and push yourself to a limit that you feel comfortable. With time your body will get stronger and each exercise will get easier. \n\nSo roll out your mat, grab some water, take a deep breath & lets begin!\n\nPilates Instructor- Juliana Spicoluk\nVideo by- Mark Spicoluk \n\nLocation 

#### Video Title

In [20]:
response3["items"][3]["snippet"]["title"]

'20 Min Full Body Pilates Yoga Workout | Total Body Stretch & Tone'

#### Video ID

In [21]:
response3["items"][3]["snippet"]["resourceId"]["videoId"]

'r2-X10IYC3Q'

#### Scraping next page of playlist
...using "pageToken" parameter, pageToken = nextPageToken.

In [25]:
request4 = youtube.playlistItems().list(
        part = "snippet",
        playlistId = "PLb09q0R7gAwRdT0XuHUe3STBK0Zib4WZp",
        maxResults = 50,
        pageToken = 'EAAaBlBUOkNESQ')

In [26]:
response4 = request4.execute()

In [27]:
pprint.pprint(response4)

{'etag': 'R8iQ1fDk8ItrHjtC3YjGcePpV7Q',
 'items': [{'etag': '18mo7dt1xuWTYVTgI4RzIWEE19A',
            'id': 'UExiMDlxMFI3Z0F3UmRUMFh1SFVlM1NUQkswWmliNFdacC4zQzFBN0RGNzNFREFCMjBE',
            'kind': 'youtube#playlistItem',
            'snippet': {'channelId': 'UCWN2FPlvg9r-LnUyepH9IaQ',
                        'channelTitle': 'Boho Beautiful Yoga',
                        'description': 'This class is great to do on its own '
                                       'for a full body workout, as part of '
                                       'Part 1: Cardio Workout, or as part of '
                                       'it own fitness challenge with the '
                                       'description below.\n'
                                       '\n'
                                       'If you like to skip the cardio '
                                       "challenge of week 1, you're more then "
                                       'welcome to begin at week 2 and stil

### Creating functions 
...to get all the needed data and store it in lists to be further explored and then turned into dataframes.

#### Function to obtain playlist IDs & names

In [28]:
playlist_ids = []
playlist_titles = []

for item in response2["items"]:
    playlist_ids.append(item["id"])
    playlist_titles.append(item["snippet"]["title"])

In [29]:
dictionary = {"Title":playlist_titles, "ID":playlist_ids}
playlists = pd.DataFrame(dictionary)

In [30]:
playlists.head()
%store playlists

Stored 'playlists' (DataFrame)


In [31]:
#playlist_titles

In [32]:
#playlist_ids

#### Function to obtain all items of all playlists
Necessary step to be able to get the video title and ID

In [33]:
playlist_items =[]

for item in playlist_ids:
    request3 = youtube.playlistItems().list(
            part = "snippet",
            playlistId = item,
            maxResults = 50)

    response3 = request3.execute()
    playlist_items.append(response3)
    
    try:
        request4 = youtube.playlistItems().list(
                part = "snippet",
                playlistId = item,
                maxResults = 50,
                pageToken = response3['nextPageToken'])
        
        response4 = request4.execute()
        playlist_items.append(response4)
        response4 = response3
        
    except KeyError:
        continue

In [35]:
playlist_items[0]

{'kind': 'youtube#playlistItemListResponse',
 'etag': 'rvo1Gkd1K8tk0EJ_NJ3C1gpThZY',
 'nextPageToken': 'EAAaBlBUOkNESQ',
 'items': [{'kind': 'youtube#playlistItem',
   'etag': '3PkCbhcdpJQKEEdr8kOBE1bUYr4',
   'id': 'UExiMDlxMFI3Z0F3UmRUMFh1SFVlM1NUQkswWmliNFdacC4zMDg5MkQ5MEVDMEM1NTg2',
   'snippet': {'publishedAt': '2021-02-03T16:19:28Z',
    'channelId': 'UCWN2FPlvg9r-LnUyepH9IaQ',
    'title': '5 Minutes Arm Workout For Toning | How To Lose Arm Fat',
    'description': "An incredible 5 minute arm workout that will tone and sculpt like never before. If you struggling to lose extra arm fat or simply looking for a way to gain more lean muscle definition, well this workout is for you! Without having to use any weight resistance or other props, we will use the force of gravity to do this super quick and effective upper body workout that will make you feel the burn in under 5 minutes. If you're struggling with 'saggy' and weak arms, or are in need of something that will strengthen every s

#### Function to obtain video IDs & names

In [36]:
video_ids = []

#response3["items"][0]["snippet"]["resourceId"]["videoId"]

for response in playlist_items:
    for item in response["items"]:
        video_ids.append(item["snippet"]["resourceId"]["videoId"])

In [37]:
#video_ids

In [38]:
video_titles = []

#response3["items"][0]["snippet"]["title"]

for response in playlist_items:
    for item in response["items"]:
        video_titles.append(item["snippet"]["title"])

In [39]:
#video_titles

In [40]:
video_published = []

for response in playlist_items:
    for item in response["items"]:
        video_published.append(item["snippet"]["publishedAt"])

In [41]:
#video_published

### Getting more data

#### Function to obtain ratings and views

In [42]:
video_stats = []
video_ids_stats = []

for i in video_ids:
        request10 = youtube.videos().list(
        part = "statistics",
        id = i)
        response10 = request10.execute()
        try:
            video_ids_stats.append(response10["items"][0]["id"])
            video_stats.append(response10["items"][0]["statistics"])
        except IndexError:
            continue

In [43]:
#for i in video_ids_stats:
        #request10 = youtube.videos().list(
        #part = "snippet",
        #id = i)
        #response10 = request10.execute()     

In [44]:
video_ids_stats[0]

'W2Mq_c-dgVY'

#### Dataframe containing video statistics

In [45]:
video_stats = pd.DataFrame(video_stats)

In [46]:
video_stats['Video_ID'] = video_ids_stats

In [47]:
video_stats = video_stats.drop_duplicates(subset = ['Video_ID'])

In [48]:
%store video_stats

Stored 'video_stats' (DataFrame)


#### Video duration

In [49]:
request13 = youtube.videos().list(
        part = "contentDetails",
        id = "W2Mq_c-dgVY")

response13 = request13.execute()
pprint.pprint(response13)

{'etag': 'tDTLh_E99BntNodd2JkPXEjAs9Q',
 'items': [{'contentDetails': {'caption': 'true',
                               'contentRating': {},
                               'definition': 'hd',
                               'dimension': '2d',
                               'duration': 'PT6M19S',
                               'licensedContent': True,
                               'projection': 'rectangular'},
            'etag': 'Q3ldOMp4_qQGbh5t9YaP6Ctt9GM',
            'id': 'W2Mq_c-dgVY',
            'kind': 'youtube#video'}],
 'kind': 'youtube#videoListResponse',
 'pageInfo': {'resultsPerPage': 1, 'totalResults': 1}}


In [50]:
response13["items"][0]["contentDetails"]["duration"]

'PT6M19S'

In [51]:
video_lenght = []
v_lenght_id = []

for i in video_ids:
        request20 = youtube.videos().list(
        part = "contentDetails",
        id = i)
        response20 = request20.execute()
        try:
            v_lenght_id.append(response20["items"][0]["id"])
            video_lenght.append(response20["items"][0]["contentDetails"]["duration"])
        except IndexError:
            continue

In [52]:
len(video_lenght)

1281

In [53]:
len(v_lenght_id)

1281

#### Dataframe containing video lenghts

In [54]:
video_lenghts = pd.DataFrame(video_lenght, columns = ["Duration"])

In [55]:
video_lenghts['Video_ID'] = v_lenght_id

In [56]:
video_lenghts = video_lenghts.drop_duplicates(subset = ['Video_ID'])

In [57]:
#video_lenghts.rename(columns={"0":"Duration"},  inplace = True)
video_lenghts

Unnamed: 0,Duration,Video_ID
0,PT6M19S,W2Mq_c-dgVY
1,PT13M56S,_qNmbXRZtfY
2,PT12M16S,AhITIKFruHM
3,PT25M6S,r2-X10IYC3Q
4,PT14M11S,YEETslOqmZs
...,...,...
1109,PT11M28S,bi1uioesDdo
1110,PT9M25S,5d6TriLBQmE
1111,PT9M58S,HVWkp1Nu6o8
1127,PT3M46S,xy2S4ayzMnI


In [58]:
%store video_lenghts

Stored 'video_lenghts' (DataFrame)


#### Storing video data in dataframe videos

In [59]:
dictionary_videos = {"Title" : video_titles, "ID" : video_ids, "Published" : video_published}
videos = pd.DataFrame(dictionary_videos)
videos

Unnamed: 0,Title,ID,Published
0,5 Minutes Arm Workout For Toning | How To Lose...,W2Mq_c-dgVY,2021-02-03T16:19:28Z
1,Best Ab Workout In 10 Min ♥ Tummy & Muffin Top...,_qNmbXRZtfY,2021-02-03T16:20:19Z
2,Challenge Your Waistline ♥ Abs & Core Workout ...,AhITIKFruHM,2021-02-03T16:32:45Z
3,20 Min Full Body Pilates Yoga Workout | Total ...,r2-X10IYC3Q,2021-08-30T15:05:47Z
4,Best Leg Toning Workout ♥ 10 Minute Glutes & T...,YEETslOqmZs,2021-02-03T16:20:01Z
...,...,...,...
1285,Easy Yoga Workout ♥ Fat Loss & Flexibility | K...,C-Q7GeQG6iE,2017-07-05T17:09:08Z
1286,30 Min Intermediate Yoga Class | Expand Your Y...,SrE1B5GzUpM,2020-04-21T15:05:37Z
1287,"30 Min Yoga For Tight & Sore Hips, Glutes, Ham...",w2TfswY2_rg,2021-01-13T09:30:59Z
1288,Private video,LnIC7dFeGO0,2015-09-25T02:11:32Z


In [60]:
%store videos

Stored 'videos' (DataFrame)
