# Youtube LLM Analytics

## 1.4. Dataset

### Data selection

As this project is particularly focused on data science channels, I found that not many readily available datasets online are suitable for this purpose. The 2 alternative datasets I found are:

- [The top trending Youtube videos on Kaggle](https://www.kaggle.com/rsrishav/youtube-trending-video-dataset): This dataset contains several months of data on daily trending YouTube videos for several countries. There are up to 200 trending videos per day. However, this dataset is not fit for this project because the trending videos are about a wide range of topics that are not necessarily related to data science. 

- Another dataset is obtained from this [Github repo](https://gitlab.com/thebrahminator/Youtube-View-Predictor) of Vishwanath Seshagiri, which is the metadata of 0.5M+ YouTube videos along with their channel data. There is no clear documentation on how this dataset was created, but a quick look at the datasets in the repository suggested that the data was obtained using keyword search of popular keywords such as "football" or "science". There are also some relevant keywords such as "python". However, I decided not to use these datasets because they don't contain data for the channels I am interested in.

I created my own dataset using the [Google Youtube Data API version 3.0](https://developers.google.com/youtube/v3). The exact steps of data creation is presented in section *2. Data Creation* below.

### Data limitations

The dataset is a real-world dataset and suitable for the research. However, the selection of the top 10 Youtube channels to include in the research is purely based on my knowledge of the channels in data science field and might not be accurate. My definition is "popular" is only based on subscriber count but there are other metrics that could be taken into consideration as well (e.g. views, engagement). The top 10 also seems arbitrary given the plethora of channels on Youtube. There might be smaller channels that might also very interesting to look into, which could be the next step of this project.

### Ethics of data source

According to [Youtube API's guide](https://developers.google.com/youtube/v3/getting-started), the usage of Youtube API is free of charge given that your application send requests within a quota limit. "The YouTube Data API uses a quota to ensure that developers use the service as intended and do not create applications that unfairly reduce service quality or limit access for others. " The default quota allocation for each application is 10,000 units per day, and you could request additional quota by completing a form to YouTube API Services if you reach the quota limit.

Since all data requested from Youtube API is public data (which everyone on the Internet can see on Youtube), there is no particular privacy issues as far as I am concerned. In addition, the data is obtained only for research purposes in this case and not for any commercial interests.

In [50]:
import pandas as pd
import numpy as np
from dateutil import parser
import isodate
from datetime import datetime, timedelta
from googleapiclient.errors import HttpError 

# Data visualization libraries
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns
sns.set(style="darkgrid", color_codes=True)

# Google API
from googleapiclient.discovery import build

In [3]:
# NLP libraries
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
nltk.download('stopwords')
nltk.download('punkt')
from wordcloud import WordCloud

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\furni\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\furni\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


# 2. Data creation with Youtube API

I first created a project on Google Developers Console, then requested an authorization credential (API key). Afterwards, I enabled Youtube API for my application, so that I can send API requests to Youtube API services. Then, I went on Youtube and checked the channel ID of each of the channels that I would like to include in my research scope (using their URLs). Then I created the functions for getting the channel statistics via the API.

In [194]:
api_key = 'AIzaSyB-4NIQtecQPbRX7TWKphThkb9_Brh2wL4' 
#api_key = 'AIzaSyA4Sd1FkOSah19dL7cg7OuBUj9VBJiE2fE'

# channel_ids = ['UCtYLUTtgS3k1Fg4y5tAhLbw', # Statquest 
#                'UCCezIgC97PvUuR4_gbFUs5g', # Corey Schafer
#                'UCfzlCWGWYyIQ0aLC5w48gBQ', # Sentdex
#                'UCNU_lfiiWBdtULKOw6X0Dig', # Krish Naik
#                'UCzL_0nIe8B4-7ShhVPfJkgw', # DatascienceDoJo
#                'UCLLw7jmFsvfIVaUFsLs8mlQ', # Luke Barousse 
#                'UCiT9RITQ9PW6BhXK0y2jaeg', # Ken Jee
#                'UC7cs8q-gJRlGwj4A8OmCmXg', # Alex the analyst
#                'UC2UXDak6o7rBm23k3Vv5dww', # Tina Huang
#               ]

channel_ids = [
    'UCupvZG-5ko_eiXAupbDfxWw',  # CNN
     'UCXIJgqnII2ZOINSWNOGFThA',  # FOX NEWS
     'UCaXkIU1QidjPwiAYu6GcHjg',  # MSNBC
     #'UCBi2mrWuNuyYy4gbM6fU18Q',  # ABC NEWS
     #'UC8p1vwvWtl6T73JiExfWs1g',  # CBS NEWS
]

youtube = build('youtube', 'v3', developerKey=api_key)

In [196]:
def get_channel_stats(youtube, channel_ids):
    """
    Get channel statistics: title, subscriber count, view count, video count, upload playlist
    Params:
    
    youtube: the build object from googleapiclient.discovery
    channels_ids: list of channel IDs
    
    Returns:
    Dataframe containing the channel statistics for all channels in the provided list: title, subscriber count, view count, video count, upload playlist
    
    """
    all_data = []
    request = youtube.channels().list(
                part='snippet,contentDetails,statistics',
                id=','.join(channel_ids))
    response = request.execute() 
    
    for i in range(len(response['items'])):
        data = dict(channelName = response['items'][i]['snippet']['title'],
                    channel_id=channel_ids[i],
                    subscribers = response['items'][i]['statistics']['subscriberCount'],
                    views = response['items'][i]['statistics']['viewCount'],
                    totalVideos = response['items'][i]['statistics']['videoCount'],
                    playlistId = response['items'][i]['contentDetails']['relatedPlaylists']['uploads'])
        all_data.append(data)
    
    return pd.DataFrame(all_data)

def get_video_ids(youtube, playlist_id):
    """
    Get list of video IDs of all videos in the given playlist for the last month past(30 days) 
    Params:
    
    youtube: the build object from googleapiclient.discovery
    playlist_id: playlist ID of the channel
    
    Returns:
    List of video IDs of all videos in the playlist
    
    """
    one_month_ago = (datetime.now() - timedelta(days=30)).strftime('%Y-%m-%dT%H:%M:%SZ')
    
    request = youtube.playlistItems().list(
                part='contentDetails',
                playlistId=playlist_id,
                maxResults=50)
    response = request.execute()
    
    video_ids = []
    
    for i in range(len(response['items'])):
        video_published_at = response['items'][i]['contentDetails']['videoPublishedAt']
        
        # Check if the video was published in the past month
        if video_published_at >= one_month_ago:
            video_ids.append(response['items'][i]['contentDetails']['videoId'])
    

    next_page_token = response.get('nextPageToken')
    more_pages = True
    
    while more_pages:
        if next_page_token is None:
            more_pages = False
        else:
            request = youtube.playlistItems().list(
                        part='contentDetails',
                        playlistId = playlist_id,
                        maxResults = 50,
                        pageToken = next_page_token
                        )
            response = request.execute()
    
            for i in range(len(response['items'])):
                video_published_at = response['items'][i]['contentDetails']['videoPublishedAt']
        
                # Check if the video was published in the past month
                if video_published_at >= one_month_ago:
                    video_ids.append(response['items'][i]['contentDetails']['videoId'])
            
            next_page_token = response.get('nextPageToken')
    return video_ids



def get_video_details(youtube, video_ids, channel_id):
    """
    Get video statistics of all videos with given IDs
    Params:
    
    youtube: the build object from googleapiclient.discovery
    video_ids: list of video IDs
    channel_id: ID of the channel
    
    Returns:
    Dataframe with statistics of videos, i.e.:
        'channel_id', 'channelTitle', 'title', 'description', 'tags', 'publishedAt'
        'viewCount', 'likeCount', 'favoriteCount', 'commentCount'
        'duration', 'definition', 'caption'
    """
        
    all_video_info = []
    
    for i in range(0, len(video_ids), 50):
        request = youtube.videos().list(
            part="snippet,contentDetails,statistics",
            id=','.join(video_ids[i:i+50])
        )
        response = request.execute() 

        for video in response['items']:
            stats_to_keep = {'snippet': ['channelTitle', 'title', 'description', 'tags', 'publishedAt','defaultAudioLanguage'],
                             'statistics': ['viewCount', 'likeCount', 'favouriteCount', 'commentCount'],
                             'contentDetails': ['duration', 'definition', 'caption']
                            }
            video_info = {}
            video_info['channel_id'] = channel_id  # Add channel_id to the DataFrame
            video_info['video_id'] = video['id']

            for k in stats_to_keep.keys():
                for v in stats_to_keep[k]:
                    try:
                        video_info[v] = video[k][v]
                    except:
                        video_info[v] = None

            all_video_info.append(video_info)
            
    return pd.DataFrame(all_video_info)



def get_playlists_info(youtube, channel_ids):


    all_playlist_data = []

    """
    Retreiving Playlist data for all the channels
    
    """
    
    for channel_id in channel_ids:
        request = youtube.playlists().list(
            part="snippet",
            channelId=channel_id,
            maxResults=50  # Adjust the maximum number of playlists to retrieve if needed
        )
        response = request.execute()

        for playlist in response.get("items", []):
            playlist_data = dict(
                playlist_id=playlist["id"],
                title=playlist["snippet"]["title"],
                description=playlist["snippet"]["description"],
                publishedAt=playlist["snippet"]["publishedAt"],
                channelId=playlist["snippet"]["channelId"],
                channelTitle=playlist["snippet"]["channelTitle"],
                defaultLanguage=playlist["snippet"].get("defaultLanguage"),
                thumbnailUrl=playlist["snippet"]["thumbnails"]["default"]["url"]
            )
            all_playlist_data.append(playlist_data)
    return pd.DataFrame(all_playlist_data)


def get_captions(youtube, video_ids):
    caption_list = []

    for video_i in video_ids:
            captions = youtube.captions().list(
            part="snippet",
            videoId=video_i
        ).execute()

        # List to store comments as dictionaries
            

        # Extract comments and append them to the list
            for caption in captions["items"]:
                snippet = caption["snippet"]
                caption_dict = {
        "videoId": snippet["videoId"],
        "lastUpdated": snippet["lastUpdated"],
        "trackKind": snippet["trackKind"],
        "language": snippet["language"],
        "name": snippet["name"],
        "audioTrackType": snippet["audioTrackType"],
        "status": snippet["status"]
    }
                caption_list.append(caption_dict)
    return(pd.DataFrame(caption_list))

def get_comments(youtube, video_ids):
    """
    Get top level comments as text from all videos with given IDs (only the first 50 comments per video due to quote limit of Youtube API)
    Params:
    
    youtube: the build object from googleapiclient.discovery
    video_ids: list of video IDs
    
    Returns:
    Dataframe with video IDs and associated top level comment in text.
    
    """
    all_comments = []
    all_comments_data = []    
    for video_id in video_ids:
        comments_in_video_info = {}
        try:   
            request = youtube.commentThreads().list(
                part="snippet,replies",
                videoId=video_id
            )
            response = request.execute()
            
            comments_in_video= []
            comments_in_video_info = {}
            for comment in response['items'][:50]:
                comment_text = comment['snippet']['topLevelComment']['snippet']['textOriginal']
        
                # Append the comment text to the list
                comments_in_video.append(comment_text)
                comments_data = {'video_id': video_id, 
                                'comments': comment_text,
                                'likeCount': comment['snippet']['topLevelComment']['snippet']['likeCount'],
                                'authorDisplayName': comment['snippet']['topLevelComment']['snippet']['authorDisplayName'],
                                'authorProfileImageUrl': comment['snippet']['topLevelComment']['snippet']['authorProfileImageUrl'],
                                'authorChannelUrl': comment['snippet']['topLevelComment']['snippet']['authorChannelUrl'],
                                'authorChannelId': comment['snippet']['topLevelComment']['snippet']['authorChannelId']['value'],
                                'channelId': comment['snippet']['topLevelComment']['snippet']['channelId'],
                                'canRate': comment['snippet']['topLevelComment']['snippet']['canRate'],
                                'viewerRating': comment['snippet']['topLevelComment']['snippet']['viewerRating'],
                                'publishedAt': comment['snippet']['topLevelComment']['snippet']['publishedAt']
                                
                                                            
                                }
                all_comments_data.append(comments_data)
            comments_in_video_info = {'video_id': video_id, 'comments': comments_in_video}
            


        except: 
            # When error occurs - most likely because comments are disabled on a video
            print('Could not get comments for video ' + video_id)



        all_comments.append(comments_in_video_info)

            
                
        
        # Create a dictionary for each comment and append it to the list
                # comment_info = {'video_id': video_id, 'comment': comment_text}
                # comments_in_video_info.append(comment_info)
        

        
        
    
        
    return pd.DataFrame(all_comments_data) , pd.DataFrame(all_comments)   




In [123]:
all_comments = []
    
for video_id in video_ids[0:5]:
    try:   
        request = youtube.commentThreads().list(
            part="snippet,replies",
            videoId=video_id
        )
        response = request.execute()
    
        comments_in_video = [comment['snippet']['topLevelComment']['snippet']['textOriginal'] for comment in response['items'][0:2]]
        comments_in_video_info = {'video_id': video_id, 'comments': comments_in_video}

        all_comments.append(comments_in_video_info)
        
    except: 
        # When error occurs - most likely because comments are disabled on a video
        print('Could not get comments for video ' + video_id)
        
comments_df = pd.DataFrame(all_comments) 
print(comments_df.head())

      video_id  \
0  rLOsgFgGkZY   
1  xRYL71j9g5M   
2  8Yy6ffODUw8   
3  vfhun9-1cJ4   
4  eiBlgAAOcCg   

                                                                                                                                                                                                                                                                                                                                                                         comments  
0                                                                                                                                                                                                              [The children stuffing from the war do not deserve this! They deserve happiness, safety, and piece!, no reason to lay down when if it hits you you are done bro!!]  
1                                                                                                                                                  

In [186]:
def get_comments1(youtube, video_ids):
    """
    Get top level comments as text from all videos with given IDs (only the first 50 comments per video due to quote limit of Youtube API)
    Params:
    
    youtube: the build object from googleapiclient.discovery
    video_ids: list of video IDs
    
    Returns:
    Dataframe with video IDs and associated top level comment in text.
    
    """
    all_comments = []
    all_comments_data = []    
    for video_id in video_ids[0:10]:
        comments_in_video_info = {}
        try:   
            request = youtube.commentThreads().list(
                part="snippet,replies",
                videoId=video_id
            )
            response = request.execute()
            
            comments_in_video= []
            
            for comment in response['items'][:2]:
                comment_text = comment['snippet']['topLevelComment']['snippet']['textOriginal']
        
                # Append the comment text to the list
                comments_in_video.append(comment_text)
                comments_data = {'video_id': video_id, 
                                'comments': comment_text,
                                'likeCount': comment['snippet']['topLevelComment']['snippet']['likeCount'],
                                'authorDisplayName': comment['snippet']['topLevelComment']['snippet']['authorDisplayName'],
                                'authorProfileImageUrl': comment['snippet']['topLevelComment']['snippet']['authorProfileImageUrl'],
                                'authorChannelUrl': comment['snippet']['topLevelComment']['snippet']['authorChannelUrl'],
                                'authorChannelId': comment['snippet']['topLevelComment']['snippet']['authorChannelId']['value'],
                                'channelId': comment['snippet']['topLevelComment']['snippet']['channelId'],
                                'canRate': comment['snippet']['topLevelComment']['snippet']['canRate'],
                                'viewerRating': comment['snippet']['topLevelComment']['snippet']['viewerRating'],
                                'publishedAt': comment['snippet']['topLevelComment']['snippet']['publishedAt']
                                
                                                            
                                }
                all_comments_data.append(comments_data)
            comments_in_video_info = {'video_id': video_id, 'comments': comments_in_video}
            


        except: 
            # When error occurs - most likely because comments are disabled on a video
            print('Could not get comments for video ' + video_id)



        all_comments.append(comments_in_video_info)

            
                
        
        # Create a dictionary for each comment and append it to the list
                # comment_info = {'video_id': video_id, 'comment': comment_text}
                # comments_in_video_info.append(comment_info)
        

        
        
    
        
    return pd.DataFrame(all_comments_data) , pd.DataFrame(all_comments)


In [188]:
all_com_df , com_df = get_comments1(youtube, video_ids)

Could not get comments for video 3yA6B9qPNQA


In [189]:
print(all_com_df.head())

      video_id  \
0  lPL1oqYLq5U   
1  lPL1oqYLq5U   
2  FOB5SZl3GV0   
3  FOB5SZl3GV0   
4  4jmAHdY2uZI   

                                                                                                                                                                                                                                        comments  \
0                                                                                                                                                                                                                  –ë–∞–π–¥–µ–Ω! –ê –ë–∞–π–¥–µ–Ω! \n–¢—ã –≤—Ä–µ—à—å!   
1                                                                                                                                                                                 I suggest that the president take a test for banned substances   
2  Omar doesn't speak for all progressive democrats, she's just visiting.\nGezz that's funny lady because it was Palestinians by the thousa

In [51]:

for video_i in video_ids[0:10]:
    try:
        comments = youtube.commentThreads().list(
            part="snippet",
            videoId=video_i,
            textFormat="plainText"
        ).execute()

        # List to store comments as dictionaries
        comment_list = []

        # Extract comments and append them to the list
        for comment in comments["items"][0:10]:
            snippet = comment["snippet"]["topLevelComment"]["snippet"]
            comment_dict = {
                "Author": snippet["authorDisplayName"],
                "Comment": snippet["textDisplay"],
                "channelId": snippet["channelId"],
                "likeCount": snippet["likeCount"]
            }
            comment_list.append(comment_dict)
    except HttpError as e:
        error_details = e._get_reason()
        if "commentsDisabled" in error_details:
            print(f"Comments are disabled for video: {video_id}")
        else:
            print(f"An error occurred for video: {video_id}")
            print(error_details)


    # Create a DataFrame from the list of dictionaries
    comments_df = pd.DataFrame(comment_list)

    # Display the DataFrame
    print(comments_df.head(3))

        Author  \
0       Or1000   
1  acslater017   
2   abc2390986   

                                                                                                                                                                                                                       Comment  \
0                                                                                                                                                                                                                     üáÆüá±üáÆüá±üáÆüá±üáÆüá±   
1  This is the power of religion to corrupt morality, entrench entitlements, and inflame tribal identity. From the outside, this is about as good a use of human life as Star Trek people murdering Star Wars people. Fiction.   
2                                                                                                                                                                                               Jerusalem is Israel‚Äôs capital. 

### Get channel statistics

Using the `get_channel_stats` function defined below, now we are going to obtain the channel statistics for the 9 channels in scope.

In [94]:
channel_df = get_channel_stats(youtube, channel_ids)

Now I can print out the data and take a look at the channel statistics overview.

In [95]:
channel_df

Unnamed: 0,channelName,channel_id,subscribers,views,totalVideos,playlistId
0,ABC News,UCupvZG-5ko_eiXAupbDfxWw,15200000,13441238241,82830,UUBi2mrWuNuyYy4gbM6fU18Q
1,CNN,UCXIJgqnII2ZOINSWNOGFThA,15600000,14771424781,161673,UUupvZG-5ko_eiXAupbDfxWw
2,CBS News,UCaXkIU1QidjPwiAYu6GcHjg,5310000,4586933090,114997,UU8p1vwvWtl6T73JiExfWs1g
3,Fox News,UCBi2mrWuNuyYy4gbM6fU18Q,10700000,15338016845,97024,UUXIJgqnII2ZOINSWNOGFThA
4,MSNBC,UC8p1vwvWtl6T73JiExfWs1g,6070000,11135910894,73236,UUaXkIU1QidjPwiAYu6GcHjg


I noticed the count columns in `channel_data` is currently in string format, so I will convert them into numeric so that we can visualize and do numeric operations on them.

In [90]:
# Convert count columns to numeric columns
numeric_cols = ['subscribers', 'views', 'totalVideos']
channel_data[numeric_cols] = channel_data[numeric_cols].apply(pd.to_numeric, errors='coerce')

In [218]:
print(channel_df['channelName'])

0       MSNBC
1         CNN
2    Fox News
Name: channelName, dtype: object


Let's take a look at the number of subscribers per channel to have a view of how popular the channels are when compared with one another.

### Get video statistics for all the channels

In the next step, we will obtain the video statistics for all the channels. In total, we obtained 3,722 videos as seen in below.

## CREATING DATAFRAMES FOR ALL THE TABLES

In [197]:
# Create a dataframe with video statistics and comments from all channels

video_df = pd.DataFrame()
comments_df = pd.DataFrame()

for c in channel_df['channelName'].unique():
    print("Getting video information from channel: " + c)
    playlist_id = channel_df.loc[channel_df['channelName']== c, 'playlistId'].iloc[0]
    channel_id = channel_df.loc[channel_df['channelName']== c, 'channel_id'].iloc[0]  # Get the channel_id
    video_ids = get_video_ids(youtube, playlist_id)
    
    # get video data
    video_data = get_video_details(youtube, video_ids, channel_id)  # Pass channel_id
    # get comment data
    comments_data_df, comments_combined_df = get_comments(youtube, video_ids)

    # append video data together and comment data toghether
    video_df = video_df.append(video_data, ignore_index=True)
    comments_df = comments_df.append(comments_combined_df, ignore_index=True)

playlist_df = get_playlists_info(youtube, channel_ids)

channel_df = get_channel_stats(youtube, channel_ids)

captions_df = get_captions(youtube, video_ids)

Getting video information from channel: ABC News
Could not get comments for video 3yA6B9qPNQA
Could not get comments for video psHHvbD6W5E
Could not get comments for video C92SGYjblxI
Could not get comments for video BqXBePgCBU8
Could not get comments for video P6V0_3Ckpzo
Could not get comments for video muGEyVcDeYk
Could not get comments for video 7v6p8Vil-Z8
Could not get comments for video diV4hF930Eg
Could not get comments for video -KVcdHNOa-M
Could not get comments for video RlGa0JRlUK4
Could not get comments for video mRxnzdQic14
Could not get comments for video KabehDEW-Jw
Could not get comments for video 8sbsyFXG2sc
Could not get comments for video 3ACP0-wavLM
Could not get comments for video Rcv5Lj4uC30
Could not get comments for video ZVHKO5I2hbo
Could not get comments for video 4LYW2wuFxRw
Could not get comments for video aUuYgWKjDPA
Could not get comments for video MCCY4OC2uVM
Could not get comments for video pBsN5tajMcY
Could not get comments for video tmmTd24QAZ4
Could 

  video_df = video_df.append(video_data, ignore_index=True)
  comments_df = comments_df.append(comments_combined_df, ignore_index=True)


Getting video information from channel: CNN


  video_df = video_df.append(video_data, ignore_index=True)
  comments_df = comments_df.append(comments_combined_df, ignore_index=True)


Getting video information from channel: CBS News


  video_df = video_df.append(video_data, ignore_index=True)
  comments_df = comments_df.append(comments_combined_df, ignore_index=True)


Getting video information from channel: Fox News
Could not get comments for video fUY7IL9htKo
Could not get comments for video zbWfFpeRHoA
Could not get comments for video r4i-RdTmy3g
Could not get comments for video XICCOVgNgDA
Could not get comments for video 70-sefxS68c
Could not get comments for video GxeUjO23vgg
Could not get comments for video BNQADZacua8
Could not get comments for video ZJNLcoNvSHg
Could not get comments for video OF-EQlrR5Ro
Could not get comments for video 4hYkgGMpe2g
Could not get comments for video 5KMx6azI6tE
Could not get comments for video 4KxUZHzU-Zs
Could not get comments for video aaBqNG8fkWo
Could not get comments for video 65zCSZK6p1U
Could not get comments for video Z3wTh51-bz8
Could not get comments for video l9r7nSCPKqk
Could not get comments for video NT0ZIOKfEL4
Could not get comments for video NlzqxuWcZMY
Could not get comments for video IjAuTPESJ7o
Could not get comments for video 2qZALSfkAFA
Could not get comments for video vqvtGMeh41g
Could 

  video_df = video_df.append(video_data, ignore_index=True)
  comments_df = comments_df.append(comments_combined_df, ignore_index=True)


Getting video information from channel: MSNBC


  video_df = video_df.append(video_data, ignore_index=True)
  comments_df = comments_df.append(comments_combined_df, ignore_index=True)


HttpError: <HttpError 403 when requesting https://youtube.googleapis.com/youtube/v3/captions?part=snippet&videoId=E40VVs-xQ1s&key=AIzaSyB-4NIQtecQPbRX7TWKphThkb9_Brh2wL4&alt=json returned "The request cannot be completed because you have exceeded your <a href="/youtube/v3/getting-started#quota">quota</a>.". Details: "[{'message': 'The request cannot be completed because you have exceeded your <a href="/youtube/v3/getting-started#quota">quota</a>.', 'domain': 'youtube.quota', 'reason': 'quotaExceeded'}]">

In [206]:
print(playlist_df.head())

                          playlist_id                        title  \
0  PL6XRrncXkMaU55GiCvv416NR2qBD_xbmf     Israel at war with Hamas   
1  PL6XRrncXkMaW9CdmnrS4NVWKavtqRO8hT  CNN Underscored: First Look   
2  PL6XRrncXkMaVXAutoJ8D2RDKAz_XufaFm                   World News   
3  PL6XRrncXkMaXpv1ZA2l3jwnhbajEEROUe          CNN Special Reports   
4  PL6XRrncXkMaW8rqNW6ddCsT6SEosWF4W2    The Lead with Jake Tapper   

                                                                                                                                                         description  \
0                                                                                                                                                                      
1  CNN Underscored gets an insider‚Äôs first look at new products and services being released. Check out all the latest tech, home and lifestyle products coming soon.   
2                                                                        

In [191]:
comments_data_df, comments_combined_df = get_comments(youtube, video_ids)

Could not get comments for video 3yA6B9qPNQA


In [None]:
print(channel_df['channelName'])

In [92]:
video_df.head(2)

Unnamed: 0,channel_id,video_id,channelTitle,title,description,tags,publishedAt,viewCount,likeCount,favouriteCount,...,duration,definition,caption,pushblishDayName,durationSecs,tagsCount,likeRatio,commentRatio,titleLength,title_no_stopwords
0,UCupvZG-5ko_eiXAupbDfxWw,oZsj7QAJ0UA,MSNBC,"‚ÄòD√©j√† vu all over again‚Äô: GOP House speaker race featuring Scalise, Jordan deemed ‚Äònonsense‚Äô","The House is without a speaker as the Israel-Hamas war continues. On critics saying the GOP speaker race is becoming a circus, David Jolly, former Republican congressman no longer affiliated with the GOP, tells Joy Reid, ""The one thing I'll tell you Republicans are good at is publicly punching themselves in the face.""\n\n¬ª Subscribe to MSNBC: http://on.msnbc.com/SubscribeTomsnbc\n\nFollow MSNBC Show Blogs \nMaddowBlog: https://www.msnbc.com/maddowblog\nReidOut Blog: https://www.msnbc.com/reidoutblog\n\nMSNBC delivers breaking news, in-depth analysis of politics headlines, as well as commentary and informed perspectives. Find video clips and segments from The Rachel Maddow Show, Morning Joe, The Beat with Ari Melber, Deadline: White House, The ReidOut, All In, Last Word, 11th Hour, and Alex Wagner who brings her breadth of reporting experience to MSNBC primetime. Watch ‚ÄúAlex Wagner Tonight‚Äù Tuesday through Friday at 9pm Eastern. \n \nConnect with MSNBC Online \nVisit msnbc.com: http...",[Joy Reid],2023-10-12 02:45:04+00:00,9841.0,291.0,,...,PT7M59S,hd,True,Thursday,479.0,1,29.570166,13.108424,92,"[‚ÄòD√©j√†, vu, again‚Äô:, GOP, House, speaker, race, featuring, Scalise,, Jordan, deemed, ‚Äònonsense‚Äô]"
1,UCupvZG-5ko_eiXAupbDfxWw,mCBQHSHHUCA,MSNBC,‚ÄòHamas is saying bring it on‚Äô: Engel on tensions as Israeli military gathers near Gaza border,"The Israeli military presence is building near the Gaza border. ""This will be a highly complex operation, very difficult to carry out for Israel... and potentially extremely lethal for all of the Palestinians‚Ä¶,‚Äù NBC News‚Äô Richard Engel tells Joy Reid live from Ashdod, Israel of a possible invasion, ‚Äúand Hamas is saying we welcome it. Bring it on."" Ben Rhodes, former deputy national security advisor in the Obama administration, also joins The ReidOut with his analysis.\n\n¬ª Subscribe to MSNBC: http://on.msnbc.com/SubscribeTomsnbc\n\nFollow MSNBC Show Blogs \nMaddowBlog: https://www.msnbc.com/maddowblog\nReidOut Blog: https://www.msnbc.com/reidoutblog\n\nMSNBC delivers breaking news, in-depth analysis of politics headlines, as well as commentary and informed perspectives. Find video clips and segments from The Rachel Maddow Show, Morning Joe, The Beat with Ari Melber, Deadline: White House, The ReidOut, All In, Last Word, 11th Hour, and Alex Wagner who brings her breadth of reporting...",[Joy Reid],2023-10-12 02:15:03+00:00,147373.0,1894.0,,...,PT12M5S,hd,True,Thursday,725.0,1,12.851744,9.228285,93,"[‚ÄòHamas, saying, bring, on‚Äô:, Engel, tensions, Israeli, military, gathers, near, Gaza, border]"


Let's take a look at the `comment_df` as well. We only get 3,743 comments in total due to the fact that we limited to 10 first comments on the video to avoid exceeding the Youtube API quota limit.

In [93]:
comments_df.head(2)

Unnamed: 0,Author,Comment,channelId,likeCount
0,Oralia Esquer ü§é,Israel and United States are the most miserables Nations.üò§ Palestina sufer more than others and the world knows...,UCBi2mrWuNuyYy4gbM6fU18Q,0
1,PointGucci01,"That father didn‚Äôt say don‚Äôt be scared, he said it‚Äôs over",UCBi2mrWuNuyYy4gbM6fU18Q,0


In [99]:
playlist_df

Unnamed: 0,playlist_id,title,description,publishedAt,channelId,channelTitle,defaultLanguage,thumbnailUrl
0,PL6XRrncXkMaW9CdmnrS4NVWKavtqRO8hT,CNN Underscored: First Look,CNN Underscored gets an insider‚Äôs first look a...,2023-09-29T15:28:35Z,UCupvZG-5ko_eiXAupbDfxWw,CNN,,https://i.ytimg.com/vi/GaqN9lEGZgg/default.jpg
1,PL6XRrncXkMaVXAutoJ8D2RDKAz_XufaFm,World News,,2023-09-29T11:39:58Z,UCupvZG-5ko_eiXAupbDfxWw,CNN,,https://i.ytimg.com/vi/H35GdCASqbU/default.jpg
2,PL6XRrncXkMaXpv1ZA2l3jwnhbajEEROUe,CNN Special Reports,,2023-09-12T17:38:57Z,UCupvZG-5ko_eiXAupbDfxWw,CNN,,https://i.ytimg.com/vi/8_JSMR-0T4I/default.jpg
3,PL6XRrncXkMaW8rqNW6ddCsT6SEosWF4W2,The Lead with Jake Tapper,,2023-08-31T16:48:18Z,UCupvZG-5ko_eiXAupbDfxWw,CNN,,https://i.ytimg.com/vi/WAItdOB3hmE/default.jpg
4,PL6XRrncXkMaWdcHpyKh3fcAsf6aDOQOnM,Climate Change,,2023-08-21T22:23:28Z,UCupvZG-5ko_eiXAupbDfxWw,CNN,,https://i.ytimg.com/vi/9ubdrCq_ZKo/default.jpg
...,...,...,...,...,...,...,...,...
245,PLEb3ThbkPrFZQqAxYlJh5aXtGg6Y_HBcA,Local Matters | CBS News,,2021-03-11T11:58:54Z,UC8p1vwvWtl6T73JiExfWs1g,CBS News,,https://i.ytimg.com/vi/8LbQpC9wC58/default.jpg
246,PLEb3ThbkPrFYLCBonpC7yPxoIvzbsvSAP,Biden Administration | CBS News,,2021-01-20T16:25:40Z,UC8p1vwvWtl6T73JiExfWs1g,CBS News,,https://i.ytimg.com/vi/aiRGLq7Ju20/default.jpg
247,PLEb3ThbkPrFao31JipOKP0YhNlaTtP3XR,Assault on the Capitol | CBS News,,2021-01-12T14:42:06Z,UC8p1vwvWtl6T73JiExfWs1g,CBS News,,https://i.ytimg.com/vi/KH2LTrEgU3M/default.jpg
248,PLEb3ThbkPrFaeeLrc289Up0KUQJG3P0bS,2020 Reykjav√≠k Global Forum ‚Äì Women Leaders,,2020-11-09T18:18:40Z,UC8p1vwvWtl6T73JiExfWs1g,CBS News,,https://i.ytimg.com/vi/gjXI1QLo4Mw/default.jpg


In [96]:
channel_df

Unnamed: 0,channelName,channel_id,subscribers,views,totalVideos,playlistId
0,ABC News,UCupvZG-5ko_eiXAupbDfxWw,15200000,13441238241,82830,UUBi2mrWuNuyYy4gbM6fU18Q
1,CNN,UCXIJgqnII2ZOINSWNOGFThA,15600000,14771424781,161673,UUupvZG-5ko_eiXAupbDfxWw
2,CBS News,UCaXkIU1QidjPwiAYu6GcHjg,5310000,4586933090,114997,UU8p1vwvWtl6T73JiExfWs1g
3,Fox News,UCBi2mrWuNuyYy4gbM6fU18Q,10700000,15338016845,97024,UUXIJgqnII2ZOINSWNOGFThA
4,MSNBC,UC8p1vwvWtl6T73JiExfWs1g,6070000,11135910894,73236,UUaXkIU1QidjPwiAYu6GcHjg


## Preprocessing & Feature engineering

To be able to make use of the data for analysis, we need to perform a few pre-processing steps. Firstly, I would like reformat some columns, especially the date and time columns such as "pushlishedAt" and "duration". In addition, I also think it is necessary to enrich the data with some new features that might be useful for understanding the videos' characteristics.

### Check for empty values

In [138]:
video_df.isnull().any()

channel_id        False
video_id          False
channelTitle      False
title             False
description       False
tags               True
publishedAt       False
viewCount         False
likeCount         False
favouriteCount     True
commentCount       True
duration          False
definition        False
caption           False
dtype: bool

There's no strange dates in the publish date column, videos were published between 2013 and 2022.

In [139]:
video_df.publishedAt.sort_values().value_counts()

2023-10-03T02:45:00Z    3
2023-10-02T22:45:00Z    3
2023-09-14T18:45:00Z    3
2023-09-29T19:30:04Z    2
2023-09-14T00:15:00Z    2
                       ..
2023-09-17T19:30:01Z    1
2023-09-17T19:15:00Z    1
2023-09-17T19:00:32Z    1
2023-09-17T19:00:13Z    1
2023-10-07T03:37:49Z    1
Name: publishedAt, Length: 4852, dtype: int64

Next, we need to check if the data type of the columns are correct. I have checked the data types and indeed some count columns such as view count and comment count are currently not in correct data type. In this step, we convert these count columns into integer.

In [65]:
print(video_df.columns)

Index(['channel_id', 'video_id', 'channelTitle', 'title', 'description',
       'tags', 'publishedAt', 'viewCount', 'likeCount', 'favouriteCount',
       'commentCount', 'duration', 'definition', 'caption'],
      dtype='object')


In [66]:
cols = ['viewCount', 'likeCount', 'commentCount']
video_df[cols] = video_df[cols].apply(pd.to_numeric, errors='coerce', axis=1)

In [67]:
video_df['favouriteCount']
condition = video_df['favouriteCount'].notna()

# Apply the condition to filter rows
filtered_df = video_df.loc[condition]
print(filtered_df['favouriteCount'])


Series([], Name: favouriteCount, dtype: object)


### Enriching data

I want to enrich the data for further analyses, for example:

- create published date column with another column showing the day in the week the video was published, which will be useful for later analysis.

- convert video duration to seconds instead of the current default string format

- calculate number of tags for each video

- calculate comments and likes per 1000 view ratio

- calculate title character length

In [68]:
# Create publish day (in the week) column
video_df['publishedAt'] =  video_df['publishedAt'].apply(lambda x: parser.parse(x)) 
video_df['pushblishDayName'] = video_df['publishedAt'].apply(lambda x: x.strftime("%A")) 

In [69]:
# convert duration to seconds
video_df['durationSecs'] = video_df['duration'].apply(lambda x: isodate.parse_duration(x))
video_df['durationSecs'] = video_df['durationSecs'].astype('timedelta64[s]')

In [70]:
# Add number of tags
video_df['tagsCount'] = video_df['tags'].apply(lambda x: 0 if x is None else len(x))

In [71]:
# Comments and likes per 1000 view ratio
video_df['likeRatio'] = video_df['likeCount']/ video_df['viewCount'] * 1000
video_df['commentRatio'] = video_df['commentCount']/ video_df['viewCount'] * 1000

In [72]:
# Title character length
video_df['titleLength'] = video_df['title'].apply(lambda x: len(x))

Let's look at the video dataset at this point to see if everything went well. It looks good - now we will proceed to exploratory analysis part.

In [73]:
video_df.head()

Unnamed: 0,channel_id,video_id,channelTitle,title,description,tags,publishedAt,viewCount,likeCount,favouriteCount,commentCount,duration,definition,caption,pushblishDayName,durationSecs,tagsCount,likeRatio,commentRatio,titleLength
0,UCupvZG-5ko_eiXAupbDfxWw,oZsj7QAJ0UA,MSNBC,"‚ÄòD√©j√† vu all over again‚Äô: GOP House speaker race featuring Scalise, Jordan deemed ‚Äònonsense‚Äô","The House is without a speaker as the Israel-Hamas war continues. On critics saying the GOP speaker race is becoming a circus, David Jolly, former Republican congressman no longer affiliated with the GOP, tells Joy Reid, ""The one thing I'll tell you Republicans are good at is publicly punching themselves in the face.""\n\n¬ª Subscribe to MSNBC: http://on.msnbc.com/SubscribeTomsnbc\n\nFollow MSNBC Show Blogs \nMaddowBlog: https://www.msnbc.com/maddowblog\nReidOut Blog: https://www.msnbc.com/reidoutblog\n\nMSNBC delivers breaking news, in-depth analysis of politics headlines, as well as commentary and informed perspectives. Find video clips and segments from The Rachel Maddow Show, Morning Joe, The Beat with Ari Melber, Deadline: White House, The ReidOut, All In, Last Word, 11th Hour, and Alex Wagner who brings her breadth of reporting experience to MSNBC primetime. Watch ‚ÄúAlex Wagner Tonight‚Äù Tuesday through Friday at 9pm Eastern. \n \nConnect with MSNBC Online \nVisit msnbc.com: http...",[Joy Reid],2023-10-12 02:45:04+00:00,9841.0,291.0,,129.0,PT7M59S,hd,True,Thursday,479.0,1,29.570166,13.108424,92
1,UCupvZG-5ko_eiXAupbDfxWw,mCBQHSHHUCA,MSNBC,‚ÄòHamas is saying bring it on‚Äô: Engel on tensions as Israeli military gathers near Gaza border,"The Israeli military presence is building near the Gaza border. ""This will be a highly complex operation, very difficult to carry out for Israel... and potentially extremely lethal for all of the Palestinians‚Ä¶,‚Äù NBC News‚Äô Richard Engel tells Joy Reid live from Ashdod, Israel of a possible invasion, ‚Äúand Hamas is saying we welcome it. Bring it on."" Ben Rhodes, former deputy national security advisor in the Obama administration, also joins The ReidOut with his analysis.\n\n¬ª Subscribe to MSNBC: http://on.msnbc.com/SubscribeTomsnbc\n\nFollow MSNBC Show Blogs \nMaddowBlog: https://www.msnbc.com/maddowblog\nReidOut Blog: https://www.msnbc.com/reidoutblog\n\nMSNBC delivers breaking news, in-depth analysis of politics headlines, as well as commentary and informed perspectives. Find video clips and segments from The Rachel Maddow Show, Morning Joe, The Beat with Ari Melber, Deadline: White House, The ReidOut, All In, Last Word, 11th Hour, and Alex Wagner who brings her breadth of reporting...",[Joy Reid],2023-10-12 02:15:03+00:00,147373.0,1894.0,,1360.0,PT12M5S,hd,True,Thursday,725.0,1,12.851744,9.228285,93
2,UCupvZG-5ko_eiXAupbDfxWw,DTcDUu9d5g4,MSNBC,Inside the saferoom: Harrowing details of an Israeli family‚Äôs escape from Hamas,"Haaretz correspondent Amir Tibon spoke to The Atlantic‚Äôs Yair Rosenberg to tell the story of how the Hamas attack unfolded and how his family managed to walk away with their lives.\n\n¬ª Subscribe to MSNBC: http://on.msnbc.com/SubscribeTomsnbc\n\nFollow MSNBC Show Blogs \nMaddowBlog: https://www.msnbc.com/maddowblog\nReidOut Blog: https://www.msnbc.com/reidoutblog\n\nMSNBC delivers breaking news, in-depth analysis of politics headlines, as well as commentary and informed perspectives. Find video clips and segments from The Rachel Maddow Show, Morning Joe, The Beat with Ari Melber, Deadline: White House, The ReidOut, All In, Last Word, 11th Hour, and Alex Wagner who brings her breadth of reporting experience to MSNBC primetime. Watch ‚ÄúAlex Wagner Tonight‚Äù Tuesday through Friday at 9pm Eastern. \n \nConnect with MSNBC Online \nVisit msnbc.com: http://on.msnbc.com/Readmsnbc\nSubscribe to the MSNBC Daily Newsletter: MSNBC.com/NewslettersYouTube\nFind MSNBC on Facebook: http://on.msnbc.c...",[Chris Hayes],2023-10-12 01:45:00+00:00,5787.0,149.0,,47.0,PT8M30S,hd,True,Thursday,510.0,1,25.747365,8.121652,79
3,UCupvZG-5ko_eiXAupbDfxWw,34DPns20HNY,MSNBC,'Republican dysfunction': Scalise faces GOP holdouts after speaker nomination,"‚ÄúThere‚Äôs no one consistent ask here. There‚Äôs no one thing Scalise needs to do. A lot of these members are beholden to a very conservative base that wants them to essentially do magic: to make a Democratic Senate and a Democratic president accept very conservative bills,"" says Sahil Kapur on the GOP lawmakers saying they won't back Steve Scalise for speaker.¬†\n\n¬ª Subscribe to MSNBC: http://on.msnbc.com/SubscribeTomsnbc\n\nFollow MSNBC Show Blogs \nMaddowBlog: https://www.msnbc.com/maddowblog\nReidOut Blog: https://www.msnbc.com/reidoutblog\n\nMSNBC delivers breaking news, in-depth analysis of politics headlines, as well as commentary and informed perspectives. Find video clips and segments from The Rachel Maddow Show, Morning Joe, The Beat with Ari Melber, Deadline: White House, The ReidOut, All In, Last Word, 11th Hour, and Alex Wagner who brings her breadth of reporting experience to MSNBC primetime. Watch ‚ÄúAlex Wagner Tonight‚Äù Tuesday through Friday at 9pm Eastern. \n \nConnect ...",[Chris Hayes],2023-10-12 01:30:01+00:00,43785.0,1226.0,,353.0,PT10M20S,hd,True,Thursday,620.0,1,28.000457,8.062122,77
4,UCupvZG-5ko_eiXAupbDfxWw,hYVAAY69mNI,MSNBC,'We have to have one single focus and that is going after Hamas' says Rep. Meeks,"Democratic Congressman Gregory Meeks of New York who is the Ranking Member of the Foreign Affairs Committee joins Nicolle Wallace on Deadline White House to discuss the U.S. response to the War in Israel.\n\n¬ª Subscribe to MSNBC: http://on.msnbc.com/SubscribeTomsnbc\n \nFollow MSNBC Show Blogs \nMaddowBlog: https://www.msnbc.com/maddowblog\nReidOut Blog: https://www.msnbc.com/reidoutblog\n\nMSNBC delivers breaking news, in-depth analysis of politics headlines, as well as commentary and informed perspectives. Find video clips and segments from The Rachel Maddow Show, Morning Joe, The Beat with Ari Melber, Deadline: White House, The ReidOut, All In, Last Word, 11th Hour, and Alex Wagner who brings her breadth of reporting experience to MSNBC primetime. Watch ‚ÄúAlex Wagner Tonight‚Äù Tuesday through Friday at 9pm Eastern. \n \nConnect with MSNBC Online \nVisit msnbc.com: http://on.msnbc.com/Readmsnbc\nSubscribe to the MSNBC Daily Newsletter: MSNBC.com/NewslettersYouTube\nFind MSNBC on Fa...",[Nicolle Wallace],2023-10-12 01:15:03+00:00,4681.0,127.0,,121.0,PT7M4S,hd,True,Thursday,424.0,1,27.130955,25.849178,80


In [75]:
print(video_df.columns)

video_df1 = video_df.drop(columns='tags')
print(video_df1.columns)

Index(['channel_id', 'video_id', 'channelTitle', 'title', 'description',
       'tags', 'publishedAt', 'viewCount', 'likeCount', 'favouriteCount',
       'commentCount', 'duration', 'definition', 'caption', 'pushblishDayName',
       'durationSecs', 'tagsCount', 'likeRatio', 'commentRatio', 'titleLength',
       'title_no_stopwords'],
      dtype='object')
Index(['channel_id', 'video_id', 'channelTitle', 'title', 'description',
       'publishedAt', 'viewCount', 'likeCount', 'favouriteCount',
       'commentCount', 'duration', 'definition', 'caption', 'pushblishDayName',
       'durationSecs', 'tagsCount', 'likeRatio', 'commentRatio', 'titleLength',
       'title_no_stopwords'],
      dtype='object')


### References/ Resources used:

[1] Youtube API. Avaiable at https://developers.google.com/youtube/v3

[2] Converting video durations to time function. https://stackoverflow.com/questions/15596753/how-do-i-get-video-durations-with-youtube-api-version-3

[3] P. Covington, J. Adams, E. Sargin. The youtube video recommendation system. In Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys '16, pages 191-198, New York, NY, USA, 2016. ACM.

In [216]:
print(captions_df.head())

       videoId                  lastUpdated trackKind language name  \
0  rLOsgFgGkZY   2023-10-12T06:08:29.39115Z       asr       en        
1  xRYL71j9g5M  2023-10-12T03:31:18.015066Z       asr       en        
2  8Yy6ffODUw8  2023-10-12T02:55:35.487699Z       asr       en        
3  vfhun9-1cJ4  2023-10-12T02:00:15.414994Z       asr       en        
4  eiBlgAAOcCg  2023-10-12T01:54:13.439551Z       asr       en        

  audioTrackType   status  
0        unknown  serving  
1        unknown  serving  
2        unknown  serving  
3        unknown  serving  
4        unknown  serving  


In [79]:
video_df1.head()
# Assuming 'publishedAt' is the column with time-zone information
video_df1['publishedAt'] = video_df1['publishedAt'].dt.tz_localize(None)


In [217]:
import snowflake.connector
from snowflake.connector.pandas_tools import write_pandas
import pandas as pd

# Snowflake connection parameters
snowflake_user = 'FURNITUREWALAABBAS'
snowflake_password = 'Abba$123'
snowflake_account = 'jrnvcvi-sw72415'
snowflake_database = 'YOUTUBE_LLM'
snowflake_schema = 'PUBLIC'
#snowflake_warehouse = 'your_warehouse'

# Create a Snowflake connection
conn = snowflake.connector.connect(
    user=snowflake_user,
    password=snowflake_password,
    account=snowflake_account,
    #warehouse=snowflake_warehouse,
    database=snowflake_database,
    schema=snowflake_schema
)

# Create a cursor object
cur = conn.cursor()

write_pandas(conn, captions_df, 'CAPTIONS', quote_identifiers= False)
# try:
#     cur.execute("SELECT * FROM dbo.tablea")
#     one_row =cur.fetchone()
#     print(one_row)
# finally:
#     cur.close()
# conn.close

#Create a Pandas DataFrame (replace this with your own DataFrame)
# data = {
#     'column1': [1, 2, 3],
#     'column2': ['A', 'B', 'C']
# }
# df = pd.DataFrame(data)

# Define the table name
table_name = 'CHANNELS'

# Create an internal stage (temporary storage for data)
# stage_name = 'STAGING'
# cur.execute(f'CREATE OR REPLACE STAGE {stage_name}')




# Upload the DataFrame to Snowflake stage
# csv_filename = 'video_data1.csv'
# video_df1.to_csv(csv_filename, index=False)
#cur.execute(r"PUT file:///C:\Users\furni\youtube-api-analysis\video_data1.csv @STAGING_TABLES AUTO_COMPRESS=TRUE")

# Copy data from the stage into a Snowflake table

    

# # Copy data from the stage into the Snowflake table
# csv_filepath = r'C:\\Users\\furni\\youtube-api-analysis\\video_data1.csv'
# copy_query = f'''COPY INTO VIDEOS FROM 'file://{csv_filepath}' FILE_FORMAT = (TYPE = CSV SKIP_HEADER = 1)'''
# cur.execute(copy_query)

# Commit the changes
conn.commit()

# Close the cursor and connection
cur.close()
conn.close()

#print(f'DataFrame has been successfully uploaded as table: {table_name}')


In [None]:
# cur.execute("CREATE OR REPLACE TABLE YOUTUBE_LLM.PUBLIC.VIDEOS ( \
#     channel_id STRING, \
#     video_id STRING,\
#     channelTitle STRING,\
#     title STRING,\
#     description STRING,\
#     publishedAt TIMESTAMP_NTZ,\
#     viewCount FLOAT,\
#     likeCount FLOAT,\
#     favouriteCount FLOAT,\
#     commentCount FLOAT,\
#     duration STRING, \
#     definition STRING,\
#     caption BOOLEAN,\
#     pushblishDayName STRING,\
#     durationSecs FLOAT,\
#     tagsCount INTEGER,\
#     likeRatio FLOAT,\
#     commentRatio FLOAT,\
#     titleLength INTEGER)" ) 

In [175]:
import os
current_directory = os.getcwd() 
print(current_directory)

c:\Users\furni\youtube-api-analysis
