# YouTube Comments Scraper for Darija Analysis

## 1. Setup and Configuration
In this section, we:
- Import required libraries (pandas, googleapiclient, dotenv)
- Load environment variables for API authentication


In [1]:
import os
from dotenv import load_dotenv
from googleapiclient.discovery import  build
import pandas as pd
load_dotenv()


True

## 2. Data Collection Functions
Here we define:
- Function to fetch top-level comments from YouTube videos


In [2]:
def get_top_comments(video_link):
    API_KEY = os.environ['YTKEY']
    os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"
    VIDEO_ID = video_link.split("v=")[1].split("&")[0]

    youtube = build("youtube", "v3", developerKey=API_KEY)

    request = youtube.commentThreads().list(
        part="snippet",
        videoId=VIDEO_ID,
        maxResults=100,
    )
    response = request.execute()

    comments = [
        item["snippet"]["topLevelComment"]["snippet"]["textDisplay"]
        for item in response["items"][1:]
    ]
    print(len(comments), "newest comments fetched")
        
    return pd.DataFrame(comments, columns=["text"])



## 3. Video Selection and Data Gathering
In this section, we:
- Define target YouTube video URLs
- Iterate through videos to collect comments


In [3]:
URLS = ["https://www.youtube.com/watch?v=FLKGQf2-118" , "https://www.youtube.com/watch?v=niqcXL155nQ"
        ,"https://www.youtube.com/watch?v=wFGye7urHrc" , "https://www.youtube.com/watch?v=I70BKjNYlV8"
        ,"https://www.youtube.com/watch?v=ZIg8dFt6Nwo" ,"https://www.youtube.com/watch?v=t19PPzs6YLs" ,
        "https://www.youtube.com/watch?v=B2N_1mgiP_M" , "https://www.youtube.com/watch?v=zvyapr04A1Y&t=525s"
       ]

In [4]:
df = pd.DataFrame([] , columns=["text"])
for  i in URLS:
          df = pd.concat([df, get_top_comments(i)], ignore_index=True)

99 newest comments fetched
99 newest comments fetched
99 newest comments fetched
99 newest comments fetched
99 newest comments fetched
99 newest comments fetched
99 newest comments fetched
99 newest comments fetched


In [5]:
df.shape

(792, 1)

## 4. Data Export and Storage
Here we:
- Save the comments to CSV format
- Prepare data for further preprocessing

In [6]:
df.to_csv("scaped_comments.csv", index=False)