# YouTube LLM Project
https://medium.com/@rodolfo.antonio.sep/extracting-youtube-comments-with-python-a-detailed-guide-105363507a93

In [1]:
from googleapiclient.discovery import build
from numba import njit
import pandas as pd

API_KEY = 'AIzaSyCw7-2a_eI1tkh67lFaaFrjkcqs78Xd8Bg'

# Initialise la connexion avec l'API YouTube Data v3
youtube = build('youtube', 'v3', developerKey = API_KEY)

### Explication du code

* `youtube.commentThreads()`: Préparation d'une requête pour accéder à des threads de commentaires sur YouTube. Il contient "le commentaire principal (top-level comment)" ainsi que "les réponses (replies) au commentaire principales".

* `part`: Détermine quelles parties des données seront incluses dans la réponse. Cela est important pour minimiser la quantité de données transférées et pour ne recevoir que les informations nécessaires.
    * `snippet`: Inclut les métadonnées principales comme: Texte du commentaire principal, Auteur du commentaire, Date et heure de publication
    * `replies` : Inclut les informations sur les réponses associées au commentaire principal.
* `textFormat`: spécifie le format du texte des commentaires que vous souhaitez recevoir dans la réponse.
    * `plainText`: Retourne le texte des commentaires dans un format brut (sans balises HTML)
    * `html`: Retourne le texte des commentaires dans un format HTML.

In [2]:
def Scrapping(id_video):
    
    # url: https://www.youtube.com/watch?v=iVSUVmxDcls
    VIDEO_ID = id_video

    request = youtube.commentThreads().list(part = "snippet, replies", videoId = VIDEO_ID, textFormat = 'plainText')
    response = request.execute()

    lst_comment = []
    lst_username = []
    lst_date = []
    lst_reply = []

    while request:
        try:
            response = request.execute()
            
            for item in response['items']:
                # Vérification et extraction des commentaires
                comment_data = item.get('snippet', {}).get('topLevelComment', {}).get('snippet', {})
                comment = comment_data.get('textDisplay', 'N/A')
                lst_comment.append(comment)
                
                # Vérification et extraction du nom d'utilisateur
                user_name = comment_data.get('authorDisplayName', 'N/A')
                lst_username.append(user_name)
                
                # Vérification et extraction de la date
                date = comment_data.get('publishedAt', 'N/A')
                lst_date.append(date)
                
                # Comptage des réponses
                reply_count = item['snippet'].get('totalReplyCount', 0)
                lst_reply.append(reply_count)
            
            # Pagination
            request = youtube.commentThreads().list_next(request, response)
        
        except Exception as e:
            print(f"Erreur : {str(e)}")
            break


    df = pd.DataFrame({ "comment": lst_comment,
                        "replies": lst_reply,
                        "user_name": lst_username,
                        "date": lst_date})

    return df

In [3]:
from urllib.parse import urlparse, parse_qs

url = "https://www.youtube.com/watch?v=-S2swP2MEgc&t=2451s"
query = urlparse(url).query
video_id = parse_qs(query)['v'][0]

# url: https://www.youtube.com/watch?v=iVSUVmxDcls
final_data = Scrapping(id_video = video_id)
final_data.head()

Unnamed: 0,comment,replies,user_name,date
0,J ai mis l application et finalement la versio...,0,@celineguillabert4074,2025-04-15T22:28:13Z
1,Mounir le boss! Tellement clair et limpide qu'...,0,@korros51dnr22,2025-04-15T06:46:26Z
2,"Bonne vidéo! Eh bien, je suis tellement heureu...",13,@MaximeRichard-m9b,2025-04-14T15:06:36Z
3,"Bonjour \nDe quel abonnement, il parle ? 40 % ...",0,@L.ame.et.son.reflet,2025-04-14T13:02:16Z
4,"Donc business intéressant,\nPar contre flat ta...",0,@teddytc2671,2025-04-14T09:20:03Z


In [12]:
from dotenv import load_dotenv
from openai import OpenAI
import os

import warnings
warnings.filterwarnings("ignore")
from urllib3.exceptions import NotOpenSSLWarning
warnings.filterwarnings("ignore", category=NotOpenSSLWarning)

load_dotenv()
api_key = os.getenv("OPENAI_KEY")
if api_key is None:
    raise ValueError("OPENAI_API_KEY is not set")
else:
    print('API Key found.')
    openai = OpenAI(api_key = api_key)
    print('API Created.') 

API Key found.
API Created.


In [13]:
all_comments = list(final_data.comment)

messages = [{
    "role": "system",
    "content": (
        "You are an expert language model specialized in analyzing YouTube comments. "
        "Your task is to read all the comments and produce a concise and insightful summary that reflects the general audience reaction. "
        "You must identify the main sentiments (positive or negative), recurring praises or criticisms, frequent themes, and the overall tone of the discussion. "
        "Include any relevant observations, such as common questions, moments appreciated, or points of confusion. "
        "The summary should be written in a natural and professional tone, as if reporting the audience’s feedback to the video creator. "
        "Respond with a short paragraph only. Do not list comments individually or classify them. Focus on delivering an overview."
    )
}]


all_comments_text = "\n".join(all_comments[:100])  # Limite à 100 commentaires pour éviter les dépassements

# Ajouter le texte utilisateur à messages
messages.append({
    "role": "user",
    "content": all_comments_text
})

In [None]:
response = openai.chat.completions.create(
    model = "gpt-3.5-turbo", 
    messages = messages,
    temperature = 0.7,
    max_tokens = 300  
)

summary = response.choices[0].message.content
print(summary)

The audience's reactions to the video are mixed. While some viewers praise Mounir for his clear insights and motivational content, others express disappointment in the functionality and pricing of the Finary app. There is a division regarding the perceived value of the app's free version and the effectiveness of its subscription model. Some viewers appreciate the financial advice shared, while others question the practicality of the 50/30/20 rule and express concerns about data security and the app's true benefits. Additionally, there are comments reflecting on political figures, societal issues, and the challenges of financial planning for the average individual. Overall, the audience appreciates the educational content but raises valid concerns about the app's utility and pricing structure.
