<a href="https://colab.research.google.com/github/SubhraSMukherjee/Scraping_YT_Comments_for_Sentiment_Analysis/blob/main/Scraping.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Fetching Youtube Comments to anlayze Sentiment via Open AI Apis

>**Objective:**
To use Open AI Api to analyze Sentiments of Youtube Videos

>**Use Case:**  Gaming Advertisers often need to gauge User Sentiment of a new IP Collab or a new feature introduction. This project was made initially to scrape all the Youtube comments from Videos introducing such features and try to understand how the User Base reacts to the new features/IPs/Storylines etc.

>**Generalization:**  This could be used by any advertiser/organisation in general to gauge user sentiment on any particular issue in General given the right videos to analyse

>### Install and Import Dependencies

In [3]:
!pip install google-api-python-client
!pip install reportlab

Collecting reportlab
  Downloading reportlab-4.4.5-py3-none-any.whl.metadata (1.7 kB)
Downloading reportlab-4.4.5-py3-none-any.whl (2.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m22.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: reportlab
Successfully installed reportlab-4.4.5


In [4]:
import random
from datetime import date
from googleapiclient.discovery import build

> ### Fetch Youtube Comment Threads and top 5 Replies per Thread

>  ### Requirements to get comments
> 1. Video needs to be Public like the [Video Used here](https://www.youtube.com/watch?v=TReelsVxWxg)
> 2. GCP API Key for youtube to know which account is trying to access information
> 3. Video Id e.g., ...youtube.com/watch?v=**TReelsVxWxg**

In [5]:
API_KEY = "xxxx...xxxx" #keys in ColabNotebooks
VIDEO_ID = "TReelsVxWxg"

youtube = build("youtube", "v3", developerKey=API_KEY)

# Fetch comments
def get_comments_with_replies(video_id):
    all_comments = []

    # request top-level comments and replies together
    request = youtube.commentThreads().list(
        part="snippet,replies",
        videoId=video_id,
        maxResults=100,
        textFormat="plainText"
    )

    while request:
        response = request.execute()

        for item in response.get("items", []):
            # ----- Top comments -----
            top = item["snippet"]["topLevelComment"]["snippet"]
            top_author = top["authorDisplayName"]
            top_text = top["textDisplay"]

            all_comments.append({
                "type": "Top comment",
                "author": top_author,
                "comment": top_text
            })

            # Reply comments (YouTube API only returns 5 replies per thread by default)
            if "replies" in item:
                for reply in item["replies"]["comments"]:
                    rep = reply["snippet"]
                    rep_author = rep["authorDisplayName"]
                    rep_text = rep["textDisplay"]

                    all_comments.append({
                        "type": "Reply comment",
                        "author": rep_author,
                        "comment": rep_text
                    })


        request = youtube.commentThreads().list_next(request, response)

    return all_comments

# Fetch Title (for Final Report)
def get_video_title(video_id):
    request = youtube.videos().list(
        part="snippet",
        id=video_id
    )
    response = request.execute()

    if not response["items"]:
        return None

    title = response["items"][0]["snippet"]["title"]
    return title


In [7]:
comments = get_comments_with_replies(VIDEO_ID)

print(f"Video: {get_video_title(VIDEO_ID)} \n")

for item in random.sample(comments,5):
    print(f"[{item['type']}] {item['author']}: {item['comment']}\n")

print("Total comments + replies:", len(comments))

Video: Machine Intelligence - Lecture 17 (Fuzzy Logic, Fuzzy Inference) 

[Top comment] @rudolfibekwe5702: This lecture is so educative. i really appreciate this. Please i need more video on fuzzy logic. I am currently working on a project and i require fuzzy logic to carry out the project.

[Top comment] @haseebchauhan8327: thanks professor....lots of love from pakistan

[Top comment] @shutokugun: Best lecture of fuzzy logic. Truly mind blowing.

[Top comment] @simonb.979: Perfect lecturer. Thank you!

[Top comment] @vincentalmero8660: Thank you for the awesome lecture!

Total comments + replies: 56


>  ### Requirements to use Open AI Apis to analyze sentiments
> 1. Open AI API Key like "sk-xxxxx"
> 2. API access is not free. If your [Usage](https://platform.openai.com/account/usage) is $0.00 (it usually is unless you agree to pay) then you cannot use the API without putting in your Card Details

In [8]:
from openai import OpenAI
client = OpenAI(api_key="sk-xxxxxx")

def sentiment_report(comments):
    prompt = f"""
    Create a sentiment analysis report for the following list of comments.


    Then give:
    - Overall sentiment summary
    - Percentage breakdown
    - Key concepts/ideas

    Comments:
    {comments}
    """
    try:
      response = client.chat.completions.create(
          model="gpt-4.1",
          messages=[{"role": "user", "content": prompt}]
      )

      return response.choices[0].message.content, prompt
    except Exception as e:
      print(type(e).__name__, "-", e)
      return -1 , prompt


comments = [item['comment'] for item in comments]


report, prompt = sentiment_report(random.sample(comments,10))
if report == -1:
    pass
else:
    print(report)

RateLimitError - Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}


> ### Since I cannot get the comments analyzed programatically without paying, I ask ChatGPT to create the mock API-style JSON response containing the sentiment report that it would have otherwise sent me for a Sample of 10 Comments

In [9]:
response_mock = {
  "status": "success",
  "data": {
    "summary": {
      "overall_sentiment": "strongly_positive",
      "description": "Most comments express gratitude, praise the lecture, and appreciate the clarity of the explanation. A few comments are neutral technical clarifications, and none are negative."
    },
    "sentiment_breakdown": {
      "positive": {
        "count": 8,
        "percentage": 80
      },
      "neutral": {
        "count": 2,
        "percentage": 20
      },
      "negative": {
        "count": 0,
        "percentage": 0
      }
    },
    "key_concepts": [
      {
        "theme": "Teaching Quality",
        "details": [
          "Best lecture",
          "mind blowing",
          "brilliant explanation",
          "very good lecture"
        ]
      },
      {
        "theme": "Gratitude",
        "details": [
          "Thank you very much",
          "Thank you so much",
          "Thanks for sharing"
        ]
      },
      {
        "theme": "Educational Value",
        "details": [
          "Great intro to fuzzy logic",
          "Helped users understand fuzzy logic better"
        ]
      },
      {
        "theme": "Technical Clarification",
        "details": [
          "Notes on inverted pendulum direction",
          "Shared related YouTube link"
        ]
      }
    ],
    "metadata": {
      "total_comments": 10,
      "report_generated": "mock-api-v1"
    }
  }
}


In [14]:
key_themes = [themes["theme"] for themes in response_mock["data"]["key_concepts"]]

report_string = f"""
<b>Sentiment Report on {date.today()} for Video:</b>
<font color="blue">{get_video_title(VIDEO_ID)}</font>
<br/><br/>

<b>Summary:</b>
<ul>
  <li><b>Overall sentiment:</b> {response_mock["data"]["summary"]["overall_sentiment"]}</li>
  <li><b>Description:</b> {response_mock["data"]["summary"]["description"]}</li>
  <li><b>Percentage breakdown:</b> Positive: {response_mock["data"]["sentiment_breakdown"]["positive"]["percentage"]} %,
      Neutral: {response_mock["data"]["sentiment_breakdown"]["neutral"]["percentage"]} %,
      Negative: {response_mock["data"]["sentiment_breakdown"]["negative"]["percentage"]} %</li>
  <li><b>Key concepts/ideas:</b> {", ".join(key_themes)}</li>
</ul>

<b>Total comments Analyzed:</b> {response_mock["data"]["metadata"]["total_comments"]}
"""

if response_mock["status"] == "success":
   print(report_string)
else:
   print("Error")


<b>Sentiment Report on 2025-11-20 for Video:</b>
<font color="blue">Machine Intelligence - Lecture 17 (Fuzzy Logic, Fuzzy Inference)</font>
<br/><br/>

<b>Summary:</b>
<ul>
  <li><b>Overall sentiment:</b> strongly_positive</li>
  <li><b>Description:</b> Most comments express gratitude, praise the lecture, and appreciate the clarity of the explanation. A few comments are neutral technical clarifications, and none are negative.</li>
  <li><b>Percentage breakdown:</b> Positive: 80 %, 
      Neutral: 20 %, 
      Negative: 0 %</li>
  <li><b>Key concepts/ideas:</b> Teaching Quality, Gratitude, Educational Value, Technical Clarification</li>
</ul>

<b>Total comments Analyzed:</b> 10



>  ### Create a (basic) Sentiment Report PDF from the mock response

In [15]:
from reportlab.platypus import SimpleDocTemplate, Paragraph
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.enums import TA_LEFT

report_name = f"Report_{date.today()}.pdf"


def pdf_raw(text, filename="Report.pdf"):
    doc = SimpleDocTemplate(filename)

    styles = getSampleStyleSheet()
    style = styles["Normal"]
    style.alignment = TA_LEFT

    html = text.replace("\n", "<br/>")
    story = [Paragraph(html, style)]
    #story = [Preformatted(text, style, maxLineLength=70)]

    doc.build(story)

pdf_raw(report_string, report_name)