# ESG Advertising Analysis: Data Collection

 Overview

This project analyzes 90 marketing campaigns (65 ESG, 25 non-ESG) from 63 publicly 
traded companies (2010-2024) to measure:
 Consumer sentiment via social media (YouTube, Twitter)
 Financial market reactions via event study analysis
Each advertising has its own dataset
We create each data set from youtube in csv and we process it.

Data Sources:
- YouTube API: 49 campaigns 
- Twitter API: 22 campaigns 
- Yahoo Finance: Stock price data


## YouTube Data Collection
Using Google API to scrape comments from ESG advertising campaigns.



In [1]:
from googleapiclient.discovery import build
import pandas as pd
import time

# YouTube API Key and Video ID
API_KEY = 'private' 
VIDEO_ID = 'Put the ID of youtube video you want to install'

# we create YouTube API client
youtube = build('youtube', 'v3', developerKey=API_KEY)

# from this we take all top-level comments
def get_top_level_comments(video_id):
    top_comments = []
    comment_ids = []

    request = youtube.commentThreads().list(
        part='snippet',
        videoId=video_id,
        maxResults=100,
        textFormat='plainText'
    )
    response = request.execute()

    while request:
        for item in response['items']:
            comment = item['snippet']['topLevelComment']['snippet']
            comment_id = item['id']

            top_comments.append({
                'comment_id': comment_id,
                'text': comment.get('textDisplay'),
                'author': comment.get('authorDisplayName'),
                'likes': comment.get('likeCount', 0),
                'published_at': comment.get('publishedAt'),
                'is_reply': False,
                'parent_id': None,
                'parent_author': None
            })
            comment_ids.append((comment_id, comment.get('authorDisplayName')))

        if 'nextPageToken' in response:
            request = youtube.commentThreads().list(
                part='snippet',
                videoId=video_id,
                pageToken=response['nextPageToken'],
                maxResults=100,
                textFormat='plainText'
            )
            response = request.execute()
        else:
            break

    return top_comments, comment_ids

    # thisis if we want the replies as well
def get_replies_for_comment(comment_id, parent_author):
    replies = []
    request = youtube.comments().list(
        part='snippet',
        parentId=comment_id,
        maxResults=100,
        textFormat='plainText'
    )
    response = request.execute()

    while request:
        for item in response['items']:
            snippet = item['snippet']
            replies.append({
                'comment_id': item['id'],
                'text': snippet.get('textDisplay'),
                'author': snippet.get('authorDisplayName'),
                'likes': snippet.get('likeCount', 0),
                'published_at': snippet.get('publishedAt'),
                'is_reply': True,
                'parent_id': comment_id,
                'parent_author': parent_author
            })

        if 'nextPageToken' in response:
            request = youtube.comments().list(
                part='snippet',
                parentId=comment_id,
                pageToken=response['nextPageToken'],
                maxResults=100,
                textFormat='plainText'
            )
            response = request.execute()
        else:
            break

    return replies

# run and we scrap top-level + replies
all_comments = []
top_level, top_ids = get_top_level_comments(VIDEO_ID)
all_comments.extend(top_level)

print(f" Top-level comments: {len(top_level)}")

# 
reply_total = 0
for comment_id, author in top_ids:
    replies = get_replies_for_comment(comment_id, author)
    all_comments.extend(replies)
    reply_total += len(replies)
    time.sleep(0.1)  

print(f" Replies were collected : {reply_total}")
print(f" Total comments : {len(all_comments)}")

# we save in csv
df = pd.DataFrame(all_comments)

# save
output_path = '/Users/ourname/Desktop/name.csv'  # Mac
# output_path = r'C:\Users\ourname\Desktop\name.csv'  # Windows

df.to_csv(output_path, index=False)

ModuleNotFoundError: No module named 'googleapiclient'


## 4. Dataset Overview

**Structure:**
- `comment_id`: Unique identifier
- `text`: Comment content
- `author`: Username
- `likes`: Engagement metric
- `published_at`: Timestamp
- `is_reply`: Boolean flag
- `parent_id`: For threading
- `parent_author`: Original commenter


- I repeated for all 90 campaigns
- Then I Merged with campaign metadata (Company, ESG theme, Date)
- and next is sentiment analysis

##  Twitter Data Collection

Fot twitter comments, Ampify was used to exctrat tweets from users and comments from advertising.

https://apify.com/