Among the available players in the Male 2022 player dataset, identify which player is
most popular in YouTube video comments. (Use a 5-minute YouTube feed to pull videos).
o Collect recent YouTube videos related to the topic by using the search keywords
[fifa world cup, soccer, football, fifa] and analyze their comments for relevance.
o Use a publisher/subscriber model to address this requirement: the publisher
should publish all comments, while the subscriber processes and analyzes them.
o Provide a dump of your YouTube feed comments to justify your answer.
(Depending on the time you are pulling the data, you may get no answer).
o Provide a screenshot of the output you are getting in the terminal.

In [2]:
import socket
from confluent_kafka import Producer
import time
from datetime import datetime, timedelta
import json
from googleapiclient.discovery import build

# API Reference website
# https://developers.google.com/youtube/v3/docs/

BROKER = 'localhost:9092'
TOPIC = 'youtube_topic'

# api key referenced above



youtube = build('youtube', 'v3', developerKey=YOUTUBE_API_KEY)
keywords = ['champions league', 'premier league', 'series a', 'la liga', 'bundesliga', 'ligue 1', 'mls', 'soccer', 'world cup']

def search_videos(keyword):
    request = youtube.search().list(
        part='snippet',
        q=keyword,
        type='video',
        order='date',
        maxResults=5
    )
    response = request.execute()
    videos = []
    for item in response['items']:
        videos.append(item['id']['videoId'])
    return videos

def get_comments(video_id):
    # set maxResults to 50 to slow down api usage
    request = youtube.commentThreads().list(
        part='snippet',
        videoId=video_id,
        textFormat='plainText',
        order='relevance',
        maxResults=50
    )
    response = request.execute()
    comments = []
    for item in response['items']:
        comment = item['snippet']['topLevelComment']['snippet']
        comments.append({
            'author': comment['authorDisplayName'],
            'text': comment['textDisplay'],
            'like_count': comment['likeCount'],
            'published_at': comment['publishedAt'],
            'video_id': video_id
        })
    return comments

def create_kafka_producer(broker):
    conf = {
        'bootstrap.servers': broker,
        'client.id': socket.gethostname()
    }
    return Producer(conf)

video_ids = []
for keyword in keywords:
    vids = search_videos(keyword)
    video_ids.extend(vids)

# ensure no duplicate video ids
video_ids = list(set(video_ids))
producer = create_kafka_producer(BROKER)
seen_comments = set()

# 5 minute timer
start_time = datetime.now()
end_time = start_time + timedelta(minutes=5)
while datetime.now() < end_time:
    for video_id in video_ids:
        if datetime.now() >= end_time:
            break
        # error handling when comments are disabled for the video
        try:
            comments = get_comments(video_id)
        except:
            continue
        if not comments:
            continue
        for comment in comments:
            comment_id = comment['published_at'] + comment['author'] + comment['video_id']
            # ensure no duplicate comments
            if comment_id not in seen_comments:
                seen_comments.add(comment_id)
                comment_data = {
                    'author': comment['author'],
                    'text': comment['text'],
                    'like_count': comment['like_count'],
                    'published_at': comment['published_at'],
                    'video_id': comment['video_id']
                }
                key_str = f"{comment['author']}: {comment['text']}"
                value_str = json.dumps(comment_data)
                producer.produce(TOPIC, key=key_str.encode('utf-8'), value=value_str.encode('utf-8'))
                print(f"Sent: {comment['author']} - {comment['text'][:50]}")
        producer.flush()
    if datetime.now() < end_time:
        time.sleep(30)

producer.flush()


Sent: @andrewgleeson8862 - Not as if the england team have hard teams to beat
Sent: @johnjonathan1257 - IS ROY KEANE WEARING HIS SMOKING JACKET TONIGHT ,W
Sent: @josemanuelmartineztortosa7848 - Se gano la se√±ora colegiada, un sobresueldo  en un
Sent: @URKG - As a Portugal and CR7 fan, this one hurts.

Watch 
Sent: @JB2tekky - This was Cristiano's last world cup qualifying gam
Sent: @captainfalconmain6576 - Penaldo not even top 5 players of all time
Sent: @LuziaAlves-o5t - Vers√£o l√≠der üíöüíöüíöüíöüíö
Sent: @Abyss-m7x - If Rapha is back,these matchs will happen usuallyüî¥
Sent: @youtubeeee443 - Who loves barca?                              üëá
Sent: @siddheshkalyankar9468 - Barca üíô‚ù§Ô∏èüíô‚ù§Ô∏è
Sent: @esmukhtert2005 - VISCA BAR√áA KEEP GOING BEST AND EVERYTHING ALWAYS!
Sent: @ZeusFirulais-u1b - Visca Bar√ßa chicos!!!üíô‚ù§üíô‚ù§üíô‚ù§‚ù§üíô‚ù§üíô
Sent: @Loko_Memetv - Vis√ßa Bar√ßa ‚ù§üíô
Sent: @ÿßÿ®Ÿàÿ™ÿ±ŸÉŸä-ÿ¥9ŸÅ - Visca Barca‚ù§‚ù§
Sent: @claudialozano5580 - 0:0

0