## Course1 : Foundation of information

**Assignment**: Data extraction and analysis from social media platform Youtube ( 30 Marks )

**Problem statement**

Videos are a fast growing medium where people communicate, share knowledge, showcase skills etc. YouTube is one of the biggest platforms which hosts videos. The YouTube platform hosts content from many different professions/arts/ cultures across the world.

People can express their opinion about the video in the form of likes, dislikes, comments which are features provided by the YouTube platform which provides the information on the sentiment about the video.

The assignment involves the steps on programmatic data extraction from YouTube on which analysis can be conducted to understand various attributes related to a video.

**Steps to be performed**

1. Connect to the Youtube API using a Python client ( 5 Marks )

# 
> 1.a Create a YouTube API key (3 marks)





> 1.b Install the Google API python client  (2 marks)



refer to the [supporting](https://developers.google.com/youtube/v3/getting-started) link on how to create YouTube API Key

Reference link : https://developers.google.com/youtube/v3/quickstart/python

In [1]:
# <code block>
# (1.a)Creating a YouTube API key

# Go to the Google Cloud Console website: https://console.cloud.google.com/
# Create a new project if you don't have one yet.
# Select your project from the dropdown menu.
# Open the navigation menu (☰) and click on "APIs & Services" > "Library."
# Search for "YouTube Data API v3" and enable it.
# Go to "APIs & Services" > "Credentials."
# Create an API key.

api_key='AIzaSyA92A18UAjEt4O4g7bAdgMPzZDJmQPENc8'

#(1.b)installing the Google API python client
! pip install google-api-python-client




2. Search and extract the data



> 2.a Search videos related to the query string  “avatar movie”
(For this part, choose/search one video of your choice and perform data collection steps on that specific video ) (3 marks)

> Output expected : ID, Snippet with following attributes Channel ID, Video Description, Channel Title, Video Title






Reference link:  https://developers.google.com/youtube/v3/docs/search/list

In [2]:
# < code block >
#(2.a)data collection of specific video

from googleapiclient.discovery import build
youtube = build('youtube', 'v3', developerKey=api_key)
response = youtube.search().list(
    q="avatar movie",
    type='video',
    part="id,snippet",
    maxResults=10,
    fields="items(id(videoId),snippet(channelId,description,channelTitle,title))"
).execute()

from pprint import pprint
items = response.get('items',[])

#choosing one video of my choice to print the data collection.
pprint(items[2])


{'id': {'videoId': 'V_KfanT2fkU'},
 'snippet': {'channelId': 'UC9qOrdYP2YR4UlMKpBol2Lg',
             'channelTitle': 'CiNiMa LoKaM',
             'description': 'Avatar: The Way of Water is a 2022 American epic '
                            'science fiction film directed and produced by '
                            'James Cameron. He co-wrote the ...',
             'title': 'Avatar 2 Way of Water Malayalam Movie Explain | Part -1 '
                      'Cinima Lokam..'}}



> 2.b  Provide the following statistics for query string “avatar movie” of top 50 videos sorted by relevance in the US region ( 7 marks )

> Output expected: video ID, title, no of views, no of likes,no of comments exported to CSV file






Reference link: https://developers.google.com/youtube/v3/docs/videos/list

In [3]:
# < code block >
#(2.b)statistics for query string “avatar movie” of top 50 videos sorted by relevance in the US region

response = youtube.search().list(
    q="avatar movie",
    type='video',
    part="id,snippet",
    # top 50 videos sorted by relevance in the US region for the 2.b
    regionCode='US',
    order='relevance',
    maxResults=50,
    fields="items(id(videoId),snippet(channelId,description,channelTitle,title))"
).execute()
items = response.get('items',[])

video_statistics = []
for item in items:    
    video_id = item['id']['videoId']  
    video_response = youtube.videos().list(
        part='statistics',
        id=video_id,
        fields='items(statistics)'
    ).execute()
    
    video_statistics_response = video_response.get('items', [{}])[0].get('statistics', {})
   
    video_statistics.append({
        'video ID': video_id,
        'title': item['snippet']['title'],
        'no of views': video_statistics_response.get('viewCount', 0),
        'no of likes': video_statistics_response.get('likeCount', 0),
        'no of comments': video_statistics_response.get('commentCount', 0)})
    
#arranging the output in dataframe and exporting to csv_file
import pandas as pd
df = pd.DataFrame(video_statistics)

csv_filename = 'avatar_movie_statistics.csv'
df.to_csv(csv_filename, index=False)

print(f"Video_statistics_are_exported to CSV file: '{csv_filename}'")


Video_statistics_are_exported to CSV file: 'avatar_movie_statistics.csv'


 3. Analyze the exported data obtained in 2.b and carry out the following tasks (15 marks )



> 3.a Sort the data 2.b  by top 10 comments in descending order and consider the video IDs and Titles of top 10 videos which have highest comments. (3mark)



In [4]:
# < code block >
# (3.a)Sorting the data by top 10 comments in descending order 

sorting_data_by_comments = sorted(video_statistics, key=lambda x: int(x['no of comments']), reverse=True)
top_10_comments = sorting_data_by_comments[:10]
pprint(top_10_comments)

#for arranging the top 10 comments data in dataframe
top_10_comments_df = pd.DataFrame(top_10_comments)
print(top_10_comments_df)


[{'no of comments': '43345',
  'no of likes': '1043511',
  'no of views': '57506355',
  'title': 'Avatar: The Way of Water | Official Trailer',
  'video ID': 'd9MyW72ELq0'},
 {'no of comments': '29183',
  'no of likes': '683202',
  'no of views': '27572413',
  'title': 'Avatar: The Way of Water | Official Teaser Trailer',
  'video ID': 'a8Gx8wiNbs8'},
 {'no of comments': '8931',
  'no of likes': '80906',
  'no of views': '12679120',
  'title': 'Avatar | Official Trailer (HD) | 20th Century FOX',
  'video ID': '5PSNL1qE6VY'},
 {'no of comments': '3834',
  'no of likes': '335825',
  'no of views': '7349705',
  'title': 'Avatar 3 Will Introduce The Dark Side🔥 Of Na&#39;vi…!?😮',
  'video ID': 'f5Zx8iPek5I'},
 {'no of comments': '3666',
  'no of likes': '190384',
  'no of views': '31895142',
  'title': 'AVATAR Clip - Final Battle (2009) James Cameron',
  'video ID': '0sJeBiUCIt4'},
 {'no of comments': '3610',
  'no of likes': '49668',
  'no of views': '2913197',
  'title': 'AVATAR: THE LAST


> 3.b Use a suitable method to retrieve comments of those top 10 videos from 3.a. For doing this, write a program to loop through each video id from 3.a and pass in the part parameter set to "snippet", to retrieve basic details about the comments. Execute this request and print the response using the pprint() method.
 - Note: pprint() will print out the response from the API in a more human-readable format.
- Reference link:  [link](https://developers.google.com/youtube/v3/docs )


> **Output expected** : Use the python library “ pprint “ to print the output of the program with the following properties  etag, items, id , kind, snippet and snippet to have the text display field which represents the comment of videos.






In [5]:
# < code block >
# (3.b)retrieving the basic details about the top 10 comments

top_10_video_ids = []
for item in top_10_comments:
    video_id = item['video ID']
    top_10_video_ids.append(video_id)

comment_responses = []
for video_id in top_10_video_ids:
    # Request to get comments for a video
    comments_response = youtube.commentThreads().list(
        part='snippet',
        videoId=video_id
    ).execute()
    comment_responses.append(comments_response)
    
# printing the response from the API in a more human-readable format by using pprint()
    pprint(comment_responses)


[{'etag': 'O1c00Z0vm5AtGh_ED9gG-lOsd4Y',
  'items': [{'etag': 'd9JfHCTJWvVLlT5DVaCr5N_-q60',
             'id': 'UgyVNQYck7i_Lfi6C_J4AaABAg',
             'kind': 'youtube#commentThread',
             'snippet': {'canReply': True,
                         'isPublic': True,
                         'topLevelComment': {'etag': 'Dddn8p-If6e6sW_HF7hAlYKqSMY',
                                             'id': 'UgyVNQYck7i_Lfi6C_J4AaABAg',
                                             'kind': 'youtube#comment',
                                             'snippet': {'authorChannelId': {'value': 'UCP65cUoJG95I2t0E6NGGDzw'},
                                                         'authorChannelUrl': 'http://www.youtube.com/channel/UCP65cUoJG95I2t0E6NGGDzw',
                                                         'authorDisplayName': 'Avinash '
                                                                              'Malla',
                                                         'auth



> 3.c Write a program to export the output of question 3.b in JSON file format and submit the file as part of the assignment (3 marks)



In [6]:
# < code block >
# (3.c)exporting the output of top 10 comments thread in JSON file format

import json
json_filename = 'top_10_comments_responses.json'
with open(json_filename, 'w') as json_file:
    json.dump(comment_responses, json_file, indent=4)

print(f"top 10 Comment responses exported in JSON file format: '{json_filename}'")

top 10 Comment responses exported in JSON file format: 'top_10_comments_responses.json'


>3.d Write a function to get  the likes vs views ratio of the top 10 videos obtained in 3.a with the highest comments (3 marks)




In [7]:
# < code block >
# (3.d) Defination function to get the likes vs views ratio of the top 10 videos

def calculate_likes_vs_views_ratio(video_statistics):
    ratio_data = []
    
    for video in video_statistics:
        views = int(video['no of views'])
        likes = int(video['no of likes'])
        
        if views > 0:
            ratio = likes / views
        else:
            ratio = 0.0
        
        ratio_data.append({
            'Video ID': video['video ID'],
            'Likes-vs-Views Ratio': ratio
        })
    
    return ratio_data

top_10_comments = sorted(video_statistics, key=lambda x: int(x['no of comments']), reverse=True)
top_10_comments = top_10_comments[:10]
likes_views_ratio_data = calculate_likes_vs_views_ratio(top_10_comments)
pprint(likes_views_ratio_data)

[{'Likes-vs-Views Ratio': 0.01814601186251502, 'Video ID': 'd9MyW72ELq0'},
 {'Likes-vs-Views Ratio': 0.024778462443602597, 'Video ID': 'a8Gx8wiNbs8'},
 {'Likes-vs-Views Ratio': 0.006381042217440958, 'Video ID': '5PSNL1qE6VY'},
 {'Likes-vs-Views Ratio': 0.045692310099521, 'Video ID': 'f5Zx8iPek5I'},
 {'Likes-vs-Views Ratio': 0.005969059488745966, 'Video ID': '0sJeBiUCIt4'},
 {'Likes-vs-Views Ratio': 0.017049310431117428, 'Video ID': 'QOg9LUIvaig'},
 {'Likes-vs-Views Ratio': 0.06708491824334338, 'Video ID': 'RGx8rYbRVR4'},
 {'Likes-vs-Views Ratio': 0.04260715763552406, 'Video ID': 'kHNCaWjB-98'},
 {'Likes-vs-Views Ratio': 0.007946329915128118, 'Video ID': 'noVddIRzn3o'},
 {'Likes-vs-Views Ratio': 0.055143610930597005, 'Video ID': 'rM8ogjwity8'}]
