<img src="youtube_parser.png"></img>

<div align="center">
    <h1>YouTube Video Info Parser</h1>
</div>
<br/>
<div align="center"><i>
    Simple YouTube-video information parser allows to parse: title, decription, comments, likes, etc.
    <br/>The task was performed in 1.5-2 hours.
    <br/>
    <br/>by Artem Drofa
    </i>
</div>

## Task

Parse video info from youtube.com https://www.youtube.com/watch?v=koPmuEyP3a0

Extract:
* title;
* subtitles;
* video description;
* views amount;
* likes amount;
* dislikes amount;
* all comments;
* comments likes amount.

## Solution

### Comments (to the Solution)

* Comments are returned as alist of lists, where each list is in the following format `[comment, likes_amount]`.
* Youtube Data API V3 returns comments by pages consisting of 100 comments, each fllowing page could be received only after the previous one was received. For demonstration purposes the number of processed pages was limited to 3.

If additionl time was provided following improvements would be done:
* requests generation would be rewritten in form outlined below.
* sbtitles parsing would be refactored (current method is able to parse only added subtitles, not generated ones).

<b>Possible method for requests generation</b>
``` python

main_url = 'https://www.googleapis.com/youtube/v3'

def remove_empty_kwargs(kw_dict):
    return {k: v for k, v in kw_dict.items() if v is not None}

def get(resource, **kwargs):
    kwargs['key'] = api_key
    response = requests.get(
        url=f'{main_url}/{resource}',
        params=remove_empty_kwargs(kwargs)
    )
    return response.json()
```

### Solution Code

In [1]:
import requests
import json
import html
import xml.etree.ElementTree as ET

`video_id` will contain ID of the selected video.

In [2]:
video_id = 'koPmuEyP3a0'

To access Youtube Data API V3 API-key should be generated, it could be done [here](https://console.cloud.google.com/apis/library), preliminary register an account in [Google Cloud](https://cloud.google.com/).

In [3]:
api_key = '...'

In [4]:
class YTVParser(object): # 'YTV' for 'YouTube Video'
    
    def __init__(self, api_key, video_id, comments_limit=3):
        self.api_key = api_key
        self.video_id = video_id
        self.comments_limit = comments_limit
        self.title = YTVParser.GetSnippet(self.video_id, self.api_key, 'title')
        self.subtitles = YTVParser.GetSubtitles(self.video_id)
        self.description = YTVParser.GetSnippet(self.video_id,self.api_key, 'description')
        self.viewCount = YTVParser.GetStatistics(self.video_id, self.api_key, 'viewCount')
        self.likeCount = YTVParser.GetStatistics(self.video_id, self.api_key, 'likeCount')
        self.dislikeCount = YTVParser.GetStatistics(self.video_id, self.api_key, 'dislikeCount')
        self.comments = YTVParser.GetComments(self.video_id, self.api_key, self.comments_limit)
        
        
    @staticmethod
    def GenerateRequest(video_id, api_key, part_parametr):
        """ Generates request to Youtube Data API V3 with stated parameters. """
        
        parameters = {
            'Video_ID': video_id,
            'API_Key': api_key,
            'Part_Parametr': part_parametr
        }
        https = 'https://'
        main = 'www.googleapis.com/youtube/v3/videos?'
        details = 'id={Video_ID}&key={API_Key}&part={Part_Parametr}'.format(
            **parameters
        )
        request = https + main + details
        return request
    
    
    @staticmethod
    def GetDataFromAPI(request):
        """ Returns a dict (json) downloaded from 'request' link. """
        
        response = requests.get(request)
        download = json.loads(response.text)
        return download
    
    
    @staticmethod
    def GetSnippet(video_id, api_key, snippetName):
        """ Returns the snippet (title / description) of the video. """
        
        part_parametr = 'snippet'
        request = YTVParser.GenerateRequest(video_id, api_key, part_parametr)
        download = YTVParser.GetDataFromAPI(request)
        snippet = download['items'][0]['snippet'][snippetName]
        return snippet
    
    
    @staticmethod
    def GetSubtitles(video_id):
        """ Returns string with all subtitles. """
        
        details = {'LANG' : 'en', 'videoId' : video_id}
        url = 'https://video.google.com/timedtext?lang={LANG}&v={videoId}'
        request = url.format(**details)
        
        data = requests.get(request)
        root = ET.fromstring(data.text)
        
        subtitles_list = [html.unescape(child.text) for child in root]
        subtitles = ' '.join(subtitles_list)
        return subtitles
    
    
    @staticmethod
    def GetStatistics(video_id, api_key, statisticName):
        """ Returns the statistic of the video. """
        
        part_parametr = 'statistics'
        request = YTVParser.GenerateRequest(video_id, api_key, part_parametr)
        download = YTVParser.GetDataFromAPI(request)
        statistic = download['items'][0][part_parametr][statisticName]
        return statistic
    
    
    @staticmethod
    def UpdateComments(comments, items):
        """ Updates list of comments with downloaded items. (See `GetComments`
        method for more details).
        """
        
        for item in items:
            snippet = item['snippet']['topLevelComment']['snippet']
            comment = html.unescape(snippet['textDisplay'])
            likes = snippet['likeCount']
            pair = [comment, likes]
            comments.append(pair)
        return comments
    
    
    @staticmethod
    def GetComments(video_id, api_key, comments_limit):
        """ Returns a list of lists where each list consists of
        [comment, #_of_likes].
        """
        
        # Request Link Generation: ===========================================
        parameters = {
            'Video_ID': video_id,
            'API_Key': api_key,
            'Part_Parametr': 'snippet,replies',
            'Max_Results' : 100
        }

        https = 'https://'
        main = 'www.googleapis.com/youtube/v3/commentThreads?'
        details_1 = 'videoId={Video_ID}&key={API_Key}'.format(
            **parameters
        )
        details_2 = '&part={Part_Parametr}&maxResults={Max_Results}'.format(
            **parameters
        )
        request = https + main + details_1 + details_2
        # ====================================================================

        data = YTVParser.GetDataFromAPI(request)
        items = data['items']

        comments = YTVParser.UpdateComments([], items)
        
        # Scrapping page by page to collect comments =========================
        # One page consists of 100 comments ==================================
        if comments_limit == None:
            comments_limit = 10 ** 100
        counter = 1
        while 'nextPageToken' in data.keys() and counter < comments_limit:
            nextPageToken = data['nextPageToken']
            details_3 = '&pageToken={}'.format(nextPageToken)
            request = https + main + details_1 + details_2 + details_3

            data = YTVParser.GetDataFromAPI(request)
            items = data['items']

            comments = YTVParser.UpdateComments(comments, items)
            counter += 1
        # ====================================================================
        return comments

### Demonstration

In [5]:
video = YTVParser(api_key, video_id)

In [6]:

print('Title:')
print(video.title, '\n')

print('-' * 100)
print('Subtitles:')
print(video.subtitles, '\n')

print('-' * 100)
print('Video Description:')
print(video.description, '\n')

print('-' * 100)
print('Views Amount:')
print(video.viewCount, '\n')

print('-' * 100)
print('Likes Amount:')
print(video.likeCount, '\n')

print('-' * 100)
print('Dislikes Amount:')
print(video.dislikeCount, '\n')

print('-' * 100)
print('Comments (1st 3) and likes amount')
for comment in video.comments[0:3]:
    print(comment[0], '| Likes:', comment[1])
print()    

print('-' * 100)
print('Total collected comments:')
print(len(video.comments))
print('Comments\' pages limit:')
print(video.comments_limit)

Title:
We Believe: The Best Men Can Be | Gillette (Short Film) 

----------------------------------------------------------------------------------------------------
Subtitles:
[OVERLAPPING NEWS AUDIO] Is this the best a man can get? [MUSIC] Is it? We can't hide from it. It's been going on far too long. We can't laugh it off. What I actually think she's trying to
say- Making the same old excuses. Boys will be boys. [TOGETHER]
Boys will be boys. But something finally changed. Allegations regarding sexual assault and sexual harassment- [OVERLAPPING NEWS AUDIO] And there will be no going back. Because we, we believe in the best in men. Men need to hold other men accountable. Smile, sweetie! Come on! To say the right thing. To act the right way. Bro, not cool. Not cool. Some already are. In ways big. and small. Say, "I am strong." I am strong! But some is not enough. That's not how we treat each other, okay? You okay? Because the boys watching today will be the men of tomorrow. 

---------