# Coleta de Dados com a API do Youtube

<img src="200.gif" width="550" align="center">

Fonte: https://giphy.com/explore/youtube

### Motivação: 

1. Coleta de dados e metadados de vídeos da plataforma Youtube
2. Analisar por meio de KPI's os dados de vídeos do Youtube

<font color ='orange'> Criando chave de acesso a API do Youtube </font>

Link: https://developers.google.com/youtube/v3/docs?hl=pt-br

Link tutorial coleta de dados pela API: https://pypi.org/project/youtube-data-api/

In [4]:
# pacotes
import pandas as pd
import numpy as np
import os 
import time
import re
from datetime import datetime
from youtube_api import YouTubeDataAPI

In [None]:
api_key = "chave de acesso"
yt = YouTubeDataAPI(api_key)

In [None]:
def get_videos(query, start, end):
    '''
    Function to collect videos.

    Arguments:
        query{str} -- query about the terms that you want
        start{data} -- initial data
        end{data} -- end data

    Return:
        [str] -- videos

    '''
    searches = yt.search(q=query,
                         max_results=50, 
                         published_after=start, 
                         published_before=end,
                         search_type='video'
                        )
    
    print(searches)
    videos = [result['video_id'] for result in searches]
    return videos

def export(data, filename):
    '''
    Function to export data.

    Arguments:
        data{str} -- dataset
        filename{filename} -- directory that you insert new dataset

    Return:

    '''    
    file = open(filename, 'w')
    file.close()

    timestamp = time.mktime((1980, 1, 1, 0, 0, 0, 0, 0, 0))
    os.utime(filename, (timestamp, timestamp))
    data.to_excel(filename, engine='xlsxwriter', index=False)
    
def get_id(link):
    '''
    Function to extract the id of a publication.

    Arguments:
        link{str} -- link of the publication

    Return:
        [str] -- id of the publication

    '''    
    if 'twitter' in link:
        pub_id = re.search("\/(?:.(?!\/))+$",link)
        pub_id = pub_id[0].replace('/','')
    else:
        pub_id = re.search("(?<=v=)[^\]](\w+)",link)
        pub_id = pub_id[0]
    return pub_id

In [None]:
# api collect
videos = get_videos([query], start=datetime(year, month, day, hour, minute), end=datetime(year, month, day, hour, minute))

In [None]:
# dataset
metadata_videos = pd.DataFrame(yt.get_video_metadata(videos))

In [None]:
# adjust dataset
metadata_videos['video_comment_count'] = metadata_videos['video_comment_count'].fillna(0)
metadata_videos['video_view_count'] = metadata_videos['video_comment_count'].fillna(0)
metadata_videos['video_like_count'] = metadata_videos['video_like_count'].fillna(0)
metadata_videos['video_dislike_count'] = metadata_videos['video_dislike_count'].fillna(0)

metadata_videos['video_comment_count'] = [int(x) for x in metadata_videos['video_comment_count']]
metadata_videos['video_view_count'] = [int(x) for x in metadata_videos['video_view_count']]
metadata_videos['video_like_count'] = [int(x) for x in metadata_videos['video_like_count']]
metadata_videos['video_dislike_count'] = [int(x) for x in metadata_videos['video_dislike_count']]

KPI's

1 - Comment Rate

$$\frac{\mbox{video comment count}}{\mbox{video view count}}$$

2 - Engage

$$\frac{3*\mbox{video comment count} + 2*\mbox{video view count} + \mbox{video like count} - \mbox{video dislike count}}{3+2+1+1}$$

3 - Positive Rate

$$\frac{\mbox{video like count}}{\mbox{video view count}}$$

4 - Negative Rate

$$\frac{\mbox{video dislike count}}{\mbox{video view count}}$$

In [None]:
# kpi's
metadata_videos['comment_rate'] = (metadata_videos['video_comment_count']/metadata_videos['video_view_count']).sort_values()
metadata_videos['engage'] = (3*metadata_videos['video_comment_count'] + 2*metadata_videos['video_view_count'] + metadata_videos['video_like_count'] - metadata_videos['video_dislike_count'])/(3+2+1+1)

metadata_videos['pos_rate'] = metadata_videos['video_like_count']/metadata_videos['video_view_count']
metadata_videos['neg_rate'] = metadata_videos['video_dislike_count']/metadata_videos['video_view_count']

In [None]:
# save dataset 
metadata_videos.to_excel('.../dataset.xlsx')