<a href="https://colab.research.google.com/github/FerrazThales/recomendador_youtube/blob/main/Recomendador_de_V%C3%ADdeos_do_Youtube.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1 align="center"><b>Encontre eu e meus projetos nas redes sociais!</b></h1>
<table>
  <tr>
  <td><a href="https://thalesferraz.medium.com/">
  <img src="https://github.com/FerrazThales/FerrazThales/blob/main/logo_gif.gif?raw=true" width="800" title="Olá, Meu nome é Thales e sou cientista de Dados!"/>
  </a>
  </td>
  <td><a href="https://github.com/FerrazThales">
  <img hspace=30 vspace=110 src="https://image.flaticon.com/icons/png/512/1051/1051326.png" width="60%" title="Entre em meu Github e veja mais projetos!" /> 
  </a>
  </td>
  <td>
  <a href="">
  <img vspace=110 src="https://download.logo.wine/logo/Medium_(website)/Medium_(website)-Logo.wine.png" width="800" title="Veja este meu projeto no Medium!"/>
  </a>
  </td>
  <td><a href="https://www.linkedin.com/in/thalesdefreitasferraz/"><img vspace=150 src="https://image.flaticon.com/icons/png/512/889/889122.png" width="40%" title="Vamos trocar uma idéia sobre Data Science no LinkedIn?" />
  </a>
  </td>
  </tr>
</table>


Este projeto faz parte de um **desafio** proposto no [Curso de Data Science](https://curso.mariofilho.com/) do **Mario Filho**. 

O `Mario` possui o nobílissimo título de *Kaggle* **Grandmaster** e é, na minha opinião, o **melhor** Cientista de Dados do Brasil.

Para mais informações sobre ele, procure em:

* [LinkedIn](https://br.linkedin.com/in/mariofilho)
* [Youtube](https://www.youtube.com/c/MarioFilhoML) <- *tem vídeo bom demais da conta*!
* [Kaggle](https://www.kaggle.com/mariofilho)

# O que você vai aprender neste projeto:

* Como funciona os sistemas de **recomendação** do Youtube.
* Uma **API** para realizar a extração das principais **features** dos vídeos.
* O que é **Active Learning** e sua importância para o aprendizado **supervisionado**.
* Diferentes algoritmos de **classificação**.
* **Deploy** de um modelo de *Machine Learning* utilizando **Heroku**.

#Identificação do Problema

Com o **aumento** de dados e usuários, os [sistemas de recomendação](https://www.analyticsvidhya.com/blog/2021/07/recommendation-system-understanding-the-basic-concepts/) se tornaram cada vez mais **relevantes** no cenário competitivo da *internet*. Hoje existem **variados** sistemas que se baseiam em **diferentes** técnicas que tentam acertar a **preferência** dos usuários da maneira mais exata possível. Estes **algoritmos** podem, por exemplo, se respaldar nos seus últimos **filmes** vistos ou **compras** realizadas. Ou até mesmo, pode **supor** que você faz parte de uma **comunidade** (*matemática*) **imaginária** que ama **vídeos** aleatórios de animais marinhos.

Sabemos que errar é **humano**. Mas a experiência do dia a dia na rede nos aponta que os **algoritmos** também erram. Muitas **recomendações** são feitas porque várias pessoas caíram em [ClickBaits](https://rockcontent.com/br/blog/clickbait/) e o algoritmo **espera** (com altíssima estimativa de **probabilidade**) que você seja o **próximo**.

Em geral, os sistemas de **recomendação** de vídeos são baseados nas **atividades** dos usuários nos sites. O mais **popular** site de vídeos do mundo, o [Youtube](https://blog.youtube/inside-youtube/on-youtubes-recommendation-system/), possuí muito conteúdo **personalizado** dos usuários para implementar um bom algoritmo de **recomendação**. No entanto, essa quantidade de dados também pode apresentar grandes **desafios** aos engenheiros de *machine learning*  do **Youtube**. 

[A cada minuto o Youtube recebe mais de 24 horas de upload](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.434.9301&rep=rep1&type=pdf). Mesmo já conhecendo algumas **labels** em seus sistemas, como: *like*, playlists, assitir mais tarde,*etc*. Combinar todas as *features* necessárias e associar a **regras** que ajudem em recomendações **assertivas** desconhecendo o conteúdo **real** desses novos vídeos é uma tarefa **árdua**.

Por conta disso, desde **2008** o *Youtube* vem **aprimorando** o seu sistema de recomendação. A imagem abaixo ilustra os momentos **chave** da formulação deste sistema.

<center><img src="https://storage.googleapis.com/gweb-uniblog-publish-prod/images/BINK_YouTube_Recommendations_V5.max-1000x1000.png" width="50%"></center>

Após realizado os ajustes no algoritmo, o **Youtube** utiliza [testes estatísticos](https://resultadosdigitais.com.br/blog/o-que-e-teste-ab/) para avaliar a **perfomance** do sistema já em produção. Mas e se você pudesse de alguma forma **intervir** nisso? Ajudar a **refinar** um pouco mais este algoritmo e trazer uma **seleção** de vídeos mais **agradáveis** para você?

- objetivo - falar do web scraping, limpeza dos dados, falar do top N

[link](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.434.9301&rep=rep1&type=pdf)

Com estas ideias em mente, o objetivo deste projeto é recriar um algoritmo de recomendação do Youtube


In [1]:
#utilizando os pacotes necessários
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#Preparação dos Dados

falar de scraping e enunciar as principais bibliotecas

* BeatifoulSoup
* Scrapy
* usar o youtube_dl << ver se alguém já resolveu este problema para você. 

In [2]:
#instalando a biblioteca de extração de dados do youtube
!pip install youtube_dl -q

texto sobre o youtube_dl, explicar parâmetros

In [3]:
#importando a biblioteca do youtube_dl
import youtube_dl

# instanciando o youtube_dl, além do YT é possível escolher outros sites
ydl = youtube_dl.YoutubeDL(params={'ignoreerrors':True, 
                                   'no_warnings':True})

#palavras chave que serão buscadas no YT
palavras_chave = ['cortes+podcast','palmeiras','cortes']

In [4]:
#extraindo JSONS das novas pesquisas
resultados = []
for palavra_chave in palavras_chave:
    #explicar o download = False
    r = ydl.extract_info(f'ytsearchdate400:{palavra_chave}', download=False)
    for entry in r['entries']:
        #algumas entradas vêm vazias e não interessam para nossa análise
        if entry is not None:
            entry['key_word'] = palavra_chave
    #guardar tudo em uma lista
    resultados += r['entries']

[download] Downloading playlist: cortes+podcast
[youtube:search:date] query "cortes+podcast": Downloading page 1
[youtube:search:date] query "cortes+podcast": Downloading page 2
[youtube:search:date] query "cortes+podcast": Downloading page 3
[youtube:search:date] query "cortes+podcast": Downloading page 4
[youtube:search:date] query "cortes+podcast": Downloading page 5
[youtube:search:date] query "cortes+podcast": Downloading page 6
[youtube:search:date] query "cortes+podcast": Downloading page 7
[youtube:search:date] query "cortes+podcast": Downloading page 8
[youtube:search:date] query "cortes+podcast": Downloading page 9
[youtube:search:date] query "cortes+podcast": Downloading page 10
[youtube:search:date] query "cortes+podcast": Downloading page 11
[youtube:search:date] query "cortes+podcast": Downloading page 12
[youtube:search:date] query "cortes+podcast": Downloading page 13
[youtube:search:date] query "cortes+podcast": Downloading page 14
[youtube:search:date] query "cortes+p

ERROR: This live event will begin in 20 hours.


[download] Downloading video 23 of 400
[youtube] 6cNJSkHN4Zk: Downloading webpage
[download] Downloading video 24 of 400
[youtube] MRH-XMYz91Y: Downloading webpage
[download] Downloading video 25 of 400
[youtube] XQhqztAhwnM: Downloading webpage
[download] Downloading video 26 of 400
[youtube] WfZYxhmk8nU: Downloading webpage
[download] Downloading video 27 of 400
[youtube] 0PaZgqmvUvk: Downloading webpage
[download] Downloading video 28 of 400
[youtube] XepXhac0gbA: Downloading webpage
[download] Downloading video 29 of 400
[youtube] BIYiXL-s90Y: Downloading webpage
[download] Downloading video 30 of 400
[youtube] hW1-oLLga0A: Downloading webpage
[download] Downloading video 31 of 400
[youtube] 8aZJFOFvzDE: Downloading webpage
[download] Downloading video 32 of 400
[youtube] WZQL567P2No: Downloading webpage
[download] Downloading video 33 of 400
[youtube] 6tg7zTGO0RM: Downloading webpage
[download] Downloading video 34 of 400
[youtube] eBjpUlnftxU: Downloading webpage
[youtube] eBjpUl

ERROR: Signature extraction failed: Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/youtube_dl/extractor/youtube.py", line 1349, in _decrypt_signature
    video_id, player_url, s
  File "/usr/local/lib/python3.7/dist-packages/youtube_dl/extractor/youtube.py", line 1262, in _extract_signature_function
    res = self._parse_sig_js(code)
  File "/usr/local/lib/python3.7/dist-packages/youtube_dl/extractor/youtube.py", line 1331, in _parse_sig_js
    initial_function = jsi.extract_function(funcname)
  File "/usr/local/lib/python3.7/dist-packages/youtube_dl/jsinterp.py", line 245, in extract_function
    raise ExtractorError('Could not find JS function %r' % funcname)
youtube_dl.utils.ExtractorError: Could not find JS function 'na'; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[download] Downloading video 35 of 400
[youtube] Kbs2oS_g69M: Downloading webpage
[download] Downloading video 36 of 400
[youtube] 9ARiiZHMCxc: Downloading webpage
[download] Downloading video 37 of 400
[youtube] SoT3vYNqN-U: Downloading webpage
[download] Downloading video 38 of 400
[youtube] aDlgMl2ADdM: Downloading webpage
[download] Downloading video 39 of 400
[youtube] KNP9hl-_W8o: Downloading webpage
[download] Downloading video 40 of 400
[youtube] uQrlUIlz9ic: Downloading webpage
[download] Downloading video 41 of 400
[youtube] lV90GiLVJOU: Downloading webpage
[download] Downloading video 42 of 400
[youtube] VrFvKj88EfY: Downloading webpage
[download] Downloading video 43 of 400
[youtube] zrHcb4i5A4E: Downloading webpage
[download] Downloading video 44 of 400
[youtube] 5cObJqIL-dY: Downloading webpage
[download] Downloading video 45 of 400
[youtube] yv13TLnTQNk: Downloading webpage
[download] Downloading video 46 of 400
[youtube] JiiMSK1z4EU: Downloading webpage
[download] Downl

ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[download] Downloading video 111 of 400
[youtube] FkbkS5Tbkm8: Downloading webpage
[youtube] FkbkS5Tbkm8: Downloading API JSON
[youtube] FkbkS5Tbkm8: Downloading API JSON
[download] Downloading video 112 of 400
[youtube] 5915x9FQBm4: Downloading webpage
[youtube] 5915x9FQBm4: Downloading API JSON
[youtube] 5915x9FQBm4: Downloading API JSON
[download] Downloading video 113 of 400
[youtube] CZyv2GuFs5s: Downloading webpage
[youtube] CZyv2GuFs5s: Downloading API JSON
[youtube] CZyv2GuFs5s: Downloading API JSON
[download] Downloading video 114 of 400
[youtube] A3rn8ErpKr4: Downloading webpage
[youtube] A3rn8ErpKr4: Downloading API JSON
[youtube] A3rn8ErpKr4: Downloading API JSON
[download] Downloading video 115 of 400
[youtube] yJtp2ejb2oA: Downloading webpage
[youtube] yJtp2ejb2oA: Downloading API JSON
[youtube] yJtp2ejb2oA: Downloading API JSON
[download] Downloading video 116 of 400
[youtube] ABDJHSSfKkM: Downloading webpage
[youtube] ABDJHSSfKkM: Downloading API JSON
[youtube] ABDJHSSf

ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[download] Downloading video 141 of 400
[youtube] qby2RoTjg20: Downloading webpage
[youtube] qby2RoTjg20: Downloading API JSON
[youtube] qby2RoTjg20: Downloading API JSON
[download] Downloading video 142 of 400
[youtube] v03Nj6DL4-k: Downloading webpage
[youtube] v03Nj6DL4-k: Downloading API JSON
[youtube] v03Nj6DL4-k: Downloading API JSON
[download] Downloading video 143 of 400
[youtube] KAwvXAE1aOc: Downloading webpage
[youtube] KAwvXAE1aOc: Downloading API JSON
[youtube] KAwvXAE1aOc: Downloading API JSON
[download] Downloading video 144 of 400
[youtube] qZvcD2cCaWQ: Downloading webpage
[youtube] qZvcD2cCaWQ: Downloading API JSON
[youtube] qZvcD2cCaWQ: Downloading API JSON
[download] Downloading video 145 of 400
[youtube] 3KLXkZHtcgs: Downloading webpage
[youtube] 3KLXkZHtcgs: Downloading API JSON
[youtube] 3KLXkZHtcgs: Downloading API JSON
[download] Downloading video 146 of 400
[youtube] uwNtjmPQOn8: Downloading webpage
[youtube] uwNtjmPQOn8: Downloading API JSON
[youtube] uwNtjmPQ

ERROR: This live stream recording is not available.


[download] Downloading video 172 of 400
[youtube] cIKiIR3Ackg: Downloading webpage
[youtube] cIKiIR3Ackg: Downloading API JSON
[youtube] cIKiIR3Ackg: Downloading API JSON
[download] Downloading video 173 of 400
[youtube] nOnXilUH9L4: Downloading webpage
[youtube] nOnXilUH9L4: Downloading API JSON
[youtube] nOnXilUH9L4: Downloading API JSON
[download] Downloading video 174 of 400
[youtube] W48pm1uWNZk: Downloading webpage
[youtube] W48pm1uWNZk: Downloading API JSON
[youtube] W48pm1uWNZk: Downloading API JSON
[download] Downloading video 175 of 400
[youtube] aIpPMcJSgT0: Downloading webpage
[youtube] aIpPMcJSgT0: Downloading API JSON
[youtube] aIpPMcJSgT0: Downloading API JSON
[download] Downloading video 176 of 400
[youtube] _vcAwy-WZlM: Downloading webpage
[youtube] _vcAwy-WZlM: Downloading API JSON
[youtube] _vcAwy-WZlM: Downloading API JSON
[download] Downloading video 177 of 400
[youtube] vx2T6hUeZRg: Downloading webpage
[youtube] vx2T6hUeZRg: Downloading API JSON
[youtube] vx2T6hUe

ERROR: Sign in to confirm your age
This video may be inappropriate for some users.


[download] Downloading video 249 of 400
[youtube] bFbpwu_Zde0: Downloading webpage
[youtube] bFbpwu_Zde0: Downloading API JSON
[youtube] bFbpwu_Zde0: Downloading API JSON
[download] Downloading video 250 of 400
[youtube] l9aWrdfgp5o: Downloading webpage
[youtube] l9aWrdfgp5o: Downloading API JSON
[youtube] l9aWrdfgp5o: Downloading API JSON
[download] Downloading video 251 of 400
[youtube] oBpqcAza5WE: Downloading webpage
[youtube] oBpqcAza5WE: Downloading API JSON
[youtube] oBpqcAza5WE: Downloading API JSON
[download] Downloading video 252 of 400
[youtube] hYd5o5rvJco: Downloading webpage
[youtube] hYd5o5rvJco: Downloading API JSON
[youtube] hYd5o5rvJco: Downloading API JSON
[download] Downloading video 253 of 400
[youtube] -5m8MrYHoKc: Downloading webpage
[youtube] -5m8MrYHoKc: Downloading API JSON
[youtube] -5m8MrYHoKc: Downloading API JSON
[download] Downloading video 254 of 400
[youtube] vwZAL4vDvoc: Downloading webpage
[youtube] vwZAL4vDvoc: Downloading API JSON
[youtube] vwZAL4vD

ERROR: Sign in to confirm your age
This video may be inappropriate for some users.


[download] Downloading video 284 of 400
[youtube] JMhwfvMnqEU: Downloading webpage
[youtube] JMhwfvMnqEU: Downloading API JSON
[youtube] JMhwfvMnqEU: Downloading API JSON
[download] Downloading video 285 of 400
[youtube] IW_wTerb4FU: Downloading webpage
[youtube] IW_wTerb4FU: Downloading API JSON
[youtube] IW_wTerb4FU: Downloading API JSON
[download] Downloading video 286 of 400
[youtube] ZNFEXLE5UsM: Downloading webpage
[youtube] ZNFEXLE5UsM: Downloading API JSON
[youtube] ZNFEXLE5UsM: Downloading API JSON
[download] Downloading video 287 of 400
[youtube] YmP4Yd5w1mM: Downloading webpage
[youtube] YmP4Yd5w1mM: Downloading API JSON
[youtube] YmP4Yd5w1mM: Downloading API JSON
[download] Downloading video 288 of 400
[youtube] DDoGTV9aQ44: Downloading webpage
[youtube] DDoGTV9aQ44: Downloading API JSON
[youtube] DDoGTV9aQ44: Downloading API JSON
[download] Downloading video 289 of 400
[youtube] wvqMiFWXcnM: Downloading webpage
[youtube] wvqMiFWXcnM: Downloading API JSON
[youtube] wvqMiFWX

ERROR: Sign in to confirm your age
This video may be inappropriate for some users.


[download] Downloading video 294 of 400
[youtube] xOKh_kPovyI: Downloading webpage
[youtube] xOKh_kPovyI: Downloading API JSON
[youtube] xOKh_kPovyI: Downloading API JSON
[download] Downloading video 295 of 400
[youtube] IHqp5kHXqEM: Downloading webpage
[youtube] IHqp5kHXqEM: Downloading API JSON
[youtube] IHqp5kHXqEM: Downloading API JSON
[download] Downloading video 296 of 400
[youtube] MZWDknWTHDg: Downloading webpage
[youtube] MZWDknWTHDg: Downloading API JSON
[youtube] MZWDknWTHDg: Downloading API JSON
[download] Downloading video 297 of 400
[youtube] kYaW7YM5-fk: Downloading webpage
[youtube] kYaW7YM5-fk: Downloading API JSON
[youtube] kYaW7YM5-fk: Downloading API JSON
[download] Downloading video 298 of 400
[youtube] dvYZWrIO4FM: Downloading webpage
[youtube] dvYZWrIO4FM: Downloading API JSON
[youtube] dvYZWrIO4FM: Downloading API JSON
[download] Downloading video 299 of 400
[youtube] qr6eU7AbzlQ: Downloading webpage
[youtube] qr6eU7AbzlQ: Downloading API JSON
[youtube] qr6eU7Ab

ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[download] Downloading video 65 of 400
[youtube] cMAwf4mPYAo: Downloading webpage
[youtube] cMAwf4mPYAo: Downloading API JSON
[youtube] cMAwf4mPYAo: Downloading API JSON
[download] Downloading video 66 of 400
[youtube] u1ELPyA2wLU: Downloading webpage
[youtube] u1ELPyA2wLU: Downloading API JSON
[youtube] u1ELPyA2wLU: Downloading API JSON
[download] Downloading video 67 of 400
[youtube] Mghh7f6bBXU: Downloading webpage
[youtube] Mghh7f6bBXU: Downloading API JSON
[youtube] Mghh7f6bBXU: Downloading API JSON
[download] Downloading video 68 of 400
[youtube] SiLjUrXe7Qc: Downloading webpage
[youtube] SiLjUrXe7Qc: Downloading API JSON
[youtube] SiLjUrXe7Qc: Downloading API JSON
[download] Downloading video 69 of 400
[youtube] b8aneUvKQ80: Downloading webpage
[youtube] b8aneUvKQ80: Downloading API JSON
[youtube] b8aneUvKQ80: Downloading API JSON
[download] Downloading video 70 of 400
[youtube] bjWU2SS-pMs: Downloading webpage
[youtube] bjWU2SS-pMs: Downloading API JSON
[youtube] bjWU2SS-pMs: D

ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[download] Downloading video 151 of 400
[youtube] 1OOfdA9ocLc: Downloading webpage
[youtube] 1OOfdA9ocLc: Downloading API JSON
[youtube] 1OOfdA9ocLc: Downloading API JSON
[download] Downloading video 152 of 400
[youtube] H31-MXnvzEU: Downloading webpage
[youtube] H31-MXnvzEU: Downloading API JSON
[youtube] H31-MXnvzEU: Downloading API JSON
[download] Downloading video 153 of 400
[youtube] NTD5GjKSjgI: Downloading webpage
[youtube] NTD5GjKSjgI: Downloading API JSON
[youtube] NTD5GjKSjgI: Downloading API JSON
[download] Downloading video 154 of 400
[youtube] MQhxPnxrXUA: Downloading webpage
[youtube] MQhxPnxrXUA: Downloading API JSON
[youtube] MQhxPnxrXUA: Downloading API JSON
[download] Downloading video 155 of 400
[youtube] XVRGdDm6InE: Downloading webpage
[youtube] XVRGdDm6InE: Downloading API JSON
[youtube] XVRGdDm6InE: Downloading API JSON
[download] Downloading video 156 of 400
[youtube] jhZzNmhySyQ: Downloading webpage
[youtube] jhZzNmhySyQ: Downloading API JSON
[youtube] jhZzNmhy

ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[download] Downloading video 209 of 400
[youtube] dNb3mWB5j0M: Downloading webpage
[youtube] dNb3mWB5j0M: Downloading API JSON
[youtube] dNb3mWB5j0M: Downloading API JSON
[download] Downloading video 210 of 400
[youtube] 0VR1fl9GIT4: Downloading webpage
[youtube] 0VR1fl9GIT4: Downloading API JSON
[youtube] 0VR1fl9GIT4: Downloading API JSON
[download] Downloading video 211 of 400
[youtube] ZhBNNrvmIto: Downloading webpage
[youtube] ZhBNNrvmIto: Downloading API JSON
[youtube] ZhBNNrvmIto: Downloading API JSON
[download] Downloading video 212 of 400
[youtube] Y8c7kbw8cNk: Downloading webpage
[youtube] Y8c7kbw8cNk: Downloading API JSON
[youtube] Y8c7kbw8cNk: Downloading API JSON
[download] Downloading video 213 of 400
[youtube] 7pbymz_UKmk: Downloading webpage
[youtube] 7pbymz_UKmk: Downloading API JSON
[youtube] 7pbymz_UKmk: Downloading API JSON
[download] Downloading video 214 of 400
[youtube] 9lSG0MdpgiI: Downloading webpage
[youtube] 9lSG0MdpgiI: Downloading API JSON
[youtube] 9lSG0Mdp

ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[download] Downloading video 225 of 400
[youtube] WHwQch9ma_E: Downloading webpage
[youtube] WHwQch9ma_E: Downloading API JSON
[youtube] WHwQch9ma_E: Downloading API JSON
[download] Downloading video 226 of 400
[youtube] vsE5CXHEDQs: Downloading webpage
[youtube] vsE5CXHEDQs: Downloading API JSON
[youtube] vsE5CXHEDQs: Downloading API JSON
[download] Downloading video 227 of 400
[youtube] xBJX50KualI: Downloading webpage
[youtube] xBJX50KualI: Downloading API JSON
[youtube] xBJX50KualI: Downloading API JSON
[download] Downloading video 228 of 400
[youtube] _98dH33BqWk: Downloading webpage
[youtube] _98dH33BqWk: Downloading API JSON
[youtube] _98dH33BqWk: Downloading API JSON
[download] Downloading video 229 of 400
[youtube] dEWeHndSTz0: Downloading webpage
[youtube] dEWeHndSTz0: Downloading API JSON
[youtube] dEWeHndSTz0: Downloading API JSON
[download] Downloading video 230 of 400
[youtube] eKUa7hTxws4: Downloading webpage
[youtube] eKUa7hTxws4: Downloading API JSON
[youtube] eKUa7hTx

ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[download] Downloading video 304 of 400
[youtube] NgN9URsEOYg: Downloading webpage
[youtube] NgN9URsEOYg: Downloading API JSON
[youtube] NgN9URsEOYg: Downloading API JSON
[download] Downloading video 305 of 400
[youtube] q7EOBetwFtY: Downloading webpage
[youtube] q7EOBetwFtY: Downloading API JSON
[youtube] q7EOBetwFtY: Downloading API JSON
[download] Downloading video 306 of 400
[youtube] EkgxGVJvJlY: Downloading webpage
[youtube] EkgxGVJvJlY: Downloading API JSON
[youtube] EkgxGVJvJlY: Downloading API JSON
[download] Downloading video 307 of 400
[youtube] 6an_tOrnOsU: Downloading webpage
[youtube] 6an_tOrnOsU: Downloading API JSON
[youtube] 6an_tOrnOsU: Downloading API JSON
[download] Downloading video 308 of 400
[youtube] LH46gLC_04c: Downloading webpage
[youtube] LH46gLC_04c: Downloading API JSON
[youtube] LH46gLC_04c: Downloading API JSON
[download] Downloading video 309 of 400
[youtube] 7xsLFR_Lc9E: Downloading webpage
[youtube] 7xsLFR_Lc9E: Downloading API JSON
[youtube] 7xsLFR_L

ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[download] Downloading video 319 of 400
[youtube] aVUhGRMrbA8: Downloading webpage
[youtube] aVUhGRMrbA8: Downloading API JSON
[youtube] aVUhGRMrbA8: Downloading API JSON
[download] Downloading video 320 of 400
[youtube] HyVEWEl2rwY: Downloading webpage
[youtube] HyVEWEl2rwY: Downloading API JSON
[youtube] HyVEWEl2rwY: Downloading API JSON
[download] Downloading video 321 of 400
[youtube] 5fH-h49JJzk: Downloading webpage
[youtube] 5fH-h49JJzk: Downloading API JSON
[youtube] 5fH-h49JJzk: Downloading API JSON
[download] Downloading video 322 of 400
[youtube] UNl2M1MJkVo: Downloading webpage
[youtube] UNl2M1MJkVo: Downloading API JSON
[youtube] UNl2M1MJkVo: Downloading API JSON
[download] Downloading video 323 of 400
[youtube] yNROShxyB8I: Downloading webpage
[youtube] yNROShxyB8I: Downloading API JSON
[youtube] yNROShxyB8I: Downloading API JSON
[download] Downloading video 324 of 400
[youtube] V-4ltC_LyC4: Downloading webpage
[youtube] V-4ltC_LyC4: Downloading API JSON
[youtube] V-4ltC_L

ERROR: Sign in to confirm your age
This video may be inappropriate for some users.


[download] Downloading video 327 of 400
[youtube] DdGOkus26_Q: Downloading webpage
[youtube] DdGOkus26_Q: Downloading API JSON
[youtube] DdGOkus26_Q: Downloading API JSON
[download] Downloading video 328 of 400
[youtube] 5JXOY23eQ38: Downloading webpage
[youtube] 5JXOY23eQ38: Downloading API JSON
[youtube] 5JXOY23eQ38: Downloading API JSON
[download] Downloading video 329 of 400
[youtube] qYwRN4DkHag: Downloading webpage
[youtube] qYwRN4DkHag: Downloading API JSON
[youtube] qYwRN4DkHag: Downloading API JSON
[download] Downloading video 330 of 400
[youtube] CWbJYIz9Eww: Downloading webpage
[youtube] CWbJYIz9Eww: Downloading API JSON
[youtube] CWbJYIz9Eww: Downloading API JSON
[download] Downloading video 331 of 400
[youtube] Dqmeem1TP70: Downloading webpage
[youtube] Dqmeem1TP70: Downloading API JSON
[youtube] Dqmeem1TP70: Downloading API JSON
[download] Downloading video 332 of 400
[youtube] ldVvf296zxI: Downloading webpage
[youtube] ldVvf296zxI: Downloading API JSON
[youtube] ldVvf296

ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[download] Downloading video 339 of 400
[youtube] B2GSxcX7Yfs: Downloading webpage
[youtube] B2GSxcX7Yfs: Downloading API JSON
[youtube] B2GSxcX7Yfs: Downloading API JSON
[download] Downloading video 340 of 400
[youtube] 0BKDqEvMU6o: Downloading webpage
[youtube] 0BKDqEvMU6o: Downloading API JSON
[youtube] 0BKDqEvMU6o: Downloading API JSON
[download] Downloading video 341 of 400
[youtube] FwE_6tSX5YU: Downloading webpage
[youtube] FwE_6tSX5YU: Downloading API JSON
[youtube] FwE_6tSX5YU: Downloading API JSON
[download] Downloading video 342 of 400
[youtube] NCpASi9cA_U: Downloading webpage
[youtube] NCpASi9cA_U: Downloading API JSON
[youtube] NCpASi9cA_U: Downloading API JSON
[download] Downloading video 343 of 400
[youtube] S6pxI2sCt9o: Downloading webpage
[youtube] S6pxI2sCt9o: Downloading API JSON
[youtube] S6pxI2sCt9o: Downloading API JSON
[download] Downloading video 344 of 400
[youtube] wfL9nNrTL1A: Downloading webpage
[youtube] wfL9nNrTL1A: Downloading API JSON
[youtube] wfL9nNrT

ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[download] Downloading video 351 of 400
[youtube] s6W0mbq7rZM: Downloading webpage
[youtube] s6W0mbq7rZM: Downloading API JSON
[youtube] s6W0mbq7rZM: Downloading API JSON
[download] Downloading video 352 of 400
[youtube] LlHkMGZ01kA: Downloading webpage
[youtube] LlHkMGZ01kA: Downloading API JSON


ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[download] Downloading video 353 of 400
[youtube] W2Va4Y-SDXM: Downloading webpage
[youtube] W2Va4Y-SDXM: Downloading API JSON
[youtube] W2Va4Y-SDXM: Downloading API JSON
[download] Downloading video 354 of 400
[youtube] kcyfyEJ7Tfs: Downloading webpage
[youtube] kcyfyEJ7Tfs: Downloading API JSON
[youtube] kcyfyEJ7Tfs: Downloading API JSON
[download] Downloading video 355 of 400
[youtube] 6JkYHBeKMwk: Downloading webpage
[youtube] 6JkYHBeKMwk: Downloading API JSON
[youtube] 6JkYHBeKMwk: Downloading API JSON
[download] Downloading video 356 of 400
[youtube] piXzmjq2hD8: Downloading webpage
[youtube] piXzmjq2hD8: Downloading API JSON
[youtube] piXzmjq2hD8: Downloading API JSON
[download] Downloading video 357 of 400
[youtube] 81oF4DssNfA: Downloading webpage
[youtube] 81oF4DssNfA: Downloading API JSON
[youtube] 81oF4DssNfA: Downloading API JSON
[download] Downloading video 358 of 400
[youtube] 4r_w6-6rjHY: Downloading webpage
[youtube] 4r_w6-6rjHY: Downloading API JSON
[youtube] 4r_w6-6r

ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[download] Downloading video 365 of 400
[youtube] Bao-VyOG4xg: Downloading webpage
[youtube] Bao-VyOG4xg: Downloading API JSON
[youtube] Bao-VyOG4xg: Downloading API JSON
[download] Downloading video 366 of 400
[youtube] ignYSUj8J0c: Downloading webpage
[youtube] ignYSUj8J0c: Downloading API JSON
[youtube] ignYSUj8J0c: Downloading API JSON
[download] Downloading video 367 of 400
[youtube] Vb2v4UqZHGc: Downloading webpage
[youtube] Vb2v4UqZHGc: Downloading API JSON
[youtube] Vb2v4UqZHGc: Downloading API JSON
[download] Downloading video 368 of 400
[youtube] G5YpS_4wn20: Downloading webpage
[youtube] G5YpS_4wn20: Downloading API JSON
[youtube] G5YpS_4wn20: Downloading API JSON
[download] Downloading video 369 of 400
[youtube] sUmPiYXjZ1M: Downloading webpage
[youtube] sUmPiYXjZ1M: Downloading API JSON
[youtube] sUmPiYXjZ1M: Downloading API JSON
[download] Downloading video 370 of 400
[youtube] 9A3aXtpiU6w: Downloading webpage
[youtube] 9A3aXtpiU6w: Downloading API JSON
[youtube] 9A3aXtpi

ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[download] Downloading video 373 of 400
[youtube] 5iZAzvfNU-Y: Downloading webpage
[youtube] 5iZAzvfNU-Y: Downloading API JSON
[youtube] 5iZAzvfNU-Y: Downloading API JSON
[download] Downloading video 374 of 400
[youtube] 21OOwgojaTg: Downloading webpage
[youtube] 21OOwgojaTg: Downloading API JSON
[youtube] 21OOwgojaTg: Downloading API JSON
[download] Downloading video 375 of 400
[youtube] ghxStgk_yXA: Downloading webpage
[youtube] ghxStgk_yXA: Downloading API JSON
[youtube] ghxStgk_yXA: Downloading API JSON
[download] Downloading video 376 of 400
[youtube] KxWNg91hBo0: Downloading webpage
[youtube] KxWNg91hBo0: Downloading API JSON
[youtube] KxWNg91hBo0: Downloading API JSON
[download] Downloading video 377 of 400
[youtube] Gk_dqMFnUHg: Downloading webpage
[youtube] Gk_dqMFnUHg: Downloading API JSON
[youtube] Gk_dqMFnUHg: Downloading API JSON
[download] Downloading video 378 of 400
[youtube] d7UJHq62W4I: Downloading webpage
[youtube] d7UJHq62W4I: Downloading API JSON


ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[download] Downloading video 379 of 400
[youtube] p6ldgGnr9jg: Downloading webpage
[youtube] p6ldgGnr9jg: Downloading API JSON
[youtube] p6ldgGnr9jg: Downloading API JSON
[download] Downloading video 380 of 400
[youtube] 9eO978QYW0Q: Downloading webpage
[youtube] 9eO978QYW0Q: Downloading API JSON
[youtube] 9eO978QYW0Q: Downloading API JSON
[download] Downloading video 381 of 400
[youtube] 6_myvPUiO-o: Downloading webpage
[youtube] 6_myvPUiO-o: Downloading API JSON
[youtube] 6_myvPUiO-o: Downloading API JSON
[download] Downloading video 382 of 400
[youtube] isZPA9ZScy8: Downloading webpage
[youtube] isZPA9ZScy8: Downloading API JSON
[youtube] isZPA9ZScy8: Downloading API JSON
[download] Downloading video 383 of 400
[youtube] xr-YM1wX8sY: Downloading webpage
[youtube] xr-YM1wX8sY: Downloading API JSON
[youtube] xr-YM1wX8sY: Downloading API JSON
[download] Downloading video 384 of 400
[youtube] WJR8iIhtOWw: Downloading webpage
[youtube] WJR8iIhtOWw: Downloading API JSON
[youtube] WJR8iIht

ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


[download] Downloading video 389 of 400
[youtube] NJqVRzjxb3o: Downloading webpage
[youtube] NJqVRzjxb3o: Downloading API JSON
[youtube] NJqVRzjxb3o: Downloading API JSON
[download] Downloading video 390 of 400
[youtube] Ij9gWxfXQ7Y: Downloading webpage
[youtube] Ij9gWxfXQ7Y: Downloading API JSON
[youtube] Ij9gWxfXQ7Y: Downloading API JSON
[download] Downloading video 391 of 400
[youtube] C82WFhqEkwY: Downloading webpage
[youtube] C82WFhqEkwY: Downloading API JSON
[youtube] C82WFhqEkwY: Downloading API JSON
[download] Downloading video 392 of 400
[youtube] BHy7HmKdDj4: Downloading webpage
[youtube] BHy7HmKdDj4: Downloading API JSON
[youtube] BHy7HmKdDj4: Downloading API JSON
[download] Downloading video 393 of 400
[youtube] SVqIj4t9C3Q: Downloading webpage
[youtube] SVqIj4t9C3Q: Downloading API JSON
[youtube] SVqIj4t9C3Q: Downloading API JSON
[download] Downloading video 394 of 400
[youtube] MBQuh3ETNoM: Downloading webpage
[youtube] MBQuh3ETNoM: Downloading API JSON
[youtube] MBQuh3ET

In [11]:
#filtar nulos


{'abr': 129.484,
 'acodec': 'mp4a.40.2',
 'age_limit': 0,
 'automatic_captions': {'af': [{'ext': 'srv1',
    'url': 'https://www.youtube.com/api/timedtext?v=sBSJSSd30_I&asr_langs=de%2Cen%2Ces%2Cfr%2Cid%2Cit%2Cja%2Cko%2Cnl%2Cpt%2Cru%2Ctr%2Cvi&caps=asr&exp=xftt%2Cxctw&xoaf=4&hl=en&ip=0.0.0.0&ipbits=0&expire=1639554225&sparams=ip%2Cipbits%2Cexpire%2Cv%2Casr_langs%2Ccaps%2Cexp%2Cxoaf&signature=9FE10E68C6F5C1D7A464647E783A3F1CD5FEE26C.761EC9E613051F03B459651FB0E3ACC2B515E5D8&key=yt8&kind=asr&lang=pt&tlang=af&fmt=srv1'},
   {'ext': 'srv2',
    'url': 'https://www.youtube.com/api/timedtext?v=sBSJSSd30_I&asr_langs=de%2Cen%2Ces%2Cfr%2Cid%2Cit%2Cja%2Cko%2Cnl%2Cpt%2Cru%2Ctr%2Cvi&caps=asr&exp=xftt%2Cxctw&xoaf=4&hl=en&ip=0.0.0.0&ipbits=0&expire=1639554225&sparams=ip%2Cipbits%2Cexpire%2Cv%2Casr_langs%2Ccaps%2Cexp%2Cxoaf&signature=9FE10E68C6F5C1D7A464647E783A3F1CD5FEE26C.761EC9E613051F03B459651FB0E3ACC2B515E5D8&key=yt8&kind=asr&lang=pt&tlang=af&fmt=srv2'},
   {'ext': 'srv3',
    'url': 'https://www.y

Extraimos os vídeos **relacionados** a `cortes+podcast` e `palmeiras` e **guardamos** na variável `r`. Esta variável contém um arquivo no formato [JSON](https://pt.wikipedia.org/wiki/JSON), um tipo muito parecido com os **dicionários**.

Se dermos uma olhada no parâmetro `entries` observamos que lá tem tudo que precisamos, as **principais** características dos vídeos extraídos.

<center><img src="https://github.com/FerrazThales/recomendador_youtube/raw/main/base_de_dados/r_entries.png"></center>

Depois disso, criamos mais uma chave: a `key_word`. Ela é muito importante para identificar **qual** foi a palavra chave que utilizamos nas **buscas** dos vídeos. 

Por fim, guardamos tudo em uma **lista** e criamos um [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) do Pandas para melhor **visualizar** nossos resultados.

In [12]:
#armanzenando os resultados em um DataFrame
df = pd.DataFrame(resultados)

#exibindo as 5 primeiras entradas do DataFrame
df.head()

AttributeError: ignored

tipo de dados

In [None]:
df.dtypes

id                       object
title                    object
formats                  object
thumbnails               object
description              object
upload_date              object
uploader                 object
uploader_id              object
uploader_url             object
channel_id               object
channel_url              object
duration                  int64
view_count                int64
average_rating          float64
age_limit                 int64
webpage_url              object
categories               object
tags                     object
is_live                  object
like_count                int64
channel                  object
extractor                object
webpage_url_basename     object
extractor_key            object
n_entries                 int64
playlist                 object
playlist_id              object
playlist_title           object
playlist_uploader        object
playlist_uploader_id     object
playlist_index            int64
thumbnai

verificar dados duplicados

dados vazios

## Exportando o banco de dados para CSV para rotulação manual

texto texto google sheets

In [None]:
#trazer o arquivo com alguns dados rotulados do github

#separar os que estão rotulados e os que não estão

## Escolha das Features

texto texto 

#Modelagem

texto texto

random forest

#Colocando em *Produção*

texto texto

#Conclusões

* x
* x
* uso de NLP
* x