# Experimentation

## Get Video Information

In [13]:
import datetime
import json

import requests
from yt_dlp import YoutubeDL

def list_most_recent_videos(days: int, channel: str):
    with YoutubeDL(params={}) as ydl:
        info = ydl.extract_info(channel, download=False)
        return info

In [5]:
i = list_most_recent_videos(0, "https://www.youtube.com/@adumb_codes/videos")

[youtube:tab] Extracting URL: https://www.youtube.com/@adumb_codes/videos
[youtube:tab] @adumb_codes/videos: Downloading webpage
[download] Downloading playlist: adumb - Videos
[youtube:tab] Playlist adumb - Videos: Downloading 6 items of 6
[download] Downloading item 1 of 6
[youtube] Extracting URL: https://www.youtube.com/watch?v=JheGL6uSF-4
[youtube] JheGL6uSF-4: Downloading webpage
[youtube] JheGL6uSF-4: Downloading ios player API JSON
[youtube] JheGL6uSF-4: Downloading m3u8 information
[download] Downloading item 2 of 6
[youtube] Extracting URL: https://www.youtube.com/watch?v=vLOgzi5y9xE
[youtube] vLOgzi5y9xE: Downloading webpage
[youtube] vLOgzi5y9xE: Downloading ios player API JSON
[youtube] vLOgzi5y9xE: Downloading m3u8 information
[download] Downloading item 3 of 6
[youtube] Extracting URL: https://www.youtube.com/watch?v=mTPqIuEM-ug
[youtube] mTPqIuEM-ug: Downloading webpage
[youtube] mTPqIuEM-ug: Downloading ios player API JSON
[youtube] mTPqIuEM-ug: Downloading m3u8 inform

In [10]:
print(json.dumps(YoutubeDL().sanitize_info(i), indent=4))

{
    "id": "UCwPdVunI5mD-dpuLogOawbw",
    "channel": "adumb",
    "channel_id": "UCwPdVunI5mD-dpuLogOawbw",
    "title": "adumb - Videos",
    "availability": null,
    "channel_follower_count": 78500,
    "description": "I make coding videos sometimes\n",
    "tags": [],
    "thumbnails": [
        {
            "url": "https://yt3.googleusercontent.com/4HP0h_KqDvAFRyYNf-4fD4aJadLfQNWn9oB4N2n_fv8pQl8QT7WuQIMit13XWUefqKYMB8ex=w320-fcrop64=1,32b75a57cd48a5a8-k-c0xffffffff-no-nd-rj",
            "height": 88,
            "width": 320,
            "preference": -10,
            "id": "0",
            "resolution": "320x88"
        },
        {
            "url": "https://yt3.googleusercontent.com/4HP0h_KqDvAFRyYNf-4fD4aJadLfQNWn9oB4N2n_fv8pQl8QT7WuQIMit13XWUefqKYMB8ex=w320-fcrop64=1,00000000ffffffff-k-c0xffffffff-no-nd-rj",
            "height": 180,
            "width": 320,
            "preference": -10,
            "id": "1",
            "resolution": "320x180"
        },
        {
 

In [12]:
info = i
for i in range(len(info['entries'])):
    print("Title:", info['entries'][i]['title'])
    print("Description:", info['entries'][i]['description'])
    print("Link to captions (en-orig):", info['entries'][i]['automatic_captions']['en-orig'][0]['url'])

Title: I Made a Graph of Wikipedia... This Is What I Found
Description: Code for all my videos: https://github.com/sponsors/adumb-codes/
Get the graph as a poster: https://adumb.store/
Twitter: https://twitter.com/adumb_codes

A deep dive into the network of Wikipedia and some of the the most interesting, bizarre, and unique articles on the website.

Music:
Beyond the Wall - Sugoi
How About Now? - Andreas Dahlbäck
First Horizon - ELFL
Neroli - Ennio Máno
Tree Tops - Autohacker

Technical details for nerds:
- Data is collected from Wikipedia dumps
- Graph is made with python-igraph
- Distributed Recursive Layout algorithm is used for the graph layout
- Leiden algorithm is used for community detection
- A valid article is any page in Wikipedia's article namespace excluding redirect pages, disambiguation pages, and soft redirects
- A valid link is a link in an articles body. Links that appear in or after the "See Also" section and links that appear as footnotes are not included since thes

In [14]:
response = requests.get(info['entries'][i]['automatic_captions']['en-orig'][0]['url'])
response.status_code

200

In [15]:
response.json()

{'wireMagic': 'pb3',
 'pens': [{}],
 'wsWinStyles': [{}, {'mhModeHint': 2, 'juJustifCode': 0, 'sdScrollDir': 3}],
 'wpWinPositions': [{},
  {'apPoint': 6, 'ahHorPos': 20, 'avVerPos': 100, 'rcRows': 2, 'ccCols': 40}],
 'events': [{'tStartMs': 0,
   'dDurationMs': 483650,
   'id': 1,
   'wpWinPosId': 1,
   'wsWinStyleId': 1},
  {'tStartMs': 0,
   'dDurationMs': 14610,
   'wWinId': 1,
   'segs': [{'utf8': 'a', 'acAsrConf': 198},
    {'utf8': ' while', 'tOffsetMs': 60, 'acAsrConf': 252},
    {'utf8': ' ago', 'tOffsetMs': 420, 'acAsrConf': 252},
    {'utf8': ' I', 'tOffsetMs': 450, 'acAsrConf': 103},
    {'utf8': ' had', 'tOffsetMs': 900, 'acAsrConf': 239},
    {'utf8': ' an', 'tOffsetMs': 1199, 'acAsrConf': 252},
    {'utf8': ' idea', 'tOffsetMs': 1350, 'acAsrConf': 236},
    {'utf8': ' a', 'tOffsetMs': 1439, 'acAsrConf': 95},
    {'utf8': ' Twitter', 'tOffsetMs': 1939, 'acAsrConf': 224},
    {'utf8': ' bot', 'tOffsetMs': 2939, 'acAsrConf': 240}]},
  {'tStartMs': 3139,
   'dDurationMs': 11

In [22]:
def get_text(captions: dict) -> str:
    text = []
    for event in captions['events']:
        if 'segs' in event:
            for segment in event['segs']:
                text.append(segment['utf8'])
    return ''.join(text)

In [23]:
print(get_text(response.json()))

a while ago I had an idea a Twitter bot
that grabs people's wishes so how do we
actually go about building a Twitter bot
that can grant people's wishes well we
can think of it in a two phase process
first we'll design the Twitter bot and
then we'll actually implement the
software required for it for the first
part of the design process we really
just want to lay out all the
requirements for this Twitter bot so the
first thing we want to do is find tweets
that have wishes in them in this case
we'll just search for tweets that have
the phrase I wish I knew how to in them
as this will simplify our problem a lot
so after we have a tweet that has a wish
in it we just want to extract that wish
from the tweet we don't really need any
of the other unnecessary parts of it
once we've extracted the question from
the tweet we can go ahead and send that
question to YouTube and find a video
that will answer that question since
YouTube just has so many videos for
everything is a good chance we'll fin

NameError: name 'segment' is not defined