In [1]:
import collections

The `youtube_urls.txt` file has a convenience sample of 210 unique Youtube video URLs, one per line.

In [2]:
with open('youtube_urls.txt') as f:
    urls = [line.strip() for line in f]

In [3]:
urls[0]

'https://www.youtube.com/watch?v=-xIOhTt80ZM'

In [4]:
codes = [url[32:] for url in urls]

The video codes seem to always be 11 characters - even for [the first Youtube video](https://www.youtube.com/watch?v=jNQXAC9IVRw) ([source](https://en.wikipedia.org/wiki/Me_at_the_zoo)).

In [5]:
collections.Counter(len(code) for code in codes)

Counter({11: 210})

Here's the full set of characters that appear in video codes.

In [6]:
freqs = collections.Counter(''.join(codes))
chars = ''.join(sorted(freqs.keys()))
chars

'-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz'

In [7]:
len(chars)

64

That makes it seem like they're going for [base 64](https://en.wikipedia.org/wiki/Base64). With 11 digits, that represents 66 bits of information, for quite a few possibilities (though fewer than in [UUIDs](https://en.wikipedia.org/wiki/Universally_unique_identifier)).

In [8]:
64**11

73786976294838206464L

That's 73 quintillion, 7.3 x 10^19, which is pretty many. It [seems like](https://www.quora.com/How-many-videos-are-on-YouTube-2017-1) there are maybe 5 billion YouTube videos. How many codes are there per video that exists?

In [9]:
64**11 / 5e9

14757395258.967642

Gross; I'd have to guess 15 billion URLs to find a video? Say I can check ten URLs per second...

In [10]:
(64**11 / 5e9) / 10 / 60 / 60 / 24 / 365

46.79539338840577

Hmm; two per century is not fast enough.

Is it really random though? This gut check shows that no characters appear half as often as expected, or twice as often as expected. (The range is inside 0.6 to 1.5 as often.)

In [11]:
sorted([val / (210. * 11 / 64) for val in freqs.values()])

[0.6372294372294373,
 0.6649350649350649,
 0.6649350649350649,
 0.6926406926406926,
 0.6926406926406926,
 0.7203463203463204,
 0.7203463203463204,
 0.7203463203463204,
 0.7480519480519481,
 0.7757575757575758,
 0.7757575757575758,
 0.7757575757575758,
 0.7757575757575758,
 0.8034632034632034,
 0.8034632034632034,
 0.8588744588744589,
 0.8588744588744589,
 0.8865800865800866,
 0.8865800865800866,
 0.8865800865800866,
 0.8865800865800866,
 0.8865800865800866,
 0.8865800865800866,
 0.8865800865800866,
 0.9142857142857143,
 0.9142857142857143,
 0.9142857142857143,
 0.9142857142857143,
 0.941991341991342,
 0.941991341991342,
 0.941991341991342,
 0.941991341991342,
 0.9696969696969697,
 0.9974025974025974,
 0.9974025974025974,
 0.9974025974025974,
 1.025108225108225,
 1.025108225108225,
 1.025108225108225,
 1.0528138528138529,
 1.0528138528138529,
 1.0528138528138529,
 1.0805194805194804,
 1.0805194805194804,
 1.0805194805194804,
 1.0805194805194804,
 1.1082251082251082,
 1.135930735930736,


So it looks at least random-ish. We could do some statistics (chi-squared?) to quantify how random it seems, look at per-character-position counts, etc., but it doesn't seem worth it.

Looks like it's back to the existing things that get random videos by some presumably cheating method:

 * [randomyoutube.net](https://randomyoutube.net/) (my favorite)
 * [stumbl.tv](http://stumbl.tv/)
 * [ytroulette.com](https://ytroulette.com/)