### Personality and its Transformations ###

An analysis of prof. Jordan Peterson's collection of lectures from University of Toronto personality course.

Lectures in video form provided at: https://www.youtube.com/watch?v=UGLsnu5RLe8&list=PLYNhvBtnVUK4aUJ6onJbylGBeeqlJUpN1     |    More about prof. Peterson at https://www.jordanbpeterson.com/

---

<div class="tomcolor8">  
<h4 style="background:#135e96; color:white ;font-size:15px;line-height:1em; text-align:left; padding: 20px">
      Get captions from a playlist</h4> 
</div>

In [2]:
# main imports
import os
from constants import *

try:
    from pytube import Playlist
    from youtube_transcript_api import YouTubeTranscriptApi
except ModuleNotFoundError:
    os.system('pip install pytube') 
    os.system('pip install youtube-transcript-api') 

***pytube*** is a lightweight, Pythonic, dependency-free, library (and command-line utility) for downloading YouTube Videos.    
Documentation available at https://pytube.io/en/latest/

***YouTube Transcript/Subtitle API*** is a python API which allows you to get the transcript/subtitles for a given YouTube video    
Documentation available at: https://pypi.org/project/youtube-transcript-api/

In [4]:
def get_playlist(play_list_name: str) -> list:
    """Generate list of youtube urls based on a given playlist name

    Args:
        play_list_name (str): name of a playlist

    Returns:
        list: Outputs a list of youtube urls
    """
    all_urls = [] # list of all urls
    playlist = Playlist(play_list_name)
    
    for url in playlist:
        all_urls.append(url)

    return all_urls

In [None]:
PLAYLIST

In [6]:
jp_videos = get_playlist(PLAYLIST)
jp_videos

['https://www.youtube.com/watch?v=UGLsnu5RLe8',
 'https://www.youtube.com/watch?v=ajtnhtEg76k',
 'https://www.youtube.com/watch?v=PH67HpFD2Ew',
 'https://www.youtube.com/watch?v=G3fWuMQ5K8I',
 'https://www.youtube.com/watch?v=IO6NvcGKZ20',
 'https://www.youtube.com/watch?v=BSh37_x5RNY',
 'https://www.youtube.com/watch?v=3uJkd54p9dY',
 'https://www.youtube.com/watch?v=WjpV9mja3Wc',
 'https://www.youtube.com/watch?v=539UQF6eT6I',
 'https://www.youtube.com/watch?v=f511uRzsHhQ',
 'https://www.youtube.com/watch?v=RNxlEQSvh_w',
 'https://www.youtube.com/watch?v=q15eTySnWxc',
 'https://www.youtube.com/watch?v=qRFxulvRC7I']

In [7]:
len(jp_videos)

13

In [8]:
def get_captions(urls: list) -> list:
    """Get captions (transcripts) from all videos in a given playlist

    Args:
        urls (list): list of YouTube video

    Returns:
        list: list of all captions from all videos
    """
    all_captions = []

    for url in urls:
        # a shorter version of a given YouTube link
        short_url = url.split('watch?v=')[1]
        try:
            captions = YouTubeTranscriptApi.get_transcript(short_url,\
                 languages=['en'])

            preprocessed_captions = ' '.join([entry['text'].replace('\n', ' ') \
                for entry in captions])

            all_captions.append(preprocessed_captions)

        # print the short url of a video that doesn't have captions
        except:
            print(f'Transcripts disabled for video {short_url}')
    return all_captions

In [9]:
all_captions = get_captions(jp_videos)

Transcripts disabled for video q15eTySnWxc


In [10]:
len(all_captions)

12

In [11]:
# preview of the first 1000  characters of the first caption
all_captions[0][:1000]

'Well, after all that. So, welcome to Psychology 230. Nice to see you all here. So, what I’m going to do today—how I’m going to start—is I’m going to give you an overview of the content of the course and then I’ll give you an overview of the class requirements right at the end. But I think we might as well jump right into the content to begin with. So, there’s a website—I don’t really like Blackboard so I have my own website. You can go to jordanbpeterson.com and underneath there there’s a menu that lists all the courses and the full syllabus is listed there. So, all the information that you’re going to need about the course can be found there, including most of the readings, although there is also a textbook, which I presume the majority of you have already purchased. So, it also lists the other things you need to know like what days the tests are going to be and what the assignments are and I’ll go over that anyways at the end of the class. So, but to begin with I’m going to tell you

In [12]:
# save captions to a file
with open(f'{OUTPUT_FOLDER}\{CAPTIONS_FILE_NAME}', 'w', encoding='utf-8') as f:
    for line in all_captions:
        f.write(f"{line}\n")

---