
## Upload YouTube Videos to the Reading Material Folder Manually


In this notebook, we will take URLs of videos, convert them to mp3s, upload them to Google Drive, trash them, and delete them.


We will start by ensuring that all the necessary python libraries are installed.

In [None]:

import sys

print('conda update --all --yes --prefix "{}"'.format(sys.prefix))

In [2]:

import sys

# youtube-dl-2019.8.2
print('{} -m pip install --upgrade youtube_dl'.format(sys.executable))
!{sys.executable} -m pip install --upgrade youtube_dl

C:\Users\dev\anaconda3\python.exe -m pip install --upgrade youtube_dl
Collecting youtube_dl
  Downloading youtube_dl-2020.6.16.1-py2.py3-none-any.whl (1.8 MB)
Installing collected packages: youtube-dl
Successfully installed youtube-dl-2020.6.16.1


In [None]:

import os

# https://github.com/ytdl-org/youtube-dl/issues/20758
# Just replaced 'token' to 'account_playback_token' in line 1674 of extractor/youtube.py
# All seems to work.
text_editor_path = r'C:\Program Files\Notepad++\notepad++.exe'
#text_editor_path = r'C:\Program Files\Sublime Text 3\sublime_text.exe'
for sub_dir, dirs_list, files_list in os.walk(r'{}\Lib\site-packages\youtube_dl'.format(sys.prefix)):
    for file_name in files_list:
        if 'youtube.py' == file_name:
            file_path = os.path.join(sub_dir, file_name)
            print(file_path)
!"{text_editor_path}" "{file_path}"

In [None]:

!youtube-dl --version

In [None]:

!{sys.executable} -m pip install --upgrade PyDrive


Next, we will prove to ourselves that we have ffmpeg correctly installed.

In [None]:

# https://github.com/adaptlearning/adapt_authoring/wiki/Installing-FFmpeg
!ffmpeg -version


Performing authentication with Google Drive requires we keep our client id and consumer secret in a json file. Keep your secrets off the internet!

In [None]:

!start %windir%\explorer.exe "C:\Users\dev\Documents\repositories\notebooks\Miscellaneous\json"


We also make sure line 286 of pydrive's files.py file reads <code>self.content.close()</code> before we run the trash cell.

In [None]:

import sys

file_path = r'{}\Lib\site-packages\pydrive\files.py'.format(sys.prefix)
!"{text_editor_path}" "{file_path}"


Before we start the download process, let's go over setting a few variables:

* **The downloads folder:** This is where all the messy work will be done. All mp3 files and any intermediate files will be deleted out of this folder by the end.
* **The URL list:** This is where we place our YouTube video URLs. It doesn't matter if you remove the <code>time_continue=</code> parameter in the URL, the mp3 will just start at zero seconds anyway.
* **The target folder ID:** This is the folder in Google Drive where our uploads will end up.

In [None]:

from bs4 import BeautifulSoup as bs
import re
import requests

site_url = 'https://www.youtube.com/'
youtube_css = '#items'
site_page = requests.get(url=site_url)
site_html = site_page.content
site_soup = bs(site_html, 'lxml')
site_soup.select(youtube_css)

In [1]:

import youtube_dl

downloads_folder = r'C:\Users\dev\Downloads'
youtube_url_list = ['https://www.youtube.com/watch?v=Unzc731iCUY']

Now we are ready to start the download process by running <code>youtube_dl</code> on our <code>youtube_url_list</code>.

In [2]:

import os

ydl_opts = {
    'format': 'bestaudio/best',
    'nocheckcertificate': False,
    'outtmpl': os.path.join(downloads_folder, youtube_dl.DEFAULT_OUTTMPL),
    'postprocessors': [{'key': 'FFmpegExtractAudio',
                       'preferredcodec': 'mp3',
                       'preferredquality': '192'}],
    'verbose': True,
    }
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
    for url in youtube_url_list:
        try:
            # It also downloads the videos
            res = ydl.extract_info(url,
                                   force_generic_extractor=ydl.params.get('force_generic_extractor',
                                                                          False))
        except Exception as e:
            print()
            print('####################################################')
            print('{}: {}'.format(url, e))
            print('####################################################')
            print()

    print(ydl._download_retcode)
print('Conversion completed.')

[debug] Encodings: locale cp1252, fs utf-8, out UTF-8, pref cp1252
[debug] youtube-dl version 2020.06.16.1
[debug] Python version 3.7.6 (CPython) - Windows-10-10.0.18362-SP0
[debug] exe versions: avconv v13_dev0-1440-g34c1133, avprobe v13_dev0-1440-g34c1133, ffmpeg git-2020-01-08-5bd0010, ffprobe git-2020-01-08-5bd0010
[debug] Proxy map: {}


[youtube] Unzc731iCUY: Downloading webpage
[debug] Invoking downloader on 'https://r1---sn-bvvbax-cvnl.googlevideo.com/videoplayback?expire=1595262279&ei=53AVX8T1BoiC8wSvi6igDg&ip=24.91.86.222&id=o-AKE-aqCyjSBKLrndLN38UnaH0abl0FZV8ZJc7pYL6cAR&itag=140&source=youtube&requiressl=yes&mh=B4&mm=31%2C29&mn=sn-bvvbax-cvnl%2Csn-ab5szn7y&ms=au%2Crdu&mv=m&mvi=1&pl=21&initcwndbps=1881250&vprv=1&mime=audio%2Fmp4&gir=yes&clen=61860446&dur=3822.306&lmt=1577533606021865&mt=1595240501&fvip=1&keepalive=yes&c=WEB&txp=5531432&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cgir%2Cclen%2Cdur%2Clmt&sig=AOq0QJ8wRQIhAIAbxC8FV_k_OQZIbjhar6-VWOSSofdsdYFr4HadIv0nAiAyANFxcGj9xTy0xD73sYKGvF77UG0zbYe8RV3A9NLOSQ%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRQIhAKzJKXPAQ6VqygS-VnzHRN8cg7cEBwae2OaCSSHmRhYeAiBIs8XIe-kuBSqQQx_CnV2f9jPZgp9C2WWjcfkXAr0_CA%3D%3D&ratebypass=yes'
[download] Destination: C:\Users\dev\Downloads\How To Speak by Patrick Winston-Unzc731iCU

In [3]:

!start %windir%\explorer.exe "{os.path.abspath(downloads_folder)}"

In [4]:

import re

files_list = os.listdir(downloads_folder)
for file_name in files_list:
    if file_name.endswith('.mp3'):
        src_path = os.path.join(downloads_folder, file_name)
        file_name = re.sub(r'-[A-Z0-9a-z-_]{11}\.mp3', '.mp3', file_name)
        dst_path = os.path.join(downloads_folder, file_name)
        os.rename(src_path, dst_path)

In [5]:

# ID of the "Reading Material" folder
tgt_folder_id = '1syfUx6jukbW1CWIEy8xoM9veGGr5MBUh'
rm_url = 'https://drive.google.com/drive/u/0/folders/{}'.format(tgt_folder_id)
!"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" {rm_url}


Next, we authenticate with Google by running the builtin web server conveniently provided.

In [1]:

from pydrive.auth import GoogleAuth

# The URL that gets displayed here is not suitable for public consumption
GoogleAuth.DEFAULT_SETTINGS['client_config_file'] = r'../json/client_secret.json'
gauth = GoogleAuth()
gauth.LocalWebserverAuth()

InvalidConfigError: Invalid client secrets file ('Error opening file', '../json/client_secret.json', 'No such file or directory', 2)


Before you run the next cell, in the online web page of Google Drive, bring up the share dialog box of the folder you are uploading to and copy its folder id into the <code>tgt_folder_id</code> string. If you right-click on the folder and select **Share...** and then click **Get shareable link**, you will see something like:

https://drive.google.com/drive/folders/1syfUx6jukbW1CWIEy8xoM9veGGr5MBUx?usp=sharing

It's that **1syfUx6jukbW1CWIEy8xoM9veGGr5MBUx** that you need to copy.

In [None]:

from pydrive.drive import GoogleDrive

# Create GoogleDrive instance with authenticated GoogleAuth instance
drive = GoogleDrive(gauth)

# ID of the "Reading Material" folder
tgt_folder_id = '1syfUx6jukbW1CWIEy8xoM9veGGr5MBUh'


Next, we will upload all the mp3s in our download folder.

In [None]:

import os

# Upload all mp3s from the Downloads folder to Google Drive's "Reading Material" folder
gfile_dict = {}
for subdir, dirs, files in os.walk(downloads_folder):
    for src_file in files:
        if src_file.endswith('.mp3'):
            src_path = os.path.join(subdir, src_file)
            title = '-'.join(src_file.split('-')[:-1]) + '.mp3'
            gfile_dict[src_file] = drive.CreateFile({'title':title, 'mimeType':'audio/mp3',
                                                     'parents': [{'kind': 'drive#fileLink',
                                                                  'id': tgt_folder_id}]})

            # Read mp3 file and set it as a content of this instance
            gfile_dict[src_file].SetContentFile(src_path)
            
            # Upload the file
            try:
                gfile_dict[src_file].Upload()
                print('Uploaded %s (%s)' % (gfile_dict[src_file]['title'],
                                            gfile_dict[src_file]['mimeType']))
            except Exception as e:
                print('Upload failed for %s (%s): %s' % (gfile_dict[src_file]['title'],
                                                         gfile_dict[src_file]['mimeType'], e))
print('Upload completed.')


Download the mp3s to <a href='http://www.voicedream.com/support/user-manual/#file'>Voice Dream</a> before you run the trash cell (below).

In [None]:

import os

# Trash all mp3s from Google Drive's "Reading Material" folder
for src_file in gfile_dict.keys():

    # Trash mp3 file
    try:
        gfile_dict[src_file].Trash()
        print('Trashed %s (%s)' % (gfile_dict[src_file]['title'],
                                   gfile_dict[src_file]['mimeType']))
    except Exception as e:
        print('Trash failed for %s (%s): %s' % (gfile_dict[src_file]['title'],
                                                gfile_dict[src_file]['mimeType'], e))
    
print('Trashing completed.')


Lastly, we run the code that will delete all the mp3s out of our downloads folder.

In [None]:

import os

# Delete all mp3s in the Downloads folder
for src_file in gfile_dict.keys():
    src_path = os.path.join(downloads_folder, src_file)

    # Delete the file
    try:
        os.remove(src_path)
        print('Deleted %s (%s)' % (src_file,
                                   gfile_dict[src_file]['mimeType']))
    except Exception as e:
        print('Failed to delete %s (%s): %s' % (src_file,
                                                gfile_dict[src_file]['mimeType'], e))
print('Deleting completed.')


If you open the downloads folder with the code below you can see that the deletion is complete.

In [None]:

!start %windir%\explorer.exe "{os.path.abspath(downloads_folder)}"

In [None]:

from IPython.display import HTML

rm_url = 'https://drive.google.com/drive/u/0/folders/{}'.format(tgt_folder_id)
message_str = 'open up your Google Drive target folder'
rm_link = '<a href="{}" target="_blank">{}</a>'.format(rm_url, message_str)
message_str = 'If you want to, you can run the code below to {} and'.format(rm_link)
message_str += ' check if everything got deleted.'
HTML(message_str)

In [None]:

!"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" {rm_url}