### A MINI Tutorial on selenium, a framework that allows us to interact with HTML elements on any website
### Description: 
Not only does Selenium provide web scraping tools, it also gives us the ability to mimic a normal browsing interaction, like clicking on a link, typing in a text area, dragging and dropping, and much more. In this mini tutorial, we'll use Selenium to perform the following tasks:
* Access a membership website and log in by 'typing' our login credentials and  'clicking' the login button
* Navigate to 100+ webpages within the website via automated clicks and collect some data (100+ urls)
* Automate the download of 100+ videos located on different webpages, and organize them in specific files
* Create a script to download future uploads<br>
Before we get started, you'll need the following:
* Download and install the Selenium libraries (use the following command to install selenium: _pip install selenium_)
* A web driver, which selenium will use to browser the internet. I'm using a chrome driver, which you can download [here](https://chromedriver.chromium.org/downloads)
* You also need to install vimeo downloader for downloading videos directly from vimeo(use the following command:_pip install vimeo-downloader_)
<p>Okay, we're ready to go! Uncomment the following cell and run it to download _selenium_ and _vimeo-downloader_. Then we'll perform some necessary imports. <br>


In [2]:
# !pip install selenium
# !pip install vimeo-downloader

In [3]:
from selenium import webdriver                  # to interact with a web browser
from selenium.webdriver.common.keys import Keys # gives us access to keys (like space, tab, enter), which our code can use to interact with the website
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from vimeo_downloader import Vimeo
import json

### Helper functions
Before we go any further, I'll define some helpful functions to facilate the task. View the commentes above the function headers for more details about the function.
#### _login_to_website()_
* Once we've accessed the website, we'll use this function to click "LOGIN", tell selenium to type in the username and password, and effectively log in to the site.

In [4]:
# save 'status files' to this directory
status_files_directory = "status files/"

# Login to the website
# returns True on a successfull Login. Otherwise returns False and prints the exception that occured.
def login_to_website(USERNAME, PASSWORD, driver):
    try:
        cursor = WebDriverWait(driver, 20).until(
            EC.presence_of_element_located((By.LINK_TEXT, "LOGIN"))
        )
        cursor.click()

        # Log in
        cursor = WebDriverWait(driver, 3).until(
            EC.presence_of_element_located((By.ID, "user_login"))
        )
        cursor.send_keys(USERNAME)
        cursor = driver.find_element(By.ID, "user_pass")
        cursor.send_keys(PASSWORD)
        cursor.send_keys(Keys.RETURN)
        print("Login Successfull!")
        return True

    except Exception as e:
        print("OOPS!! Couldn't Log you in!", e, sep="\n\t")
        driver.quit()
        return False


#utility: to save JSON objects to specified file
def save_to_file(fileName, dictionary):
    with open(fileName, 'w') as f:
        f.write(json.dumps(dictionary))
    print(f"File saved to '{fileName}'")
    

#utility: to read JSON objects from specified file
def read_from_file(fileName):
    with open(fileName, 'r') as f:
        data = json.load(f)
    return data
        

# For each section (skill level), this function takes the html_articles
#  and extracts the urls to the lessons in that section
def get_lessons_webpages(hashset, html_articles):
    links = []
    for article in html_articles:
        webpage_link = article.find_element(By.TAG_NAME, 'a').get_attribute('href')     # the link to the lessons
        if webpage_link not in hashset:
            hashset.add(webpage_link)
            links.append(webpage_link)
    return links

# gets all the urls to the midi files/lessons on Sean's website
# saves links as a JSON object to a file on disk.
# returns a dictionary: KEYS: skill level, VALUES: [urls to midi lessons/files for this competence level]
def get_webpages_for__midi_section(driver, levels): 
    all_lessons_pages = set()
    urls_for_section = {}
    print("obtaining MIDI urls...")

    # for all sections (skill level), get all the urls to the lessons
    for level in levels:
        lessons_html_articles = driver.find_elements(By.CLASS_NAME, f"filter-{level.lower().replace(' ', '-')}")
        lessons_urls = get_lessons_webpages(all_lessons_pages, lessons_html_articles)
        urls_for_section[level] = lessons_urls

    print("Done! MIDI lessons urls acquired!")
    return urls_for_section



### More helper functions to download files in the *MIDI FILES* section of the website
View comments for more details on each function.<br> 
Notice the following line of code:<br>
* WebDriverWait(driver, 3).until(EC.presence_of_element_located((By.TAG_NAME, "article")))
<p>This is one of locating HTML elements on a website. In this case, we locate the HTML element with Tag Name "article". Sometimes when accessing an element on the site, it is possible that the entire website has not been fully loaded on our browser.<br>- _"WebDriverWait(driver, 3).until(EC.presence_of_element_located..."_ tells selenium to wait for a maximum of 3 seconds for the element to be found.<br>- You can wait for multiple elements using EC.presence_of_all_elements_located (see *get_roadmap_vimeo_links()* function for how I used it)

In [5]:
def get_video_source(iframe):
    return iframe.get_attribute("src")

# This function will navigate to each website in "urls_for_MIDI_section" and
#  get all video and midi files urls on the page, and save them to a disk
# urls_for_MIDI_section: All urls to lessons in the MIDI section
def get_video_midi_links(driver, urls_for_MIDI_section): 
    all_links = {}  #? {Level -> [([video_links], [LMV_MIDI_links]), (...), (...)], ...}
    failed_links = []
    
    for level in urls_for_MIDI_section:
        # create folder for each section (skill level)
        folder_name = level
        all_links[folder_name] = []
        print("\tRetrieving Videos and MIDI URLS for section: "+folder_name+"...")
        
        # get all video, LMV, and MIDI files on this skill level
        for page_url in urls_for_MIDI_section[level]:
            try: 
                driver.get(page_url)

                # all links (video, LMV, MIDI) are in the <article> tag
                html_articles = WebDriverWait(driver, 3).until(
                    EC.presence_of_element_located((By.TAG_NAME, "article"))
                )
                # get all video links (each video is in an iframe)
                iframes = html_articles.find_elements(By.TAG_NAME, "iframe")
                video_links = [get_video_source(iframe) for iframe in iframes]

                # get and MIDI LMV files if present
                LMV_MIDI_tags = html_articles.find_elements(By.TAG_NAME, "a")
                LMV_MIDI_links = [x.get_attribute("href") for x in LMV_MIDI_tags]
                
                all_links[folder_name].append((video_links, LMV_MIDI_links))

            except:
                print("FAILED to get: ", page_url)
                failed_links.append(page_url)

        print("\t"+folder_name+" Done")

    print("ALL DONE!")
    return all_links


### Helper function for downloading the videos
The vimeo-downloader expects the video urls to contain the video ID only, no query parameters. We'll use these helper functions to clean-up collected urls by removing query parameters. I'll only download 720p quality to save storage space. 

In [6]:
# Given the video, lvm and midi urls, this function will clean up the urls 
# and return a dictionary containing the direct urls to the videos
# (e.g https://player.vimeo.com/video/621834761)
def get_vimeo_links(midi_section_urls):
    vimeo_links = {}

    for level in midi_section_urls:
        v_links = []

        for src_urls in midi_section_urls[level]:
            videos = src_urls[0]
            for v in videos:
                # remove the url query parameters and get just the video ID
                index = v.find('?')
                video_link = v.replace(v[index:], '')
                v_links.append(video_link)
                # total_size += float(Vimeo(video_link).streams[-2].filesize.split()[0]) #max resolution he records in is 1080p, so get the 720p versions to save storage space
        
        vimeo_links[level] = v_links
    return vimeo_links


# given a video url, downloads video(720p) and saves it to the specified destination
# if a download is unsuccessful, it will send the link back to the caller in "failed_downloads" 
def download_video(video_url, destination_folder, failed_downloads):
    try:
        video = Vimeo(video_url)#.best_stream
        
        # we'll only download 720p quality to save storage space
        for s in video.streams:
            if s.quality == "720p":
                stream = s
                break        
        
        stream.download(destination_folder, stream.title)
        return True
    except Exception as e:
        print("OOPS!!! Download failed. url appended to the 'failed downloads' list")
        failed_downloads.append(video_url)
        print(e)
        return False


# get the video links in the specified RoadMap section
def get_roadmap_vimeo_links(level):
    cursor =  WebDriverWait(driver, 3).until(
        EC.presence_of_element_located((By.LINK_TEXT, "ROADMAPS"))
    )
    cursor.click()

    # go to the intermidiate section
    cursor =  WebDriverWait(driver, 3).until(
        EC.presence_of_element_located((By.LINK_TEXT, level))
    )
    cursor.click()

    # get all video links (each video is in an iframe)
    html_articles = WebDriverWait(driver, 3).until(
        EC.presence_of_all_elements_located((By.TAG_NAME, "iframe"))
    )
    video_links = [get_video_source(iframe) for iframe in html_articles]
    return video_links


# Running the script
We'll use the above helper functions to obtain what we need from the website. Here are the sections we need from the membership website:
* 1- MIDI FILES (videos and MIDI/LMV files) for the following skill levels: _Competent, Intermediate, Advanced_ 
* 2- ROADMAPS for Intermediate and Advanced levels (Vimeo links only)
* 3- LIVE TRAINING. The live training section consists of 3 subsections. We'll get all the vimeo links for videos in each sections and save to file for future access.
<p>After installing the webdriver, we use the path to tell selenium where to access it, and we create a driver which we'll use to navigate the internet. We access the website and then log in using the username and password stored in the specified text file. 


In [7]:
webdriverPATH = "C:\Program Files (x86)\chromedriver.exe"
s = Service(webdriverPATH)
driver = webdriver.Chrome(service=s)
# driver = webdriver.Chrome(webdriverPATH)  # depricated

# read username and password from file
login_info  = read_from_file("loginInfo.txt")

website = "https://seanwilsonpiano.com"
driver.get(website)

#1 Log in to the website
login_to_website(login_info["USERNAME"], login_info["PASSWORD"], driver)

Login Successfull!


True

## First we'll deal with the "MIDI Files" section
We'll go to the _"MIDI FILES"_ section and get the urls to all the lessons in this section. As before, we'll locate the html elment using *EC.presence_of_element_located_()*. You May search HTML elements by ID, tag name, link text and more ([view selenium documentation for more details](https://selenium-python.readthedocs.io/)) and navigate to it using the _"click()"_ method. Using our utility function (described above), we'll get all the urls to the lessons (_Advanced, Competent, Intermediate and Song Breakdowns_).

In [8]:
# go to the MIDI files page
cursor =  WebDriverWait(driver, 3).until(
    EC.presence_of_element_located((By.LINK_TEXT, "MIDI FILES"))
)
cursor.click()
    
# get the different skill levels
midi_tutorials = driver.find_element(By.ID, "tabs")
levels = midi_tutorials.find_elements(By.TAG_NAME, "li")

# remove the "VIEW ALL" and "BEGINNER" sections
del levels[0]
del levels[1]
levels = [x.text for x in levels]
# Get the urls for each lessons
urls_for_MIDI_lessons = get_webpages_for__midi_section(driver, levels)

# save the urls links to a file on disk
save_to_file(status_files_directory+"All midi pages.txt", urls_for_MIDI_lessons)


obtaining MIDI urls...
Done! MIDI lessons urls acquired!
File saved to 'status files/All midi pages.txt'


We saved the file to disk as a JSON object, to be used for furture comparison the next time we want to download new content from the website. Here's what the file content looks like for the "Song Breakdowns" section

In [193]:
midi_section_urls = read_from_file("status files/All midi pages.txt")
count = sum([len(x) for x in midi_section_urls.values()])
print(f"Number of websites to visit: {count}")
midi_section_urls["SONG BREAKDOWNS"]

Number of websites to visit: 105


['https://seanwilsonpiano.com/kevin-bond-voicings/',
 'https://seanwilsonpiano.com/how-to-play-lord-you-are-awesome-west-coast-version/']

As you can see, there are 105 websites to visit and collect video and midi urls from. Luckily we won't have to do this manually. We'll use selenium to access each of these pages. For each page, we'll get the url links for videos, LMV and MIDI present. To accomplish this, we'll use the function _"get_video_midi_links()"_ (described in the helper section above) 

In [194]:
all_links_in_MIDI = get_video_midi_links(driver, midi_section_urls)
save_to_file(status_files_directory+"All midi video links.txt", all_links_in_MIDI)


	Retrieving Videos and MIDI URLS for section: ADVANCED...
	ADVANCED Done
	Retrieving Videos and MIDI URLS for section: COMPETENT...
	COMPETENT Done
	Retrieving Videos and MIDI URLS for section: INTERMEDIATE...
	INTERMEDIATE Done
	Retrieving Videos and MIDI URLS for section: SONG BREAKDOWNS...
FAILED to get:  https://seanwilsonpiano.com/kevin-bond-voicings/
	SONG BREAKDOWNS Done
ALL DONE!
File saved to 'status files/All midi video links.txt'


Great! We now have the video and midi urls that we want to download, and we've saved them to a file for ease of access. Before we proceed, let's see that the collected data for one video lessons looks like.

In [195]:
video_lvm_midi_urls = read_from_file("status files/All midi video links.txt")
video_lvm_midi_urls["ADVANCED"][3]

[['https://player.vimeo.com/video/513998524?portrait=0&title=1&color=fff&byline=1&autopause=0',
  'https://player.vimeo.com/video/514016566?portrait=0&title=1&color=fff&byline=1&autopause=0'],
 ['https://gumroad.com/l/NFOgj/seanwilsonpiano123member07',
  'https://app.box.com/s/9n7axyq5txnt58qacskzwjjecytmdq86',
  'https://app.box.com/s/rq8mhpoiwyfrs42p0hdwgknsmey227gb',
  'https://gospelmusicians.com/midiculous-4.html']]

Let's further clean up this data using our _"get_vimeo_links()"_ function. This will separate the video urls from the midi/lmv urls. Also, it will get rid of the query parameters that are linked to the vimeo lessons (_this will facilitate the download process_). Here is a sample of the cleaned-up video urls. 

In [196]:
vimeo_links_midi = get_vimeo_links(video_lvm_midi_urls)
count = sum([len(x) for x in vimeo_links_midi.values()])
print(f"Number of videos to download: {count}")
vimeo_links_midi["ADVANCED"][:5]

Number of videos to download: 176


['https://player.vimeo.com/video/621834761',
 'https://player.vimeo.com/video/589502197',
 'https://player.vimeo.com/video/589520216',
 'https://player.vimeo.com/video/534543521',
 'https://player.vimeo.com/video/534531929']

Now that we have the vimeo links, we can download the videos from vimeo. We'll use the vimeo downloaded module that we installed at the begining of the tutorial.

In [197]:
failed_downloads = []
downloaded_videos = {}
for level in vimeo_links_midi:
    result = []
    for video_url in vimeo_links_midi[level]:
        downloaded = download_video(video_url, level, failed_downloads)
        if downloaded == True:
            result.append(video_url)
    downloaded_videos[level] = result
save_to_file(status_files_directory+"downloaded midi videos.txt", downloaded_videos)

There is a fountain, George Granville vid.mp4: 26244KB [00:02, 9583.01KB/s] 
Jason White (Site I).mp4: 87427KB [00:14, 5965.00KB/s]                           
Jason White (Site II).mp4: 92757KB [00:13, 6747.27KB/s]                           
Mike Won the Victory Site I.mp4: 111381KB [00:17, 6537.08KB/s]                            
Mike Won the Victory Site II.mp4: 59661KB [00:06, 8598.03KB/s]                           
Overjoyed Site Beginner.mp4: 84697KB [00:09, 8668.50KB/s]                            
Overjoyed Site Intermediate Theory Discussions.mp4: 131128KB [00:12, 10361.97KB/s]                            
Cory Henry Doobie Shed Session.mp4: 169296KB [00:12, 13883.54KB/s]                            
Jo Pryor Band Killin Site.mp4: 61258KB [00:04, 14310.72KB/s]                           
Secret Place Tutorial.mp4: 472633KB [00:34, 13529.09KB/s]                            


OOPS!!! Download failed. url appended to the 'failed downloads' list
404: Unable to retrieve download links
OOPS!!! Download failed. url appended to the 'failed downloads' list
404: Unable to retrieve download links


Chord Idea Travis.mp4: 13046KB [00:01, 11773.64KB/s]                           
2nd line (travis).mp4: 12882KB [00:00, 13813.98KB/s]                           
3rd Line (travis).mp4: 33622KB [00:02, 13622.26KB/s]                           


OOPS!!! Download failed. url appended to the 'failed downloads' list
local variable 'stream' referenced before assignment
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/jFpx-caTqhs is not supported. Make sure you don't include query parameters in the url


The Anthem.mp4: 11348KB [00:00, 13735.16KB/s]                           


OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/i6OA1tr-Bno is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
local variable 'stream' referenced before assignment
OOPS!!! Download failed. url appended to the 'failed downloads' list
local variable 'stream' referenced before assignment
OOPS!!! Download failed. url appended to the 'failed downloads' list
local variable 'stream' referenced before assignment
OOPS!!! Download failed. url appended to the 'failed downloads' list
local variable 'stream' referenced before assignment
OOPS!!! Download failed. url appended to the 'failed downloads' list
local variable 'stream' referenced before assignment
OOPS!!! Download failed. url appended to the 'failed downloads' list
local variable 'stream' referenced before assignment
OOPS!!! Download failed. url appended to the 'failed downloads' list
local variable

Oh Give Thanks Suggestion video.mp4: 80541KB [00:05, 14323.45KB/s]                           


OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/32K1EcvEASk is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/pMEgX1XXs0w is not supported. Make sure you don't include query parameters in the url


There is no way vimeo.mp4: 118778KB [00:10, 11782.67KB/s]


OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/4w-86S0OyPI is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/DJqK077XkZg is not supported. Make sure you don't include query parameters in the url


Eddie Brown Talk Music Part 2.mp4: 37690KB [00:02, 13707.53KB/s]                           
Eddie Brown Talk Music Part 2.mp4: 37690KB [00:02, 13201.30KB/s]                           


OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/8NIvPxBTOs0 is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/pUBWnvspbZk is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/4jFRlZ5D-Ak is not supported. Make sure you don't include query parameters in the url


Ear Training 101 - Intervals.mp4: 35505KB [00:02, 13508.06KB/s]                           
Ear Training 102 - Intervals.mp4: 42948KB [00:03, 13335.43KB/s]                           
Ear Training 103 - Intervals.mp4: 34849KB [00:02, 13595.05KB/s]                           
Ear Training 104.mp4: 88734KB [00:06, 13681.69KB/s]                           
Ear Training 104.mp4: 88734KB [00:06, 13075.76KB/s]                           


OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/KgPISrDwOzA is not supported. Make sure you don't include query parameters in the url


Cory Henry Practice Section 1.mp4: 11277KB [00:01, 9173.29KB/s]                           
Cory Henry Practice Section 1.mp4: 11277KB [00:00, 13273.39KB/s]                          
Cory Henry Practice Section 2.mp4: 16825KB [00:01, 14294.72KB/s]                           
Cory Henry Practice Section 3.mp4: 22150KB [00:02, 8400.46KB/s]                            
Cory Henry Practice Section 4.mp4: 22647KB [00:01, 13139.83KB/s]                           


OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/fbQiKMgcU4o is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/c0zs5AlMzN4 is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/NXZGU6onuPg is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/1PDQXBTtnrQ is not supported. Make sure you don't include query parameters in the url


Cory Henry Yesterday.mp4: 221333KB [00:15, 14095.21KB/s]                            


OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/noS8r0-5FNU is not supported. Make sure you don't include query parameters in the url


8 note chord progression.mp4: 28052KB [00:01, 14240.66KB/s]                           
Somewhere over the rainbow Quennel.mp4: 2333KB [00:00, 9294.30KB/s]                          


OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.facebook.com/v3.3/plugins/comments.php is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/tzQPP_1_AE8 is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
404: Unable to retrieve download links
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/D_cF3F32ey8 is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
local variable 'stream' referenced before assignment
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/EddlK-Nar9Y is not supported. Make sure you don't include query parameters in the url


We are not ashamed (2).mp4: 50272KB [00:03, 14185.35KB/s]                           


OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/DjoY1qhpvTk is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/JdLJz7vIkeY is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/ZQA0Rz03sXI is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/qpwq-9FzRIw is not supported. Make sure you don't include query parameters in the url


Star Spanglde Banner at 75%.mp4: 9027KB [00:00, 14327.69KB/s]                          
Star Spanglde Banner at 50%.mp4: 13289KB [00:00, 13942.51KB/s]                           


OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/GWaxkBoJPhE is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/nzMBmc7fwE0 is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
403: If the video is embed only, also provide the url on which it is embedded, Vimeo(url=<vimeo_url>,embedded_on=<url>)


Cory national anthem Part 1(site).mp4: 67813KB [00:04, 13951.65KB/s]                           
Cory national anthem Part II (site).mp4: 55313KB [00:05, 11003.02KB/s]                           
Grateful Site (beginners).mp4: 94921KB [00:08, 11278.38KB/s]                           
Grateful Site (competent intermediate).mp4: 112068KB [00:09, 12039.17KB/s]                            
Stand Tutorial.mp4: 125593KB [00:09, 12892.48KB/s]                            
Cory Stand Tutorial.mp4: 42025KB [00:02, 14385.90KB/s]                           
Oh the Blood Site (beginners).mp4: 58424KB [00:04, 14128.51KB/s]                           
Oh the Blood Site (other levels).mp4: 132391KB [00:10, 12349.99KB/s]                            
Manifest Tutorial, Part 1.mp4: 111855KB [00:08, 13620.13KB/s]                            
Manifest Tutorial, Part 2.mp4: 99113KB [00:07, 13117.18KB/s]                           
Manifest Tutorial, Vamp.mp4: 109146KB [00:08, 13298.44KB/s]                            

OOPS!!! Download failed. url appended to the 'failed downloads' list
404: Unable to retrieve download links
OOPS!!! Download failed. url appended to the 'failed downloads' list
404: Unable to retrieve download links


Kevin Bond Promise Tutorial.mp4: 127309KB [00:09, 13015.09KB/s]                            
Lord You are Awesome MIDI File.mp4: 26590KB [00:02, 13067.60KB/s]                           


OOPS!!! Download failed. url appended to the 'failed downloads' list
'files'
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/U9uQ3CB2SWY is not supported. Make sure you don't include query parameters in the url


I have decided Full Lesson.mp4: 192794KB [00:14, 13074.14KB/s]                            


OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/BbpjRmXaAGA is not supported. Make sure you don't include query parameters in the url


Alright Vamp by Doobie Powell.mp4: 72449KB [00:05, 13573.08KB/s]                           


OOPS!!! Download failed. url appended to the 'failed downloads' list
[Errno 22] Invalid argument: 'COMPETENT\\Gain the World *advanced tips and thoughts.mp4'


Best in Me.mp4: 172459KB [00:13, 13239.77KB/s]                            
My Testimony.mp4: 113744KB [00:08, 13650.18KB/s]
My Testimony Extras.mp4: 9604KB [00:00, 12641.81KB/s]                          
My Testimony Extras.mp4: 28186KB [00:02, 11991.49KB/s]                           
You are God Alone tutorial.mp4: 381182KB [00:28, 13579.92KB/s]                            


OOPS!!! Download failed. url appended to the 'failed downloads' list
'files'
OOPS!!! Download failed. url appended to the 'failed downloads' list
local variable 'stream' referenced before assignment


You are God Alone tutorial.mp4: 251295KB [00:20, 12517.62KB/s]                            
You are God Alone MIDI Vid.mp4: 12760KB [00:00, 14018.02KB/s]                           
Great it thy faithfulness tutorial.mp4: 84985KB [00:08, 10018.51KB/s]                           
Jayden Baker, Jesus, Jesus (site).mp4: 161328KB [00:12, 12696.15KB/s]                            
The 2-5- half step rule (for site).mp4: 51451KB [00:04, 12629.91KB/s]                           
Milo's moped.mp4: 42799KB [00:03, 13358.44KB/s]                           


OOPS!!! Download failed. url appended to the 'failed downloads' list
local variable 'stream' referenced before assignment
OOPS!!! Download failed. url appended to the 'failed downloads' list
404: Unable to retrieve download links
OOPS!!! Download failed. url appended to the 'failed downloads' list
404: Unable to retrieve download links


Phil All I do Vimeo.mp4: 5905KB [00:00, 8658.48KB/s]                          


OOPS!!! Download failed. url appended to the 'failed downloads' list
https://gumroad.com/l/HcafX is not supported. Make sure you don't include query parameters in the url


Play Father Jesus Holy Spirit like Phil Feaster.mp4: 58338KB [00:04, 13113.44KB/s]                           
You are my hiding place site.mp4: 90834KB [00:06, 13629.73KB/s]                           
I am Still Here by Mike Bereal.mp4: 155673KB [00:11, 13658.33KB/s]                           
Jonathon Nelson I love you tutorial.mp4: 70456KB [00:05, 13704.99KB/s]                           
Travis Sayles Site.mp4: 148442KB [00:11, 13330.31KB/s]                            
A Praying Spirit.mp4: 11767KB [00:01, 11633.67KB/s]                           
Rodney East Allelujah Site.mp4: 108979KB [00:07, 13780.91KB/s]                            
He Shall Purify Tutorial, Part 1 (of 3).mp4: 56487KB [00:04, 13537.78KB/s]                           
Oh Holy Night Tutorial.mp4: 372380KB [00:36, 10300.18KB/s]                            
When Sunday Comes, Tutorial.mp4: 211139KB [00:15, 13237.09KB/s]                            


OOPS!!! Download failed. url appended to the 'failed downloads' list
404: Unable to retrieve download links


Wendell Lowe Site.mp4: 164346KB [00:12, 13068.93KB/s]                           


OOPS!!! Download failed. url appended to the 'failed downloads' list
404: Unable to retrieve download links
OOPS!!! Download failed. url appended to the 'failed downloads' list
404: Unable to retrieve download links
OOPS!!! Download failed. url appended to the 'failed downloads' list
404: Unable to retrieve download links
OOPS!!! Download failed. url appended to the 'failed downloads' list
403: If the video is embed only, also provide the url on which it is embedded, Vimeo(url=<vimeo_url>,embedded_on=<url>)
OOPS!!! Download failed. url appended to the 'failed downloads' list
404: Unable to retrieve download links


Javad Day Part 2.mp4: 57697KB [00:07, 7217.93KB/s]                            
Holy Ghost by Kim Burrell.mp4: 122308KB [00:09, 12950.59KB/s]                            
Love Medley Site.mp4: 210050KB [00:15, 13539.50KB/s]                            


OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/TduMmjj0Y60 is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/No8NOSmgv-4 is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/mFILJ9vwSaA is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
[Errno 22] Invalid argument: 'INTERMEDIATE\\Play the Transition Chords of "Sinking" Tye Tribbet.mp4'
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.facebook.com/v3.3/plugins/comments.php is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.

Worthy is the Lamb Tutorial.mp4: 185606KB [00:13, 13476.89KB/s]                            


OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/AR6CJDHzILQ is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/xrh7_1ichng is not supported. Make sure you don't include query parameters in the url


Ear Training MINI Lesson.mp4: 23335KB [00:01, 11780.73KB/s]                           


OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.facebook.com/v3.3/plugins/comments.php is not supported. Make sure you don't include query parameters in the url
OOPS!!! Download failed. url appended to the 'failed downloads' list
https://www.youtube.com/embed/PBGIutMfmHc is not supported. Make sure you don't include query parameters in the url


Glory be to the Father.mp4: 22996KB [00:01, 12929.55KB/s]                           
And to the Son.mp4: 35617KB [00:02, 13146.47KB/s]                           
And to the Holy Ghost.mp4: 45446KB [00:03, 13881.50KB/s]                           
As it was in the beginning.mp4: 24319KB [00:01, 14022.32KB/s]                           
It is now and ever shall be.mp4: 22146KB [00:01, 13579.28KB/s]                           
World Without end.mp4: 13373KB [00:00, 13840.67KB/s]                           
Amen Amen.mp4: 24784KB [00:01, 13197.35KB/s]                           
Lord you are awesome Tutorial.mp4: 47204KB [00:03, 12801.85KB/s]                           
Lord You are Awesome Tutorial Part 2.mp4: 26755KB [00:02, 13027.31KB/s]                           
Lord you are awesome for beginners.mp4: 42911KB [00:04, 10030.62KB/s]                           

File saved to 'status files/downloaded midi videos.txt'





In [213]:
# handle failed downloads
print(f"Number of videos that Failed to downloads:  {len(failed_downloads)}")
failed_downloads[:10]

Number of videos that Failed to downloads:  69


['https://player.vimeo.com/video/3311353',
 'https://player.vimeo.com/video/32557617',
 'https://player.vimeo.com/video/20573696',
 'https://www.youtube.com/embed/jFpx-caTqhs',
 'https://www.youtube.com/embed/i6OA1tr-Bno',
 'https://player.vimeo.com/video/295064483',
 'https://player.vimeo.com/video/295064452',
 'https://player.vimeo.com/video/295064409',
 'https://player.vimeo.com/video/294966261',
 'https://player.vimeo.com/video/294966290']

In [217]:
# remove youtube links and facebook links
failed_downloads_clean = list(filter(lambda x: x.find('youtube') == -1, failed_downloads))    # remove youtube links, can view the videos on the youtube channel
failed_downloads_clean = list(filter(lambda x: x.find('facebook') == -1, failed_downloads_clean))    # remove facebook links
print(f"Number of vimeo links that failed: {len(failed_downloads_clean)}")
failed_downloads_clean
save_to_file(status_files_directory+"failed downloads.txt", {"failed vimeo downloads": failed_downloads_clean})

Number of vimeo links that failed: 33
File saved to 'status files/failed downloads.txt'


### ROADMAPS SECTION
For this section, I'll just save the vimeo links. I won't download any videos because I don't need them as much I needed the MIDI lessons.

In [None]:
roadMaps = {}

I_roadmap_video_urls = get_roadmap_vimeo_links("Intermediate")
roadMaps["Intermediate"] = I_roadmap_video_urls

# uncomment this to download the advanced roadmap videos when they are uploaded
# A_roadmap_video_urls = get_roadmap_vimeo_links("Advanced")
# roadMaps["Advanced"] = A_roadmap_video_urls

save_to_file("status files/ROADMAPs.txt", roadMaps)
roadMaps

### TRAINING SECTION
Just like the _ROADMAPS_ section, we'll only collect the vimeo links for this section. The _LIVE TRAINING_ section is divided into 3 subsections. For each subsection, we'll get the url to the training, obtain the vimeo link and save to a file on disk

In [1]:
def get_training_urls():
    live_training = {}
    cursor =  WebDriverWait(driver, 3).until(
        EC.presence_of_element_located((By.LINK_TEXT, "LIVE TRAININGS"))
    )
    cursor.click()

    # organize by sections
    sections = WebDriverWait(driver, 3).until(
        EC.presence_of_element_located((By.ID, "tabs"))
    )
    sections = sections.find_elements(By.TAG_NAME, "li")
    del sections[0]   # remove the "VIEW ALL" section
    
    for section in sections:
        filter = f"filter-{section.text.lower().replace(' ', '-')}"
        articles = driver.find_elements(By.CLASS_NAME, filter)
        
        # get and save urls to live trainings for each section (guests, monthly lives, and sean)
        section_urls = [article.find_element(By.TAG_NAME, 'a').get_attribute("href") for article in articles]
        live_training[section.text] = section_urls
            
    return live_training


# takes in the url to the live training and returns the vimeo url 
def get_vimeo_link_from_url(page):
    driver.get(page)
    html_article =  WebDriverWait(driver, 3).until(
        EC.presence_of_element_located((By.CLASS_NAME, "lesson_content"))
    )
    iframe = html_article.find_element(By.TAG_NAME, "iframe")
    vimeo_link = get_video_source(iframe)
    
    return vimeo_link 


In [28]:
# collect and save all urls to live trainings
live_training = get_training_urls()
save_to_file(status_files_directory+"live training urls.txt", live_training)

# get the video links for each live training session and save to file
live_training_vimeo_links = {}
for section in live_training:
    links_for_section  = []
    for url in live_training[section]:
        video_src = get_vimeo_link_from_url(url)
        links_for_section.append(video_src)
    live_training_vimeo_links[section] = links_for_section

save_to_file(status_files_directory+"live training vimeo links.txt", live_training_vimeo_links)

File saved to 'status files/live training urls.txt'
File saved to 'status files/live training vimeo links.txt'


In [None]:
# close the webdriver now that we are done
driver.quit()

We've come to the end of the tuturial. We've learned how to get a website, locate HTML elements, and make our code interact with the website using functions like _"click()"_ and _"send_keys()"_. We also learned how to download videos using vimeo-downloader. I hope you can use this to automate your browsing.<br>BONUS: Below, I'll create a python script which will check the website for new content and download only the new content


<p>The code below is a script which I'll use in the future to automatically download new content from the site. When this script is ran, it will compare the content on the website against the content in our _"status files"_ saved on disk, and only download new content. Now that we've downloaded all these lessons, I can cancel my membership and resubscribe in a few months to download new content.

In [None]:
# # Some helper functions
# # downloads new videos in the MIDI section, provided new videos were uploaded
# def download_new_MIDI_videos(driver, new_midi_lessons_urls, failed_downloads):
#     new_midi_video_lmv = get_video_midi_links(driver, new_midi_lessons_urls)
#     new_vimeo_links_midi = get_vimeo_links(new_midi_video_lmv)
#     for level in new_vimeo_links_midi:
#         for video_url in new_vimeo_links_midi[level]:
#             download_video(video_url, level, failed_downloads)

# # compares 2 dictionaries(files), and retuns only the new elements
# def compare_dict(new_dict, old_dict):
#     result = {}
#     for key in new_dict:
#         result[key] = list(set(new_dict[key]) - set(old_dict[key]))
#     return result

# def update_file(fileName, prevDict, newContentDict):
#     result = prevDict
#     for k in newContentDict:
#         values_to_add = newContentDict[k]
#         newValues = prevDict[k]
#         [newValues.append(x) for x in values_to_add]
#         result[k] = newValues
#     save_to_file(status_files_directory+fileName, result)


# webdriverPATH = "C:\Program Files (x86)\chromedriver.exe"
# s = Service(webdriverPATH)
# driver = webdriver.Chrome(service=s)

# # functions save status files to this directory
# status_files_directory = "status files/"

# # read username and password from file
# login_info  = read_from_file("loginInfo.txt")

# website = "https://seanwilsonpiano.com"
# driver.get(website)

# #1 Log in to the website
# login_to_website(login_info["USERNAME"], login_info["PASSWORD"], driver)

# #2 get updates from the MIDI Lessons section, and download new files if necessary
# cursor =  WebDriverWait(driver, 3).until(
#     EC.presence_of_element_located((By.LINK_TEXT, "MIDI FILES"))
# )
# cursor.click()
    
# # skill levels of interest
# levels = ["INTERMEDIATE", "ADVANCED"]

# # Get the urls for the "MIDI FILES" section and compare to what we have on disk
# urls_for_MIDI_lessons = get_webpages_for__midi_section(driver, levels)
# previous_urls = read_from_file(status_files_directory+"All midi pages.txt")
# new_midi_lessons_urls = compare_dict(urls_for_MIDI_lessons, previous_urls)
# num_new_MIDI_lessons = sum([len(x) for x in new_midi_lessons_urls.values()])
# if num_new_MIDI_lessons == 0:
#     print("No New MIDI Files To Download")
# else:
#     print(f"{num_new_MIDI_lessons} New MIDI Lessons To Download!!")
#     update_file("All midi pages.txt", previous_urls, new_midi_lessons_urls)   # update status file with new content

# # download updates if any
# if sum([len(x) for x in new_midi_lessons_urls.values()]) == 0:  # if there are no updates
#     print("No new MIDI lessons!")
# else:
#     # update status_files_directory+"All midi pages.txt" with the new lessons urls
#     failed_downloads = []
#     download_new_MIDI_videos(new_midi_lessons_urls, failed_downloads)


# #3 Get new links in the ROADMAP section
# #! not available yet. Uncomment this when the Advanced ROADMAP course is released
# # new_roadmap_video_links= get_roadmap_vimeo_links("Advanced")
# # save_to_file(status_files_directory+"ROADMAP Advanced.txt")
