Merge pull request coursera-dl#34 from rbrito/fixes/code-quality

Fixes/code quality
balta2ar · Dec 22, 2012 · da47faa · da47faa
2 parents 3dd088c + 30e7900
commit da47faa
Show file tree

Hide file tree

Showing 2 changed files with 96 additions and 37 deletions.
diff --git a/README.md b/README.md
@@ -1,16 +1,26 @@
 Coursera Downloader
 ===================
-[Coursera] is creating some fantastic, free educational classes (e.g., algorithms, machine learning, natural language processing, SaaS).  This script allows one to batch download lecture resources (e.g., videos, ppt, etc) for a Coursera class.  Given a class name and related cookie file, it scrapes the course listing page to get the week and class names, and then downloads the related materials into appropriately named files and directories.
+
+[Coursera] is creating some fantastic, free educational classes (e.g.,
+algorithms, machine learning, natural language processing, SaaS).  This
+script allows one to batch download lecture resources (e.g., videos, ppt,
+etc) for a Coursera class.  Given a class name and related cookie file, it
+scrapes the course listing page to get the week and class names, and then
+downloads the related materials into appropriately named files and
+directories.
 
 Why is this helpful?  Before I was using *wget*, but I had the following problems:
 
-1. Video names have a number in them, but this does not correspond to the actual order.  Manually renaming them is a pain.
+1. Video names have a number in them, but this does not correspond to the
+   actual order.  Manually renaming them is a pain.
 2. Using names from the syllabus page provides more informative names.
-3. Using a wget in a for loop picks up extra videos which are not posted/linked, and these are sometimes duplicates.
+3. Using a wget in a for loop picks up extra videos which are not
+   posted/linked, and these are sometimes duplicates.
 
 *DownloadThemAll* can also work, but this provides better names.  
 
-Inspired in part by [youtube-dl] by which I've downloaded many other good videos such as those from Khan Academy.
+Inspired in part by [youtube-dl] by which I've downloaded many other good
+videos such as those from Khan Academy.
 
 
 Features
@@ -28,13 +38,16 @@ Features
 Directions
 ----------
 
-Requires Python 2.x (where x >= 5) and a free Coursera account enrolled in the class of interest.
+Requires Python 2.x (where x >= 5) and a free Coursera account enrolled in
+the class of interest.
 
 1\. Install any missing dependencies.
 
-  * [Beautiful Soup 3]  
-  Ubuntu/Debian: `sudo apt-get install python-beautifulsoup`  
-  Mac OSX: bs4 may be required instead (modify import as well)
+  * [Beautiful Soup 3] or [Beautiful Soup 4]  
+  Ubuntu/Debian for BS3: `sudo apt-get install python-beautifulsoup`  
+  Ubuntu/Debian for BS4: `sudo apt-get install python-bs4`  
+  Mac OSX: `bs4` may be required instead.
+  When using `bs4`, be sure to modify the import at the top of the script.
   Other: `easy_install BeautifulSoup`  
   * [Argparse] (Not necessary if Python version >= 2.7)  
   Ubuntu/Debian: `sudo apt-get install python-argparse`  
@@ -63,13 +76,22 @@ username, password (or a `~/.netrc` file) and the class name.
     Specify download path:       coursera-dl progfun-2012-001 -n --path=C:\Coursera\Classes\
     Download multiple classes:   coursera-dl progfun-2012-001 -n --add-class=hetero-2012-001 --add-class=thinkagain-2012-001
 
-On \*nix platforms\*, the use of a `~/.netrc` file is a good alternative to specifying both your username and password every time on the command line. To use it, simply add a line like the one below to a file named `.netrc` in your home directory (or the [equivalent], if you are using Windows) with contents like:
+On \*nix platforms\*, the use of a `~/.netrc` file is a good alternative to
+specifying both your username and password every time on the command
+line. To use it, simply add a line like the one below to a file named
+`.netrc` in your home directory (or the [equivalent], if you are using
+Windows) with contents like:
 
     machine coursera-dl login <user> password <pass>
 
-Create the file if it doesn't exist yet.  From then on, you can switch from using `-u` and `-p` to simply call `coursera-dl` with the option `-n` instead.  This is especially convenient, as typing usernames and passwords directly on the command line can get tiresome (even more if you happened to choose a "strong" password).
+Create the file if it doesn't exist yet.  From then on, you can switch from
+using `-u` and `-p` to simply call `coursera-dl` with the option `-n`
+instead.  This is especially convenient, as typing usernames and passwords
+directly on the command line can get tiresome (even more if you happened to
+choose a "strong" password).
 
-\* if this works on Windows, please add additional instructions for it if any are needed.
+\* if this works on Windows, please add additional instructions for it if
+any are needed.
 
 Troubleshooting
 ---------------
@@ -90,16 +112,17 @@ Troubleshooting
 
 Contact
 -------
-Post bugs and issues on [github].  Send other comments to John Lehmann: first last at geemail dotcom or [@jplehmann]  
 
-
+Post bugs and issues on [github].  Send other comments to John Lehmann:
+first last at geemail dotcom or [@jplehmann]
 
 [@jplehmann]: www.twitter.com/jplehmann
 [1]: https://chrome.google.com/webstore/detail/lopabhfecdfhgogdbojmaicoicjekelh
 [2]: https://addons.mozilla.org/en-US/firefox/addon/export-cookies
 [youtube-dl]: http://rg3.github.com/youtube-dl
 [Coursera]: http://www.coursera.org
-[Beautiful Soup 3]: http://www.crummy.com/software/BeautifulSoup
+[Beautiful Soup 3]: http://www.crummy.com/software/BeautifulSoup/bs3
+[Beautiful Soup 4]: http://www.crummy.com/software/BeautifulSoup
 [Argparse]: http://pypi.python.org/pypi/argparse
 [wget]: http://sourceforge.net/projects/gnuwin32/files/wget/1.11.4-1/wget-1.11.4-1-setup.exe
 [easy_install]: http://pypi.python.org/pypi/setuptools

diff --git a/coursera-dl b/coursera-dl
@@ -1,6 +1,9 @@
 #!/usr/bin/env python
 """
-For downloading lecture resources such as videos for Coursera classes. Given a class name, username and password, it scrapes the coarse listing page to get the section (week) and lecture names, and then downloads the related materials into appropriately named files and directories.
+For downloading lecture resources such as videos for Coursera classes. Given
+a class name, username and password, it scrapes the coarse listing page to
+get the section (week) and lecture names, and then downloads the related
+materials into appropriately named files and directories.
 
 Examples:
   coursera-dl -u <user> -p <passwd> saas
@@ -9,31 +12,42 @@ Examples:
 Author:
   John Lehmann (first last at geemail dotcom or @jplehmann)
 
-Contributions are welcome, but please try to make them platform independent and backward compatible.
+Contributions are welcome, but please try to make them platform independent
+and backward compatible.
 """
 
-import sys, os, re, string
-import urllib, urllib2, urlparse, cookielib
-import tempfile
-import subprocess
 import argparse
+import cookielib
+import errno
+import netrc
+import os
+import re
+import string
 import StringIO
+import subprocess
+import sys
 import tempfile
-import netrc
+import urllib
+import urllib2
+
 from BeautifulSoup import BeautifulSoup
 # for OSX, bs4 is recommended
 #from bs4 import BeautifulSoup
 
 def get_syllabus_url(className):
-  """Return the Coursera index/syllabus URL."""
+  """
+  Return the Coursera index/syllabus URL.
+  """
   return "http://class.coursera.org/%s/lecture/index" % className
 
 def get_auth_url(className):
   return "http://class.coursera.org/%s/auth/auth_redirector?type=login&subtype=normal&email=&visiting=&minimal=true" % className
 
 def write_cookie_file(className, username, password):
+  """
+  Automatically generate a cookie file for the coursera site.
+  """
   try:
-    """automatically generate a cookie file for the coursera site"""
     (hn,fn) = tempfile.mkstemp()
     cj = cookielib.MozillaCookieJar(fn)
     opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), urllib2.HTTPHandler())
@@ -57,9 +71,11 @@ def write_cookie_file(className, username, password):
   return fn
 
 def load_cookies_file(cookies_file):
-  """Loads the cookies file. I am pre-pending the file with the special
+  """
+  Loads the cookies file. I am pre-pending the file with the special
   Netscape header because the cookie loader is being very particular about
-  this string."""
+  this string.
+  """
   cookies = StringIO.StringIO()
   NETSCAPE_HEADER = "# Netscape HTTP Cookie File"
   cookies.write(NETSCAPE_HEADER);
@@ -69,7 +85,9 @@ def load_cookies_file(cookies_file):
   return cookies
 
 def get_opener(cookies_file):
-  """Use cookie file to create a url opener."""
+  """
+  Use cookie file to create a url opener.
+  """
   cj = cookielib.MozillaCookieJar()
   cookies = load_cookies_file(cookies_file)
   # nasty hack: cj.load() requires a filename not a file, but if
@@ -79,7 +97,9 @@ def get_opener(cookies_file):
   return urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
 
 def get_page(url, cookies_file):
-  """Download an HTML page using the cookiejar."""
+  """
+  Download an HTML page using the cookiejar.
+  """
   opener = get_opener(cookies_file)
   #return opener.open(url).read()
   ret = opener.open(url).read()
@@ -97,7 +117,9 @@ def grab_hidden_video_url(href, cookies_file):
   return l[0]['src']
 
 def get_syllabus(class_name, cookies_file, local_page=False):
-  """ Get the course listing webpage."""
+  """
+  Get the course listing webpage.
+  """
   if (not (local_page and os.path.exists(local_page))):
     url = get_syllabus_url(class_name)
     page = get_page(url, cookies_file)
@@ -110,23 +132,29 @@ def get_syllabus(class_name, cookies_file, local_page=False):
   return page
 
 def clean_filename(s):
-  """Sanitize a string to be used as a filename."""
+  """
+  Sanitize a string to be used as a filename.
+  """
   # strip paren portions which contain trailing time length (...)
   s = re.sub("\([^\(]*$", "", s)
   s = s.strip().replace(':','-').replace(' ', '_')
   valid_chars = "-_.()%s%s" % (string.ascii_letters, string.digits)
   return ''.join(c for c in s if c in valid_chars)
 
 def get_anchor_format(a):
-  """Extract the resource file-type format from the anchor"""
+  """
+  Extract the resource file-type format from the anchor
+  """
   # (. or format=) then (file_extension) then (? or $)
   # e.g. "...format=txt" or "...download.mp4?..."
   format = re.search("(?:\.|format=)(\w+)(?:\?.*)?$", a)
   return format.group(1) if format else None
 
 def parse_syllabus(page, cookies_file):
-  """Parses a Coursera course listing/syllabus page.
-  Each section is a week of classes."""
+  """
+  Parses a Coursera course listing/syllabus page.  Each section is a week of
+  classes.
+  """
   sections = []
   soup = BeautifulSoup(page)
   # traverse sections
@@ -186,7 +214,9 @@ def download_lectures(
   path='',
   verbose_dirs=False
   ):
-  """Downloads lecture resources described by sections."""
+  """
+  Downloads lecture resources described by sections.
+  """
 
   def format_section(num, section):
     sec = "%02d_%s" % (num, section)
@@ -218,7 +248,9 @@ def download_lectures(
             open(lecfn, 'w').close()  # touch
 
 def download_file(url, fn, cookies_file, wget_bin):
-  """Downloads file and removes current file if aborted by user."""
+  """
+  Downloads file and removes current file if aborted by user.
+  """
   try:
     if wget_bin:
       download_file_wget(wget_bin, url, fn, cookies_file)
@@ -230,14 +262,18 @@ def download_file(url, fn, cookies_file, wget_bin):
     sys.exit()
 
 def download_file_wget(wget_bin, url, fn, cookies_file):
-  """Downloads a file using wget.  Could possibly use python to stream files to
-  disk, but wget is robust and gives nice visual feedback."""
+  """
+  Downloads a file using wget.  Could possibly use python to stream files to
+  disk, but wget is robust and gives nice visual feedback.
+  """
   cmd = [wget_bin, url, "-O", fn, "--load-cookies", cookies_file, "--no-check-certificate"]
   print "Executing wget:", cmd
   retcode = subprocess.call(cmd)
 
 def download_file_nowget(url, fn, cookies_file):
-  """'Native' python downloader -- slower than wget."""
+  """
+  'Native' python downloader -- slower than wget.
+  """
   print "Downloading %s -> %s" % (url, fn)
   urlfile = get_opener(cookies_file).open(url)
   chunk_sz = 1048576