Supply a template or txt file with course names for easy lookup #59

ivoflipse · 2013-02-11T15:22:42Z

When I tried to use the otherwise awesome script I had to go and lookup all the names I wanted from the course list. So I just made a little txt file with the url handle and the name of the course, which I could then easily copy into the command line.
Perhaps it would be an idea to maintain a list of all the courses?

Past courses

neuralnets-2012-001 Neural Networks for Machine Learning
sciwrite-2012-001 Writing in the Sciences
progfun-2012-001 Functional Programming Principles in Scala
maththink-2012-001 Introduction to Mathematical Thinking
bigdata-2012-001 Web Intelligence and Big Data
healthpolicy-2012-001 Health Policy and the Affordable Care Act
intrologic Introduction to Logic
compilers Compilers
automata Automata
gametheory Game Theory
crypto Cryptography I

Current courses (possibly incomplete)

algo2-2012-001 Algorithms: Design and Analysis, Part 2
thinkagain-2012-001 Think Again: How to Reason and Argue
hetero-2012-001 Heterogeneous Parallel Programming
compmethods-2012-001 Computational Methods for Data Analysis
precalculus-001 Pre-Calculus
algebra-001 Algebra
proglang-2012-001 Programming Languages
calcsing-2012-001 Calculus in a Single Variable

rbrito · 2013-02-11T16:13:14Z

Hi, Ivo.

On Mon, Feb 11, 2013 at 1:22 PM, Ivo Flipse notifications@github.com wrote:

When I tried to use the otherwise awesome script I had to go and lookup all the names I wanted from the course list.

Well, supposedly, the idea would be to download material from courses
that you already know about (because you are subscribed to them). :)

So I just made a little txt file with the url handle and the name of the course, which I could then easily copy into the command line.
Perhaps it would be an idea to maintain a list of all the courses?

I guess that one of the easiest routes would be to grab this
information from some site that aggregates this (e.g.,
classcentral.com), but this is on the borderline of the scope of
coursera-dl, which is meant for downloads, not discovery...

Furthermore, keeping such lists may need some manual intervention and
it is not really clear how they could be used by the script. The
person has to sign up for the courses anyway (and if you try to signup
for some courses after they are already running or after they have
been concluded, you will be denied access).

The reason for that may be because the course won't be offered on
coursera anymore (see, for instance, Jeniffer Widom's db course
migrating to Class2Go, Umesh Vazirani's qcomp migrating to EdX.org,
the saas courses moving to EdX too etc.).

And, of course, to have access to the courses, you have to click the
"I accept the honor code" or something like that. I don't intend to
make this particular step automated, for human/awareness reasons.

Please, clarify how you intend to keep the list of courses up-to-date
without the maintainers of the program (John and I) having extra work.
If you are persuasive enough, we may implement your idea. :)

Thanks,

Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org/blog : Projects : https://github.com/rbrito/
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

ivoflipse · 2013-02-11T17:05:45Z

I personally only download courses when all the material is available, because else I would have to come back later and download the rest anyway. But I can understand if others use it to download video's to watch them offline or on-the-go. The issue with course material no longer being available could (hopefully) be caught with an exception when you get an access denied error.

I guess the only work around I could imagine would be to parse the Course page for logged in users.
https://www.coursera.org/user/i/<user_uuid>
Then check if left/width of the "coursera-course-listing-progress" element have reached 100%.
If so, extract the course url from the "coursera-course-listing-meta" element and try to run the script.

But I can understand if all this level of automation is out of scope of the script.

jplehmann · 2013-02-11T18:46:57Z

I've personally been facing a similar issue with the explosion of classes. I have used the following regex:

# extract all the currently open classes I'm enrolled in on a single line, space separated
grepo "class.coursera.org/(.*?)/" courses.html | uniq | paste -s -d" "

where courses.html is the page displayed when you click on "courses" underneath your name in the menu, and "grepo" is a script I wrote which does something like "grep -o" except it outputs only the text matched by the group.

ivoflipse · 2013-02-12T10:46:43Z

Inspired by your comment I messed around a little to see if I could get out this information. I couldn't get to my /courses page, so I just manually downloaded it. Automating this would be nice, but it works.

Then I load the page using BeautifulSoup:

page = open("Courses.htm")
soup = BeautifulSoup(page)
# Find the box that contains the course information
course_elements = soup.findAll("div", 
{"class":"coursera-course-listing-box coursera-course-listing-box-wide coursera-account-course-listing-box"})

This gives us a list that contains each of the boxes on the /course page. From here we can try and extract the relevant information:

# Iterate through each course box
for course in course_elements:
    # The date information is in a span element
    listing_start = course.findAll("span")
    # Some booleans for controlling behavior of the script
    is_course = True
    ended = False

    # Not every box seems to be a course, so we just try to parse it and else fail
    try:
        # There seem to be three different date formats:
        # Courses yet to start
        if "Starts" == listing_start[2].text.split()[0]:
            ending_time = listing_start[2].text
        # Courses that have already ended
        elif "Ended" == listing_start[2].text.split()[0]:
            ending_time = listing_start[2].text
            ended = True
        # Courses that have already started, but not yet ended
        else:
            ending_time = "End date: {}".format(listing_start[2].text)
    except:
        # If we can't get the date, flip this boolean, so we don't bother with further parsing
        is_course = False

    # If the current element is a course, print the info
    # If you set this check to ended, it'll only give you info for completed courses
    if is_course: #and ended:
        course_listing = course.findAll("h3")
        course_name = course_listing[0].text
        course_url = str(course_listing[0]).split("\"")[3]
        split_course_url = course_url.split("/")
        if split_course_url[3] == "course":
            course_handler = course_url.split("/")[4]
        else:
            course_handler = course_url.split("/")[3]
        print "Course name: {}".format(course_name) 
        print "Course handler: {}".format(course_handler)
        print "Course url: {}".format(course_url)
        print ending_time
        print

I added some prints, which aren't really needed, but just show you that you can retrieve the information you'd want. You could either use the url that's passed when you press the green button or use the course name, like your script currently uses. It seems that courses that are no longer accessible have a different url (with the auth part), so that's useful info too.

So depending on the status of the course, you'd get something like this:

Course in progress
Course name: Think Again: How to Reason and Argue
Course handler: thinkagain-2012-001
Course url: https://class.coursera.org/thinkagain-2012-001/auth/auth_redirector?type=login&subtype=normal
End date: Nov 26th

Course not yet started
Course name: Know Thyself
Course handler: knowthyself
Course url: https://www.coursera.org/course/knowthyself
Starts in 20 days

Ended course
Course name: Automata
Course handler: automata
Course url: https://class.coursera.org/automata/auth/auth_redirector?type=login&subtype=normal
Ended 8 months ago

Ended and closed course
Course name: Statistics One
Course handler: stats1
Course url: https://www.coursera.org/course/stats1
Ended 4 months ago

It would require some fiddling, because you no longer have to pass the names through the command line, so you'd have to insert them somewhere. Or make the script get the names from the parsed file and go through them one by one.

Anyway, this was a fun experiment :-) If only I could get it to retrieve this information from the live page and possibly list the courses available for me, so I could pass the number of the course I wanted the script to download, that would be awesome!

jonasdt mentioned this issue Jun 27, 2013

Feature request: list available courses on Coursera for download #89

Closed

jplehmann mentioned this issue Jun 30, 2013

Download course description #151

Closed

Glavin001 mentioned this issue Jun 16, 2015

ConnectionError: ('Connection aborted.', ResponseNotReady()) #347

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supply a template or txt file with course names for easy lookup #59

Supply a template or txt file with course names for easy lookup #59

ivoflipse commented Feb 11, 2013

rbrito commented Feb 11, 2013

ivoflipse commented Feb 11, 2013

jplehmann commented Feb 11, 2013

ivoflipse commented Feb 12, 2013

Supply a template or txt file with course names for easy lookup #59

Supply a template or txt file with course names for easy lookup #59

Comments

ivoflipse commented Feb 11, 2013

rbrito commented Feb 11, 2013

ivoflipse commented Feb 11, 2013

jplehmann commented Feb 11, 2013

ivoflipse commented Feb 12, 2013