Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support preview pages for past courses #41

Closed
memeplex opened this issue Jan 13, 2013 · 8 comments
Closed

Support preview pages for past courses #41

memeplex opened this issue Jan 13, 2013 · 8 comments

Comments

@memeplex
Copy link

Past couses like:

https://class.coursera.org/modelthinking/lecture/preview

offer a preview page which contains all the lectures of the course. As time goes by there will be more and more courses with this condition and it would be great if your script supported them.

@rbrito
Copy link
Member

rbrito commented Jan 13, 2013

Hi there.

On Jan 13 2013, memeplex wrote:

Past couses like:

https://class.coursera.org/modelthinking/lecture/preview

offer a preview page which contains all the lectures of the course. As
time goes by there will be more and more courses with this condition and
it would be great if your script supported them.

It already works with such courses. Just:

  1. Download the page and put it somewhere where coursera-dl can find it. Say
    you put it in the current directory with name preview.html.

  2. invoke with something like (depending on the path):

    coursera-dl -u user -p pass -l ./preview.html modelthinking

It should work (I already used that in the past). Perhaps you would want to
send us a patch/pull request with the instructions above more
detailed/elaborated?

Regards,

Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org/blog : Projects : https://github.com/rbrito/
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

@memeplex
Copy link
Author

Hi Rogelio,

seems as if --skip-download is enabled or something:

[carlos@carlos modelthinking]$ coursera-dl -u XXX -p XXX -w wget -l ./preview modelthinking
Read (53710 bytes) from local file
Introduction-_Why_Model
Why_Model
None https://class.coursera.org/modelthinking/lecture/preview_view/17
Intelligent_Citizens_of_the_World
None https://class.coursera.org/modelthinking/lecture/preview_view/19
Thinking_More_Clearly
None https://class.coursera.org/modelthinking/lecture/preview_view/6
Using_and_Understanding_Data
None https://class.coursera.org/modelthinking/lecture/preview_view/8
Using_Models_to_Decide_Strategize_and_Design
None https://class.coursera.org/modelthinking/lecture/preview_view/18
Segregation_and_Peer_Effects
Sorting_and_Peer_Effects_Introduction
None https://class.coursera.org/modelthinking/lecture/preview_view/15
Schellings_Segregation_Model
None https://class.coursera.org/modelthinking/lecture/preview_view/16
Measuring_Segregation
None https://class.coursera.org/modelthinking/lecture/preview_view/9
[...]

contents are recognized but that's all, no download is happening.

Regards

Carlos

@memeplex
Copy link
Author

The problem is here:

  for a in vtag.findAll('a'):
    href = a['href']
    fmt = get_anchor_format(href)
    print "    ", fmt, href
    if fmt: lecture[fmt] = href

For example: fmt=None while href=u'https://class.coursera.org/modelthinking/lecture/preview_view/17'

<div class="course-item-list-header expanded"><h3><span class="icon-chevron-down" style="width:18px;display:inline-block;"></span> &nbsp;Introduction: Why Model?</h3> <span class="hidden">(expanded, click to collapse)</span></div><ul class="course-item-list-section-list"><li class="unviewed"><a data-lecture-id="17"
   data-modal-iframe="https://class.coursera.org/modelthinking/lecture/preview_view?lecture_id=17"
   data-modal=".course-modal-frame"
   href="https://class.coursera.org/modelthinking/lecture/preview_view/17"
   rel="lecture-link"
   class="lecture-link">

@memeplex
Copy link
Author

This is a PITA, from what I see I would say you will need to parse these links

https://class.coursera.org/modelthinking/lecture/preview_view?lecture_id=15

in order to open the url and get these links (one by one)

https://d19vezwu8eufl6.cloudfront.net/modelthinking recoded_videos%2FL2A%20Sorting%20and%20Schelling%20Segregation%20%5Bde197bde%5D%20.webm

I will code a quick bash hack now. If I've time (I don't see this happening soon) I'll sent you a patch for your script.

Regards

@memeplex
Copy link
Author

Hopefully you will be able to extract some inspiration and/or regexps from this hack&slash bash trash script. It was capable of downloading gametheory and modelthinking at least. I wonder how long will it take for them to ban me. Are you using any throttling at all?

#!/bin/bash

function _wget {
  wget --load-cookies /tmp/cookies.txt "$@"
}

function cookies {
  sqlite3 -separator $'\t' ~/.mozilla/firefox/carlos/cookies.sqlite \
    'select host,"0",path,isSecure,expiry,name,value from moz_cookies' \
    | grep coursera > /tmp/cookies.txt
}

function lectures {
  _wget -O - 'https://class.coursera.org/'$1'/lecture/preview' 2>/dev/null \
    | sed -rn 's/.*href=".*preview_view\/(.*)"/\1/p'
}

function lecture {
  local preview='https://class.coursera.org/'$1'/lecture/preview_view?lecture_id='$2
  local title=$(_wget -O - "$preview" 2>/dev/null | \
    sed -rn 's/.*lecture_title[^>]*>([^<]*)<\/div>/\1/p')
  title="${title# }"
  title="$3. ${title% }"
  [[ -f "$title".mp4 ]] || {
    _wget -O "$title".mp4 "$(_wget -O - "$preview" 2>/dev/null | \
    sed -rn 's/.*src="(.*mp4)">/\1/p')" 
  }      
  [[ -f "$title".srt ]] || {
    _wget -O "$title".srt 'https://class.coursera.org/'$1'/lecture/subtitles?q='$2'_en'
  }     
}

function main {
  local n=1
  cookies
  for l in $(lectures $1) 
  do
    lecture $1 $l $n
    ((n++)) 
  done 
}   

main $1

@rbrito
Copy link
Member

rbrito commented Jan 14, 2013

Hi, Carlos.

On Jan 13 2013, memeplex wrote:

[carlos@carlos modelthinking]$ coursera-dl -u XXX -p XXX -w wget -l ./preview modelthinking
Read (53710 bytes) from local file
Introduction-_Why_Model
Why_Model
None https://class.coursera.org/modelthinking/lecture/preview_view/17
Intelligent_Citizens_of_the_World
None https://class.coursera.org/modelthinking/lecture/preview_view/19
Thinking_More_Clearly
None https://class.coursera.org/modelthinking/lecture/preview_view/6
Using_and_Understanding_Data
None https://class.coursera.org/modelthinking/lecture/preview_view/8
Using_Models_to_Decide_Strategize_and_Design
None https://class.coursera.org/modelthinking/lecture/preview_view/18
[...]

OK, I see that the change in layout that they performed also affected the
preview pages. I'm heading to bed right now, but I will try to see what's
the deal in the next week, hopefully.

Thanks,

Rogério.

Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org/blog : Projects : https://github.com/rbrito/
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

@vojnovski
Copy link
Contributor

Can close this as it's a duplicate of #90

@rbrito rbrito closed this as completed Apr 29, 2013
@rbrito
Copy link
Member

rbrito commented Apr 29, 2013

Hi, Viktor.

On Mon, Apr 29, 2013 at 10:32 AM, Viktor Vojnovski
notifications@github.com wrote:

Can close this as it's a duplicate of #90

Thanks for the reminder.

Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org/blog : Projects : https://github.com/rbrito/
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants