Skip to content

RayBB/edx-scrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

edx-scrape

Scrape the html from edX

What is this

edx-dl is a wonderful tool made to pull down all the videos/pdfs from an edX course. Unforunately, it is not currently setup to download any HTML content see #600.

Until edx-dl is able to download HTML itself I've made a little hacky script to download it for you.

If you want to download PDFs of all the Q/A checkout out edx-archive.

Usage

  1. Copy the code from index.js
  2. Open your browser and go to the "Progress" page of the course.
  3. Open the console in devtools | instructions
  4. Paste the code from index.js and press enter.
  5. The script will automatically download the html of all pages listed under progress and output a zip.
  6. Unzip the file and then open the "pages" folder in your browser

Limits

Currently this script only download the raw HTML it does not grab:

  • Images
  • Results of the "Show Answers" button

The downloaded pages rely on a few js scripts to be cached in browser if you would like to view the pages offline.

Navigating between pages won't work.

Contributing

I'd much rather you contribute to the edx-dl project. See #600. But if you'd like to improve this open a ticket and we can chat

About

Scrape the html from edX

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published