Scrape the html from edX
edx-dl is a wonderful tool made to pull down all the videos/pdfs from an edX course. Unforunately, it is not currently setup to download any HTML content see #600.
Until edx-dl is able to download HTML itself I've made a little hacky script to download it for you.
If you want to download PDFs of all the Q/A checkout out edx-archive.
- Copy the code from
index.js
- Open your browser and go to the "Progress" page of the course.
- Open the console in devtools | instructions
- Paste the code from
index.js
and press enter. - The script will automatically download the html of all pages listed under progress and output a zip.
- Unzip the file and then open the "pages" folder in your browser
Currently this script only download the raw HTML it does not grab:
- Images
- Results of the "Show Answers" button
The downloaded pages rely on a few js scripts to be cached in browser if you would like to view the pages offline.
Navigating between pages won't work.
I'd much rather you contribute to the edx-dl
project. See #600. But if you'd like to improve this open a ticket and we can chat