Python scripts that enable scheduled exports for Coursera data.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
LICENSE
README.md
call.py
requirements.txt
scheduler.py

README.md

Coursera-scheduled-exports

This project contains a program that wraps around the courseraresearchexports python package. The program allows the user to request data exports for one or multiple courses and download them with a single command. This is useful for those who want to automate data export downloads. It also makes it less tedious to download e.g. clickstream data since the program downloads it as soon as the job is finished.

Note that only data coordinators can currently use this program, as it requests full exports including partner-level ids and clickstream data.

Installation

To install this program, clone the repository.

Dependencies

This program depends on:

  • Python 2.7.x
  • pip
  • courseraresearchexports

If you do not have pip installed on your platform, follow the installation instructions here. Alternatively, you can install Anaconda.

To install dependencies, navigate to the clone repository in a terminal and run the following:

pip install -r requirements.txt

Please refer to the courseraresearchexports if you encounter any issues installing the courseraresearchexports package.

Updates

You should occasionally execute git pull origin master to ensure that you're using the latest version of the program.

Usage

The program contains three required arguments and six optional arguments.

Required arguments

  • export_type: either one of 'clickstream' or 'tables' depending on the data you want to download.
  • course_slugs: either a string of course slugs separated by a comma or the location of a .txt file containing course slugs. Each course slug should be placed on a new line.
  • location: Location where data exports will be stored. The program automatically creates two subfolders; one for the export type (e.g. 'clickstream') and one for the course slug.

Optional arguments

  • --clickstream_days: When downloading clickstream data, the default is that the program downloads the last 7 days of data. This argument lets you change the number of days to any number higher than or equal to 1.
  • --interval: Input a specific date range over which to download clickstream data. Overrides clickstream_days argument. You should format the date range as YYYY-MM-DD.
  • --save_metadata: Save request metadata? If true, will be saved in the 'location' directory.
  • --force_request: If you are requesting data for a course for which you have already requested data in the past 5 days (for tables) or 1 day (for clickstreams), the program will not create a new request. Instead, it will continue from the previous request. If you add this flag, the program will ignore this and create a new request. Note that this might not work because Coursera allows only 1 request per hour.
  • --verbose: Print verbose messages to the terminal? Useful if you're running the program manually.
  • --log: Store a log file containing detailed information? Mostly useful for debugging purposes.

Running the program

Running the program is as simple as executing:

python call.py 'tables' 'human-language' '/users/jasper/tmp' --verbose

You can also query multiple course slugs in one command:

python call.py 'tables' 'terrorism, human-language' '/users/jasper/tmp' --verbose

Or, if you have a .txt file containg course slugs:

python call.py 'tables' '/users/jasper/desktop/courses.txt' '/users/jasper/tmp' --verbose

You can set a specific interval for clickstream data:

python call.py 'clickstream' '/users/jasper/desktop/courses.txt' '/users/jasper/tmp' --interval '2016-09-26' '2016-10-05' --verbose

Or you can override the default (download the past 7 days) with any number larger than or equal to 1:

python call.py 'clickstream' '/users/jasper/desktop/courses.txt' '/users/jasper/tmp' --clickstream_days 14 --verbose

This is useful if you use a program to schedule downloads, say, every 2 weeks instead of every 7 days.

Argument description and help

Execute

python call.py -h

To get an overview and a description of the arguments.

Other information

Scheduling downloads

You can use e.g. crontab (linux, example) or automator (mac, example) to automate requests every week, month etc.

Running the program in a VM or VPS

I use this program on a VPS. If you plan to use this program in a Virtual Machine (VM) or Virtual Private Server (VPS), you can use the bootstrap file to install all dependencies in one go.

Because Coursera uses OAuth2 and requires you to authenticate using a browser, you need to do the following: authenticate your account using your own computer. Then, look for the hidden .coursera folder in your home directory (e.g. /home/<user>/.coursera/ for linux) and copy the .pickle file to the same folder on your VM/VPS (e.g. /home/<vps-user-name>/.coursera). You can now request exports on your VM/VPS.

Requests

If you have a request (like adding a new argument), please leave it here.

Issues

Should you encounter any bugs or issues, please report them here.