This project contains a program that wraps around the courseraresearchexports python package. The program allows the user to request data exports for one or multiple courses and download them with a single command. This is useful for those who want to automate data export downloads. It also makes it less tedious to download e.g. clickstream data since the program downloads it as soon as the job is finished.
Note that only data coordinators can currently use this program, as it requests full exports including partner-level ids and clickstream data.
To install this program, clone the repository.
This program depends on:
- Python 2.7.x
To install dependencies, navigate to the clone repository in a terminal and run the following:
pip install -r requirements.txt
Please refer to the courseraresearchexports if you encounter any issues installing the
You should occasionally execute
git pull origin master to ensure that you're using the latest version of the program.
The program contains three required arguments and six optional arguments.
- export_type: either one of 'clickstream' or 'tables' depending on the data you want to download.
- course_slugs: either a string of course slugs separated by a comma or the location of a
.txtfile containing course slugs. Each course slug should be placed on a new line.
- location: Location where data exports will be stored. The program automatically creates two subfolders; one for the export type (e.g. 'clickstream') and one for the course slug.
- --clickstream_days: When downloading clickstream data, the default is that the program downloads the last 7 days of data. This argument lets you change the number of days to any number higher than or equal to 1.
- --interval: Input a specific date range over which to download clickstream data. Overrides
clickstream_daysargument. You should format the date range as YYYY-MM-DD.
- --save_metadata: Save request metadata? If true, will be saved in the 'location' directory.
- --force_request: If you are requesting data for a course for which you have already requested data in the past 5 days (for tables) or 1 day (for clickstreams), the program will not create a new request. Instead, it will continue from the previous request. If you add this flag, the program will ignore this and create a new request. Note that this might not work because Coursera allows only 1 request per hour.
- --verbose: Print verbose messages to the terminal? Useful if you're running the program manually.
- --log: Store a log file containing detailed information? Mostly useful for debugging purposes.
Running the program
Running the program is as simple as executing:
python call.py 'tables' 'human-language' '/users/jasper/tmp' --verbose
You can also query multiple course slugs in one command:
python call.py 'tables' 'terrorism, human-language' '/users/jasper/tmp' --verbose
Or, if you have a
.txt file containg course slugs:
python call.py 'tables' '/users/jasper/desktop/courses.txt' '/users/jasper/tmp' --verbose
You can set a specific interval for clickstream data:
python call.py 'clickstream' '/users/jasper/desktop/courses.txt' '/users/jasper/tmp' --interval '2016-09-26' '2016-10-05' --verbose
Or you can override the default (download the past 7 days) with any number larger than or equal to 1:
python call.py 'clickstream' '/users/jasper/desktop/courses.txt' '/users/jasper/tmp' --clickstream_days 14 --verbose
This is useful if you use a program to schedule downloads, say, every 2 weeks instead of every 7 days.
Argument description and help
python call.py -h
To get an overview and a description of the arguments.
Running the program in a VM or VPS
I use this program on a VPS. If you plan to use this program in a Virtual Machine (VM) or Virtual Private Server (VPS), you can use the bootstrap file to install all dependencies in one go.
Because Coursera uses OAuth2 and requires you to authenticate using a browser, you need to do the following: authenticate your account using your own computer. Then, look for the hidden
.coursera folder in your home directory (e.g.
/home/<user>/.coursera/ for linux) and copy the
.pickle file to the same folder on your VM/VPS (e.g.
/home/<vps-user-name>/.coursera). You can now request exports on your VM/VPS.
If you have a request (like adding a new argument), please leave it here.
Should you encounter any bugs or issues, please report them here.