Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
MapReduce jobs to run on a corpus of a million course syllabi.
Python
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
data
README.md
average_words_per_syllabus.py
count_syllabi.py

README.md

MapReduce Jobs for CHNM's Million Syllabi Database

Description

This repository contains a series of MapReduce jobs that run on a sample of 50,000 syllabi 100 syllabi from CHNM's million syllabi database. They can also be used on the entire million+ dataset, however only a subset of the data has been cleaned and reformatted at this time. MapReduce jobs are written in Python, using MRJob.

Contents

  • /data/ - Includes syllabi_sample.tsv, which is the first 100 records from the CHNM syllabi database.
  • average_words_per_syllabus.py - Calculate the average number of words per syllabus text.
  • count_syllabi.py - Count the number of syllabi in the dataset. This is the most-basic example of map reduce and using MRJob I could write.

Licence

MIT

Something went wrong with that request. Please try again.