Home

Chris Teplovs edited this page Sep 13, 2017 · 11 revisions
Clone this wiki locally

Welcome to the PLA wiki!

Introduction

This project contains code and data associated with the University of Michigan Practical Learning Analytics course. It parallels content from the videos.

Organization

A body of code and data tables (student.course.csv, student.record.csv and see below) form the basis of this project.

Code

Content of the code is driven by an initial list of questions address by the course. Several advanced analyses which overlap with these questions are included. Each .R file contains a main-level function as well as several subroutines with accompanying header material that describes the function of the subroutine. Everything is written in R, and we make use of R packages gplots, treemap, and optmatch which are called using library() in R, but must be installed from the command line or Rstudio locally if they are not already.

Once you have a local copy of these .R files, you can cut-and-paste the commands (in gray) below. You'll need to change the paths (e.g. "~/aim-analytics/PLA-MOOC/) to reflect where these files are downloaded and stored on your machine.

grade.penalty.module.R: This does course-by-course grade penalty analysis, surveying grade penalties among courses and between groups of students. It also has options to using regression and matching to try to isolate particular effects. Ex: What is the grade penalty in Physics 135 and is it different between genders? Are there differences after matching or regression? Note that this makes use of the optmatch package. Run these commands in R (replacing the paths as needed!):

  1. R> sr <- read.csv("~/aim-analytics/PLA-MOOC/student.record.csv")
  2. R> sc <- read.csv("~/aim-analytics/PLA-MOOC/student.course.csv")
  3. R> source('~/aim-analytics/PLA-MOOC/grade.penalty.module.R')
  4. R> out <- grade.penalty(sr,sc,'PHYSICS',135,GROUP='GENDER',REGRESSION=TRUE,MATCHING=TRUE,PDF=FALSE)

course.persistence.module.R: This does course-to-course persistence analysis: given the grade a student received in a course, what is the probability they took another course?. Ex: What is the probability that a student that got a B in Physics 140 (Physics I) later took Physics 240 (Physics II)? Run these commands in R (replacing the paths as needed!):

  1. R> sr <- read.csv("~/aim-analytics/PLA-MOOC/student.record.csv")
  2. R> sc <- read.csv("~/aim-analytics/PLA-MOOC/student.course.csv")
  3. R> source('~/aim-analytics/PLA-MOOC/course.persistence.module.R')
  4. R> hh <- course.persistence.setup(sr,sc,'PHYSICS','PHYSICS',140,240,TITLE='Physics 140 -- > 240: Gender',PDF=TRUE)
  5. R> hh <- course.persistence.setup(sr,sc,'PHYSICS','PHYSICS',140,240,TYPE='MAJOR1_DEPT', GROUP1='Physics Department',GROUP2='Chemistry Department', TITLE='Physics 140 -- > 240: MAJOR',PDF=FALSE)

course.pathways.treemaps.R: This asks two sets of questions. First, for some course of interest, which courses did students take before, during, and after that course and what kinds of grades did they get? And second, what were the eventual majors of those students? Ex: What courses did students take before, during, and after Physics 140, and what were there eventual majors? This makes use of the treemap package. Run these commands in R (replacing the paths as needed!):

  1. R> sr <- read.csv("~/aim-analytics/PLA-MOOC/student.record.csv")
  2. R> sc <- read.csv("~/aim-analytics/PLA-MOOC/student.course.csv")
  3. R> source('~/aim-analytics/PLA-MOOC/course.pathways.treemaps.R')
  4. R> course.pathway.treemaps(sr,sc,"PHYSICS",140,TERM_RANGE=c(100,156), PDF=FALSE)

course.pathways.barplots.R: This asks two sets of questions. First, for some course of interest, which courses did students take before, during, and after that course and what kinds of grades did they get? And second, what were the eventual majors of those students? This is basically an alternative visualization of the data rendered by course.pathways.treemaps.R. Ex: What courses did students take before, during, and after Physics 140, and what were there eventual majors? Run these commands in R (replacing the paths as needed!):

  1. R> sr <- read.csv("~/aim-analytics/PLA-MOOC/student.record.csv")
  2. R> sc <- read.csv("~/aim-analytics/PLA-MOOC/student.course.csv")
  3. R> source('~/aim-analytics/PLA-MOOC/course.pathways.barplots.R')
  4. R> course.pathway.barplots(sr,sc,"PHYSICS",140,TERM_RANGE=c(100,156), PDF=FALSE)

student.term.GPA.R This reduces a student course table (formatted like the one we provide) into a one-line-per-student-term track of GPA. Beware when running this on the student-course table we provide; due to the fact that student course table, and therefore the grades, GPAOs, and individuals are synthetic, the GPAs computed from the grades in this table may not be consistent with the synthetic GPAOs.

Data

These functions all run on synthetic data provided with this project, but may in principle be run on similarly-formatted data from local sources. The data come in two tables: a student-course table (student.course.csv) and student-record table (student.record.csv).

Term Table

This is a lookup table that maps our integer academic TERMs to human readable terms: FA 2006 = Fall 2006, WN=Winter, SP=Spring, SS =Spring/Summer, Su=Summer, etc.

Student Record Table

This includes one-time information about a student: major, gender, etc. This is one line per student.
ANONID: Anonymous ID of the student, used to merge with columns of the student course table.
ADMIT_TERM: Term of admission. Terms are have be re-numbered to preserve anonymity. The same consistent numbering convention is used for all “TERM” fields. These TERMS go back to TERM=53.

HSGPA: HSGPA as recomputed by admissions. Note that this contains ‘0’ as well, whose meaning is unclear.
LAST_ACT_MATH_SCORE: ACT Math Score.
LAST_ACT_ENGL_SCORE: “”
LAST_ACT_READ_SCORE: “”
LAST_ACT_SCIRE_SCORE: “”
LAST_ACT_COMP_SCORE: “”
LAST_SATI_VERB_SCORE: “”
LAST_SATI_MATH_SCORE: “”
LAST_SATI_TOTAL_SCORE: “”
MAJOR1_DESCR: Full name of first undergraduate major degree.
MAJOR2_DESCR: Full name of second undergraduate major degree.
MAJOR3_DESCR: Full name of third undergraduate major degree.
MAJOR1_TERM: The term that MAJOR1 was received, otherwise NA. Degree data become incomplete before TERM 80. Note that degree information goes back to at least TERM 10. MAJOR2_TERM: The term that MAJOR2 was received, otherwise NA.
MAJOR3_TERM: The term that MAJOR3 was received, otherwise NA.
MAJOR1_DEPT: The department that awarded MAJOR1. This collapses some rare majors and may be preferable for anonymity.
MAJOR2_DEPT: “”
MAJOR3_DEPT: “”
STDNT_GROUP1: Students are allowed up to two groups of 7 available groups denoted A-G.
STDNT_GROUP2: “”

Student Course Table

Courses taken by a student and grade received are recorded here. This may be multiple lines per student.

ANONID: Anonymous ID of the student, used to merge with columns of the student record table.
SUBJECT: Subject area of course.
CATALOG_NBR: Catalog number of the course.
GRD_PTS_PER_UNIT: Discrete numerical field ranging from 0-4, indicating the grade received.
GPAO: Grade point in all other classes over the student's career, up to and including the term the course was taken. CUM_GPA: Actual CUM_GPA as of the term the course was taken.
DIV: The division (P=professional, H=Humanities,SS=Social Sciences, S=Science,E=Engineering,O=Other) of the SUBJECT.
ANON_INSTR_ID: Anonymized instructor ID. I haven’t used this field much yet.
TERM: Term the course was taken. This reaches TERM=60, which is also the minimum TERM for the ADMIT_TERM field in the student-record table.

Courses

Below is a full list of the courses in the student course table. You can make the list yourself as well:

  1. R> sc <- read.csv("~/aim-analytics/PLA-MOOC/student.course.csv")

  2. R> cnames <- paste(sc$SUBJECT,sc$CATALOG_NBR,sep=" ")

  3. R> cnames <- cnames[!duplicated(cnames)]

  4. R> cnames <- cnames[order(cnames)]

  5. R> print(cnames)

ACC 271

ACC 272

AMCULT 240

AMCULT 374

ANTHRBIO 161

ANTHRBIO 368

ANTHRCUL 101

ARTDES 300

ASIAN 230

ASTRO 106

BE 300

BIOLOGY 105

BIOLOGY 118

BIOLOGY 171

BIOLOGY 172

BIOLOGY 173

BIOLOGY 225

BIOLOGY 226

BIOLOGY 305

BIOLOGY 310

BIOLOGY 311

BIT 200

BUDDHST 230

CHEM 125

CHEM 126

CHEM 130

CHEM 210

CHEM 211

CHEM 215

CHEM 216

CHEM 230

CICS 101

CLCIV 101

CLCIV 372

CLCIV 385

CMPTRSC 183

CMPTRSC 280

CMPTRSC 370

COMM 101

COMM 102

DANCE 100

ECON 101

ECON 102

ECON 401

ECON 402

EECS 183

EECS 203

EECS 280

EECS 281

EECS 370

ENGLISH 124

ENGLISH 125

ENGLISH 223

ENGLISH 225

ENGLISH 239

ENGLISH 240

ENGLISH 297

ENGLISH 298

ENGLISH 325

ENGR 100

ENGR 101

ENGR 110

FIN 300

FRENCH 232

GEOSCI 100

GEOSCI 103

GEOSCI 106

GEOSCI 107

GEOSCI 114

GTBOOKS 191

HISTORY 201

HISTORY 374

LHC 250

LHC 306

LHC 350

LING 111

LING 211

MATH 105

MATH 115

MATH 116

MATH 215

MATH 216

MATH 425

MCDB 310

MECHENG 211

MECHENG 240

MKT 300

MO 300

NURS 220

OB 300

OM 311

OMS 301

OMS 311

PHIL 230

PHYSICS 125

PHYSICS 126

PHYSICS 127

PHYSICS 128

PHYSICS 135

PHYSICS 136

PHYSICS 140

PHYSICS 141

PHYSICS 235

PHYSICS 236

PHYSICS 240

PHYSICS 241

POLSCI 101

POLSCI 111

POLSCI 160

POLSCI 300

POLSCI 389

POLSCI 489

PSYCH 111

PSYCH 112

PSYCH 230

PSYCH 240

PSYCH 250

PSYCH 260

PSYCH 270

PSYCH 280

PSYCH 290

PSYCH 303

PSYCH 330

PSYCH 340

PSYCH 350

PSYCH 360

PSYCH 370

PSYCH 380

PSYCH 390

RELIGION 230

SMS 301

SOC 100

SOC 101

SOC 102

SPANISH 101

SPANISH 103

SPANISH 231

SPANISH 232

SPANISH 275

SPANISH 276

SPANISH 277

STATS 100

STATS 250

STATS 350

STATS 425

STRATEGY 390

UC 280

WOMENSTD 220

WOMENSTD 240