Software construction CI grading infrastructure

Course assignment structure

The course consists of building a game throughout the whole quarter. Each week there are assignments to build some specific part/feature of the game. The assignments (usually) are broken into two stages:

(due sometime mid-week) students submit test cases for the feature due that week.
(due sometime end-of-week) students submit implementations of the feature.

The tests the students submit in stage 1 should be valid tests, and then all of the students’ valid tests + some from the instructor are collected to test the stage 2 submissions.

In parallel to the students, the course instructors implement an “oracle” solution that is the standard of correctness for tests.

The students’ feedback comes in two forms:

Each week they receive a score corresponding to
1. the proportion of valid tests they submit, and
2. the proportion of all aggregated tests their implementation passes
Students also do code walks both in class and privately for the final assignment

We are using CI to provide students with continuous feedback of the first form. Specifically, we have implemented infrastructure to

Collect tests from all students
Check test validity against the oracle
Run student submissions on all tests to tell them their grade
Do automated grading

Students work in small teams.

Class repository structure and conventions

Summary of repos

Organization
 |- oracle               : course infrastructure (public)
 |- course-admin         : course admin information and programs (private)
 |- game implementation  : oracle implementation, might also live outside of org (private)
 |- teamX                : student development & submission (private)
 ...

Possibly elsewhere (e.g. a personal github account)
 |- grading              : used for automated grading (private)

The grading repo might be outside the organization so as to not use up the Actions CI resources of the organization for grading. This probably isn’t strictly necessary, since we observed that we used around 35k / 50k minutes per month. Then again, the 50k is common to all NWU organizations so whether that amount of usage is acceptable may change over time!

Organization

We have an organization that houses a private repo for every student team, the infrastructure implementation, and optionally the oracle implementation.

Student submissions: the dev repos

Student teams “submit” tests and implementations in their own (private) repo in the organization, using a specific file naming convention. That is, we take the latest commit before the deadline of an assignment as the team’s submission, and find the file name(s) according to the convention.

The convention is that they must have a folder named Deliverables in the top level of their repo, structured like this:

Deliverables            # must have this folder at top level of repo
  |- 0                  # assignment number
  |    |- 0.0           # assignment parts (major.minor)
  |    |- 0.1
  |        |- makefile  # "make" creates an exe called "run" in same directory
  |        |- input0.json
  |        |- output0.json
  |        |- input1.json
  |        |- output1.json
  |        ... # as many tests as necessary
  |        |- ... can contain anything else
  |
  ... # one per major assignment

The makefile must build an executable representing their submission for assignment M.N. It must install any dependencies necessary. That said, it’s important that the makefile doesn’t fail if the dependencies are already installed. IOW it should be fine to run make for two assignments in a row (to enable Grading for every assignment also re-grades all past assignments).

The inputX.json and outputX.json are unit tests: given input inputX.json on stdin, the expected output of the submission is outputX.json.

CI integration

Students must also have the necessary config file for CI integration in their repo in order to get CI feedback.

Infrastructure implementation: the infrastructure repo

This has the implementation of the course infrastructure (not the instructor’s game implementation). It should be a racket collection with two parts:

infrastructure collection
 |- <game>-oracle
 |   |- oracle
 |   |   |- foo.rkt ...                   # implementation (obfuscated, copy from instructor dev repo)
 |   |   |- oracle1.0                     # shell scripts to run the oracle
 |   |   |- oracle1.1
 |   |   |- oracle7.1.rkt                 # racket programs for non-simple oracles
 |   |   ...
 |   |
 |   |- submitted-tests
 |   |   |- 1
 |   |   |  |- 1.1
 |   |   |     |- input5_team55.json      # copied from student dev repos after test deadline
 |   |   |     |- output5_team55.json     # naming convention identifies the source
 |   |   |     ...
 |   |   ...
 |   |
 |   |- validated-tests                   # no outputs, will run oracle to get expected output
 |       |- 1
 |       |  |- 1.1
 |       |     |- input5_instructor.json  # instructor tests are considered validated
 |       |     ...
 |       |     |- input5_team55.json      # only the valid inputs from `submitted-tests` copied here
 |       |     ...
 |       ...
 |    
 |- software-construction-admin
     |- ... # infrastracture implementation

Course admin repo

This has programs, records, and notes for course admin. Most notably, it’s the place that raw team scores and codewalk information is stored - along with programs to work with that information.

Other repos

test snapshots repo

Has snapshots of the student team dev repos for every tests deadline. I have this repo just locally.

Structure: has top level directory

M/M.N

for each assignment M.N, in which is one teamX.tar.gz per team.

submission snapshots repo

Has snapshots of the student team dev repos for every submission deadline. Structure: same as test snapshots repo.

I have this repo just locally.

grading repo

Is used by grading programs for doing grading on the CI. Specifically, validate-tests.rkt, grading.rkt, and debug-team-submission.rkt use this repo. Each of them have an expectation about the CI config files in the repo: see each one’s description for what it needs.

Course workflows

For assignments

For assignment M.N

Instructor implements oracle and commits it

Commit to in the infrastructure repo. Historically the oracle implementation has been obfuscated in this repo using the simple-obfuscation package Robby wrote.

Students create tests before `test-submission-deadline`, push them to their dev repos

And they must follow the naming convention for test inputs and outputs, like described in Student submissions: the dev repos. During this period, the CI will

validate the students tests, and
run a test fest of all instructor tests against any submission

After `test-submission-deadline`, TA collects valid tests

Specifically,

Run snapshot-team-repos.rkt to take snapshots of team submissions as of the deadline and commit them to the test snapshots repo at
Run validate-tests.rkt to collect submitted tests from the snapshots, commit to the infrastructure repo, and then launch a CI job to validate the submitted tests
Once that job is done, run validate-tests.rkt again to retrieve the valid tests, copy them to the validated directory, and commit them.

Students create submissions before `submission-deadline`, push them to their dev repos

And they must use the structure/protocol for submissions described in Student submissions: the dev repos.

After `submission-deadline`, TA grades submissions

Specifically,

Run snapshot-team-repos.rkt to take snapshots of team submissions in the submission snapshots repo.
Run grading.rkt to launch CI jobs to grade every student submission
1. grading.rkt can be used to check the status of the jobs as they go
Once grading is done, run grading.rkt again to collect the grading results
Commit the scores in scores.rkt of the course-admin repo.
Download the canvas scores csv, run scores.rkt on it, and upload the new scores to release the latest grades.

Debugging student questions about a submission

Run debug-team-submission.rkt, which will pull down a fresh copy of the team’s dev repo, check out a particular ref/hash (or alternatively use a snapshot), and using the grading repo + a CI job in it to try things out.

Regrading a particular submission

To regrade the snapshotted commit, run grading.rkt, but supply the team name with --team. To regrade a different commit, doing the whole regrading of previous assignments thing, take a new snapshot and then run grading.rkt using it.

For codewalks

Before any codewalks, be sure to configure the codewalks.rkt with information about codewalking dates etc. Then run it with --regenerate to generate a codewalking schedule.

Before each codewalk (24-36h before)

Run codewalks.rkt with -m to generate piazza messages to be copied to post (privately) in piazza notifying teams.

After each codewalk

Run codewalks.rkt with --commit to record that the codewalk happened.

If the plan needs to change

If it’s because a student will be absent on day X or something like that, best is to add that to the absences definition, then regenerate. Alternatively, you can hand-edit the codewalk plan and double-check it with --check-plan.

The infrastructure

In the oracle repo.

Programs

The core configurable parts of all of these programs are set in config.rkt. There you can configure which repos the programs use, special assignments, testing timeouts, etc.

main.rkt

This program runs the actual test fest checking a submission executable against the oracle for all validated tests. It also does test validation to give students test feedback before the tests deadline.

Meant to run on the CI, and report both # valid tests, and test fest failures.

test-validation/ci-validate-tests.rkt

This program runs the oracle on all submitted tests to validate them and prints the list of test inputs that are valid. Meant to run on the CI.

test-validation/validate-tests.rkt

This program

Collects tests from repo snapshots in the test snapshots repo,
launches a CI job to validate tests for a given assignment (running ci-validate-tests.rkt),
collects the reported valid tests, and
installs them in the validated tests directory.

Meant to run locally by the TA/instructor.

This program expects that:

gpg is installed and in the path
there is a .gh-token.gpg gpg-encrypted file somewhere containing a github token for using the actions API
- The path is hard-coded to be a convenient place for me (with define-runtime-path of course) in github-actions/github-api.rkt
- To get one of these, go to your github account settings > Developer settings > Personal access tokens and generate one.

grading/snapshot-team-repos.rkt

This program takes a snapshot of student submissions after a deadline and commits them to a snapshots repo (either the tests or the submission one). Meant to run locally by the TA/instructor.

One caveat here is that the snapshot will be of the default branch of the repo (unless otherwise specified). So students must push to main / master.

grading/grading.rkt

This program

For each team, unpacks the assignment’s repo snapshot in the submission snapshots repo,
commits and pushes the snapshot (with a “skip CI” tag)
launches a CI job to grade the submission (running ci-test-fest.rkt), and
(later) pulls down the results.
Ultimately (once all CI jobs are done) it reports a summary of every team’s score.

The program does this for every team all at once (or optionally just a few). Meant to run locally by the TA/instructor.

An important thing to note here is that the grading program just uses the contents of the snapshot it is pointed to. It doesn’t know about deadlines or anything. The part of the process that handles the deadlines is the snapshotting. This has the nice property that if you want to have a special deadline for one or a few teams, you can just overwrite those teams’ snapshots and then run this grading program as usual to get everyone’s grades.

grading/debug-team-submission.rkt

This program

Pulls down a fresh copy of a team’s dev repo,
checks out a particular ref/hash or uses a snapshot,
copies it into the grading repo,
commits and pushes it, to run the normal ci-test-fest.rkt like students would

Meant to run locally by the TA/instructor.

Course management tools

In the course-admin repo.

Scheduling and recording codewalks

codewalks.rkt

This program manages the codewalk schedule. It needs to be configured by modifying the definitions at the top of the file. Having done that, it

generates a codewalking schedule for the quarter
keeps track of the upcoming and past codewalks
generates piazza messages notifying teams that they will codewalk in the next class
verify that the current codewalk plan doesn’t violate any constraints

The codewalk schedule and history are stored in the files named at the top of the program. The plan can be edited by hand; then the verification option is useful to make sure the changes don’t accidentally cause problems.

You may want to edit the message generated by codewalk-messages, too.

Run it with -h to see the options.

scores.rkt

This program is mainly a record of scores and related information for each team, but its main module also records each teams’ current scores in a CSV for uploading into canvas.

The regular assignment scores are stored in the scores-by-assignment hash at the top of the program (just paste in the output of software-construction-admin/grading/grading.rkt). Final CI run scores are in final-run-scores. Final codewalk scores are recorded by filling in final-codewalk-score. The teams that participate in the tournament are recorded by filling in tournament-teams. Any late penalty exceptions are recorded in late-penalty-exception?. The function from CI scores to assignment grades is in code-score-function, which uses the current-late-submission-penalty-factor.

For the canvas score recording, the program expects a CSV export of the course grades from canvas, and it then updates that CSV to create a new one with the latest team grades. The new csv can then be uploaded back into canvas.

teams.rkt

Provides a list of all the teams in the class. This file needs to be filled in with the output of running the helper parse-team-info-from-spreadsheet.rkt on the team-registering-google-form export.

What to do when spinning up another iteration of the course

Create a course organization, contact the NU people IT people who can make it a NU organization

This gives us access to the NU CI credits.

Create the oracle repo therein, called `oracle`, and push the repo of last year’s oracle

Clean out the oracle for the last game.

Ensure the repo is public.

Create a google form for students to record partner pairs

The form we used in the past had these fields

Timestamp,My name,My netid,My GitHub id,My partner's name,My partner's netid,My partner's GitHub id,My partner and I have met and discussed our schedule and put aside time to pair program on the homework assignments.

The csv-exported data from this form will be used to populate the team information below.

Create a directory for this iteration of the course

E.g. spring22

Set up all the grading repos

Clone the oracle from last year

To spring22/oracle And install it as a package.

Clone the course admin repo from last year, update it

To spring22/course-admin And install it as a package.

Update the information at the top of codewalks.rkt, specifically

the dates for codewalking
the list of known student absences (probably starts empty, but update it as they come in)
(perhaps) the codewalk notification message

Update the information in teams.rkt using parse-team-info-from-spreadsheet.rkt and the spreadsheet from the student teams form. You’ll probably need to edit parse-team-info-from-spreadsheet.rkt to update the path to the csv.

Clean up scores.rkt if needed, which is the file that records the history of team scores.

Generate the codewalk schedule for the quarter with codewalks.rkt.

(optional, useful for testing everything) Make/copy a dummy team repo, update its workflow yaml if needed

Update racket version
Update organization url

Create the snapshot and grading repos

Create the directories and do git init for each:

spring22/grading
spring22/submission-snapshots
spring22/test-snapshots

Run the scripts to kick off and retrieve test validation, grading information to check if apis have changed. The first attempt for validation/grading may fail because the workflow .yml files need to be pushed before the api will accept references to them.

Create a private repo in your personal github account for grading

Update settings

Review and update `oracle/software-construction-admin/config.rkt`

Organization name
Grading repo info
Assignment sequence and test info
Oracle types
Assignment test deadlines

Update github auth key if necessary

https://github.com/settings/tokens

Put it in the encrypted file referred to by github-actions/github-api.rkt.

Ensure that the oracle still works by pushing to the dummy team repo and checking if the actions CI launches, finishes OK

Update the CI guide as necessary

At minimum, in the workflow yaml:

Racket version
Organization url

Update the auto-generated action config for automated grading if necessary

E.g. to bump the racket version number. At software-construction-admin/github-actions/actions.rkt, function make-workflow-config-contents.

Gotchas and things to keep in mind

Look through and modify config.rkt to control many fine-grained aspects of the system

Which repos to use
Which github organization to use
Lots of stuff around assignments
- Which assignments have tests to validate etc
- What kind of oracle per assignment
- Which assignments tests conflict with past assignments
- Assignment deadlines
- …and so on…

The infrastructure relies on test files containing only valid json to check test validity and novelty

So if you want require tests submitted to be unique and novel, then you need to have a requirement that the test files only contain valid json.

We didn’t require this in Spring 2021 on the first assignment and ended up having to add an exception for test novelty checking because it’s a mess to do when tests are not completely valid json.

Multiple kinds of oracles

Not all assignments fit nicely into the format of running the student submission on a test input and comparing it to the oracle’s answer on the same input. For example a given part of the system might be able to give a variety of different but correct answers. Or the part under test might involve a sequence of interactions.

So we have three kinds of oracles to handle all of these cases:

A basic oracle that we just run on the input and compare the output against the submission’s output
An oracle that gets the input and the submission’s output and returns if it’s OK or not
An oracle that gets the input and an interface to interact with the submission and returns if it’s OK for not

The table mapping assignments to these kinds of oracles is in

Name		Name	Last commit message	Last commit date
Latest commit History 563 Commits
software-construction-admin		software-construction-admin
welcome-to-oracle		welcome-to-oracle
.gitignore		.gitignore
info.rkt		info.rkt
readme.org		readme.org

NorthwesternSoftwareConstructionS22/oracle

Folders and files

Latest commit

History

Repository files navigation

Software construction CI grading infrastructure

Course assignment structure

Class repository structure and conventions

Summary of repos

Organization

Student submissions: the dev repos

CI integration

Infrastructure implementation: the infrastructure repo

Course admin repo

Other repos

test snapshots repo

submission snapshots repo

grading repo

Course workflows

For assignments

Instructor implements oracle and commits it

Students create tests before test-submission-deadline, push them to their dev repos

After test-submission-deadline, TA collects valid tests

Students create submissions before submission-deadline, push them to their dev repos

After submission-deadline, TA grades submissions

Debugging student questions about a submission

Regrading a particular submission

For codewalks

Before each codewalk (24-36h before)

After each codewalk

If the plan needs to change

The infrastructure

Programs

main.rkt

test-validation/ci-validate-tests.rkt

test-validation/validate-tests.rkt

grading/snapshot-team-repos.rkt

grading/grading.rkt

grading/debug-team-submission.rkt

Course management tools

Scheduling and recording codewalks

codewalks.rkt

scores.rkt

teams.rkt

What to do when spinning up another iteration of the course

Create a course organization, contact the NU people IT people who can make it a NU organization

Create the oracle repo therein, called oracle, and push the repo of last year’s oracle

Create a google form for students to record partner pairs

Create a directory for this iteration of the course

Set up all the grading repos

Clone the oracle from last year

Clone the course admin repo from last year, update it

(optional, useful for testing everything) Make/copy a dummy team repo, update its workflow yaml if needed

Create the snapshot and grading repos

Create a private repo in your personal github account for grading

Update settings

Review and update oracle/software-construction-admin/config.rkt

Update github auth key if necessary

Ensure that the oracle still works by pushing to the dummy team repo and checking if the actions CI launches, finishes OK

Update the CI guide as necessary

Update the auto-generated action config for automated grading if necessary

Gotchas and things to keep in mind

Look through and modify config.rkt to control many fine-grained aspects of the system

The infrastructure relies on test files containing only valid json to check test validity and novelty

Multiple kinds of oracles

About

Resources

Stars

Watchers

Forks

Languages

Students create tests before `test-submission-deadline`, push them to their dev repos

After `test-submission-deadline`, TA collects valid tests

Students create submissions before `submission-deadline`, push them to their dev repos

After `submission-deadline`, TA grades submissions

Create the oracle repo therein, called `oracle`, and push the repo of last year’s oracle

Review and update `oracle/software-construction-admin/config.rkt`