Skip to content

Latest commit

 

History

History
156 lines (106 loc) · 14.1 KB

README.md

File metadata and controls

156 lines (106 loc) · 14.1 KB

R judge for Dodona

Note: this judge makes extensive use of R environments to separate student code from test code. If you are not familiar with environments, it might be useful to read this.

A basic exercise

The test file for a basic exercise can look like this:

context({
  testcase('The correct method was used', {
    testEqual("test$alternative", function(studentEnv) { studentEnv$test$alternative }, 'two.sided')
    testEqual("test$method", function(studentEnv) { studentEnv$test$method }, ' Two Sample t-test')
  })
  testcase('p value is correct', {
    testEqual("test$p.value", function(studentEnv) { studentEnv$test$p.value }, 0.175)
  })
}, preExec = {
  set.seed(20190322)
})

context({
  testcase('x has the correct length', {
    testEqual("length(x)", function(studentEnv) { length(studentEnv$x) }, 100)
  })
})

Let's unpack what happens here.

Tabs

First of all, something you can't see in the example code above. Dodona groups contexts in tabs. These are represented in the R judge by the files containing the test code. The name of the file (without the .R) extension is used to name the tab. A file should contain one or more calls to context. Tabs are ordered lexicographically by their filename. To make sure that tabs can be in a logical order, leading digits followed by a dash (-) are also stripped from the filename.

Contexts

A context represents one execution of the student code. It is generally desirable to have as many contexts as possible, since students can filter by incorrect contexts. The context function does a few things:

  1. It creates a clean environment based on the global environment. Students have access to the global environment, but don't have access to the testing code or variables used in the testing code (the testing code is executed in a similar environment that is never bound).
  2. It executes the code passed through the preExec argument in this clean environment. This can be used for setting the seed (as in this example), but also to set variables or define functions that students can then use. NOTE: the preExec argument is not executed in the environment where the tests are run. If you need this, you will need to do this yourself.
  3. It executes the student's code in the clean environment. If the code errors out or generates warnings/messages these are caught and handled. An error will interrupt the execution and set the runtime error state for the the submission. It also adds a message to the context containing the error message generated by R. Warnings and messages do not interrupt execution, but are also added as messages to the context.
  4. It executes the first argument (a code block containing testcases) in the test environment.

Note that the student code is executed once for each call to context. Technically, this allows the student to store intermediate results in the global environment. The use of this is limited, so we don't see this as a problem.

The contextWithRmd function does the same as the context function but it expects the student code to be in the R Markdown format. The R chunks are evaluated as before and the markdown text is ignored during evalutaion.

An extra contextWithImage function also exists. This function takes the same arguments, but adds an image to the output if it was generated by the student while their code is executed. By default, this function will make the output wrong if the expected image wasn't generated. This behaviour can be changed by setting the optional failIfAbsent parameter to FALSE. Extra arguments, for example to set the width and height, will be passed to the underlying png call.

For introductory exercises students often use R as a calculator and do not store the result of an expression as a variable in their script. For such scripts the eval function that executes the parsed script of the student does not store this result as a variable in the test environment. However, it simply returns the value to the caller. The result of the evaluation is injected into the test environment under the name evaluationResult. A simple test using this could look like this:

context({
  testcase('the correct value is calculated', {
    testEqual("Result", function(studentEnv) { studentEnv$evaluationResult }, 42)
  })
})

Testcases

Testcases group a number of related tests. The first argument of the testcase function is a description of that related group. The second argument is a code block (containing tests) which will be executed by the testcase function. There is little functionality in the testcase function. It is mostly used as a wrapper for the Dodona concept.

In addition to the usual testcase function, there is also testcaseAssert. This specialist function can be used for testcases where the related test doesn't map cleanly to the Dodona concept of a test. For example, to check if a variable is present in the student's env, one could use this function like this:

testcaseAssert('x exists', function(studentEnv) { isTRUE(exists("x", studentEnv)) })

Tests

A test is an actual evaluation of correctness. Multiple test* functions are available and are explained in more detail below. The only constant thing for tests are the first three arguments:

  1. A description of the test. Preferably, this is something the student can copy-paste into their local R environment (e.g. length(x), test$p.value, etc.).
  2. A function extracting the value to be tested from the student's environment. This function should take one argument (env) and return a value.
  3. The expected value. This expected value is compared to the value extracted by the second argument.

testEqual

The testEqual function uses the base::all.equal function internally to determine whether the two values are equal. Any parameters that can be passed to all.equal can be passed to testEqual (but the first three arguments need to be as described above). In addition, one can pass a comparator argument to testEqual. This comparator should be a function that takes two arguments (generated and expected, in that order) and returns TRUE or FALSE. If this argument is passed, the comparator is used instead of all.equal. Any named arguments passed to test_equal that are not known by testEqual are passed to all.equal or comparator depending on what is used. There is also an extra formatter argument that can be passed. formatter should be a function that takes a single argument and returns its argument formatted to the test's liking.

If you want to include some extra feedback while the values are compared, you can do this by passing the comparator argument. To add a message to the feedback while the comparison is happening you can use the get_reporter()$add_message() function. See for example the following:

context({
  testcase('the correct value is calculated', {
    testEqual("Result", function(studentEnv) { studentEnv$evaluationResult }, TRUE, comparator = function(generated, expected, ...) {
      if (isTRUE(all.equal(generated, "TRUE"))) {
          get_reporter()$add_message("You should not add quotes around `TRUE`", type="markdown")
      }
      isTRUE(all.equal(generated, expected, ...))
    })
  })
})

testIdentical

The testIdentical function uses the base::identical function internally to determine whether the two values are equal. Any parameters that can be passed to identical can be passed to testIdentical (but the first three arguments need to be as described above). The formatter argument described above can also be passed to this function.

testImage

The testImage function is a special case, since it won't actually add a test to the output. Instead, it only expects one argument: a function taking the environment, that will generate an image when called. By default, this function will make the output wrong if the expected image wasn't generated. This behaviour can be changed by setting the optional failIfAbsent parameter to FALSE. Extra arguments, for example to set the width and height, will be passed to the underlying png call.

testDF

The testDF function can be used to test the equality of dataframes. By default row and column order are ignored. If you do not want this, pass the ignore_col_order and ignore_row_order arguments as FALSE (when applicable). Again, a custom comparator can be passed if necessary. The feedback in Dodona will show the first five rows of the dataframe(s).

testGGPlot

The testGGPlot function can be used to test the equality of GGPlots. Aside from the usual description, generated and expected arguments it has some optional arguments:

  • show_expected = TRUE A logical value indicating whether the solution plot should be shown to the student if the testGGPlot function determines the solution to be incorrect. If set to FALSE the student won't be able to compare their plot with the solution which could make an exercise really hard to solve. Please handle with care.
  • test_data = TRUE A logical value indicating whether the input data to the ggplot function should be verified. This test will succeed if all columns from the solution have a corresponding column in the given input with the same column name and data.
  • test_geom = TRUE A logical value indicating whether the geometric layers should be checked. For each layer the parameters and aesthetics are verified. This test method also takes into account the default aesthetics set in the ggplot function itself.
  • test_facet = TRUE A logical value indicating whether the facet layer should be tested. This test supports facet_grid and facet_wrap layers and can compare them to one another (e.g. a facet_grid with only 1 row/column can be equal to a facet_wrap and vice versa).
  • ignore_facet_type = TRUE A logical value indicating whether the facet type should be tested when testing the facet layer. Even though facet_grid and facet_wrap return similar graphs, the instructor may want to force facet_grid as it can create clearer graphs when faceting with multiple variables.
  • test_label = FALSE A logical value indicating whether the label layers should be tested.
  • test_scale = FALSE A logical value indicating whether the scale of the axis should be tested.

Note: Because we want the testing of the ggplot to be as flexible as possible the test functions are all made in a way that the given solution ggplot is the plot that defines the minimal requirements for the student plot. When writing exercises this is a very important aspect to keep in mind. For example we don't recommend testing plots where you defined parameters to be used in geometric layers in the ggplot function itself. This because the test function would test for these parameters in every geometric layer in the student plot even when they are not used.

Note: When testing ggplots we recommend using geom layers instead of stat layers. Both provide the same functionalities but tests written with geom layers will also work for ggplots with stat layers, this is not he case for tests written with stat layers.

testFunctionUsed

The testFunctionUsed function is a function you can use to test if a certain function is used in the student code. The function takes 1 parameter: the name of the function you want to make sure the student used.

testFunctionUsedInVar

The testFunctionUsedInVar function is a function you can use to test if a certain function is used in the assignation of a certain variable in the student code. It can also detect indirect assignations: testFunctionUsedInVar("mean", "a") will add a correct test to the feedback if the student code is a <- b <- mean(1). As you can see in the example the function takes 2 parameters. The first parameter should be the name of the function you want to test for. The second parameter is the name of the variable where the given function should be used in its assignment.

testHtest

The testHtest function is a function you can use to test objects of the htest class (e.g. result of t.test function). Aside from the usual description, generated and expected arguments it has some optional arguments:

  • test_p_value = TRUE A logical value indicating whether the p-value should be tested.
  • test_interval = TRUE A logical value indicating whether the confidence interval should be tested.
  • test_statistic = FALSE A logical value indicating whether test-statistic should be tested.
  • test_alternative = FALSE A logical value indicating whether the alternative should be tested.
  • test_confidence_level = FALSE A logical value indicating whether the confidence level should be tested.
  • test_method = FALSE A logical value indicating whether the used method should be tested.

testMultipleChoice

The testMultipleChoice function is a function you can use to test multiple choice questions. This function can handle multiple right anwsers if they are passed in a vector. Aside from the usual description, generated and expected arguments it has the following arguments:

  • possible_answers A vector/list containing all possible options for the multiple choice question. The options can be integer or character values.
  • verify_answer = FALSE A logical value indicating whether the given answer should be tested. When set to TRUE the judge will tell the student if his answer is wrong or correct. When set to FALSE the judge will only test if the given answer is valid (contained within possible_answers).
  • give_feedback = TRUE A logical value indicating whether the judge should show the student where their mistake is.
  • feedback = NULL A list containing optional extra information about why a certain option is wrong. This should be a named list with the options as names when your options are characters.
  • show_expected = FALSE A logical value indicating whether the judge should show the correct anwser to the student.

⚠️ We do not recommend using this test method because it won't deliver an optimal experience for students nor teachers.

If you would like to see multiple choice questions implemented in Dodona you can voice your support in this Dodona issue.