# Using okgrade
This notebook demonstrates how to use [okgrade](http://okgrade.readthedocs.io/en/latest/). okgrade is a Python library that allows instructors to grade students' Jupyter notebooks or Python source files. Imagine you're an instructor and have assigned a notebook to complete for homework. Perhaps there are 10 questions in that notebook and 100 students in the course. Rather than having to run each student's notebook one by one and visually inspecting the answers, okgrade will do that for you. You can find more information about okgrade, including installation instructions, at the link above.

To use okgrade, you only need to do two things:

- Write tests for each question
- Use okgrade's `grade` function to run your tests

### A quick working example

Before we go into detail about these two steps, let's see a quick working example. Imagine this notebook is one of your homework notebooks, and you want to test the students' ability to implement the [Euclidean algorithm](https://en.wikipedia.org/wiki/Euclidean_algorithm). You ask them the following question:

#### Question 1
_Write a function called `gcd` that takes two positive integers and returns their greatest common divisor._

Hopefully, your students define a function something like this:

In [2]:
# Solution
def gcd(a, b):
    while b:
        a, b = b, a%b
    return a

We've written tests for this question already, so all you need to do is run the following cell!

In [4]:
from okgrade import grade
grade('tests/q1.py')

SyntaxError: invalid syntax (grader.py, line 72)

Great - all the tests passed! If, however, a student gives a wrong answer, the tests will catch that:

In [59]:
# Wrong answer
def gcd(a, b):
    while b:
        a, b = b, a%b
    return a + 1

In [60]:
grade('tests/q1.py')

### Writing tests
The way okgrade grades is by running tests on students' code to see if it does what is expected. In the example above, we knew (from pen and paper calcuations) that the greatest common divisor of 200 and 15 is 5. If students correctly implemented the Euclidean algorithm, then when we call their `gcd` function with 200 and 15 as arguments, it should return 5. If it does, the student's code passes that test. If not, it fails. As the instructor, you have to decide what tests are necessary and sufficient to test students' code. Once you've done that, you have to actually write the tests.

We write tests in Python files. These files are separate from the notebook or Python file that's being graded. Tests are normally kept in a folder called `tests`, but you can put them wherever you want. You can also call them whatever you want, but a common convention is "q1.py" for "question 1".

At the moment, tests are written in a particular format for historical reasons (it's a subset of the [OK test format](https://github.com/Cal-CS-61A-Staff/ok-client/wiki/ok_test)). Basically, the only thing the test file should do is assign a Python dictionary to the name `test`. That dictionary must follow a particular structure, so in practice it's usually best to just copy a template test file and change a few things. As an example, here's the entire contents of `tests/q1.py` from above:

```
test = {
    'name': 'Euclidean algorithm',
    'suites': [
        {
            'cases': [
                {
                    'code': r"""
                    >>> gcd(200, 15)
                    5
                    """
                },
                {
                    'code': r"""
                    >>> gcd(200, 14)
                    2
                    """
                },
                {
                    'code': r"""
                    >>> gcd(19, 201)
                    1
                    """
                }
            ]
        }
    ]
}
```

The important parts are the name right at the top ('Euclidean algorithm'), and the actual three tests. Everything else is boilerplate code. The tests are stored as raw strings following the [doctest](https://docs.python.org/3/library/doctest.html) format. The doctest format simulates the standard Python interpreter. You write the code you want to evaluate after the `>>>` and then the expected result on the following line. Here's a template test file, just replace the bits in all caps:


```
test = {
    'name': 'THE NAME OF THE TEST/QUESTION',
    'suites': [
        {
            'cases': [
                {
                    'code': r"""
                    >>> CODE THAT YOU WANT TO TEST
                    EXPECTED RESULT
                    """
                },
                {
                    'code': r"""
                    >>> CODE THAT YOU WANT TO TEST
                    EXPECTED RESULT
                    """
                },
                {
                    'code': r"""
                    >>> CODE THAT YOU WANT TO TEST
                    EXPECTED RESULT
                    """
                }
            ]
        }
    ]
}
```

There are a couple of points worth noting. First, we have three tests in this file but you can have as many as you want. Just copy the inner most dictionary (shown again below) as many times as you need.

```
{
                    'code': r"""
                    >>> CODE THAT YOU WANT TO TEST
                    EXPECTED RESULT
                    """
                }
```

Second, each test can test as many lines as you want. For example, this is a valid test:

```
{
                    'code': r"""
                    >>> gcd(19, 201)
                    1
                    >>> gcd(200, 16)
                    2
                    >>> callable(gcd)
                    True
                    """
                }
```

Third, these tests have access to the global namespace of the student's notebook being graded. So we can test any names that the student uses. This is why we never defined `gcd` ourselves, we're using the `gcd` that the student defined. If we asked a student to assign their answer to a question in a variable called `my_answer`, we can just test that their `my_answer` has the value we expect.

Fourth, as you saw above in the working example, when a test fails, it prints the simulated interactive session that didn't pass. We can use this to display helpful feedback to students by adding comments above the test that explain why it failed.

```
{
                    'code': r"""
                    >>> # It looks like you forgot to store your answer in a variable called 'my_answer'!
                    >>> # Did you spell it as 'my_answer' exactly?
                    >>> 'my_answer' in vars()
                    True
                    """
                }
```

Fifth, if you've been using [okpy](https://okpy.org/) and are used to writing tests with the full OK test format, here are some important restrictions on the test format used in okgrade. At the moment, all tests must be worth one point each. It's easiest if you don't specify any points yourself and let okgrade assume one point for each test. You also can't use any set-up or tear-down code.

### okgrade's `grade` function

Now that we know how to write tests, let's look at the `grade` function. Imagine that you've assigned a homework on regular expressions. Perhaps you start off with a question like this:

#### Question 2
*Write a regular expression pattern to validate a US phone number that consists of three digits, a dash ('-'), three more digits, a dash, and then four digits. Store the pattern (as a string) in a variable called `phone_number_pattern`.*

Hopefully your students write 

In [3]:
phone_number_pattern = r'\d{3}-\d{3}-\d{4}'

And you can test their answers with the following test, which is in "tests/q2-basic.py":
```
{
                    'code': r"""
                    >>> import re
                    >>> test_string = 'Call me maybe? 510-123-4567'
                    >>> match = re.search(phone_number_pattern, test_string)
                    >>> bool(match)
                    True
                    """
                }
```


In [15]:
result = grade('tests/q2-basic.py')
result

The return value of `grade` is a `TestResult`. We can ask instances of `TestResult` for the grade from that test:

In [18]:
result.grade

1

Our tests in 'tests/q2-basic.py' weren't thorough or helpful to students, so let's try the following tests in 'tests/q2-thorough.py:

```
               {
                    'code': r"""
                    >>> # Make sure you assigned your answer to a variable called 'phone_number_pattern'.
                    >>> 'phone_number_pattern' in vars()
                    True
                    """
                },
                {
                    'code': r"""
                    >>> # Remember that phone_number_pattern should be a string.
                    >>> isinstance(phone_number_pattern, str)
                    True
                    """
                },
                {
                    'code': r"""
                    >>> # We're just looking for phone numbers without parentheses at the moment!
                    >>> import re
                    >>> test_string = "Don't do anything fancy with parentheses: (510)-123-4567"
                    >>> match = re.search(phone_number_pattern, test_string)
                    >>> bool(match)
                    False
                    """
                },
                {
                    'code': r"""
                    >>> # Are you sure your regular expression is looking for all ten digits?
                    >>> import re
                    >>> test_string = "You shouldn't match on 123-4567 but you should on 510-123-4567"
                    >>> matches = re.findall(phone_number_pattern, test_string)
                    >>> len(matches)
                    1
                    """
                }
                
```

In [24]:
result = grade('tests/q2-thorough.py')
result.grade

1

The current answer in `phone_number_pattern` passed all these tests, but perhaps another student misunderstood the question and submitted a compiled regular expression object:

In [26]:
phone_number_pattern = re.compile(r'\d{3}-\d{3}-\d{4}')
result = grade('tests/q2-thorough.py')
result.grade

0.0

To be finished...

### Examples
More examples to go here.