# Rubric for coding projects

Each coding project should be written as a notebook (if you're creating the notebook yourself, make sure the kernel is set to Python3).

Each coding project is graded on a 5 point scale, according to the criteria below.
If you're improving the lecture notes instead, check the other rubric.

## Code must run and work (F/P)

- When given input of the correct form, your code must execute without crashing (no *syntactic* bugs).

- Your code must be reasonably close to solving the problem.
  For example, the spellchecker below meets the criterion of running without crashing and being bug-free, but it obviously is a horrible spellchecker:

In [2]:
def is_misspelled(word):
    return True

- This criterion does not add any points, but **failing this criterion automatically means 0 out of 5 points.**

- Each one of the other criteria adds one point.

## Code must be bug-free (1 point)

- Your code should be free of *semantic* bugs.
  A semantic bug does not crash the program, but it makes it do things that weren't intended.
  For instance, if you have a piece of code that misbehaves on some inputs and extracts word bigrams from a list of words instead of character bigrams from the words in those lists, that's a semantic bug.

## Code should be modular and scalable (1 point)

- Your solution should be general enough to be easily extended to slightly different problems.
  For instance, if your task requires a trigram model, it should be easy enough to adapt the code for any arbitrary n-gram model.

- Here's an example of code that fails this criterion because it cannot work with n-grams of arbitrary size.

In [3]:
def trigram_freq(trigram, prefix_tree):
    """Get trigram frequency from prefix tree."""
    first = trigram[0]
    second = trigram[1]
    third = trigram[2]
    freq = "_freq"
    return prefix_tree.get(first).get(second).get(third).get(freq)

- The following piece of code for bigram extraction meets the condition.
  Even though it is written for bigrams, it can easily be generalized to arbitrary n-grams:

In [4]:
def extract_bigrams(sentence):
    """Extract bigrams from tokenized sentence."""
    return [sentence[m:m+2] for m in range(len(sentence) - 1)]

## Code should follow good coding practices (1 point)

- Use functions.
  Don't just put everything into a single function.

- Functions must have docstrings.
  For simple functions, a single line description suffices.
  For more complex ones, specify arguments, return value, examples, and so on.

- Complicated parts of the code should be clarified with comments.

- Where appropriate, use classes.

- Use meaningful names for variables, functions, and classes.

- Aim for simple, elegant code that is easy to read.
  Below is an example of a particular crude piece of code, with several alternatives of increasing elegance:

In [7]:
# a crude piece of code
def keyval_swapper(some_dict):
    "Swap keys and values in dictionary."
    tmp = []
    for k in some_dict.keys():
        for v in some_dict.values():
            tmp2 = (v, k)
            tmp.append(tmp2)
    new_dict = dict(tmp)
    return new_dict

In [8]:
# simplified for-loop
def keyval_swapper(some_dict):
    "Swap keys and values in dictionary."
    tmp = []
    for k, v in some_dict.items():
        tmp2 = (v, k)
        tmp.append(tmp2)
    new_dict = dict(tmp)

In [9]:
# redundant variables removed
def keyval_swapper(some_dict):
    "Swap keys and values in dictionary."
    tmp = []
    for k, v in some_dict.items():
        tmp.append( (v, k) )
    return dict(tmp)

In [10]:
# use list expression instead of for loop
def keyval_swapper(some_dict):
    "Swap keys and values in dictionary."
    tmp = [(v, k) for k, v in some_dict.items()]
    return dict(tmp)

In [11]:
# construct new_dict via dictionary expression
def keyval_swapper(some_dict):
    "Swap keys and values in dictionary."
    return {v: k for k, v in some_dict.items()}

## Code should come with test cases (1 point)

- Specify a list of problems that your code should be tested on.
  This can be done as part of each function's docstring by providing examples there.

### Code should have written documentation (1 point)

- Every good piece of software comes with documentation that describes its usage and design.
  This includes:

  - what the software is good for
  - how it should be used
  - important aspects of the implementation  (for instance, if it takes very long to run with large inputs)

- For simple projects (like yours will probably be), this can be very short and can be added at the beginning of the notebook.
  An example:

> DJ: A Python script for detecting dad jokes
> ===========================================
>
> This Python script is used to calculate what percentage of tweets contain lame dad jokes.
> For now, it only calculates the number of tweets with hashtag #metoo that contain the phrase "me three" or "methree".
> But additional matches can be passed in as a dictionary.
> You have to have a key for the Twitter API in order to use it.
> Just run the code in the cell, then wait.
> Depending on your internet connection, the first run might take over an hour.
> Results are cached, so future runs should take less time.

## Optional: Give your code a catchy name (just for fun)

- Software relies on good naming to attract attention and convey what they're about.

  For instance, there is a finite-state automaton library for handling phonological and morphological processes in Python. 
  The package is called *pynini*, which is a portmanteau of *Python* and the Indian grammarian *Panini*.
  Panini, as you might know, was the first to systematically describe the phonology and morphology of a human language (Sanskrit, in this case).

  Another cute option is a recursive acronym.
  Examples include *WINE* for "WINE is not an emulator" and *GNU* for "GNU is not Unix".

  That said, many packages have pretty boring names, like *NLTK* for "Natural Language ToolKit".