# 07.04 - Notebook Format and Naming Convention

## Purpose

It's come up a few times when people don't understand the 30% part of the grade, and why I'm grading the way I'm grading.

The video part of this can be found in 01.01 - Course Introduction.
https://uiowa.instructure.com/courses/172943/modules/items/4525367

The above explains much of it, this is intended to explain it in more detail

## Description of Problem.

The grading in this class is 50% As and 50% Bs, but the breakdown for any particular homework/exam is: 70% correctness and 30% format/naming/etc.

Why?

The largest cost in software development is the **maintenance** of code, not the development of it (https://stackoverflow.com/questions/3477706/development-cost-versus-maintenance-cost).  The Data Sciences realm have even worse of an issue because few teach the importance of naming.  When you write code, it needs to be as clear as possible on intention.  What does this help with?

* When you come back to it in 6 months (or longer) you know what was done and why.
* Less bugs - If things are consistent, it's easier to read, and less likely to cause issues.
* Easier to explain - If you have to show your code to anyone, it's easier to explain if it's better written.

## Sections contained in notebook:

* Naming Convention Category - Covers names, both of functions and of variables
* Conciseness - Covers making code cleaner in communication
* Creativity - Going "above and beyond"

## References

Some books I highly recommend that talk about this topic, some of which is also included on ICON (under module 7):

* https://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882 - Clean Code, a Handbook of Agile Software Development
  * A very good book that explains naming convention, and what makes code "clean" (Or at least "cleaner").  A book I heavily rely on (and have for many, many years)
* https://www.amazon.com/Pragmatic-Programmer-journey-mastery-Anniversary/dp/0135957052/ - Pragmatic Programmer: Your Journey to Mastery.
  * I have an older copy of this book, but it's good.  Similar to Clean Code

# Naming Convention Category

## Consistent Names

What do I mean by consistent names? Lets look at some examples, pretend this is all in one notebook:

In [1]:
myInt = 0
my_int = 0
MyInt = 0
MYINT = 0

Each of the above are inconsistently named.  Python has some guidelines (which I don't grade on), that you can follow.  But, be consistent.

## Proper Names

Proper names fall with 'consistent names' as well as 'format' in terms of the grading.  Names should be descriptive about what they're holding.  Sometimes this is good to make your own story, or use hints from homework to wrap a story around.  Let's take an example:

In [2]:
myMatrix = [[1, 2, 3], [4, 5, 6]]
catAges = [5, 7, 2, 8, 1]

In the first example, myMatrix - why is this bad? Well, it tells me **nothing** about the purpose of this variable, or why I should care.  Are those ages of children? The weight of loaves of bread? etc.  The variable below, catAges, is more descriptive.  It's clear from the start that this is a list of ages of cats.

The readers of your code shouldn't have to investigate how it's used to know what it's for.  Simple as that.

Some tips though:

* Make it descriptive, use names that matter and indicate why I should care (as a reader of your notebook).
* Make it consistently named with other variables in your notebook.
* Don't make it redundant.  Avoid the use of 'matrix', for example.  You'll notice I tend to spell out the type (anInt), or (aMatrix) because I'm indicating the type, for your benefit.  In real projects, use real names and not types unless you have to for some reason.

# Conciseness

When I talk about conciseness, there's a few parts.  Do you have duplicated code (e.g. a duplicated while loop with only minor changes?)  Do you have functionality defined that isn't used?  Are you using an improper loop for a particular problem?

Let's go through some examples

## Loop Choices (conciseness)

In [3]:
catAges = []
for x in range(10):
    catAges.append(x)

The above is not concise. Largely speaking, I likely wouldn't grade negatively on the above, even if it's not concise - but it's not concise.  Why?  Because list comprehension exists, and is cleaner (less code)

In [4]:
catAges = [x for x in range(10)]

## Duplicated Code
Below, we have some duplicated code doing the same general thing.

In [5]:
counter = 0
myCollection = []  # Notice, bad name
while counter < 10:
    myCollection.append(counter)
    counter += 1

myCollection2 = []  # Notice, also a bad name
counter = 0
while counter < 15:
    myCollection.append(counter)
    counter += 1

The above can be written better two ways.  Ideally, this is another case for list comprehension, but if you have to use a while loop, then:

In [6]:
def generateIntegerSequence(length):  # Notice, good name
    returnCollection = []
    counter = 0
    while counter < length:
        returnCollection.append(counter)
        counter += 1
    return returnCollection

myCollection = generateIntegerSequence(10)
myCollection2 = generateIntegerSequence(15)

## Functions

Functions should be meaningful, not just replacing what's already in Python.  For example:

In [7]:
import random as rnd
def generateRandomSequence(low, high, num):
    return [rnd.randint(low, high) for _ in range(num)]

The above is not really useful, it doesn't add anything that base Python doesn't already provide.  It's okay for practice, but too many of these cause readability issues.  But, if the requested structure is more complicated, and you're unsure how to do it in a library, it can help.  For example:

In [8]:
import numpy as np
def generateRandomMatrix(low, high, dimensions):
    """Given a low and high number, and a dimension tuple (x, y) return a y by x matrix of random numbers between the low and high"""
    x = dimensions[0]
    y = dimensions[1]
    returnStructure = []
    while len(returnStructure) < y:
        returnStructure.append([rnd.randint(low, high) for _ in range(x)])
    return np.asarray(returnStructure)

In [9]:
generateRandomMatrix(1, 10, (5, 10))

array([[ 6,  2,  9,  5,  8],
       [ 9,  5,  4,  5,  5],
       [ 4,  4,  3,  5,  4],
       [10,  4,  7,  7,  2],
       [ 5,  7,  4,  1, 10],
       [ 5,  3,  8,  5,  5],
       [10,  3, 10,  2,  3],
       [ 7,  3,  4,  7, 10],
       [ 5,  3,  1,  4,  1],
       [ 1,  4,  7,  6, 10]])

While the generateRandomMatrix is overkill for what we're doing (could be greatly simplified), the point loss is far less than otherwise, but when dealing with more complicated structures, a function is perfectly applicable.

## Improper Loop

Loop choices do matter.  I've recommended many times about "ask yourself why" you choose one thing over another.

Let's go through an example.

In [10]:
catAges = [1, 5, 3, 2, 9, 8]

counter = 0
while counter < 6:
    print(f"Cat Age: {catAges[counter]}")
    counter += 1

Cat Age: 1
Cat Age: 5
Cat Age: 3
Cat Age: 2
Cat Age: 9
Cat Age: 8


There's two things wrong with the above implementation.  First, we hardcoded the 6 for the length.  What happens if we have catAges with more or less elements?  We then introduce a bug.  Second, this is better served as a for loop, because it eliminates the above bug, and has less code.  That would be:

In [11]:
for age in catAges:
    print(f"Cat Age: {age}")

Cat Age: 1
Cat Age: 5
Cat Age: 3
Cat Age: 2
Cat Age: 9
Cat Age: 8


# Notebook Format

The format of your notebook matters, too.  Do you include any documentation?  Do you have blocks clearly identified?

Most people have been using my .todo file, and just extending off it.  Largely speaking, I haven't been strict here either, but did go over it in lectures.  The format, I generally go for is:

1.  The title of the notebook
2.  A description of the notebook (again, why should I care to read this as an receiver of the notebook)
3.  Dependencies/libraries (that I may need to install)
4.  Imports
5.  Helper functions
6.  Implementation

For the **Implementation** section, it may be useful to break that out (like I do by section).  In production notebooks, you may have a **conclusion** at the end, too.

Along with notebook format is stuff like:
* Misspellings - Did you turn in the notebook without typos in either the comments or the names.
* Proper capitalization - Does your notebook read professionally, something you're proud of sharing.

# Creativity

Creativity can earn you bonus points in certain circumstances.  This usually mean adding in more than was is explicitly required.  Some examples of that are:

1.  Interesting helper functions.  One student wrote a helper for printing out types of variables/values, which is interesting.
2.  A story around the data set.  Another student wrote more about their data set, adding names and a story around what they did.
3.  Finding an interesting way of implementation that was concise, and outside what was directly taught (in some cases, it's rare).

The max grade for any one assignment is 100, regardless of how many bonus points are awarded.  But, early on in the course, I explained that the purpose of the language is to "play" with it to get a better idea of how to use it.  Some people have taken that opportunity in their homework to show attempts they went through to get an answer, why they picked one, and so on.  Even if incorrect, understanding what you attempted will give you far more points.