# Table of Contents

1. General characteristics
2. Functions
    - Functions should have descriptive names
    - Functions should be short
    - Functions should do one thing
3. Classes
    - When to use classes
    - Encapsulation
    - When to use inheritance
    - `super` function
5. Exceptions
6. Documentation / comments
7. Modules and packages
8. Standard libraries (Intro): os, sys, datetime, shutil, glob, re, logging, urllib, subprocess, pickle, ... (move this to language overview)
9. [Unit] Testing
10. PEP8 - coding conventions
11. Version Control Tools (Git)
12. Virtual environments
13. ...
14. Miscellany:
    - comprehensions (list/dict/set)
    - string formatting
    - lambda functions
    - decorators
    - logical operators
    - variable scope
    - mutable vs immutable
    - `*args` and `**kwargs`
    - floating point arithmetic

# PEP 8 - Python Style Guide

A PEP is a Python Enhancement Proposal. PEP 8 (the eigth PEP) describes how to write Python
code in a common style that will be easily readable by other programmers. If this seems
unnecessary, consider that programmers spend much more time reading code than writing it.

You can read PEP 8 here: https://www.python.org/dev/peps/pep-0008/

## pycodestyle

Wouldn't it be nice if you didn't need to remember all of these silly rules
for how to write PEP 8-consistent code? What if there was a tool that would
tell you if your code matches PEP 8 conventions or no?

There is such a tool, called [`pycodestyle`](https://pypi.python.org/pypi/pycodestyle).

In [1]:
"""
This is some ugly code that does not conform to PEP 8.

Check me with pycodestyle:
    pycodestyle pep8_example.py
"""
from string import *
import math, os, sys

def f(x):
    """This function has lines that are just too long. The maximum suggested line length is 80 characters."""
    return 4.27321*x**3 - 8.375134*x**2 + 7.451431*x + 2.214154 - math.log(3.42153*x) + (1 + math.exp(-6.231452*x**2))
def g(x,
     y):
    print("Bad splitting of arguments")

# examples of bad spacing
mydict  =  { 'ham' : 2,  'eggs'  : 7  }#this is badly spaced
mylist=[ 1 , 2 , 3 ]

myvar   = 7
myvar2  = myvar*myvar
myvar10 = myvar**10

# badly formatted math
a= myvar+7 *  18-myvar2  /  2

l = 1 # l looks like 1 in some fonts
I = l # also bad
O = 0 # O looks like 0 in some fonts

# bad variable names
kMyUglyVariableName  = 18
The_Meaning_Of_Life  = 42


In [10]:
import subprocess
program = 'pycodestyle'
filename = '../resources/pep8_example.py'
try:
    result = subprocess.run([program, filename], stdout=subprocess.PIPE)
except CalledProcessError:
    pass
print(result.stdout.decode("utf-8"))

../resources/pep8_example.py:8:12: E401 multiple imports on one line
../resources/pep8_example.py:10:1: E302 expected 2 blank lines, found 1
../resources/pep8_example.py:11:80: E501 line too long (109 > 79 characters)
../resources/pep8_example.py:12:80: E501 line too long (118 > 79 characters)
../resources/pep8_example.py:13:1: E302 expected 2 blank lines, found 0
../resources/pep8_example.py:14:6: E128 continuation line under-indented for visual indent
../resources/pep8_example.py:18:1: E305 expected 2 blank lines after class or function definition, found 1
../resources/pep8_example.py:18:7: E221 multiple spaces before operator
../resources/pep8_example.py:18:10: E222 multiple spaces after operator
../resources/pep8_example.py:18:13: E201 whitespace after '{'
../resources/pep8_example.py:18:19: E203 whitespace before ':'
../resources/pep8_example.py:18:33: E203 whitespace before ':'
../resources/pep8_example.py:18:38: E202 whitespace before '}'
../resources/pep8_example.py:18:40: E261

# Naming conventions

Use descriptive names for your variables, functions, and classes. In Python,
the following conventions are usually observed:
* Variables, functions, and function arguments are lower-case, with underscores to separate words.
    ```python
    index = 0
    num_columns = 3
    length_m = 7.2   # you can add units to a variable name
    ```
* Constants can be written in all-caps.
    ```python
    CU_SPECIFIC_HEAT_CAPACITY = 376.812   # J/(kg K)
    ```
* Class names are written with the CapWords convention:
    ```python
    class MyClass:
    ```
    
Programmers coming from other programming languages (especially FORTRAN and C/C++) should avoid using
special encodings (e.g., [Hungarian notation](https://en.wikipedia.org/wiki/Hungarian_notation)) in their
variable names:
```python
# don't do this!
iLoopVar = 0      # i indicates integer
szName = 'Test'   # sz means 'string'
gGlobalVar = 7    # g indicates a global variable
```

# Comments

Comments are helpful when they clarify code. They should be used *sparingly*. Why?
* If a code is so difficult to read that it needs a comment to explain it, it should probably be rewritten.
* Someone may update the code and forget to update a comment, making it misinformation.
* Comments tend to clutter the code and make it difficult to read.

Consider this example:

In [16]:
# this function does foo to the bar!
def foo(bar):
    bar = not bar   # bar is active low, so we invert the logic
    if bar == True:   # bar can sometimes be true
        print("The bar is True!")   # success!
    else:   # sometimes bar is not true
        print("Argh!")   # I hate it when the bar is not true!    

Only one of these comments is helpful. This code is much easier to read when written properly:

In [19]:
def foo(bar):
    """
    This function does foo to the bar!
    
    Bar is active low, so we invert the logic.
    """
    bar = not bar    # logic inversion
    if bar:
        print("The bar is True!")
    else:
        print("Argh!")

# Doc strings

Doc-strings are a useful way to document what a function (or class) does.

In [3]:
def add_two_numbers(a, b):
    """This function returns the result of a + b."""
    return a + b

In a Jupyter notebook (like this one) or an iPython shell, you can access get information
about what a function does and what arguments it does by reading its doc-string:

In [5]:
add_two_numbers?

Doc-strings can be several lines long:

In [7]:
def analyze_data(data, old_format=False, make_plots=True):
    """
    This function analyzes our super-important data.
    
    If you want to use the old data format, set old_format to True.
    Set make_plots to false if you do not want to plot the data.
    """
    # analysis `...

If you are working on a large project, there may be project specific conventions
on how to write doc-strings. For example:

In [9]:
def google_style_doc_string(arg1, arg2):
    """Example Google-style doc-string.
    
    Put a brief description of what the function does here.
    In this case, the function does nothing.
    
    Args:
        arg1 (str): Your full name (name + surname)
        arg2 (int): Your favorite number

    Returns:
        bool: The return value. True for success, False otherwise.
    """

def scipy_style_doc_string(x, y):
    """This is a SciPy/NumPy-style doc-string.
    
    All of the functions in SciPy and NumPy use this format for their
    doc-strings.
    
    Parameters
    ----------
    x : float
        Description of parameter `x`.
    y :
        Description of parameter `y` (with type not specified)

    Returns
    -------
    err_code : int
        Non-zero value indicates error code, or zero on success.
    err_msg : str or None
        Human readable error message, or None on success.
    """

# Functions

## Functions should have descriptive names

Functions should have names that describe what they are for.

For example, what does this function do?

In [63]:
def myfunc(mylist):
    import re
    f = re.compile('([0-9]+)_.*')
    return [int(f.findall(mystr)[0]) for mystr in mylist]

myfunc(['000_Image.png', '123_Image.png', '054_Image.png'])

[0, 123, 54]

This function is for extracting the leading integers from a list of file names.
A better alternative could be:
```python
def extract_integer_index(file_list):
```

## Functions should be short

Here is an example of a function that is a bit too long.
It is not very long because it is an example, but in real physics code it is not uncommon
to find single functions that are hundreds of lines long!

In [61]:
def analyze():
    print("******************************")
    print("    Starting the Analysis!    ")
    print("******************************")

    # create fake data
    x = [4.1, 2.8, 6.7, 3.5, 7.9, 8.0, 2.1, 6.3, 6.6, 4.2, 1.5]
    y = [2.2, 5.3, 6.3, 2.4, 0.1, 0.67, 7.8, 9.1, 7.1, 4.9, 5.1]
    
    # make tuple and sort
    data = list(zip(x, y))
    data.sort()
    
    # calculate statistics
    y_sum = 0
    xy_sum = 0
    xxy_sum = 0
    for xx, yy in data:
        y_sum += xx
        xy_sum += xx*yy
        xxy_sum += xx*xx*yy
    xbar = xy_sum / y_sum
    x2bar = xxy_sum/y_sum
    std_dev = (x2bar - xbar**2)**0.5
    
    # print the results
    print("Mean:   ", xbar)
    print("Std Dev:", std_dev)

    print("Analysis successful!")

analyze()

******************************
    Starting the Analysis!    
******************************
Mean:    4.272253258845437
Std Dev: 2.2108824184193927
Analysis successful!


How can we improve this code? Our `analysis` function is really doing three things:
1. Creating fake data
2. Calculating some statistics
3. Printing the the status and results

Each of these things can be put in a separate function.

In [60]:
def generate_fake_data():
    x = [4.1, 2.8, 6.7, 3.5, 7.9, 8.0, 2.1, 6.3, 6.6, 4.2, 1.5]
    y = [2.2, 5.3, 6.3, 2.4, 0.1, 0.67, 7.8, 9.1, 7.1, 4.9, 5.1]
    data = list(zip(x, y))
    data.sort()
    return data

def calculate_mean_and_stddev(xy_data):
    y_sum = 0
    xy_sum = 0
    xxy_sum = 0
    for xx, yy in data:
        y_sum += xx
        xy_sum += xx*yy
        xxy_sum += xx*xx*yy
    xbar = xy_sum / y_sum
    x2bar = xxy_sum/y_sum
    std_dev = (x2bar - xbar**2)**0.5
    return xbar, std_dev

def analyze():
    data = generate_fake_data()
    mean, std_dev = calculate_mean_and_stddev(data)
    print("Mean:   ", mean)
    print("Std Dev:", std_dev)
    
analyze()

Mean:    4.272253258845437
Std Dev: 2.2108824184193927


We note three important results of this code restructuring:
1. It is much easier to tell at a glance what `analyze()` does.
2. The comments (which we used to organize our code before) are no longer needed.
3. `generate_fake_data()` and `calculate_mean_and_stddev()` can now be reused elsewhere.

## Functions should do one thing

We now know we should break up big functions into smaller ones, but how do we decide
how to break them up, and how small should they be?

A useful principle for guiding the creation of functions is that functions should
do one thing. In the previous section, our large `analysis()` function was doing
several things, so we broke it up into smaller functions.

But wait! You may notice that `calculate_mean_and_stddev()` does two things! Should we
break it up into two functions, `calculate_mean()` and `calculate_stddev()`?
The answer depends on two things:
1. Will you ever want to calculate the mean and standard deviation separately?
2. Will splitting the function into two result in a large amount of duplicated code?

Another important consequence of of the "do one thing" principle is that it can help
you avoid cases where a function does more than what you would expect it to do.

For example, this function claims to just write data to a file; however, it also modifies the data!

In [62]:
def write_data_to_file(data, filename='data.dat'):
    with open(filename, 'w') as f:
        data *= 2
        f.write(data)

Try to imagine a much larger code where you have
a factor of two introduced, and you can't figure out where it came from.

# Classes

## When to use classes

## Encapsulation

## When to use inheritance