# Table of Contents

1. General characteristics
2. Functions
    - Functions should have descriptive names
    - Functions should be short
    - Functions should do one thing
3. Classes
    - When to use classes
    - Encapsulation
    - When to use inheritance
    - `super` function
5. Exceptions
6. Documentation / comments
7. Modules and packages
8. Standard libraries (Intro): os, sys, datetime, shutil, glob, re, logging, urllib, subprocess, pickle, ... (move this to language overview)
9. [Unit] Testing
10. PEP8 - coding conventions
11. Version Control Tools (Git)
12. Virtual environments
13. ...
14. Miscellany:
    - comprehensions (list/dict/set)
    - string formatting
    - lambda functions
    - decorators
    - logical operators
    - variable scope
    - mutable vs immutable
    - `*args` and `**kwargs`
    - floating point arithmetic

# Functions

## Functions should have descriptive names

Functions should have names that describe what they are for.

For example, what does this function do?

In [63]:
def myfunc(mylist):
    import re
    f = re.compile('([0-9]+)_.*')
    return [int(f.findall(mystr)[0]) for mystr in mylist]

myfunc(['000_Image.png', '123_Image.png', '054_Image.png'])

[0, 123, 54]

This function is for extracting the leading integers from a list of file names.
A better alternative could be:
```python
def extract_integer_index(file_list):
```

## Functions should be short

Here is an example of a function that is a bit too long.
It is not very long because it is an example, but in real physics code it is not uncommon
to find single functions that are hundreds of lines long!

In [61]:
def analyze():
    print("******************************")
    print("    Starting the Analysis!    ")
    print("******************************")

    # create fake data
    x = [4.1, 2.8, 6.7, 3.5, 7.9, 8.0, 2.1, 6.3, 6.6, 4.2, 1.5]
    y = [2.2, 5.3, 6.3, 2.4, 0.1, 0.67, 7.8, 9.1, 7.1, 4.9, 5.1]
    
    # make tuple and sort
    data = list(zip(x, y))
    data.sort()
    
    # calculate statistics
    y_sum = 0
    xy_sum = 0
    xxy_sum = 0
    for xx, yy in data:
        y_sum += xx
        xy_sum += xx*yy
        xxy_sum += xx*xx*yy
    xbar = xy_sum / y_sum
    x2bar = xxy_sum/y_sum
    std_dev = (x2bar - xbar**2)**0.5
    
    # print the results
    print("Mean:   ", xbar)
    print("Std Dev:", std_dev)

    print("Analysis successful!")

analyze()

******************************
    Starting the Analysis!    
******************************
Mean:    4.272253258845437
Std Dev: 2.2108824184193927
Analysis successful!


How can we improve this code? Our `analysis` function is really doing three things:
1. Creating fake data
2. Calculating some statistics
3. Printing the the status and results

Each of these things can be put in a separate function.

In [60]:
def generate_fake_data():
    x = [4.1, 2.8, 6.7, 3.5, 7.9, 8.0, 2.1, 6.3, 6.6, 4.2, 1.5]
    y = [2.2, 5.3, 6.3, 2.4, 0.1, 0.67, 7.8, 9.1, 7.1, 4.9, 5.1]
    data = list(zip(x, y))
    data.sort()
    return data

def calculate_mean_and_stddev(xy_data):
    y_sum = 0
    xy_sum = 0
    xxy_sum = 0
    for xx, yy in data:
        y_sum += xx
        xy_sum += xx*yy
        xxy_sum += xx*xx*yy
    xbar = xy_sum / y_sum
    x2bar = xxy_sum/y_sum
    std_dev = (x2bar - xbar**2)**0.5
    return xbar, std_dev

def analyze():
    data = generate_fake_data()
    mean, std_dev = calculate_mean_and_stddev(data)
    print("Mean:   ", mean)
    print("Std Dev:", std_dev)
    
analyze()

Mean:    4.272253258845437
Std Dev: 2.2108824184193927


We note three important results of this code restructuring:
1. It is much easier to tell at a glance what `analyze()` does.
2. The comments (which we used to organize our code before) are no longer needed.
3. `generate_fake_data()` and `calculate_mean_and_stddev()` can now be reused elsewhere.

## Functions should do one thing

We now know we should break up big functions into smaller ones, but how do we decide
how to break them up, and how small should they be?

A useful principle for guiding the creation of functions is that functions should
do one thing. In the previous section, our large `analysis()` function was doing
several things, so we broke it up into smaller functions.

But wait! You may notice that `calculate_mean_and_stddev()` does two things! Should we
break it up into two functions, `calculate_mean()` and `calculate_stddev()`?
The answer depends on two things:
1. Will you ever want to calculate the mean and standard deviation separately?
2. Will splitting the function into two result in a large amount of duplicated code?

Another important consequence of of the "do one thing" principle is that it can help
you avoid cases where a function does more than what you would expect it to do.

For example, this function claims to just write data to a file; however, it also modifies the data!

In [62]:
def write_data_to_file(data, filename='data.dat'):
    with open(filename, 'w') as f:
        data *= 2
        f.write(data)

Try to imagine a much larger code where you have
a factor of two introduced, and you can't figure out where it came from.

# Classes

## When to use classes

## Encapsulation

## When to use inheritance