# Introduction

That's a strange name for a topic - "defensive programming" - defence against what? Well, bugs of course. Experience has shown that in most projects the majority of development time is spent debugging. For this reason, it is highly desirable to follow practices that minimise the number of bugs. In this notebook I will introduce a number of tried and true methods for avoiding bugs. It does require extra work, but your future self will thank you.

This notebook is by no means an exhaustive review of all good practices, but it's a good place to start. The idea here is to introduce some important concepts, about which you can read more later, and adapt to your particular application.

# Design

The path to a less buggy code starts before you even touch your keyboard. I would go so far as to say that planning is as important, if not more important, than the actual coding. We are usually tasked with some complex problem we hope we can solve using a computer. The way to go about it is to divide this problem into multiple smaller and manageable problems. Unfortunately, there is no magic recipe that tells you the best way to do it. There are, however, a number of widely accepted design principles

* Resist the temptation to write everything yourself and opt for an existing library whenever possible

* [Keep it Simple, Stupid](https://en.wikipedia.org/wiki/KISS_principle)

* [Don't repeat yourself](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself)

* [Separation of Cencerns](https://en.wikipedia.org/wiki/Separation_of_concerns)

* [Single Responsibility Principle](https://en.wikipedia.org/wiki/Single_responsibility_principle)

* [Meaningful variable names](https://medium.com/coding-skills/clean-code-101-meaningful-names-and-functions-bf450456d90c)

* [Open/Closed principle](https://en.wikipedia.org/wiki/Open%E2%80%93closed_principle) - You should write function and classes in such a way that they can be extended without editing the source

* Be aware of the viewing space. Avoid line wrapping and functions whose bodies cannot fit a single screen.

* Choose and adhere to a coding standard

Another good source for ideas for good design are [Object Calisthenics](https://williamdurand.fr/2013/06/03/object-calisthenics/). In my opinion, they represent an unfeasible, unatainable ideal. However, Striving toward this ideal will more often than not improve your code.

Finally, you can also draw inspiration for good design from the so called Zen of Python

In [1]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


# Documentation

If you are working with other people, or intend for your code to be used by other people, then it goes without saying that you must document your code. However, I argue that you need to document your code even if you are the only one using it. The reason is that your future self will not remember how to use the code. 

As a rule of thumb, when it comes to documentation, more is better. The bare minimum for a documentation for a function is
1. A short description
2. Description of the arguments
3. Description of the output

Python [docstring](https://www.python.org/dev/peps/pep-0257/) feature makes the documentation available also via the interactive shell (see example below)

In [1]:
def is_even(num):
    
    """Checks if a number is even
    
    Parameters:
    num - An integer
        
    Returns:
    True if num is even, False otherwise    
    """
    
    return num%2==0

[is_even(4), is_even(3)]

[True, False]

The documentation can be shown using the ```help``` function

In [2]:
help(is_even)

Help on function is_even in module __main__:

is_even(num)
    Checks if a number is even
    
    Parameters:
    num - An integer
        
    Returns:
    True if num is even, False otherwise



I'd like to end with one word of caution. The code is alive while the documentation is not. Be sure to change the documentation when you change the code.

# Test Driven Development

Most people, if they test their code at all, write their tests after they've written the code. In this section I will make the case for writing the tests **before** you write the code. This approach is called test driven development. Besides catching bugs early on, the benefit for this approach is that it forces you to think about the interface, while most developers would be preoccupied with the implementation. Below is a demonstration of this approach, using the [unittest](https://docs.python.org/3/library/unittest.html) package.

Suppose we want to write a function that checks if an integer divides three. We begin with the declaration of a function that does not do anything, and a test that fails

In [3]:
import unittest

def is_div_three(num):
    
    pass

class TestIsDivThree(unittest.TestCase):

    def test_3(self):
        self.assertEqual(is_div_three(3),True)
    def test_4(self):
        self.assertEqual(is_div_three(4),False)

unittest.main(argv=[''], verbosity=2, exit=False)

test_3 (__main__.TestIsDivThree) ... FAIL
test_4 (__main__.TestIsDivThree) ... FAIL

FAIL: test_3 (__main__.TestIsDivThree)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-3-e1e85a88bf75>", line 10, in test_3
    self.assertEqual(is_div_three(3),True)
AssertionError: None != True

FAIL: test_4 (__main__.TestIsDivThree)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-3-e1e85a88bf75>", line 12, in test_4
    self.assertEqual(is_div_three(4),False)
AssertionError: None != False

----------------------------------------------------------------------
Ran 2 tests in 0.008s

FAILED (failures=2)


<unittest.main.TestProgram at 0x7ff9695f7e10>

The next phase is to implement the function. After this stage, hopefully all the tests pass. If not, then the function or the test need to be fixed.

In [4]:
import unittest

def is_div_three(num):
    
    return num%3 == 0

class TestIsDivThree(unittest.TestCase):

    def test_3(self):
        self.assertEqual(is_div_three(3),True)
    def test_4(self):
        self.assertEqual(is_div_three(4),False)

unittest.main(argv=[''], verbosity=2, exit=False)

test_3 (__main__.TestIsDivThree) ... ok
test_4 (__main__.TestIsDivThree) ... ok

----------------------------------------------------------------------
Ran 2 tests in 0.002s

OK


<unittest.main.TestProgram at 0x7ff9695575c0>

Another good rule of thumb for writing tests is that once a bug was found and fixed, one should write a test that makes sure that the  bug does not recur.

# Exceptions

One way to spot errors early on is by checking if the data is valid. This can be done using assertions

In [3]:
def div_by_four(num):
    
    assert(type(num)==type(1))
    
    return num%4==0

print(div_by_four(4))
print(div_by_four(1))
print(div_by_four('four'))

True
False


AssertionError: 

Python's default response to errors is to exit. However, sometimes we'd like to do something else. For example, we'd like the program to give us more information when there's error. Not only that, but we'd like information outside the scope of the function where the error occurred. This can be accomplished by try catch clauses

In [11]:
def take_sqrt(num):
    
    assert(num>=0)
    
    return num**0.5

# Suppose we get the following input from the user
user_input = [1,2,3,4,5,6,-7,8,9]

# We proceed to calculate the square root from all entries
for index, num in enumerate(user_input):
    
    try:
        print(take_sqrt(num))
    except AssertionError:
        print('Something wrong with entry at position '+str(index)+'. The value there is '+str(num)+'.')

1.0
1.4142135623730951
1.7320508075688772
2.0
2.23606797749979
2.449489742783178
Something wrong with entry at position 6. The value there is -7.
2.8284271247461903
3.0


# Lint

After you've already written your code, you can have another program go over the source code and try to find errors. This sort of program is called a static code analyser, or lint. We'll use the most popular one for python, called [pylint](https://www.pylint.org/). In the following example I'll show an issue that the python interpreter doesn't catch, but that pylint does

In [7]:
!cat bad_compare.py

def bad_compare(lhs, rhs):

    unnecessary = lhs - rhs

    return rhs == lhs

print(bad_compare(3, 3))


In [11]:
!python ./bad_compare.py

True


In [8]:
!pylint bad_compare.py

************* Module bad_compare
bad_compare.py:1:0: C0111: Missing module docstring (missing-docstring)
bad_compare.py:1:0: C0111: Missing function docstring (missing-docstring)
bad_compare.py:3:4: W0612: Unused variable 'unnecessary' (unused-variable)

------------------------------------------------------------------
Your code has been rated at 2.50/10 (previous run: 2.50/10, +0.00)



In this example, the code works just find, but pylint picks up on the fact that there is an unused variables

# Logging

All the discussion above assumes everything is working. But what do we do when it doesn't? The simplest way to diagnose the problem is by inserting ```print``` statements. Unfortunately, not only is this a messy way to go about it, after you fix the problem you need to scan the code and remove all the print statements. Luckily, there is a better way - logging. Logging allows you to control the amout of output you get from a function. Typically, you'd like to minimise the output in production mode, and make it verbose when trying to find a problem. Logging let's you do just that.

In [7]:
import logging

def calc_fibo(n):
    
    if n<3:
        return 1
        
    last_term = 1
    before_last = 1
    for i in range(2,n+1):
        logging.info(str(last_term))
        next_term = last_term + before_last
        before_last = last_term
        last_term = next_term
    return last_term

Logging off

In [57]:
logging.getLogger().setLevel(logging.WARNING)
calc_fibo(5)

8

Logging on

In [59]:
logging.getLogger().setLevel(logging.INFO)
print(calc_fibo(5))
logging.getLogger().setLevel(logging.WARNING)

INFO:root:1
INFO:root:2
INFO:root:3
INFO:root:5


8


The logging library can let you do more sophisticated things like printing out the time for each instruction, or write output to a file. See the documentation.

# Debugging

Debugging is the last resort you turn to when all else fails. It lets you examine the program while it is running, but it is excruciating to use. The default debugger for python is pdb, or ipdb to debug code in notebooks. The most common controls are:

* n - Next instruction
* s - Step into function
* b - Set breakpoint
* c - Continue until breakpoint
* q - quit

In [4]:
!pip install ipdb --user

Collecting ipdb
  Downloading https://files.pythonhosted.org/packages/6d/43/c3c2e866a8803e196d6209595020a4a6db1a3c5d07c01455669497ae23d0/ipdb-0.12.tar.gz
Building wheels for collected packages: ipdb
  Building wheel for ipdb (setup.py) ... [?25ldone
[?25h  Stored in directory: /home/nbuser/.cache/pip/wheels/59/24/91/695211bd228d40fb22dff0ce3f05ba41ab724ab771736233f3
Successfully built ipdb
Installing collected packages: ipdb
Successfully installed ipdb-0.12


In [8]:
import ipdb
ipdb.set_trace()

fibo_array = []
for n in range(5):
    fibo_array.append(calc_fibo(n))

--Return--
None
> [0;32m<ipython-input-8-9a7dcf5d2162>[0m(2)[0;36m<module>[0;34m()[0m
[0;32m      1 [0;31m[0;32mimport[0m [0mipdb[0m[0;34m[0m[0m
[0m[0;32m----> 2 [0;31m[0mipdb[0m[0;34m.[0m[0mset_trace[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0m
[0m[0;32m      3 [0;31m[0;34m[0m[0m
[0m
ipdb> b 6
Breakpoint 2 at <ipython-input-8-9a7dcf5d2162>:6
ipdb> c
None
> [0;32m<ipython-input-8-9a7dcf5d2162>[0m(6)[0;36m<module>[0;34m()[0m
[0;32m      4 [0;31m[0mfibo_array[0m [0;34m=[0m [0;34m[[0m[0;34m][0m[0;34m[0m[0m
[0m[0;32m      5 [0;31m[0;32mfor[0m [0mn[0m [0;32min[0m [0mrange[0m[0;34m([0m[0;36m5[0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0m
[0m[1;31m2[0;32m---> 6 [0;31m    [0mfibo_array[0m[0;34m.[0m[0mappend[0m[0;34m([0m[0mcalc_fibo[0m[0;34m([0m[0mn[0m[0;34m)[0m[0;34m)[0m[0;34m[0m[0m
[0m
ipdb> p n
0
ipdb> c
None
> [0;32m<ipython-input-8-9a7dcf5d2162>[0m(6)[0;36m<module>[0;34m()[0m
[0;32m      4 [0;31m[0mfi

# Version Control

Suppose you have a library that you tested and it is working. One day, you decide to refactor it. Maybe you want to optimise it and make it faster, or add more features. The problem is that whenever you change the code you are running the risk of breaking it. The best tool against this is version control. Version control lets you save multiple versions of your code, and load a previous version in case something breaks. It also lets you [collaborate more effectively](https://www.atlassian.com/git/tutorials/comparing-workflows). The most popular version control system today is [git](https://git-scm.com/doc?fbclid=IwAR1YxT3x6XCvpcLC1x4HCS7saF5hPAKkfr4t_IMlkk7tu9xN7FwDmgF4TBY), and the most popular repository hosting services are github and bitbucket. Below are the most commonly used commands:
* clone - Creates a copy of the repository on your local machine
* pull - Changes your local copy such that it would be the same as the one on the server
* commit - Records local changes locally
* push - Transmit commits to server

In addition, git also provides another set of commands that is especially useful for refactoring. The basic problem here is that between the current state of the code and the state of the code after refactoring, the code might be broken. This can be a problem, since you might also want to use the code in the mean while. The solution is to make two copies of the code, one that is operational, and another which is safe to break. This is called branching. After you are done making changes on the second copy and you are satisfied by the result, you can merge back the two copies of the code. The corresponding command are adequately called "branch" and "merge". The command "checkout" lets you switch between branches.

# Continuous Integration

Ideally, you'd like to test the code every time you make a change. However, doing this manually would be extremely tedious. Luckily for us, we can get a computer to do it for us. This practice is called continuous integration. The way it works is that you have a server that listens to the repository, and run a series of tests whenever a commit is pushed. One of the more popular continuous integration tools is Travis CI. Setting up continuous integration would take too long for this tutorial, but you can see an example that I've already set up [here](https://travis-ci.org/bolverk/huji-rich). If nothing else, continuous integration can help you when you argue with your colleagues about who broke the code, as seen in the example below.

<img src='travis_demo.jpg'>