<img src="images/Picture0.png" width=200x />

# Notebook 07 - Elements of Software Engineering
In this module you will learn the basics of software engineering and best practices typically used in the industry. While a jupyter notebook is a great tool for interactive scripting, writing tutorials, as well as code prototyping, larger projects are typically characterized by Python modules shared amongst team members. In view of facilitating the integration of the multiple contributors involved in a given project, it is often necessary adopting the following best practices.

## Instructions
Read the material below and complete the exercises.

Material covered in this notebook:

- Understanding Coding Standards
- The need for documentation
- Source Code Management - Version Control and git
- Working from the command line of a terminal
- Testing your code - the importance of unit testing and regression testing

### Pre-requisites
Notebooks 01 to 05

### Credits
Original version by Martin-D. Lacasse @ INMAS



## Coding Standards and Documentation
As Python is a very flexible language, there are often multiple ways to achieve the same goal. For example, a simple loop can be coded by using an index,

```python
for i in range(len(x)):
```
or using an iterator,
```python
for item in x:
```
or even with a forever loop such as
```python
while 1:
```
with an appropriate `break` condition.

### Exercise
<div class="alert alert-block alert-info">
- It is recommended to use iterators whenever possible. In a few situations, however, iterators can lead to problems. What are these cases?
</div>

The purpose of this introduction is not to propose a given coding standard, but rather to make you aware that coding standards exist and that you should (1) think about adopting a given style in your own Python scripts and (2) inquire about the existence of such standard in your future assignments. A good place to start is to read the de facto __[guide to Python scripting](https://peps.python.org/pep-0008/)__ (known as PEP 8) from the author of Python himself.

### Exercise
<div class="alert alert-block alert-info">
- Take a few minutes to browse the PEP 8 and discuss the situation when a line break is required for an expression too long to fit on a single line. Should the line break be before or after a binary operator?
<br>- What is the recommended line length for Python code? Why?
<br>- Discuss your pet peeves if any.
</div>

Coding standards are intrinsically boring but are a necessary tool in team environments. Fortunately, __[automated tools](https://github.com/life4/awesome-python-code-formatters)__ exist for reformating the code in order to comply with parts of the standard involving where to put spaces (or not). As an example, using `yapf` (yet another Python formatter, from Google) on the following (ugly) code:

```python
x = {  'a':37,'b':42,

'c':927}

y = 'hello ''world'
z = 'hello '+'world'
a = 'hello {}'.format('world')
class foo  (     object  ):
  def f    (self   ):
    return       37*-+2
  def g(self, x,y=42):
      return y
def f  (   a ) :
  return      37+-+a[42-x :  y**3]
```

gets reformated into

```python
x = {'a': 37, 'b': 42, 'c': 927}

y = 'hello ' 'world'
z = 'hello ' + 'world'
a = 'hello {}'.format('world')


class foo(object):
    def f(self):
        return 37 * -+2

    def g(self, x, y=42):
        return y


def f(a):
    return 37 + -+a[42 - x:y**3]
```

Another unpopular topic is documentation. Prototypical codes are generally poorly documented as the code is in a constant state of change. Python offers mutliple ways by which code can be documented as it gets written, specifically named arguments and doc strings for functions and modules. Make it a habit to use descriptive names for variables and named arguments. The main customer of your code is you: make sure you will be able to understand your own code 6 months after not seeing it.

In the following cell, we define a function using doc string.

In [None]:
def double(x):
    '''Return twice the number provided in the argument'''
    return 2*x

The text gets stored in the variable '\__doc__' associated with the function.

In [None]:
double.__doc__

Multiple lines can also be used such as in...

In [None]:
def add_binary(a, b):
    '''
    Return a string representing the sum of two decimal numbers in binary digits.

            Parameters:
                    a (int): A decimal integer
                    b (int): Another decimal integer

            Returns:
                    Binary string of the sum of a and b
    '''
    binary_sum = bin(a+b)[2:]
    return binary_sum

In [None]:
add_binary(1024, 1)

In [None]:
add_binary.__doc__

In [None]:
print(add_binary.__doc__)

## Version Control
The Jupyter Notebook allows for checkpoints and the ability to recall previous versions. In team environments, `git` is often the preferred tool for incorporating contributions from the various team members. The basic functionality of `git` is relatively simple to master and might even be useful to you when you start writing your thesis in LaTeX. A good place to start is to read a __[short tutorial](https://git-scm.com/doc)__ on git and install it on your computer. This __[web site](https://git-scm.com/)__ is the main resource for `git` source code management tool. There are also multiple websites (e.g, __[github](https://github.com/)__, __[gitlab](https://about.gitlab.com/)__, ...) that can host your project for free. Industries most often use an internal server for that purpose.

### Optional Exercise
<div class="alert alert-block alert-info">
- Consider installing 'git' on your computer. Download the notebooks using `git download` from a terminal. 
</div>

## Modules, Parameter Files, and Command Line Basics
Software reusability can significantly reduce development costs. The main mechanism by which code can be re-used in Python is through modules. In this section we will go beyond the Jupyter notebook and implement a few modules from the command line. To start, open a terminal window using the Anaconda powershell on Windows or the equivalent on macOs or Linux. Let's first see if Python is in your path by running:
```
python --version
```
Note that on Windows, a shell started from the Anaconda Navigator will have the PATH variable properly configured. 

Good! Next let's change to a directory to where you stored the files from the Workshop and where we will write our own modules. If you have never used a command line, the following commands will get you 90% of the work done:
- 'pwd' Print working directory
- 'cd where' Change directory to 'where'
- 'mkdir newdir' Make a new directory called 'newdir'
- 'ls' List the files contained in the current directory
- 'mv' Move or rename a file

A Python script is run as:
- 'python main.py'
<br>or, if you want to pass arguments to the script
- 'python main.py arg1 arg2 ...'

Most scripts require (re-)configurable parameters to run. When using single-file Jupyter notebook, these values might end up being hardcoded in the file. For re-usability of the code, it is beneficial to isolate the parameters that will make the algorithm more universally re-usable. There are multiple ways to pass configuration values to a Python script, generally:
1. Through a parameter file
2. Through command line arguments
3. Through a Graphical User Interface

Here, we will mainly address 1 and 2 and leave the GUI topic for an advanced exercise.

The following simple code reads a parameter file and creates and assigns variables as described in the file.

In [None]:
def readParameters(filename):
    '''Read run-time parameters from a text file'''
    file = open(filename, 'r')
    for line in file:
        variable, value = [word.strip() for word in line.split('=')]
        variable.replace(' ', '_')
        pythoncode = variable + '=' + value
        exec(pythoncode)

Notice how this function generates new code as it reads the parameter file. This is one of
the benefits of Python being a dynamically-typed, interpreted language.

### Exercise
<div class="alert alert-block alert-info">
- Read module Code_07/params.py as an example on how that could be implemented.
<br>- Using 'try' and 'except', modify the function above to better handle errors
<br>- Print error message on standard error 'stderr'
<br>- (Optional) Modify the code to store values in a dictionary with keys (test your code with file "parameters.txt" provided to you)
</div>


### Putting it all together
We will now run the following script from a command line. For that purpose, we provide you with a file called *main_1.py* located in the *Code_07* directory. This file looks as follows:
```python
#!/usr/bin/env python3
'''
A prototypical main file demonstrating how to read parameters from a file
Martin-D. Lacasse, JHU 2022
'''

import sys
import params

# Print Usage
def printHelp(name):
   print("Usage: ", name, "filename.par")
   sys.exit(1)
     
def run():
   try:
      filename = sys.argv[1]
   except:
      printHelp(sys.argv[0])
    
   myDico = params.readParameters(filename)
   params.printParameters(myDico)                                                                                                                                                                                                                    #####################################################################
# This is the main program
if __name__ == '__main__':
    run()
else:
    print('Error: Can't import main script as a module.', repr(__name__))      
```
Notice how the functions definition is separated from the main part of the program, and how this code cannot be imported as a module. This practive will force you to modularize your code and design an architecture that can be more easily understood and maintained.

By going to the file tab of Jupyter, navigate to the *Code_07* directory and open the `main1.py` file in a separate tab. Then the file can be edited in plain, vi, or emacs mode, depending on your preferred code editor. Also look at the file `params.py` which is imported here.

We can now run this main file from Jupyter using the *bang* (!) operator as in the following command:


In [None]:
!python Code_07/main_1.py Code_07/parameters.txt

Or one can also run it from a terminal by using
```shell
python main_1.py parameters.txt
```
from the Code_07 directory.

<div class="alert alert-block alert-warning">
- (Advanced) On Linux and macOs, the script can be run directly in a terminal using the script name `./main_2.py parameters.py'. Depending on the security configuration on your Windows computer, the ability to run Python scripts directly in a terminal can be disallowed unless changes are made to the registry.
</div>

### Arguments passed on the command line
Another way to pass parameters is through command-line arguments. This is typically achieved using the getopt() function from the C standard library made available through the *getopt* module in Python. The following file `main_2.py` shows a typical usage of the `getopt()` function.

```python
#!/usr/bin/env python3

# A prototypical main file with parameters from command line arguments
# Martin-D. Lacasse, JHU 2022

import sys
import getopt
import params

# Print Usage
def printHelp(name):
    print("Usage: ", name, "-[h] [-a a_param] [-b b_param] [-c c_param] [-s _sourceCode] [-d DesiredOutcom] -f [file]")
    sys.exit(1)

# Parse options from command line
def processCommandLineArgs(argv):
    progName = argv[0]
    argList =  argv[1:]
    # Default values
    a, b, c, sourceCode, desiredOutcome = 0, 0, 0, 'none', 'failure'

    # Options
    options = "ha:b:c:s:d:"

    # Long options for parameters
    longOptions = ["help", "a=", "b=", "c==", "d=", "s="]

    try:
        # Parsing arguments
        opts, vals = getopt.getopt(argList, options, longOptions)

        # Checking each argument
        for opt, val in opts:
            if opt in ("-h", "--help"):
                printHelp(progName)
            elif opt in ("-a", "--a"):
                a = int(val)
            elif opt in ("-b", "--b"):
                b = int(val)
            elif opt in ("-c", "--c"):
                c = int(val)
            elif opt in ("-s", "--sourceCode"):
                sourceCode = val
            elif opt in ("-d", "--desiredOutcome"):
                desiredOutcome = val

    except getopt.error as err:
            print(str(err))
            printHelp(progName)
            sys.exit(2)

    # print("Opt is", opt)
    # print("Val is", val)
    return a, b, c, sourceCode, desiredOutcome

def run():
    a, b, c, sourceCode, desiredOutcome = processCommandLineArgs(sys.argv)
    print('Arguments are:', 'a=', a, 'b=', b, 'c=', c, 'sourceCode=', sourceCode, 'desiredOutcom=', desiredOutcome)


#####################################################################
# This is the main program
if __name__ == "__main__":
    run()
else:
    print("Error: Can't import main script as a module.", repr(__name__))
```
This script will read parameters from the command line and override the default values defined when the variables are first initialized. Notice how the strings provided through the command line need to be converted to int.

### Exercise
<div class="alert alert-block alert-info">
- Run the script through the following cell. 
<br>- Try different values, different parameters in both the long and short forms.
<br>- Try entering a parameter that does not exist, e.g, `-z`.
</div>


In [None]:
!python Code_07/main_2.py -a 2 -b 3

Script `main_2.py` is a good starting point to read parameters from the command line. However, it has many deficiencies: The code is not re-usable as all default parameters are hard-coded in the function, including their names. This is definitely a handicap compared to the dictionary approach used in main_1.py. Also, parameters are mixed types, limiting re-usability. Let's fix this.

### Optional Exercise
<div class="alert alert-block alert-info">
- Duplicate `main_2.py` as `main_3.py` and modify the file to use a dictionary to parse and store parameters. Use your favorite editor and use the same trick to generate code as in the first params module. Create a new module called `params2.py` that will contain your new function and make it re-usable.
<br>- Incorporating the original function from first 'params.py', implement functionality of reading from either a parameter file and/or command line arguments. Which mode should have precedence? Should a parameter override warning be given to the user?
</div>

### Optional Exercise
<div class="alert alert-block alert-info">    
- Include a way to distinguish between optional and required parameters (Hint: use a reserved default value, e.g. `undef`, for required parameters).
<br>- Allow for comments in the parameter file by detecting and ignoring lines with a leading '#' character.
<br>- Implement the capability of recursively including other files using the '@include' keyword. Should you test for circular inclusions through a call depth counter?
</div>

Modularisation is an important method which reduces the size of your code. Always remember that a line of code is a liability more than being an asset: less is always better. Every time you do a cut and paste should raise a red flag in your mind for a missed opportunity for a function or a module.

## Unit and Regression Testing
Once we develop our own modules, it is important to define tests that verify the conditions of use of our new algorithms. In most team environments, contributors are asked to run all tests before pushing their changes to the common code repository. As an additional incentive, source code management tools (subversion, git, etc.) have a 'blame' functionality for assigning the responsibility of a broken code to the individual who made the faulty changes. The action to test a function in isolation is called __unit testing__ while the action of testing the ensemble, i.e., how the functions interact with one another, is called __regression testing__. Let's look at our newly created module to read parameters as an example.

A robust parameter reader should detect
- duplicate entries
- missing or poorly formatted assignments
- lines starting with a '#' and treat them as comments

We will now introduce the common approach to write unit tests in Python. These tests are themselves boolean functions starting with the 'test_' keyword and typically living in a separate file located in the same directory as your module. In the interest of time, we will use simple numerical examples to illustrate how to proceed. Before we start, however, we need to discuss an important point about float comparison. Let's start with a case where it works: Say that we have the following function which normalizes a 3D vector that we represent as a tuple. (This function is already available in numpy and is only used as a short representative example here).

In [None]:
def normalize(v):
    from math import sqrt
    norm = sqrt(v[0]*v[0] + v[1]*v[1] + v[2]*v[2])
    if norm == 0.:
        v = (0., 0., 0.)
    else:
        v = (v[0]/norm, v[1]/norm, v[2]/norm)
    return v

Writing this function raises a few questions:
1. Should the vector be normalized in-place or should a new tuple be created (libraries typically implement both approaches and use past participle to distinguish between the two, e.g., 'normalize()' vs. 'normalized()'
2. A zero vector should be detected to avoid division by zero.

To answer the second question, we use a comparison involving floating numbers. We can get away with this comparison here, but in most other cases, we must use something more sophisticated. For that purpose, the concept of machine epsilon is necessary: this is the floating number which added to unity will return unity due to numerical inaccuraries. Running the following code will give an estimate of the value of machine epsilon on your system. 

In [None]:
eps = 1.0
while 1 + eps > 1:
    eps /= 2
eps *= 2

print('Your machine epsilon is: ', eps)

Because the numerator in the `normalize()` function decreases at a similar rate as the denominator, we only have a 'division by zero' problem when the vector is truely 0. Let's prove our case by trying to make our function to fail: 

In [None]:
normalize([eps, 0, 0])

In [None]:
normalize( (1.e-30, 1.e-21, 1.e-32) )

In [None]:
normalize((0., 0., 0.))

Great! A simple comparison with 0 seems to work. Let's now consider this simple equality (please run):

In [None]:
a = 0.1
b = 0.2
c = 0.3
a + b == c

Surprised? This comparison involves floating point numbers which representation is only valid down to an epsilon. A better way to test is as follows: 

In [None]:
abs((a + b) - c) < abs(c)*eps

Notice how we normalize on the right hand side as we want to make all comparisons relative to unity, as epsilon is defined with respect to 1. We are now ready to write our first unit test. In the interest of time, we will write these tests to verify existing functions. Say that you just wrote a faster trigonometric library. Let's write our first test. 

### Exercise
Run and see if you need to fix a few things...

In [None]:
import math
def mycos(x):
    '''Placeholder to define my own cosine function'''
    return math.cos(x)

def test_mycos1():
    '''Test mycos() with respect to inverse function over float range'''
    import random
    i = 0
    while (i < 10000):
        i += 1
        x = 2*random.random() - 1.
        assert(mycos(math.acos(x)) == x)
        
test_mycos1()

Congratulations on your first test! These test functions would typically reside in a module called, say, test_mytrig if your library was mytrig. Once written, these tests can be automated and run before each time you commit new code to make sure that your new shiny features have not broken anything in the existing software framework. Automation can be implemented with various tools, including the `unittest` library, but these topics are beyond the scope of this introductory tutorial. These libraries also have the benefit to provide broader comparison functions such as `assertEqual()` which detect the type of object (float, vs boolean, integer, tuple, dicts, etc.) and behave accordingly.

Testing is an important part of software development. It is generally thought of as a way to discover and prevent bugs. Another interesting approach of testing is design through testing, unlike the typical waterfall method, which consists of designing sofware to meet some pre-defined capabilitities and implementing it. __[Test-driven development](https://en.wikipedia.org/wiki/Test-driven_development)__ can be a great way to accelerate your project.

Given our time constraints, we will need jump to our next topic: debugging. At this point however, you should remember that:
- Coding standards do exist and you should inquire about best practices
- Teams use source code management tools such as `git` that can also be benificial to you in your PhD work
- Code is a liability more than an asset. Never cut and paste - use functions
- Modules are a great way to achieve re-usability. A long self-contained notebook is more appropriate for completing an assignment. Multiple modular files are better suited for devising, testing, and managing a scalable project.
- Drop the mouse for a sec! A little knowledge of command line functions can get you a long way. 
- Software development in teams is often governed by established software design approaches and human engineering methods.
- Test your software. These tests can also help you design your code.
- Take the habit of documenting your code: You'll be the first one to thank you.