# Prototyping VS production

When we are developing some functionality and some analysis, we might do it in the Jupyter notebook, or in another IDE, where we can run pieces of code separately. This is development stage, also called prototyping. However, when we finished creating some functionality, we don't want anymore to fiddle with the environment, change some parameters in the code manually, select line to run, etc. We want clarity, predictability and replicability from our code, because that is the cornerstone of good coding practices (incidentally, it is also the cornerstone of good science practices). For that reason we create scripts.

A script is a sequence of code lines which is run from top to bottom in one go, usually from console, without opening any development environment.

Scripts are essentially no different from the development code, except for the fact that, after you have finished developing them, they are intended to be run "as is", without modifying them every time you need to run them. So they need a different set of controls, as we will see further.

# Running a script

Let's run a very simple script in the console. In the course folder, create a file with name `numpy_version_test.py`. Paste the following content there:

In [4]:
%%writefile numpy_version_test.py
import numpy as np
print('This environment has numpy version {} installed'.format(np.__version__))

This environment has numpy as version 1.14.2 installed


Open the terminal in that folder and run

    python numpy_version.py
    
That's all there is to it!

# Arguments

During the development we may have some variables which values we might change depending on the result we want. It might be the name of the data file to be loaded, or the critical values of the bandpass filter. If we were to create a script which would process the data or do the filtering, we need to a way to give it certain parameters to use for this particular script run.

Python has a way of collecting anything you write after the name of the script in a special variable: you need to import module `sys` and its attribute `agrv` will have all of the strings as a list. Run the following cell, with IPython magic you can create files directly from the notebook!

In [14]:
%%writefile arguments_test.py
import sys
print('Submitted arguments are',sys.argv)

Overwriting arguments_test.py


With IPython magic you can give console commands directly from the notebook, just use `!` before the command. But I urge you to try running this in console first. Use consoles for large scripts, it will also teach you working with remote machines (you usually have only the console to control these).

Run in console:

    python arguments_test.py these will be arguments
    
Try running in the notebook:

In [29]:
!python arguments_test.py these will be arguments

Submitted arguments are ['arguments_test.py', 'these', 'will', 'be', 'arguments']


In this way you can submit the names of files, values for some functions, etc, to the script. Of course you will need to parse them inside the script!

# A more advanced way of submitting the arguments

Python has an in-build package `argparse`, which implements a smarted system of arguments parsing from the console. It can, for instance, restrict the data type or certain arguments and create default flags. It will also create automatic documentation for the script, based on the arguments descriptions. Consider the following example of parsing arguments from a real application (only the argument parsing part is presented).

*From now on I will only use `%run` to run scripts from the notebook, but you should try doing it from the console to get used to it.*

In [30]:
%%writefile argparse_test.py
import argparse

parser = argparse.ArgumentParser(
    description='Plots selected event'
)

# first argument is required, is of type `str` and can contain multiple values (nargs='+')
parser.add_argument(
    'event_names',
    type=str,
    nargs='+',
    help='Names of the events',
)

# second argument is optional (prefaced by --), by default is False (default=False), 
# but when invoked saves True (action='store_true')
parser.add_argument(
    '--track',
    default=False,
    action='store_true',
    help='Plot events on map of the track in addition to the time domain',
)

args = parser.parse_args()

print('Was asked to plot the following events:', args.event_names)
print('Plot on the map of the track:', args.track)

Overwriting argparse_test.py


In [31]:
!python argparse_test.py brake 

Was asked to plot the following events: ['brake']
Plot on the map of the track: False


In [32]:
!python argparse_test.py brake gas --track

Was asked to plot the following events: ['brake', 'gas']
Plot on the map of the track: True


It automatically create a very helpful doc for the use of the script:

In [33]:
!python argparse_test.py --help

usage: argparse_test.py [-h] [--track] event_names [event_names ...]

Plots selected event

positional arguments:
  event_names  Names of the events

optional arguments:
  -h, --help   show this help message and exit
  --track      Plot events on map of the track in addition to the time domain


And if the usage is bad, it will correct you:

In [34]:
!python argparse_test.py --track

usage: argparse_test.py [-h] [--track] event_names [event_names ...]
argparse_test.py: error: the following arguments are required: event_names


`argparse` is capable of much more. Check the documentation in case you need to use it: https://docs.python.org/3/library/argparse.html

I highly suggest investing time in using `argparse` for the scripts of any complexity. The only case in which it can skipped is if your script just does 1 thing and requires 1 simple and obvious argument and will not break if 2 arguments are given instead of 1.

# Running a sequence of scripts
Another reason script are useful is because you can create a higher-level system script ("bash script"), which will a sequence of scripts for you, with specified arguments. This is especially useful in case of machine learning application for 2 reasons: 
- large-scale machine learning may take a lot of time, so you might want to set a queue for processing different subjects (without bash scripts you would have to edit your lovely script which is already optimized for running a single subject and make it temporarily ugly to run many subjects)
- you will often work on remote servers, because your machine is too slow or doesn't have enough graphical memory; it is useful to create a sequence of commands for the remote computer and disconnect, letting it run; you might also make a small routine to email you once it finishes :) This will be more and more pervasive with the advent of computing cloud services by Amazon, Google and Microsoft.

Running a sequence of script is as easy as creating a `.sh` file with all the commands you want to run, like so:

In [26]:
%%writefile bash_test.sh
python numpy_version.py
echo # prints an empty line
python arguments_test.py blah blah
echo
python argparse_test.py --help
echo
python argparse_test.py gas --track

Overwriting bash_test.sh


Try running the file by typing in the console:

    ./bash_test.sh
    
You should see the out all of the previous functions, separated by a single empty line.

You can do it from Jupyter as well:

In [28]:
!./bash_test.sh

This environment has numpy version 1.13.3 installed
 
Submitted arguments are ['arguments_test.py', 'blah', 'blah']
 
usage: argparse_test.py [-h] [--track] event_names [event_names ...]

Plot selected event

positional arguments:
  event_names  Names of the events

optional arguments:
  -h, --help   show this help message and exit
  --track      Plot events on map of the track in addition to the time domain
 
Was asked to plot the following events: ['gas']
Plot on the map of the track: True
