# Writing longer programs

This notebook covers:

* Modules and namespaces
* `import` statements and `PYTHONPATH`
* Organizing your code by making your own modules and packages
* Command line arguments

## Program Structure

Here is a representative skeleton of a long program, showing almost everything we're going to cover.

```python
# Import statements
import stuff

# Variables at the module level
DEFINE_STUFF = 1

# Class definitions
class Stuff:
    # Functions within the class definition
    def __init__(self,arguments...):
        "stuff"
    def function_in_class(self,arguments..):
        "stuff"
        return
        
# Function definitions
def function_in_module(arguments...):
    "stuff happens in the function"
    return stuff
   
if __name__ == '__main__':
    "stuff to run from the command line"
```


A `.py` file defines a scope for a **module**. Python has a simple structure of **one module per file**. 

A module is made up of **module-level statements** which can include any valid Python, including assignments to variables and **function** and **class** definitions. Class and functions definitions are explained in a separate notebook.

A module can access the functions, classes and variables defined in any other module using `import` statements. (Different ways of writing this `import` statements change what exactly gets imported and how the stuff from other files appears can be used in the module that does the `import`.)

Any code in a module (i.e. file) outside function and class definitions will be executed when the module is `imported`. You can define a special block (`if __name__ == '__main__':`) to separate code that should be run only if the module is called from the command line, not when that module is imported by another one. In this way each python file can contain both a self-contained program and a library of functions for other modules to use.

The rest of the notebook explains these ideas in more detail.

## Modules, packages and namespaces

Modules are individual files containing sets of variables, classes and functions that can be included in other Python scripts with the `import` statement. Packages are collections of related modules. We've already seen several examples, like `os`, `sys` and `math`, which are part of the Python standard library.

Modules define a **namespace** just like functions and classes, so you can use the name of the module followed by a `.` to distinguish functions and variables defined in the scope of the module from variables (possibly with the same name) defined in other scopes. For example:

In [29]:
pi = 3.0

import math
print('In the local namespace, pi=%f'%(pi))
print('In the math module namespace, pi=%f'%(math.pi))
print('pi == math.pi? %s'%(pi==math.pi))

In the local namespace, pi=3.000000
In the math module namespace, pi=3.141593
pi == math.pi? False


**Detail:** 'namespace' and 'scope' are very similar ideas and it's common for non-experts (like me) to [use them interchangably](https://softwareengineering.stackexchange.com/questions/273302/what-is-the-relationship-between-scope-and-namespaces-in-python). A *namespace* is basically a dictionary in which the keys are strings (the names of things) and the values ae the things themselves. The *scope* is a set of rules for deciding which namespace to refer to when you use a particular name at a particular point in the program.

You can choose to import specific items from a module into the local namespace using `from x import y`:

In [6]:
import math
from math import log10
print(log10 == math.log10)
print(log10(2.0))

True
0.3010299956639812


In this example, `log10()` will have a (tiny) speed advantage over `math.log10()` because it doesn't have to go through the step of asking 'what function does the math module call `log10`?' each time. But the main reason to do this is not speed, just not having to type `math.` every time you want to call `log10()` (which could make the code clearer or not, depending on the context).

You can also rename modules and things you import from modules. This is useful if the module has a long name, or you want to remember which module a particular function comes from or avoid conflicts.

In [31]:
import math as m
m.log10(2)

sqrt = lambda x: 'S.Q.R.T.: %s'%(x)

from math import sqrt as math_sqrt

print(sqrt(2))
print(math_sqrt(2))


S.Q.R.T.: 2
1.41421356237


For example, it is extremely common to see the `numpy` module imported like this:

In [10]:
import numpy as np

You *can* also import _everything_ from a module into the local namespace. Usually **you should not do this**.

In [11]:
from math import *  # Imports _everything_ from the math module into the local namespace.
sqrt(2)

1.4142135623730951

Making effective use of namespaces is one reason to avoid using `from module_x import *` (unless you have a very good reason). It makes it much easier to see where functions are coming from, and it avoid accidentally re-assigning (or unknowingly defining) local variables with whatever happens to be in the module you're importing. 

In [12]:
pi = 3.0

from math import *
print('In the local namespace, pi=%f'%(pi))
print('In the math module namespace, pi=%f'%(math.pi))
print('pi == math.pi? %s'%(pi==math.pi))


In the local namespace, pi=3.141593
In the math module namespace, pi=3.141593
pi == math.pi? True


Packages can organize related modules. For example, the `os` package contains the `path` module to do things with the file system:

In [13]:
import os
print(os.path.join('hello','world'))

import os.path as op
print(op.join('hello','world'))

hello/world
hello/world


`import` statements can go anywhere in Python files, but they're usually grouped together at the top. The scope of names from imported modules depends on where the import statement appears.

In [25]:
import math as m

def matrix_loop():
    for n in range(0,5):
        for m in range(0,5):
            print('%2d'%(m)),
        print('')
    print('Inside matrix_loop, m is: %s'%(type(m)))
    return

matrix_loop()
print('In the toplevel scope, m is: %s'%(type(m)))

 0
 1
 2
 3
 4

 0
 1
 2
 3
 4

 0
 1
 2
 3
 4

 0
 1
 2
 3
 4

 0
 1
 2
 3
 4

Inside matrix_loop, m is: <class 'int'>
In the toplevel scope, m is: <class 'module'>


##  `globals()` and `locals()`

These a builtin functions that are occassionaly useful. They return dictionaries, the keys of which are all the varaibles defined in the global (top level) and local scope, respectively.

In [26]:
x = 5 
print(globals()['x'])
print(locals()['x'])

def a_function(x):
    print('Inside a_function: %d and %d'%(globals()['x'],locals()['x']))
    
a_function(42)
print(globals()['x'])
print(locals()['x'])

5
5
Inside a_function: 5 and 42
5
5


Sometimes I've used this as a way to call particular functions chosen at runtime -- for example, depending on some input from the user, or a command line parameter. You could do the same thing with `if` statements, as long as you know the name of each function when you write the program...

In [27]:
def function_a(): print('Function A!')
def function_b(): print('Function B!')
def function_c(): print('Function C!')
    
def pick_a_function(letter):
    function = 'function_%s'%(letter.lower())
    if function in globals():
        globals()[function]() # Note the brackets here
    else:
        print('No such function: %s'%(letter))

for x in 'abcde':
    pick_a_function(x)

Function A!
Function B!
Function C!
No such function: d
No such function: e


## Organizing your code by making your own modules and packages

Making a module is as easy as making a file with some valid Python in it and giving it the `.py` extension. With Python there is no real difference between 'modules' (libraries of classes functions, if you like) and 'main' programs (scripts). 

To make a package, first put one or more files with python code (modules) in a directory with the name you want for the package, and then **write an empty file called `__init__.py` in that directory**. This is another example of `__` indicating some behind-the-scenes magic in Python. The existance of that file signals to Python that the directory contains some importable modules.

The following is a little shell script that creates a package in the directory `mypackage`. The package has one module and the module has one function. Run this first, then the following examples will demonstrate how Python works with this module.

In [18]:
# Write the module using the shell
!mkdir mypackage
!rm mypackage/*.py*
!echo
!echo 'def myfunc():\n    print("This is a function in my module!")' >> mypackage/mymodule.py
!cat mypackage/mymodule.py
!touch mypackage/__init__.py
!echo
!tree ./mypackage
!echo

mkdir: mypackage: File exists

def myfunc():
    print("This is a function in my module!")

./mypackage
├── __init__.py
├── __pycache__
│   ├── __init__.cpython-36.pyc
│   └── mymodule.cpython-36.pyc
└── mymodule.py

1 directory, 4 files



Now some python:

In [19]:
import mypackage.mymodule as my
my.myfunc()

This is a function in my module!


You can read about what the `__init__.py` file does, and more about how to make packages [here](https://docs.python.org/3/tutorial/modules.html#packages).

### PYTHONPATH

Python has a list of directories that it searches (in order) for modules when you `import something`. You can get access to this list from inside python from the `sys` module:

In [21]:
import sys
sys.path

['',
 '/Users/andrew/python/miniconda2/envs/astro3/lib/python36.zip',
 '/Users/andrew/python/miniconda2/envs/astro3/lib/python3.6',
 '/Users/andrew/python/miniconda2/envs/astro3/lib/python3.6/lib-dynload',
 '/Users/andrew/python/miniconda2/envs/astro3/lib/python3.6/site-packages',
 '/Users/andrew/python/miniconda2/envs/astro3/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg',
 '/Users/andrew/python/miniconda2/envs/astro3/lib/python3.6/site-packages/IPython/extensions',
 '/Users/andrew/.ipython']

It's not always easy to work out where all these entries come from, but that doesn't matter. 

What matters is being able to tell Python where to find packages that you have installed (which are usually grouped in a small number of places) and packages/scripts you have written yourself (which could be anywhere).

Most of the entries in `sys.path` are either defaults (inncluding the currnt directory) or come from a shell environment variable called `PYTHONPATH`.

The environment variable `PYTHONPATH` works in the same way as the shell's `PATH`, but for Python files. Like `PATH`, it's supposed to be set by the user, so by default it's empty. Any directory in `PYTHONPATH` (entries are separated by '`;`') will end up in `sys.path`, with higher priority than all the Python default search paths.

This is why I said you need some understanding of how shell environment variables to work effectively with Python.

To make good use of this, it's a good idea to keep your own modules (i.e. code that you want to include in other projects with `import`) under a directory like `~/projects/code/python`. For example, you might have two modules like  `~/projects/code/python/plot_tools` and `~/projects/code/python/astro_routines`. If you set `PYTHONPATH` (for example, in `~/.profile`) with something like
```bash
export PYTHONPATH=~/projects/code/python
```

then you will be able to
```python
import plot_tools
import astro_routines
```

in any Python script (provided those directories are marked as Python modules by including an `__init__.py` file).

Tools like `pip` and `conda` have their own way of getting the places where they install files into `sys.path`.

The current working directory is always the first place python will look, even if it's not on `sys.path`.

`sys.path` is a list that is made when your script runs, so if all else fails (or you just want a quick hack) you can modify it directly.

In [None]:
sys.path.append('/my/custom/path')
sys.path

Behind the scenes, both IPython and Jupyter add some paths of their own to `sys.path`.

## Command Line Arguments

Often you want to write programs that can be run from the command line and given some arguments by the user, for example the name of an input data file to read in and process, or the name of a file to write output to.

You can use the list in `sys.argv` to get any arguments from the command line as strings. A bit like `sys.path`, `sys.argv` is a list created by Python when it starts to execute a script. 

    - sys.argv[0] is always the name of the script
    - sys.argv[1] is the first command line argument (if there are any)
    - sys.argv[2] is the second command line argument, etc.
    
Anything much more complicated than using `sys.argv[1]` directly (for example, having optional arguments, variable numbers of arguments, arguments that are lists, etc. etc.) is better done with the `argparse` standard library module

In [29]:
import argparse 

parser = argparse.ArgumentParser()
parser.add_argument('input_file')
parser.add_argument('-o','--option', default=42, type=int)
parser.add_argument('-t','--logical_option', action='store_true')

args = parser.parse_args(['myfile.txt','-o','42'])

print(args)
print(args.input_file)
print(args.option)

Namespace(input_file='myfile.txt', logical_option=False, option=42)
myfile.txt
42


In normal usage, the above would usually appear in the `if __name__ == '__main__'` block, and `sys.argv` would be passed as the argument to `parser.parse_args`. Here is the example from the Functions and Classes notebook:

```python
if __name__ == '__main__':
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument('n_bears',type=int,default=10)
    args = parser.parse_args()
    
    report_random_bear_events(args.n_bears)
``` 

`argparse` has a lot of features for optional arguments, lists of arguments, argument data types, default values etc. etc., so using it effectively requires a careful reading of the standard library documentation for this module.

## End of notebook