In [1]:
# Use the new-style print function, don't worry about this
from __future__ import print_function

# Advanced Training

Olivier Iffrig (LMPA) <olivier.iffrig@cea.fr>

## Using libraries

One of the great advantages of Python is the variety of libraries available. Before thinking of building your own, check if what you want to do does not already exist.

The standard modules distributed with Python are described here: http://docs.python.org/2.7/library/index.html

In [2]:
# Easter egg: try to uncomment the following line, and execute
#import antigravity

There are two ways of importing modules

 * importing the whose module as a separate namespace (recommended)
 * importing specific parts in the global namespace (use with care)

### Import a whole module

This is the most standard way to use a module.

In [3]:
import math

print("We just imported the 'math' module, its type is {0}.".format(type(math)))

print("pi = {0}".format(math.pi))

We just imported the 'math' module, its type is <type 'module'>.
pi = 3.14159265359


We just created a *module* object named *math* inside the current name space. It contains several variables and constants (see http://docs.python.org/2.7/library/math.html for the reference). We did **not** define them in the global namespace.

In [4]:
pi

NameError: name 'pi' is not defined

Sometimes you may want to make a module name shorter. Or only keep part of the name of a module living deep in a hierarchy, like *fits* instead of *astropy.io.fits*.

In [5]:
import math as M # This is just an example, it is not really useful here

print("We imported the 'math' module as 'M', are 'math' and 'M' the same?")
print(math is M)

We imported the 'math' module as 'M', are 'math' and 'M' the same?
True


As you may see later, some abbreviations are widely used. But remember that people reading your code may not understand your abbreviations, so avoid using too many of them.

### Import some parts of a module

When you want to import a single function from a module, it can be simpler to import it directly into the global namespace.

In [6]:
from math import sqrt

print("The square root of 3 is {0}".format(sqrt(3)))

The square root of 3 is 1.73205080757


Again, do not import too many functions like this. It reduces the readability of your code.

Sometimes you may see this:

In [7]:
from os import *

This imports everything the module *os* defines into the global namespace.

<font color="red">Do NOT do this. You may override some functions and get strange errors.</font>

In [8]:
f = open("dummy.txt", "r") # We wanted to use the built-in 'open'
                           # but it has been overridden by os.open

TypeError: an integer is required

## Defining your own functions

Functions are the basic building block of procedural programming. They are the simplest way to make code reusable.

A function definition looks like this:

```python
    def func(arg1, arg2, ..., argN):
        code # Notice the indentation
```

Let's define a function returning the square of any number.

In [9]:
def square(x):
    """Compute the square of x""" # <-- this is called a docstring
    return x**2

The `return` keyword is a special statement, which causes the function to terminate. The expression following the `return` keyword is the result of the function (if there is no `return` or if `return` has no argument, the result is `None`).

In [10]:
print("The square of 4.25 is {0}".format(square(4.25)))

The square of 4.25 is 18.0625


A function may have multiple arguments:

In [11]:
def add(x, y):
    """Compute the sum of the two arguments"""
    return x + y

Some arguments may be optional. You can specify their default value. Note that they have to be defined **after** the required arguments.

In [12]:
def foo(x, y=1):
    """Some complicated function"""
    if x > 0:
        return x - y
    else:
        return x + y

print("foo(2) is {0}".format(foo(2)))
print("foo(-1, 3) is {0}".format(foo(-1, 3)))

foo(2) is 1
foo(-1, 3) is 2


If you have several optional arguments, you can specify only part of them by giving their name.

In [13]:
def bar(x, y=1, z=2):
    return x * y + z

print("bar(1, z=3) is", bar(1, z=3))

bar(1, z=3) is 4


### Documenting functions

Most of the functions coming from modules you may import are documented. You can use the ```help``` function to see this documentation.

In [14]:
help(math.radians)

Help on built-in function radians in module math:

radians(...)
    radians(x)
    
    Convert angle x from degrees to radians.



Remember the ```square``` function we defined? It had a docstring.

In [15]:
help(square)

Help on function square in module __main__:

square(x)
    Compute the square of x



In [16]:
help(bar) # This one did not...

Help on function bar in module __main__:

bar(x, y=1, z=2)



Don't forget to document your code, it may help everyone, including you!

Docstrings are the stadard way to document your code. They are recognized by many IDEs, by the Python interpreter (the function has a ```__doc__``` attribute containing it), IPython and documentation generators such as sphinx.

Do not forget to document the parameters and return value of your functions, and specify the types. It may also help to list the exceptions your function may raise.

A guide about good documentation practices: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt

#### Note for IPython users

To access an object's documentation, you can also use

In [17]:
square?

In the IPython console, the documentation will be printed, in a notebook, it will show up at the bottom of your screen.
You can also see the source Python code (if it is available) with `??`.

In [18]:
square??

### More on functions...

#### Catch-all arguments

Python has a special syntax for functions accepting a variable number of arguments.

Arguments can be split into two categories:
 * Positional arguments: the required ones, where order matters
 * Keyword arguments: the optional ones, identified by their name

You can tell the Python interpreter to store the remaining positional arguments in a tuple by putting an asterisk in front of its name.

Similarly, keyword arguments can be stored in a dictionary by putting two asterisks in front of its name.

In [19]:
def dummy(*args, **kwargs):
    """Print the arguments"""
    print("Positional arguments:", args)
    print("Keyword arguments:", kwargs)
    print()

dummy()
dummy(foo=3)
dummy(1, 2, 3, bar=4)

Positional arguments: ()
Keyword arguments: {}

Positional arguments: ()
Keyword arguments: {'foo': 3}

Positional arguments: (1, 2, 3)
Keyword arguments: {'bar': 4}



You can use both named (required and optional) and catch-all arguments in the same function definition. Notice the order though.

In [20]:
def mix(x, y=1, *args, **kwargs):
    """Just an example mixing all argument types"""
    print("x = {0}, y = {1}, args = {2}, kwargs = {3}".format(x, y, args, kwargs))

mix(42)
mix("a", "b")
mix("a", "b", "x")
mix(2, y=1, c=4)
mix(1, y=3)
mix(1, d=8)

x = 42, y = 1, args = (), kwargs = {}
x = a, y = b, args = (), kwargs = {}
x = a, y = b, args = ('x',), kwargs = {}
x = 2, y = 1, args = (), kwargs = {'c': 4}
x = 1, y = 3, args = (), kwargs = {}
x = 1, y = 1, args = (), kwargs = {'d': 8}


This star notation can also be used the other way round, to specify arguments from a tuple and/or a dictionary:

In [21]:
d = {"u": 3}
mix(1, y=2, **d)

t = (1, 3, 7)
mix(*t, v=8)

mix(*t, **d)

x = 1, y = 2, args = (), kwargs = {'u': 3}
x = 1, y = 3, args = (7,), kwargs = {'v': 8}
x = 1, y = 3, args = (7,), kwargs = {'u': 3}


#### Anonymous functions

Sometimes, it would be overkill to define a new function, for instance when the code holds on a single line, and you use it only once, as an argument to another function.

You can specify small, anonymous functions, called *lambda* functions (like in lambda-calculus), with any number of arguments (including catch-all). The syntax is the following:

```python
lambda arg1, arg2: value
```

In [22]:
lst = range(6)
print(lst)
lst2 = map(lambda x: x**2, lst)
print(lst2)

[0, 1, 2, 3, 4, 5]
[0, 1, 4, 9, 16, 25]


#### Arguments passed by reference

In Python, there are two main categories of types, mutable and immutable ones.

Most basic types are immutable: ```int, float, str, tuple, set```.

Compound types are mutable: ```list, dict, ...```, and also user-defined types.

Objects are passed by reference, which means that every in-place modification **to a mutable object** is visible once you leave the function:

In [23]:
def append_one(lst):
    lst.append(1)
    return lst

lst = [2, 3, "a"]
lst2 = append_one(lst)
print(lst, lst2)

[2, 3, 'a', 1] [2, 3, 'a', 1]


To modify an immutable object, you have to create a new one, breaking the reference. Thus, immutable objects passed down to functions behave as if they were passed by value.

In [24]:
def increment(x):
    x += 1
    return x

x = 2
y = increment(x)
print(x, y)

2 3


This behaviour is also true for default values. Note that default values are evaluated only once:

In [25]:
def append_one(lst=[]):
    lst.append(1)
    return lst

lst = append_one()
print(lst) # The value of the default argument of append_one is now [1]
lst2 = append_one()
print(lst, lst2)

[1]
[1, 1] [1, 1]


## Defining your own modules

A good practice when writing Python code is to make it reusable: defining functions, and grouping them into modules, which you can import whenever you need.

### Single-file modules

If you wrote a Python script (in a `.py` file), you already did! The simplest way to define a module is to create a `.py` file, where the name before the `.py` is the name of your module (this name must be a valid Python identifier, it should not contain any space, dot or hyphen, etc.).

In [26]:
%%file mymodule.py
"""My first Python module""" # <-- Remember about docstrings?
# You can (and should) also document modules using docstrings.

from __future__ import print_function

print("Imported, my name is '{}'".format(__name__)) # __name__ contains the name of the module if it is imported,
                                         # or "__main__" if it is executed directly
# Any code you may put here will be executed, try it!

def foo():
    """Hello world"""
    print("Hello, world !")

Writing mymodule.py


A quick remark about `__name__`. Python has several ways to do some introspection. Modules know their name, and also the file name in `__file__`. You can access this information from inside the module as well as from outside.

In some modules, you may see constructs like this:

```python
if __name__ == '__main__':
    do_something()
```

This allows you to do some actions only if the script file is executed directly (because is name is `'__main__'`), and not when it is imported. You can for example do some testing like that.

Now that we have created `mymodule.py`, we can use it like any other module.

In [27]:
import mymodule
help(mymodule)
mymodule.foo()

Imported, my name is 'mymodule'
Help on module mymodule:

NAME
    mymodule - My first Python module

FILE
    /Users/kosack/Copy/CEA/PythonWorkshop/workshop_material/04-AdvancedTraining-Olivier/mymodule.py

FUNCTIONS
    foo()
        Hello world

DATA
    print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0)...


Hello, world !


### Packages

Most of the time, a library consists of more than one single module. In that case, you can create a package containing your modules. It is just a directory, again, the directory name will be the name of the package.

Modules (.py files or subdirectories) contained in a package can be accessed directly by their name, using a dot as hierarchical separator:

In [28]:
import os.path
help(os.path)

Help on module posixpath:

NAME
    posixpath - Common operations on Posix pathnames.

FILE
    /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.py

MODULE DOCS
    http://docs.python.org/library/posixpath

DESCRIPTION
    Instead of importing this module directly, import os and refer to
    this module as os.path.  The "os.path" name is an alias for this
    module on Posix systems; on other systems (e.g. Mac, Windows),
    os.path provides the same operations in a manner specific to that
    platform, and is an alias to another module (e.g. macpath, ntpath).
    
    Some of this can actually be useful on non-Posix systems too, e.g.
    for manipulation of the pathname component of URLs.

FUNCTIONS
    abspath(path)
        Return an absolute path.
    
    basename(p)
        Returns the final component of a pathname
    
    commonprefix(m)
        Given a list of pathnames, returns the longest common leading component
    
    dirname(p)
        

This, however, only works if there is a file called `__init__.py` (notice the double underscores) in the directory. This file can be empty. Its contents is executed when you import your package.

In [29]:
%%bash
mkdir -p mypackage

In [30]:
%%file mypackage/__init__.py
"""Package sample"""

print("Hello from my first package!")

Writing mypackage/__init__.py


In [31]:
import mypackage

Hello from my first package!


You can then make submodules which contain specific features, and can be imported specifically by using their qualified name *packagename*`.`*modulename*. You main nest several levels of packages. Don't forget the `__init__.py`! A good practice is to re-import the most useful objects into `__init__.py`.

In [32]:
%%file mypackage/foo.py
"""A package submodule"""

from __future__ import print_function

def foo(x):
    """Fooes the argument"""
    print("Fooing", repr(x))

Writing mypackage/foo.py


In [33]:
import mypackage.foo
mypackage.foo.foo(42)

Fooing 42


### How import finds the module you want

Like your shell has a `PATH` environment variable to look for executables, the Python interpreter has the ```sys.path``` variable to look for modules. This variable can be modified directly (not recommended), or by setting the `PYTHONPATH` environment variable just like you would set your `PATH`.

In [34]:
import sys
print(sys.path)

['', '/Users/kosack/Copy/CEA/PythonWorkshop/workshop_material/04-AdvancedTraining-Olivier', '/Users/kosack/Source/Working/act-analysis', '/Users/kosack/Source/Working', '/Users/kosack/Source/Working/CommonSens', '/Users/kosack/Projects/HESS/OC', '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python27.zip', '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7', '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-darwin', '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac', '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac/lib-scriptpackages', '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk', '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-old', '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/readline', '/opt/local/Library/Frameworks/Python.framework/

The simplest way to make a module available is to put it in your working directory. However, you may want to make your packages available for many scripts, regardless of the working directory. To do that, you can either put the package into one of the system directories (not recommended), add a custom directory to your `PYTHONPATH` and use it as a custom package directory (better).

The best way to do it is to make an installable package with a `setup.py` installation script. See https://docs.python.org/2.7/distutils/index.html and https://python-packaging-user-guide.readthedocs.org/en/latest/index.html for more information.

### How to install a package

#### Anaconda

If you use the Anaconda Python Distribution, you can install many packages with the `conda` utility.
See http://docs.continuum.io/anaconda/pkg-docs.html for a list. The command to use is `conda install <package>`

#### Your OS' package manager

If you have a Python distribution installed with a package manager from your operating system, you should be able to install many libraries using it (this method requires administrator rights).

For instance, on Debian or Ubuntu, you can use APT (or synaptic, aptitude, the Ubuntu Software Library) to install Python packages. Usually, the names start with `python-`, for instance `python-numpy`.

#### The Python Package Index, pip

Most Python packages are available on the Python Package Index (PyPI, http://pypi.python.org). You can install them by using the `pip` command, either as an administrator with `pip install <package>`, or only for you with `pip install --user <package>`. `pip` will download and install the latest version of the package from the Python Package Index. You can also use `pip` to install packages from a tarball or a source repository.

#### Using a `setup.py` file

Some packages are not on the PyPI, but come with a standard `setup.py` installation script. You can use `pip install [--user] <package-dir>` or `python setup.py install [--user]` to install the package.

#### By hand

If there is no installation script (which is really bad for a public package), you should be able to use the package by putting it into a directory available through ```sys.path```.

## Some useful modules

The Python Standard Library comes with a lot of packages, which are (almost) all documented in the Library Reference part of the official documentation: http://docs.python.org/2/library/index.html

Here are some packages which may be very useful for you. The lists are far from exhaustive, you may want to read the module documentation.

### Mathematical functions -- `math`

The ```math``` module defines most of the usual mathematical functions
 * ```sqrt``` (square root), ```pow``` (exponentiation, also available through the \** operator)
 * ```exp```, ```log```, ```log10```
 * ```sin```, ```cos```, ```tan```, ```asin```, ```acos```, ```atan```, ```atan2```
 * ```degrees```, ```radians``` (convert between degrees and radians)
 * ```floor```, ```ceil```
 * ```pi```, ```e```
 * ...

In [35]:
math.sin(0.3 * math.pi)

0.8090169943749473

### Runtime system information -- `sys`

The ```sys``` module contains some information about the running interpreter

 * ```argv``` contains the command-line arguments
 * ```exit``` causes the interpreter to quit when called (with the given exit code, if given)
 * ```path``` contains the locations where the interpreter looks for modules
 * ```modules``` contains all imported modules
 * ```stdin```, ```stdout``` and ```stderr``` are file-like objects representing the standard input, output and error streams (can be overridden)

In [36]:
sys.stdout.write("Hello!\n")

Hello!


### Operating system utilities -- `os`

The ```os``` module contains some useful information and functions to work with the operating system

 * ```getenv``` and ```setenv``` allow to query and modify the environment variables
 * There are many functions to manipulate the file system like ```mkdir```, ```listdir```...
 * You can manipulate paths with the functions in ```os.path```

In [37]:
print(os.getcwd())

/Users/kosack/Copy/CEA/PythonWorkshop/workshop_material/04-AdvancedTraining-Olivier


### Time manipulation -- `time`, `datetime`

The ```time``` module contains the usual functions for manipulating time

 * ```time``` returns the number of seconds since January 1st 1970, as a floating point number
 * ```strftime``` and ```strptime``` deal with formatted times
 * ```sleep``` causes the program to wait during a given time
 * ```clock``` returns an measure of the CPU time (it may not correspond to the real time on many systems)

In [38]:
import time
print(time.time(), time.clock())
time.sleep(5)
print(time.time(), time.clock()) # The sleep function does not use CPU time

1430820119.08 0.613694
1430820124.09 0.615429


The ```datetime``` defines high-level objects to manipulate dates easily

 * ```datetime``` is designed to hold a precise date and time
 * ```date``` is a shortcut when you only need the day
 * ```time``` is a shortcut when you only need the time
 * ```timedelta``` holds differences

You can subtract two ```datetime```s to get a ```timedelta```, add and subtract ```timedelta```s, and add or subtract a ```timedelta``` to a ```datetime```.

In [39]:
import datetime
birthdate = datetime.date(1989, 12, 25)
age = datetime.date.today() - birthdate
print("Python is approximately {} days old.".format(age.days))

Python is approximately 9262 days old.


### Data serialization -- `pickle`

Sometimes you may want to store some data into a file or a string, without caring about the format.

The ```pickle``` (and ```cPickle```) serves exactly that purpose. You can use it on many built-in types, and even with custom objects. This is not intended to be used to save data in a portable form (yet it does), but it can be useful to make a quick backup (to allow restarting a long calculation for example).

In [40]:
import pickle
mydata = {'date': datetime.datetime.now(),
          'ints': range(5)}
x = pickle.dumps(mydata)
print(repr(x)) # There are a lot of line breaks, so we show the condensed version
data = pickle.loads(x)
print(data)

"(dp0\nS'date'\np1\ncdatetime\ndatetime\np2\n(S'\\x07\\xdf\\x05\\x05\\r\\x02\\x04\\x01|\\x96'\np3\ntp4\nRp5\nsS'ints'\np6\n(lp7\nI0\naI1\naI2\naI3\naI4\nas."
{'date': datetime.datetime(2015, 5, 5, 13, 2, 4, 97430), 'ints': [0, 1, 2, 3, 4]}


### Handling command-line arguments -- `argparse`

As we saw, ```sys.argv``` contains the command-line arguments. To avoid code duplication, several modules are available to parse them in a standard way. I personally recommend ```argparse``` which provides simple high-level interface.

In [41]:
import argparse
parser = argparse.ArgumentParser(description="Sample program")
parser.add_argument("n", type=int, help="a number")
parser.add_argument("-v", "--verbose", action="store_true", default=False)

try:
    parser.parse_args(["--help"])
except SystemExit: # The help message forces the program to exit
    pass

usage: __main__.py [-h] [-v] n

Sample program

positional arguments:
  n              a number

optional arguments:
  -h, --help     show this help message and exit
  -v, --verbose


In [42]:
parser.parse_args(["42"])

Namespace(n=42, verbose=False)

In [43]:
parser.parse_args(["-v", "36"])

Namespace(n=36, verbose=True)

### Calling external programs -- `subprocess`

Sometimes, you may need to call other external commands. You can use the ```subprocess``` module for this. The most generic constructor is ```Popen```, but often you do not need to use it directly. The module defines convenience functions for the most common use cases.

In [44]:
import subprocess
output = subprocess.check_output(["ls"], shell=True)
print(repr(output))

'04_Advanced_Training.ipynb\nmymodule.py\nmymodule.pyc\nmypackage\nsolution\n'


### Finding files using wildcards -- `glob`

There is a small yet useful module to find files using wildcards just like in the shell.

In [45]:
import glob
glob.glob("../*/*.ipynb")

['../01-intro-Fabio/01_Intro_fabio.v3.ipynb',
 '../02-BasicTraining-David/Basic training.ipynb',
 '../03-DataStructures-Karl/data-structures.ipynb',
 '../04-AdvancedTraining-Olivier/04_Advanced_Training.ipynb',
 '../05-numerics-Damien/numpy_intro.ipynb',
 '../05-numerics-Damien/numpy_intro_breakout_solution.ipynb',
 '../06-plotting-Fabio/06-Matplotlib_Intro.v3.ipynb',
 '../06-plotting-Fabio/06-Matplotlib_Solution.ipynb',
 '../06-plotting-Fabio/06-Matplotlib_Solution.v3.ipynb',
 '../07-astropy-Karl/Intro to AstroPy.ipynb',
 '../07-astropy-Karl/Intro to AstroPy.v3.ipynb',
 '../08-SciPy-Jean-Marc/Scipy-solution.v3.ipynb',
 '../08-SciPy-Jean-Marc/Scipy.v3.ipynb',
 '../08-SciPy-Jean-Marc-v2/Scipy-solution.v3.ipynb',
 '../08-SciPy-Jean-Marc-v2/Scipy.v3.ipynb',
 '../09-advancedAstro-Alan/Advanced_Astronomy.v3.ipynb',
 '../09-advancedAstro-Alan/Advanced_Astronomy_BreakoutSolution.v3.ipynb',
 '../10-Optimization-Marc/Optimization - hands-on - karl.ipynb',
 '../10-Optimization-Marc/Optimization 

### Regular expressions -- `re`

Regular expressions are well beyond this tutorial, but since they are a very powerful tool to perform searches through character strings, it is worth giving a short introduction.

The purpose of a regular expression is to describe a *language*. One can then check if a given string is part of the language, or find occurrences of elements of a language in a string.

See also: https://regex101.com/#python for a regular expression debugger and description guide.

#### Quick syntax reference

 * **Basic characters**

  Any character which is **not** in the following set is understood literally:

```
. ^ $ * + ? \ | ( ) [ ] { }
```

  To prevent this behaviour, you can escape any of them with a backslash.
  
 * **Character classes**
 
  To recognize any character (except a newline), you can use a dot.
 
  To recognize any character in a given set, you can put the characters in square brackets (you can use - as a shortcut)
  
  `[0123456789abcdef]`, or in more concise form `[0-9a-f]` stands for *an hexadecimal digit*
  
  You can also recognize any character which is not in a given list by adding a `^` at the beginning:
  
  `[^a-z]` means *not a lowercase letter*
  
  To include a `-` in a character class, put it at the beginning or at the end.
  To include a '^', put it anywhere except at the beginning.
  To include a `]`, put it at the beginning.

  There are shortcuts for common character classes:
  
  `\d` for `[0-9]` (decimal digits)
  
  `\s` for `[ \t\n\r\f\v]` (white space)
  
  `\w` for `[a-zA-Z0-9_]` ("word" characters)
  
  Their negations also exist as `\D`, `\S` and `\W`.
  
  A special "character" exists, matching word boundaries: `\b`, and its negation is `\B`.
 
 * **Alternatives**
 
  To recognize alternatives, separate them by `|`:
  
  `foo|bar` describes a language containing the words 'foo' and 'bar'.
  
 * **Multipliers**
 
  To recognize several occurrences of a character, a character class or a group (see below), you can use multipliers:
  
  `X?` means *`X` or nothing*
  
  `X*` means *any number of `X`, including 0*
  
  `X+` means *at least one `X`*
  
  `X{n}` means *n occurrences of `X`
  
  `X{m,n}` means *between m and n occurrences of `X`*.
  If m is omitted, it is assumed zero. If n is omitted, its value is infinite.
  
  By default, these multipliers will try to recognize the most characters. If it is not the wanted behaviour, you can use `??`, `*?` and `+?`.
  
 * **Grouping**
 
  You can regroup elements, for instance to repeat them. Groups may also have a special meaning because they are numbered, and can be accessed remotely.
  
  A group is enclosed in parentheses. To reference a group from within the expression, use a backslash followed by the group number (from 1 to 99).
  
  To prevent a group from being numbered, you can use `(?: )` instead of `( )`. These are called *non-grouping parentheses*.
  
  Groups are also useful because they can be accessed from the match objects returning by the searching functions.
 
 * **Beginning and end**
 
  There are two special characters meant to represent the beginning and the end of the string. These are `^` and `$`.

#### Functions

If you plan to use a regular expression several times, you can compile it to gain a bit of efficiency using ```re.compile```. This also allows you to add flags (see below).

To look if a string is in the language described by the expression, use the ```re.match(expr, string)``` or ```compiled.match(string)``` if you have a compiled regular expression object. This returns None, or a match object describing the match (including its groups).

To find the first occurrence of an element of a given language, use ```re.search(expr, string)``` or ```compiled.search(string)```.

To find all non-overlapping occurrences of elements of a given language, use ```re.findall(expr, string)``` or ```compiled.findall(string)```. This returns a list of matched strings (if there are no numbered groups), or a list of tuples (the groups).

To replace elements of a language, use ```re.sub(expr, string, repl)``` or ```compiled.sub(string, repl)```. ```repl``` can be either a string (where references like `\4` are converted to numbered groups (the whole match is \0)), or a function taking a match object and returning a string.

#### Flags

You can use flags to modify some behavioural elements. Flags are integer values which can be combined using a logical or (`|`).

 * ```re.I``` (```IGNORECASE```) makes the expression case insensitive
 * ```re.M``` (```MULTILINE```) makes `^` and `$` also match beginning and ends of lines inside the string
 * ```re.S``` (```DOTALL```) makes the dot '`.`' recognize newlines
 * ```re.L``` (```LOCALE```) makes `\w`, `\W`, `\s`, `\S`, `\b` and `\B` depend on the locale settings
 * ```re.U``` (```UNICODE```) makes `\w`, `\W`, `\d`, `\D`, `\s`, `\S`, `\b` and `\B` depend on the properties of the Unicode characters
 * ```re.X``` (```VERBOSE```) makes the regular expression parser ignore white space (except in square brackets) and allows comments (starting with `#`)

# Exercise: Writing a simple program

Let's write a program to make distance conversions step by step.

$\rightarrow$ Make a dictionary containing length units (km, au, ly, pc, ...) as keys, and their value in meters as values.

Useful module: `scipy.constants`, contains `astronomical_unit`, `c`, `year`, ...

$\rightarrow$ Make a function which takes a value, a unit name, and a second unit name, converting the given value from the first unit to the second. Don't forget to document it.

To test, you can use `%run script.py` in IPython.

Example:
```python
>>> convert_distance(1.0, "km", "m")
1000.0
```

$\rightarrow$ This function may be useful in many cases, not only for our simple program. Let's create a module for it!

*Note: this is a good practice. Whenever you write generic functions, put them in a separate module. It prevents code duplication. Remember one of the Python guidelines: DRY (Don't Repeat Yourself).*

Again, don't forget the documentation and tests.

Now, we can write a program using this function.

$\rightarrow$ Write a program demonstrating some conversions. You can use the `if __name__ == '__main__':` idiom.

That's cool. But it would be even cooler if we could give the value and units as command-line arguments, isn't it ?

$\rightarrow$ Use `argparse` to retrieve the value, source and destination units

Example:
```bash
$ python convert_distance.py --from ly --to km 1.0
9.45425495549e+12
```

See the `argparse` documentation if you need help: https://docs.python.org/2/library/argparse.html

$\rightarrow$ Add a `-p` / `--precision` option giving the number of significant digits to show, in scientific notation

Example:
```bash
$ python convert_distance.py --precision 5 --from ly --to km 1.0
9.4543e+12
```

Useful function:
```python
def format_precision(precision, value):
    """Format a value with the given number of digits in scientific notation"""
    p = precision - 1 # the format description takes the number of digits after the point
    format_str = "{{:.{}e}}".format(p) # the format string is something like "{:.3e}"
    return format_str.format(value)
```
