# Table of Contents
* [1. Durham Astro PG Python Tutorials 2017](#1.-Durham-Astro-PG-Python-Tutorials-2017)
	* [1.1 Overview](#1.1-Overview)
	* [1.2 Some results of the survey](#1.2-Some-results-of-the-survey)
	* [1.3 Session 1](#1.3-Session-1)
	* [1.4 Environment setup for these tutorials](#1.4-Environment-setup-for-these-tutorials)
* [2. Introduction](#2.-Introduction)
	* [2.1 What is Python?](#2.1-What-is-Python?)
	* [2.2 Hello World in Python](#2.2-Hello-World-in-Python)
	* [2.3 Interactive Python](#2.3-Interactive-Python)
	* [2.4 Why do astronomers like Python?](#2.4-Why-do-astronomers-like-Python?)
	* [2.5 Improving your Python knowledge is worth the effort](#2.5-Improving-your-Python-knowledge-is-worth-the-effort)
	* [2.6 Python2 or Python3?](#2.6-Python2-or-Python3?)
	* [2.7 Tools for writing (Python) code](#2.7-Tools-for-writing-%28Python%29-code)
	* [2.8 Where to get help on Python](#2.8-Where-to-get-help-on-Python)
	* [2.9 Running Python](#2.9-Running-Python)
* [3. Basic Python Review](#3.-Basic-Python-Review)
	* [3.1 Maths at the interactive prompt](#3.1-Maths-at-the-interactive-prompt)
	* [3.2 Errors](#3.2-Errors)
	* [3.3 Printing stuff](#3.3-Printing-stuff)
	* [3.4 Variables and strings](#3.4-Variables-and-strings)
	* [3.5 `None` and logical operations](#3.5-None-and-logical-operations)
	* [3.6 Conditional expressions](#3.6-Conditional-expressions)
	* [3.7 Objects, classes, functions, methods --  and interactive help](#3.7-Objects,-classes,-functions,-methods-----and-interactive-help)
	* [3.8 Defining functions](#3.8-Defining-functions)
	* [3.9 Classes](#3.9-Classes)
	* [3.10 Collections: lists and tuples](#3.10-Collections:-lists-and-tuples)
		* [3.10.1 Lists](#3.10.1-Lists)
		* [3.10.2 Tuples](#3.10.2-Tuples)
		* [3.10.3 Packing and unpacking tuples](#3.10.3-Packing-and-unpacking-tuples)
		* [3.10.4 Loops](#3.10.4-Loops)
	* [3.11 Dictionaries (`dicts`)](#3.11-Dictionaries-%28dicts%29)
* [4. Program structure](#4.-Program-structure)
	* [4.1 Modules, packages and namespaces](#4.1-Modules,-packages-and-namespaces)
	* [4.2 Scope](#4.2-Scope)
	* [4.3 `globals()` and `locals()`](#4.3-globals%28%29-and-locals%28%29)
	* [4.4 Memory, copies and references to variables](#4.4-Memory,-copies-and-references-to-variables)
* [5. Getting things done with Python](#5.-Getting-things-done-with-Python)
	* [5.1 Exceptions and debugging](#5.1-Exceptions-and-debugging)
	* [5.2 File input and output](#5.2-File-input-and-output)
	* [5.3 Working with the filesystem](#5.3-Working-with-the-filesystem)
	* [5.4 Working with numerical data using numpy](#5.4-Working-with-numerical-data-using-numpy)
		* [5.4.1 Creating arrays](#5.4.1-Creating-arrays)
		* [5.4.2 Slicing](#5.4.2-Slicing)
		* [5.4.3 Arithmetic](#5.4.3-Arithmetic)
		* [5.4.4 Saving numpy arrays to disk](#5.4.4-Saving-numpy-arrays-to-disk)
		* [5.4.5 Random numbers](#5.4.5-Random-numbers)
		* [5.4.6 Histograms](#5.4.6-Histograms)
		* [5.4.7 Statistics, logic and `where`](#5.4.7-Statistics,-logic-and-where)
		* [5.4.8 Reading data from a file with `np.loadtxt`](#5.4.8-Reading-data-from-a-file-with-np.loadtxt)
	* [5.5 Making plots with matplotlib](#5.5-Making-plots-with-matplotlib)
* [6. Homework challenge](#6.-Homework-challenge)


# 1. Durham Astro PG Python Tutorials 2017

## 1.1 Overview

__Andrew Cooper, OCW 223, a.p.cooper@durham.ac.uk__

The aims of the course as a whole are to:

* Provide a minimal, astronomy-oriented but complete introduction to Python for those that haven't used it much before.
* Give some pointers about where to go for more information and examples.
* Provide an excuse to discuss how you’re currently using Python with your colleagues and solve any immediate problems you’re having.

3x3hr sessions, no assessment, but there is a lot of stuff to cover and an optional 'homework' problem each week. 
* Session 1: Python basics, intro to numpy and matplotlib
* Session 2: More advanced python, numpy and matplotlib, astronomy-specific topics
* Session 3: Flexible -- solve problems raised in previous weeks, discuss specialized packages, etc.

Format: 
- work through this notebook at your own pace. 
- when you find something new or confusing, try and figure it out.
- convince yourself you've understood by experimenting in this notebook, running commands in a separate ipython session, or writing commands into a script and running it from the command line (see below).
- if you can't immediately figure something out, ask me or someone else -- that's the point of doing this as a workshop.
- anything you don't get through in 3hrs, please try to work through before next week.
- if you already know everything, at least read the tutorial and tell me how I could improve it!

Between sessions, try and put Python to use on something you care about (research work, problems from other courses, shell scripting, or anything else) or just try the homework problem, and come to the next session with any questions, success stories or interesting discoveries.

Feedback/questions welcome throughout -- the course is new, so please spot mistakes and make suggestions (especially about stuff that is missing, wrong, confusing or not helpful).

Other tutorials:
- http://www.scipy-lectures.org/ (covers just about everything, not just scipy)

This tutorial was developed with one eye on the [Euclid python primer](https://github.com/aboucaud/python-euclid2016).

## 1.2 Some results of the survey

- 7 (maybe 8?) responses. Of these:
- About 50% use Python daily
- About 50% consider themselves more advanced than 'beginner'
- 100% comfortable with MacOS
- **50% comfortable with a UNIX-like terminal**

This isn't necessarily related to Python, but being able to use a terminal/shell (on a linux or mac machine) is important for a many jobs in astronomy, like controlling a telescope or working with a supercomputer. If you don't have much experience using a terminal, it's worth spending some time soon making sure that you:

- Understand at least one common shell, like `bash`, `csh` or `tcsh`, and appreciate the differences between them.
- Know commands like `ls`, `cd`, `cat`, `echo`, `mkdir`, `rm` etc.
- Know about command line tools like `less`, `grep`, `tar`, `gz` etc.
- Know how to use `man` to get help on commands.
- Be able to use at least one common non-graphical text editor like `emacs` or `vim`.
- Understand the basics of configuration files like `.bashrc` and `.profile`
- Understand the role of environment variables, particularly `$PATH`, and how to show/set/modify them.

I couldn't find a really good tutorial for the things above. [Here's a concise but old-fashioned shell tutorial](http://freeengineer.org/learnUNIXin10minutes.html).

----

## 1.3 Session 1

Aims:
- Short introduction/background, look at tools for working with Python
- Make sure everyone has an environment in which they can write and run Python
- Introduce/recap fundamentals of the language

By the end of the session, you should be able to:
- write a basic program using the common elements of standard Python
- understand the structure of a typical Python program
- manipulate files and directories
- generate simple random data, read/write a file and make a plot like this with matplotib:

<img style="float: left;" src="example_scatter_plot.png">

----

## 1.4 Environment setup for these tutorials

Python is installed on the Mira (ITS linux) machines, but Jupyter isn't. Also the search paths are not set up to find locally Python packages. We need to install one package, called Jupyter.

`pip` is the Python package manager, used to install modules outside the standard library. There will be some explanation of `pip` in the next session.

Find which shell you're using (probably `tcsh` on the Ph216 machines)
```shell
echo $SHELL
```

If using `tcsh` or `csh`:

```csh 
setenv PATH $HOME/.local/bin:$PATH
setenv PYTHONPATH $HOME/.local/lib/python2.7/site-packages
```

If using `bash`:

```bash
export PATH=$HOME/.local/bin:$PATH
export PYTHONPATH=$HOME/.local/lib/python2.7/site-packages
```

Now install the Jupyter package with `pip`.

```bash
pip install --user jupyter
```

The `--user` option installs packages to a subdirectory of your home directory (`~/.local` by default), so does not require root access.

# 2. Introduction

## 2.1 What is Python?

Wikipedia (with highlights by me):

> Python is a high-level programming language used for general-purpose programming and originally created by Guido van Rossum in 1991. Python has a design philosophy which **emphasizes code readability**, and a syntax which allows programmers to **express concepts in fewer lines of code** than possible in languages such as C++ or Java. The language provides constructs intended to enable writing **clear programs on both a small and large scale**.

> Python features a **dynamic type system** and **automatic memory management** and supports multiple programming paradigms, including object-oriented, imperative, functional programming, and procedural styles. It has a **large and comprehensive standard library**.

- Friendly (high-level, can figure out what the code does by reading it, and don't have to explicitly compile anything, importing 'packages' of code is simple, so it's easy to build up large programs in a modular way).
- A Swiss army knife, useful for almost everything, even if not optimal for anything. 

----

## 2.2 Hello World in Python

Python is an interpreted language -- you pass commands to the Python interpreter (`python`) rather than using a compiler to make self-contained executable binary files. This makes Python programs very short:

```python
print('Hello World')
```

We can write this in a file and run it with Python (the `!` at the start of these lines is a way to run shell commands from inside this notebook):

In [None]:
# Write code in a .py file.
!echo "print('Hello World')" > hello_world.py
# And run it with python
!python hello_world.py

Compare with C (which you would have to compile):

```C
#include<stdio.h> // Need this for output

main() // Need a 'main' function
{
    printf("Hello World"); // Need a ; to end lines.
} // code in the main function is marked with {}
```

----

## 2.3 Interactive Python

The fact that Python is run through an interpreter makes it natural to work with interactively (i.e. by writing one line at a time and looking at the result), which is what's happening in this 'notebook' tutorial. These cells contain Python code that you can execute by selecting the cell and pressing SHIFT-RETURN.

In [None]:
print('Hello World')

If you double click on the cell you can edit it (try making the cell above print something other than Hello World). The cells are all part of the same 'session' of a Python interpreter running in the background, so you can re-use the results of previous cells. For example, we can run the `my_main()` function again:

In [None]:
my_main()

This interactive interpreter can also run commands in the shell. To do this, start with a `!` (this is a magic function in the interpreter, not part of Python).

In [None]:
!echo This shell is $SHELL
!echo The current working directory is `pwd`

Try opening a terminal and typing `python` to start the basic interactive prompt, and use it print Hello World. To get out of the interactive session, press `CTRL-D`. The basic prompt is very restrictive, but it always works and is quick to start up. 

IPython is a more advanced prompt that is the most common way of working interactive with Python. It has many useful features, like colourful syntax highlighting, a command history, interactive help, interactive plotting, a way to inspect the state of variables in programs after they cause errors to find out what happened (debugging), and plenty of other stuff which will be covered later.

This tutorial is made up of a bunch of grey 'code cells' that look like this:

These are being run with IPython, but through the Jupyter notebook browser interface -- one of several possible 'fancy' ways to run IPython with extra features. However, the more common way to work with IPython is direct from the terminal. 

Try starting IPython from a terminal now with the `ipython` command (if you're starting it for the first time ever, it might take slightly longer to start up). 

------

## 2.4 Why do astronomers like Python?

- There is a huge library of open-source Python code to do stuff that astronomers and engineers want to do on a daily basis. 
- Comprehensive and fast high-level numerical computing routines from numpy/scipy, unmatched by any other high-level language.
- Working with strings and the operating system is built-in and easy.
- I/O libraries for just about any format and interaction with system.
- Python’s plot-making library, matplotlib, is comprehensive produces publication-ready results.
- Python replaces shell scripting for many tasks (though not entirely) and saves you having to write scripts for many different shells.
- It's easy to sketch pseudo-code that looks like real code.
- Excellent interactive environment, IPython, now 'industry standard'.
- A lot of trendy coding is done in Python or things that work with Python, especially web-related. Astronomers really like to be trendy.

----

## 2.5 Improving your Python knowledge is worth the effort

- You’ll probably need to read and write Python at a reasonable level throughout your career as an astronomer or engineer. You’ll also need an ability to learn new Python things fast and keep your knowledge up to date.
- For better or worse, sophisticated Python skills make you more employable, especially if you go into industry after your PhD (in which case you’ll need to demonstrate these skills). 
- Astronomy careers can (in some cases) be based almost entirely around writing good Python as a data scientist, instrument engineer or software engineer (although the market is getting very crowded now). Some astronomers are major contributors to standard scientific python packages, which looks good on their CV.
----

## 2.6 Python2 or Python3?

Confusingly, there are two versions of Python in common, everyday use: Python2 and Python3. In future, Python3 will be the only version. The differences are meaningful, but not huge. It's pretty easy to 'upgrade' code from Python2 to Python3. 

This tutorial is written for Python2. Astronomers are usually quite slow to change, because of dependencies on old code. Most major packages commonly used by astronomers now fully support Python3, but this has only become the case over the last few years. 

If you don't use a package that absolutely only works with Python2, it's probably a good idea to use python3. If you want to stick with Python2 for now, you can make your code minimally compatible with python3 very easily (see the notes at the end of the tutorial).

Learning Python2 does not teach you any 'bad habits' that you have to 'unlearn' to use Python3. Well, maybe only one (we'll see below).

You can check what version of Python you're using in a terminal:

```bash
python --version
```

In [None]:
# Let's check what this notebook is using
!python --version

## 2.7 Tools for writing (Python) code

The results of the survey: 7 responses, 5 editors, 9 preferences
```
gedit    xxx
Xcode    xx
pycharm  x
spyder   xx
emacs    x
```

`Xcode`, [`pycharm`](https://www.jetbrains.com/pycharm/) and [`spyder`](https://github.com/spyder-ide/spyder) are IDEs: they provide a more comfortable environment, interactive help/documentation and tools to manage projects, alongside an editor linked to a built in interpreter session. These can be helpful, but they aren't essential for writing Python.

The 'essential' features of a good python editor are:
    - syntax highlighting with colours
    - consistent indentation with spaces rather than tabs

After that it's up to you! Use what you're comfortable with, and take time to learn how your editor can make writing code quicker and easier. If you feel it's not working after a few weeks, try something else. Now is the time to experiment!

Going back to the list above, if I asked 10 'experienced' astronomers, I would expect the answer to be:
```
emacs   xxxxxx
vim     xxx
```

Why?

- These tools have been around forever and are available on most systems, in most places.
- A lot of work is done in the terminal only (no graphics), often over `ssh` to many different machines you don't own.
- Astronomers tend to work on small projects and [Heath-Robinson](https://en.wiktionary.org/wiki/Heath_Robinson) solutions to problems rather than large, well-structured programs with lots of documentation.
- Learning vim is difficult but rewarding (tutorials: [1](http://vim.wikia.com/wiki/Tutorial) [2](http://www.openvim.com/)).

Among trendy people, you will also hear about [`atom`](https://atom.io/), which is a newer editor along the same lines (although I don't think it runs from the command line).

Here is [some more discussion of Python editors](https://www.fullstackpython.com/development-environments.html).

## 2.8 Where to get help on Python

Python has a built-in help system that can give you some minimum information about a module, function or variable, if you already know its name. For example, anywhere you can execute python, you can use the `help()` function:

In [None]:
help(str.upper)

In IPython (and hence in these notebooks), you can also type a `?` before or after the method name, which does much the same thing, but more 'interactively' (try and see -- also try it in a terminal IPython session).

In [None]:
?str.upper

These are the quickest ways to remind yourself how to use a method you already know about. 

To get more general help on how to solve a particular problem ('how do I do XYZ in python?' or 'why does Python do X?') [StackOverflow](http://stackoverflow.com/questions/tagged/python?sort=votes&pageSize=15) (SO) is often very useful. SO exists specifically to share questions and answers on programing problems. Answers (and questions) can be voted on by other users, the idea being that the 'best' answer will eventually get the most votes, so you don't have to read all the 'less good' answers. Because SO is very popular with serious programmers, this usually works -- however, there are a lot of *wrong* 'accepted' answers on SO, so it's often necessary to read some of the comments and discussion.

Some books can be worth getting out of the library, particularly for beginners in Python -- when I was learning Python I found the examples in the O'Reilly [Python Cookbook](http://shop.oreilly.com/product/0636920027072.do) quite useful, but I probably wouldn't buy a new copy. More than one astronomer has written books about Python -- [this one is aimed at machine learning](http://www.astroml.org/#textbook) and is quite good (there are copies in the library, but one is on my desk). [This one](http://shop.oreilly.com/product/0636920033424.do) (which is not by astronomers) also looks interesting.

Once you have some experience, reading other people's code (e.g. on [github](https://github.com/astropy/astropy)) can be a good way to learn new things.

## 2.9 Running Python

There are multiple ways to run python code. In order of increasing complexity:

1. __Write a script and tell the python executable to run it:__

    _This is the best way to run long-running non-interactive jobs (and the only way if sending jobs to a batch queue)._

    ```bash
langdale:tutorial2016 andrew$ python my_script.py
```
       
2. __Run python interactively with the basic interpreter shell built into python itself:__
    
    _This is not good for much, generally speaking, but it is always available wherever Python is and fast to start up. Exit with CTRL-D or exit()._
    
    ```bash
langdale:tutorial2016 andrew$ python```
    
    ```
Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:43:17)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> 1+1
2
>>>
```

3. __Run through the third-party IPython shell (which you have to install):__

   _This is a much more advanced shell and the most productive way of developing and debugging code interactively. It has nice colours, a command history, works like a system shell, has a pretty good interface to the python debugger built in, and plenty of other useful things. There are some instructions for using IPython [here](http://ipython.readthedocs.io/en/stable/interactive/tutorial.html)._
 
   ```bash
langdale:tutorial2016 andrew$ ipython```
    ```
Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:43:17)
Type "copyright", "credits" or "license" for more information.
IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
```
    ```
In [1]: 1+1
Out[1]: 2
In [2]:
```

4. __Run ipython through a jupyter notebook:__

    _This is fancy but not terribly practical (IMO) for everyday use. Requires an up-to-date browser to be running._

    ```bash 
langdale:tutorial2016 andrew$ jupyter-notebook```

   This tutorial is a Jupyter notebook. Press SHIFT+ENTER to run each cell, and the results will be printed underneath. This is just a more elaborate version of IPython. In your own time, have a look at the user interface tour and keyboard shortcuts under the Help menu.
   
   You can edit any of the examples to experiment. If things get badly messed up, Kernel->Restart & Clear Output.
   
   I recommend having a normal terminal open somewhere with IPython running in it, to experiment with. You can experiment here in the notebook too, but it's worth getting used to the terminal. The terminal also feels more natural and straightforward to some people.
   
   --------

# 3. Basic Python Review

## 3.1 Maths at the interactive prompt

All this should be familiar if you've ever used Python before. The first point here is that typing stuff at the interactive prompt in IPython usually has a simple and intuitive result, and the second point is that Python has *dynamic typing*, so most of the time you don't have to worry about how numbers are encoded in memory, or how much memory is used to store them.

In [None]:
# Add integers, get an integer
1 + 1

In [None]:
# Add floats, get a float (no need to worry about precision)
1.0 + 1.0

In [None]:
# Add an interger to a float, get a float
1.0 + 2

In [None]:
# All the same ideas applies to multipilcation, obviously enough.
5*2.0

In [None]:
# Raising something to a power
3 ** 4

In [None]:
# Very big numbers are OK (but we'll see below that this doesn't hold for serious numerical work).
# Note the L on the end.
(2**22)**22

In [None]:
# You can write numbers like this too:
1.1e12

In [None]:
# These lines starting with # are Python comments, by the way.

Python is very forgiving about variable types. You can check the type of anything explicitly if you want, using the built-in function `type()`, but you rarely need to. 

In [None]:
type(1)

In [None]:
type(1.0)

In [None]:
type((2**22)**6)

In [None]:
# You can also explicitly force variables to be interpreted as a specific type (known as 'casting')
float(1)/float(2)

Division as usual, is more subtle. The result of the next expression is different in Python 2 (where dividing an integer by an integer results in integer division) and Python 3 (where it results in floating point division).

In [None]:
8 / 3

In [None]:
8/3.0

I always try to make sure I cast the denominators of fractions to floats explicitly (using `float()` or writing them with a `.0` on the ned), but it's easy to forget, leading to bugs. The recommended way to avoid this in python 2 is to start all your code with:

```python
from __future__ import division```

which makes the python3 behaviour the default in python2. For now, there's no need to worry about this. One more thing for completeness:

In [None]:
# If you want to make it clear you actually want integer division (rounding down)
8 // 3

Not much more to say about arithmetic operators.

In [None]:
# Modulo (remainder) -- it's useful sometimes, really!
42 % 12

## 3.2 Errors

A large fraction of your Python life will be spent intepreting the results of errors. Fortunately these usually get printed in a helpful way. Let's confuse Python:

In [None]:
1/0

There's more about this later, when we look at error handling and debugging.

## 3.3 Printing stuff

Above, the answers just appeared by magic underneath the expressions. This is how the interactive prompt works. If you put some of the code above in a file and ran it with Python directly, nothing would be printed. To print something to the standard output (i.e. 'the screen'), use the `print()` function.

In [None]:
print(1+1)

In python2, `print` was interpreted specially and you could leave the brackets off if you wanted, like this:

In [None]:
print 1+1

In python3, `print()` _must_ have the brackets. It's a good habit to treat it like that in python2 as well.

In [None]:
# If you _don't_ want to see the result of the interactive prompt, put a semi-colon at the end of the line:
1 + 1;

In [None]:
# The ; is Python's line continuation, if you're curious. So what happens here should be understandable:
1+1; 2+3; 20+30

Since good Python code is supposed to be easily readable, multiple statements on one line like this are very rarely seen.

## 3.4 Variables and strings

Assigning values to variables gives a reassuring feeling about the order of the Universe.

In [None]:
x = 3

(There is no output at the interactive prompt above because assignment with `=` isn't a function, so there is no 'result' to print)

In [None]:
x

Variables can be named anything, as long as you avoid [reserved keywords](https://docs.python.org/2/reference/lexical_analysis.html#keywords) used by Python itself.

In [None]:
# All that dynamic typing works as you might expect with these variables
y = 2.0
type(x*y)

Now is probably a good point to introduce two other basic types, strings and 'booleans' (logical values `True` and `False`).

In [None]:
a = 'carbon dioxide' # a string
b = True # a boolean

In [None]:
type(a)

In [None]:
type(b)

We'll concentrate on strings for a bit. 

(Note: Python strings can be Unicode without _too_ much fuss, see here: [Python2](https://docs.python.org/2/howto/unicode.html)/[Python 3](https://docs.python.org/3.6/howto/unicode.html). However in most astronomy-related cases this is overkill, you should stick to the default ascii encoding. Printing maths and less common characters in plots is handled in a totally different way, as we'll see later.) 

Compared to most other languages, manipulating strings in Python is easy.

In [None]:
print(a + ' sounds like ' + a)
print(a.upper())
print(a.capitalize())

There are several different ways to specify strings:

In [None]:
"Double quotes"

In [None]:
'Single quotes'

The main point is that you can use one inside the other, so:

In [None]:
"This 'works' fine"

In [None]:
'as does "this"'

In [None]:
"""
Three double quotes either side allows long blocks of text with line breaks!

You will see this all the time in Python, because it's the usual way of providing documentation for functions.
"""

That `\n` is the (non-printing) new-line code, which you've hopefully seen before somewhere. This illustrates the difference between the interactive prompt and what you would get from `print()`.

In [None]:
print("I put some explicit\nnewlines in here\nmyself")

Other things can be converted to strings:

In [None]:
str(6**7)

Usually we want the values of variables to appear in strings. The most basic and ugly way is to add up lots of small chunks like this:

In [None]:
element1 = 'carbon'
element2 = 'oxygen'
print(a + ' is a compound of ' + element1 + ' and ' + element2)

This is OK in limited cases but it's hard to do anything more complicated with it. Also it can be tough to read. The more common way is to write a string with special placeholders in it that allow for 'string interpolation', like this:

In [None]:
full_string = '%s is a compound of %s and %s'%(a,element1,element2)
print(full_string)

The variables passed to the interpolation operator `%` are inserted into the string in order. `%s` is a placeholder that formats the variable you give it as a **s**tring. It doesn't matter if do the interpolation when you define the string, or later.

In [None]:
# For example, this is equivalent to the above example:
full_string = '%s is a compound of %s and %s'
print(full_string%(a,element1,element2))

# So is this:
print('%s is a compound of %s and %s'%(a,element1,element2))

String interpolation turns out to be one of the things you'll use to get stuff done at work -- for example, in the context of writing axis labels and legends for plots, working with data tables, and working with path to files. Compared to many other languages, Python is a joy to use in this respect.

There are other format specifiers, and syntax for each of them to specify things like the number of decimal places. [This link has a clear explanation of the various options](https://pyformat.info/). A mismatch between the type of the variable and the type expected by the specifier works as long as it's possible to cast the former to the latter.

In [None]:
for_example = """This is an integer: %d
This is a float: %2.3f
This is scientific notation: %10.6e
This is a string: %s"""

print(for_example%(2.0,2,2e22,2))

There is another way to format strings (the so-called 'new' style) using the format function, like this:

In [None]:
for_example = '{} is a compound of {} and {}'
print(for_example.format(a,element1,element2))

In [None]:
str.format(for_example,a,element1,element2)

In [None]:
for_example = '{compound:s} is a compound of {first_thing:s} and {second_thing:s}'
print(for_example.format(first_thing=element1,compound=a,second_thing=element2))
print(for_example.format(first_thing=element2,compound=a,second_thing=element1))

----
*Further reading:*
- the `regex` module is used to search for patterns in strings; it seems complicated at first, but it's worth learning.

## 3.5 `None` and logical operations

Variables can also have a special value of `None`, which indicates that they are defined, but have no value. This is used all the time and is really important to know about.

In [None]:
c = None
type(c)

Now we can try some logical operations using the comparison operators:

In [None]:
print(1==1)
print(1!=2)
print(1==2)
print(1<=2)
# etc. etc.
print(True != False)
print(False == None)

In [None]:
True and True

In [None]:
True and False

In [None]:
True or False

In [None]:
not True or False

and so on. Beware:

In [None]:
1 == True

In [None]:
0 == False

Yikes! Maybe every positive number is `True`?

In [None]:
2 == True

In [None]:
-1 == True

OK... so if True and False are equal to 1 and 0, can we do arithmatic with them?

In [None]:
True - 1 == False 

In [None]:
False - 1

In Python, `True` and `False` are really some special sort of integers (`bool`), the values of which are 1 and 0 respectively. 99.9% of the time you probably don't need to worry about this. [Further discussion here if interested](http://stackoverflow.com/questions/2764017/is-false-0-and-true-1-in-python-an-implementation-detail-or-is-it-guarante).

There is another operator, `is`, which checks if two things *are the same thing* rather than just checking that they have equal values. This doesn't sound obvious and it is less obvious than it sounds.

In [None]:
x = 1
y = 1.0
a is b

In [None]:
0 is False

## 3.6 Conditional expressions

If you've ever written any code before you should expect the following kind of thing to happen in Python, and it does:

In [None]:
A = 8
if A == 8: # test equality
    print ('A really is equal to 8.')
    
B = 10
if B > 10*A: # test inequality
    print('B is about %f times bigger than A.'%(B/A))
elif B < A/10.0:
    print('B is signficantly smaller than A')
else:
    print('An astronomer might say B is approximately equal to A.')

Some languages have `case` statements. Python doesn't, it relies on `elif`.

## 3.7 Objects, classes, functions, methods --  and interactive help

Python doesn't just have basic 'things' like `int`, `float` and `str`, it also has objects. Everything in python, more or less, is an object of some sort. You don't need to worry about that, because all it means is that things usually work as you expect.

In [None]:
a = 'carbon dioxide' # a string
print(type(a))

'a' is an object of class `str`. We can also ask 'is the varaible a string'?

In [None]:
isinstance(a,str)

You won't see this `isinstance` check very often in Python code, because the reccomended approach is not to care what class an object actually *is*, only what methods it has (this is called 'duck typing')

More or less, there is no difference between the words 'type' and 'class' in Python, they mean the same thing. Objects of the same class ('instances' of the class) behave in the same way (in Python jargon, they have the same methods) but have different values. We already saw some methods of the `str` class, accessed through the `.` notation, for example:

In [None]:
a.upper()

Methods, for example `str.upper()`, are just functions that are associated with objects. Python is 'object oriented' ('everything in Python is an object'), but you don't need to know that, or what it means, to write Python code. Consequently Python code doesn't have to look object oriented. The most obviously not-object-oriented thing in Python is probably the way to get the length of a string. If you've seen other object-oriented languages you might expect it to be a property or method of the string, like `a.length()` or `a.size()`, but in fact it's a function that takes the string as its argument:

In [None]:
len(a)

There is [a reason for this](https://docs.python.org/2/faq/design.html#why-does-python-use-methods-for-some-functionality-e-g-list-index-but-functions-for-other-e-g-len-list), but it's partly a philosophical one.

Since we know that our variable `a` is an object of class `str`, we can use the function `help()` to learn about what we can do with it.

In [None]:
help(str)

I almost never use `help()`. If I want to know what methods an object has, I usually use 'tab completion' in IPython -- type the name of the variable and a dot then press TA , like this `a.` + [TAB].

In [None]:
a. # click on the cell to edit it, put the cursor after the dot, and press TAB.

If I want to know how to use one of these methods, I use the magic '?' suffix in IPython to see the so-called 'docstring' of the method. Docstrings are special strings written into the code that defines the method, which provide a minimal ammount of information. In Jupyter this opens a pop-up window that you can safely close after you've read it.

In [None]:
a.rstrip?

In [None]:
# OK, let's try this rstrip function:
a.rstrip('ide')

The documentation shows that this returns a **copy** of the string, so the original string should be the same.

In [None]:
a

In [None]:
# The same idea works for functions
len?

Often the docstring isn't enough, in which case Google is the solution.

The list of methods we printed with `help()` above showed many starting with `__`. These are 'hidden' methods -- hidden only in the sense that they don't show up when you use tab completion. Apart from that they are identical to 'normal' methods. These are methods that you are not expected to call yourself (but you can do if you want).

In [None]:
a.__eq__('carbon dioxide') 

Here `__eq__` takes another string and returns true if that string is 'equal' to the object. It's the fact that `str` has an `__eq__` method that means you can compare strings with logical operators like this:

In [None]:
a == 'carbon dioxide'

This is 'duck typing' in action -- we don't need to care if instances of a particular class are numerical or boolean values to test if two objects of that class are equal; we only need to care that they have a method with the name `__eq__` that defines what 'equality' means for that particular class.

## 3.8 Defining functions

This is a simple definition of a function, using the `def` keyword:

In [None]:
def my_function(first_argument, another_argument):
    """
    my_function adds together the first and second arguments 
    and returns the result.
    
    Arguments
    ---------
        first_argument   : the first argument
        another_argument : this is the other argument
    
    The two arguments can be any two things that python can 
    add together.
    
    Returns: the sum of the two arguments
    """
    print('I got %s and %s'%(first_argument, another_argument))
    some_result = first_argument + another_argument
    
    return some_result

Note the following:

- the body of the function (from the line after the `:`) is indented.
- the first thing in the function is a multi-line string that explains the purpose of the arguments. This is the docstring.
- the variable after `return` is returned by the function.

Good docstrings are concise and about 80 characters wide. Apart from that it's up to you what if anything to write there, but it's a good idea to write something, even in your own code. Docstrings are picked up by Python's interactive help system. 

In [None]:
help(my_function)

In [None]:
my_function(1,2)

In [None]:
my_function(1,True)

In [None]:
my_function('astro','physics')

In [None]:
x = my_function(7,8) # assign the result to a variable

print 'So x = %s'%(x) # Think about why I'm using %s to format x in the printed string here...

This is the simplest possible function:

In [None]:
def simplest_possible_function(): 
    pass

# If we call it, nothering happens.
simplest_possible_function()

`pass` is a keyword that does nothing. It's needed here because otherwise the method definition is incomplete. Functions don't have to `return` anything, but they do have to do something (even if that something is just `pass`)

Functions can have **default** arguments, as follows:

In [None]:
def another_function(alpha=2,beta=None):
    if beta is not None:aa
        return alpha*beta
    else:
        return str(alpha**2)
    
print(another_function()) # No explicit arguments
print(another_function(3))
print(another_function(3,2))
print(another_function(beta=2,alpha=3))
print(another_function(beta=4))

There is another way to define very simple one-line functions, called `lambda`

In [None]:
add_together = lambda x,y : x+y

add_together(1,2)

There is no deep difference between functions and `lambda`s. As it says [here](https://docs.python.org/2/faq/design.html#why-can-t-lambda-expressions-contain-statements):

> Unlike lambda forms in other languages, where they add functionality, Python lambdas are only a shorthand notation if you’re too lazy to define a function.

I tend to use `lambda` for simple one-line expressions.

In [None]:
# Functions are objects too.
print(my_function.__name__.upper())

# So we can pass functions as arguments.
def uppercase_name_of_function(f):
    print(f.__name__.upper())

uppercase_name_of_function(my_function)
uppercase_name_of_function(add_together)

`lambda` expressions don't have names, and don't really need them -- but we can give then names if we want, since they're no different from other functions.

In [None]:
add_together.__name__ = 'Add Together!'
uppercase_name_of_function(add_together)

When small functions are defined inside other functions, they're usually written using `lambda`. But sometimes we want complicated functions to be defined inside other functions -- **nested** functions. There is nothing complicated about that:

In [None]:
def outer_function(x):
    # define a nested function
    y = 10
    def inner_function(x):
        # The inner function knows about variables in the outer function,
        # unless they're redefined in the body of the inner function
        print(x*x*y)
        return
    
    # call the nested function twice
    inner_function(2*x)
    inner_function(4*x)
    return

outer_function(3)

# The inner function is only defined within the outer function, so we can't call it from the outer `scope`.
inner_function(4)

## 3.9 Classes

Defining your own classes is not all that common in everyday Python programming -- you can write very sophisticated code without ever defining any classes. So my explanation here is a brief example:

In [None]:
class MyClass(object):
    def __init__(self,favourite_colour):
        """
        Create a new MyClass instance.
        
        Arguments
        ---------
            favourite colour : the colour this instance likes best
        """
        self.favourite_colour = favourite_colour
        
    def my_favourite_colour(self):
        return self.favourite_colour
    
class_alpha      = MyClass('green')
class_number_one = MyClass('orange')

print(class_alpha.my_favourite_colour())
print(class_number_one.my_favourite_colour())

Note:
- Class definitions start with `class`. 
- By convention, multi-word class names are usualy written in so-called CamelCase, whereas functions are named_like_this, with underscores.
- The `(object)` after the name of the class means this class 'inherits' from the most basic class, `object`. Don't worry about this for now.
- The class has two methods, nested inside the definition. These are called `__init__` and `my_favourite_colour`.
- `__init__` is a special method that allows new instances of the class (i.e. objects) to be created with some initial state based on arguments passed directly to the class name, like this: `MyClass('green')`.

The first argument of both methods is the keyword `self`. This represents the current instance of the class. Although it appears when the argument lists are defined for the functions in the class, it is 'invisible' when those methods are called. Each instance of the class has some 'state' that you can access through `self`, which is what's happening in the my_favourite_colour method.

For the time being, just treat it as a law that `self` has to appear as the first argument when you defined class methods. 

## 3.10 Collections: lists and tuples

User-defined classes aren't very common. Collections, on the other hand, are extremely important. Collections are objects that group other objects together. There are three main types of collection, `list`, `tuple` and `dict`. Manipulating these three basic collections is fundamental to good programming in Python.

Lists an tuples have a lot in common. Lists are enclosed in [] and tuples in ():

In [None]:
my_list = [100,200,300,400,500,600]

In [None]:
my_tuple = (100,200,300,400,500,600)

We'll look at the differences later. You'll find you work with lists more often than tuples.

### 3.10.1 Lists

Lists are ordered collections. Each element in the list has an index: the first element has index `0`, the second element has index `1` and so on (like C).

In [None]:
my_list[0]

In [None]:
my_list[2]

In [None]:
my_list[3]

In [None]:
len(my_list)

Lists can be 'sliced' to return ranges of elements using the notation `x[start:stop:step]`, where `x` is a list and `start` is the first element you want to return, `stop` is the element **after the last one you want to return** and `step` allows you to return either every element (`step=1`), every 2nd element (`step=2`), every third (`step=3`) etc. Negative values of step mean the same thing, but go in the opposite direction through the list.

All three parts of the slice syntax are optional, so `x`, `x[:]` and `x[::]` are all equivlent.

The fact that the element corresponding to the index 'stop' **is not included** in the sliced list is one of the most confusing things for Python beginners.

***Understanding the following behaviour is very important*** for using Python to work with scientific data, in particular because the same ideas apply to the multidimensional numerical arrays that we'll look at later in `numpy`.

In [None]:
my_list[0:3] # elements 0, 1 and 2, NOT INCLUDING element 3!

In [None]:
my_list[1:] # This is a 'slice' of the list from the 2nd element to the last

In [None]:
my_list[-1] # Negative indices count backwards through the list

In [None]:
my_list[1:-1] # This is also a 'slice', here from the 2nd to the 2nd-last

In [None]:
my_list[:-1]

In [None]:
my_list[0:-1:2]

In [None]:
my_list[:-1:2]

In [None]:
my_list[::2]

In [None]:
my_list[1::2]

In [None]:
my_list[::-1]

In [None]:
my_list.reverse()
print(my_list)
my_list.reverse()
print(my_list)

In [None]:
empty_list_a = []
empty_list_b = list()

In [None]:
my_list + [400] # Returns a new list
my_list # Original list wasn't changed

In [None]:
my_list.append(4) # Changes the list 'in place'
my_list

Lists are 'mutable' (we can change the values of their elements)

In [None]:
my_list[1] = 100
my_list

Lists have lots of methods and functions to operate on them. We'll see more examples as we go on.

Functions can take lists as arguments and return lists:

In [None]:
def list_processing(x):
    """
    Function that expects a list as an argument.
    """
    new_list = list()
    if len(x) > 0:
        for i,j in zip(x[1::],x[::-1]):
            new_list.append(i)
            new_list.append(j)
    return new_list

a = [1,2,3,4,5]
print(list_processing(a))
print(a)

def add_to_x(x):
    x = x+1
    return x

What happens if you change `new_list = list()` to `new_list = x` in the example above? 

It's a good idea to read [this](http://docs.python-guide.org/en/latest/writing/gotchas/) advice about using lists as default arguments in functions.

### 3.10.2 Tuples

Tuples are very similar to lists. The most important difference is that they are **not** mutable. In other words, you can't change the elements after you've made the tuple.

In [None]:
my_tuple[1] = 100

In [None]:
my_tuple + (4,) # Returns a new tuple

Tuples have no 'append' function, because they're immutable.

In [None]:
1, 2, 3, 4 # This makes a tuple

In [None]:
my_function = lambda x: x
['this', 1, None, True, my_function, 'is fine']

The `in` statement checks if an item is in a list or tuple:

In [None]:
'a' in ['a','b','c']

You'll most often see tuples returned by functions that return several 'packed' arguments, as in the next example.

### 3.10.3 Packing and unpacking tuples

In [None]:
def three_powers(x):
    return x**2, x**3, x**4

y = three_powers(4)
print('y is a tuple: %s'%(str(y)))

Why do we have `str(y)` and not just `y` in the arguments to the string format there?

If we want, we can 'unpack' the tuple into individual variables:

In [None]:
alpha, beta, gamma = three_powers(4)
print(alpha)
print(beta)
print(gamma)

In [None]:
# So the tuple y is equivalent to y[0], y[1], y[2]. 
# Now it's clear why we needed to give `str(y)` to the print statement
# when we only had one `%s`. The other way is:
print('y is a tuple, the elements of which are: %s %s %s'%(y))

In [None]:
# So this is also a tuple:
1,2,3

### 3.10.4 Loops

Loops build on lists and tuples, both of which are examples of 'iterables' (classes that implement the special functions required for operating on each element of a sequence).

In [None]:
for x in [1,2,3,4]:
    print(x)

In [None]:
x = 0
i = 0
while i < 10:
    x = x + i
    i = i + 1
print(x)

I almost never use `while` loops. The same idea as the while loop above is more often expressed like this:

In [None]:
x = 0
for i in range(0,10):
    x = x+i
print(x)

`range` is a useful function that returns a list with length (and step size) determined by its arguments:

In [None]:
range(0,20,2)

The indented `for` loop above is the most general way to write loops in Python, but in simple cases they can be written in a more compact way using 'list comprehensions', which basically means a for-loop *inside* the syntax for creating a list, like this:

In [None]:
[i**2 for i in range(0,10)]

So the example above can be written

In [None]:
x=0 ; print(sum([x+i for i in range(0,10)]))

`sum` is obviously a function that takes a list as an argument and adds up all the entries. List comprehensions can be used to do sophisticated things.

For things like loops, it's a waste to have to store a huge list like this in memory, so more often you'll see `xrange`:

In [None]:
xrange(0,20,2)

`xrange` returns a so-called 'generator'. These have a function `__iter__` that returns an object that yields each term in the sequence on successive calls rather than computing all the terms immediately. If it runs out of items it throws an exception. 

In [None]:
my_sequence = xrange(0,12,2)
v = my_sequence.__iter__()

print(v.next())
print(v.next())
print(v.next())
print(v.next())
print(v.next())
print(v.next())
print(v.next())

Generators are necessary for iterating over sequences of infinite or unknown length -- we'll see this again when we look at reading from files.

Unless you're writing your own generators, you don't need to worry about how they work -- they work exactly like lists in controlling loops, which is where they appear most often. 

Just remember that `xrange` works like `range` and is better for loop counters because it doesn't take up memory.

To get out of a loop part way through, use `break`.

In [None]:
for i in xrange(0,10):
    if i > 5: break
    print(i)

To go straight to the next iteration without breaking the loop, use `continue`.

In [None]:
for i in xrange(0,10):
    print(i)
    if i < 5 or i > 8:
        continue
    print('Keep going!')
    

So far we've been iterating over lists of numbers, but strings are also iterable, because they are collections of characters.

In [None]:
'abcdefg'[2:5]

Very often you'll want to loop over something (call it `x`) and count the steps in the loop it at the same time (call the step numbers `i`). You could make a counter variable for `i` and explicitly add `+1` in each iteration of the loop, but Python has a function `enumerate` to make things slightly neater:

In [None]:
for i,x in enumerate('carbon dioxide'):
    print('%3d : %s'%(i,x))

If you want to loop over two (or more) things at the same time, use `zip`(). For example, `enumerate` is equivalent to this:

In [None]:
my_string = 'carbon dioxide'
for i,x in zip(xrange(0,len(my_string)), my_string):
    print('%3d : %s'%(i,x))

What happens when the arguments to `zip` are have different lengths?

Occasionally you want strings to be treated like scalar variables while still treating lists as lists. In this case the simplest thing to do is check the type explicitly. For example:

In [None]:
# This is the obvious way to do it, but it doesn't work in all cases:
def add_together_strings(s):
    """
    Arguments:
        s: a list of strings
    
    Returns:
        A single string with 'and' between each element of s
    """
    return '%s'%(' and '.join(s))

print(add_together_strings(['carbon','nitrogen','oxygen']))
print(add_together_strings('carbon')) # Doesn't work as we want!

# Fix it:
def add_together_strings(s):
    """
    Arguments:
        s: a list of strings
    
    Returns:
        A single string with 'and' between each element of s
    """
    if isinstance(s,str):
        return s
    else:
        return '%s'%(' and '.join(s))

print(add_together_strings('carbon'))
print(add_together_strings(['carbon','nitrogen','oxygen']))
            

-----
*Further reading:*

- [StackOverflow question on iteration in Python](http://stackoverflow.com/questions/9884132/what-exactly-are-pythons-iterator-iterable-and-iteration-protocols)
- [generator expressions](https://wiki.python.org/moin/Generators)
- [how generators work and the yield keyword (more advanced stuff)](http://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do)
- the [iterools](https://pymotw.com/2/itertools/) standard library module
-----

## 3.11 Dictionaries (`dicts`)

'dict' is short for 'dictionary'. These are *unordered* collections of labels ('keys') with associated data ('values'), which other languages often call 'hashes'. There are two basic ways to create them:

In [None]:
# since dict() is a function, the arguments have to be written like this:
morphologies = dict(ngc4849 = 'spiral', ngc550 = 'elliptical', ngc4994 = 'spiral', ngc2337 = 'irregular', ngc9999 = None)

In [None]:
# The {} syntax is an alternative:
morphologies = {'ngc4849': 'spiral', 'ngc550': 'elliptical', 'ngc4994' : 'spiral', 'ngc2337' : 'irregular', 'ngc9999' : None}

In [None]:
# Long expressions on single lines like this can be hard to read. Here's how I usually write them:
morphologies = {'ngc4849' : 'spiral',
                'ngc550'  : 'elliptical', 
                'ngc4994' : 'spiral', 
                'ngc2337' : 'irregular', 
                'ngc9999' : None}

Notice:

- in the first (`dict()`) form, the keys have to be strings. In the {} form they don't, but they do have to satisfy some requirements, otherwise you will get an error about 'unhashable types'. The keys of dictionaries are usually strings.

- the values can be any mix of types, functions, collections, classes, whatever you like. Nested `dict`s are very common. 

We can look up the value associated with a particular key like this:

In [None]:
morphologies['ngc4505'] # get the value for a given key

Both the keys and values can be returned as lists. Compare the order to the order in the expression that creates the dictionary above.

In [None]:
print(morphologies.keys())
print(morphologies.values())

The `iteritems()` method of a `dict` returns a generator that yields key-value pairs, saving you a `zip()`.

In [None]:
for k,v in morphologies.iteritems():
    print("%s is %s"%(k,v))

I use `dict`s all the time. Being able to make a simple structure with named 'properties' using a `dict` is probablu one reason why it's not common to see simple classes written to represent 'quick and dirty' bundles of data  in Python, despite the 'object oriented' nature of the language. This is particularly true for the common case of returning structured results from functions. For example:

In [None]:
def my_complicated_function(x):
    """
    This function is obviously over-complicated for what it does.
    
    A realistic version might be, for example, reading 
    the header from a data file.
    """
    results                  = dict()
    results['powers']        = dict()
    results['odd_multiples'] = dict()
    
    results['powers']['square'] = x**2
    results['powers']['cube']   = x**3
    
    for m in xrange(1,12,2):
        results['odd_multiples'][m] = x*m

    return results

answer_three = my_complicated_function(3)
answer_four  = my_complicated_function(4)

print(answer_three)
print(answer_four)
print(answer_four['powers']['cube'])

The above is, in a lot of simple cases, just as good as or better than a similar idea implemented with classes:

In [None]:
# This is the most generic class, we only give it a name.
class MyComplicatedFunctionResults(object):
    pass

class Powers(object):
    pass

class OddMultiples(dict):
    pass

def my_complicated_function(x):
    """
    """
    results               = MyComplicatedFunctionResults()
    results.powers        = Powers()
    results.powers.square = x**2
    results.powers.cube   = x**3
    
    results.odd_multiples = OddMultiples()
    for m in xrange(1,12,2):
        results.odd_multiples[m] = x*m

    return results

answer_three = my_complicated_function(3)
answer_four  = my_complicated_function(4)

print(answer_three)
print(answer_four)
print(answer_four.powers.cube)

Note the results of a simple print are less informative (we would need to write a [`__str__`](https://docs.python.org/2/reference/datamodel.html#object.__str__) function for the class) and the most natural way to express what's happening with `results.odd_multiples` is to use a dict anyway (here expressed by a class that inherits from `dict`, which is needlessly complicated).

However, for lots of more complicated cases, creating classes with methods will be much more useful. The point is that `dict` can be used in a lot of cases where other languages would force you to write your own class (or use something like C's `struct`).

It's good to know about some fundamental uses of dicts in Python. To start with, most objects have an internal dictionary called `__dict__`. You're not supposed to access this directly, but you can do if you want, and sometimes it's useful. The existence of easy access to the internal representation of objects like this is the basis (for example) of things like tab-completion in ipython.

In [None]:
class MyClass(object):
    """
    The docstring of the class
    """
    def __init__(self):
        self.alpha = 1
        self.beta  = 2
    def summation(self):
        """
        A docstring for the summation function
        """
        return self.alpha + self.beta
    
x = MyClass()

print("The object's dictionary:") 
print(x.__dict__)
print('')
print("The class's dictionary:")
print(x.__class__.__dict__)
print('')
print("The class's docstring:")
print(x.__class__.__dict__['__doc__'])

Further reading:
    - The internal dictionary __dict__
    - collections.OrderedDict
    - the other items in the collections standard library module
    - the struct module
    - named tuples

# 4. Program structure

The above is a summary of the basic elements of a Python program. Now we'll look at the structure of complete programs.

In [None]:
# Import modules
import sys
import math

# Import specific functions from moduels
from __future__ import division
from os import getcwd

# Define some 'global' variables
Y = 10.0
Z = 'The Letter Z'

def complicated_function(x,*args,**kwargs):
    """
    This docstring should explain the function, 
    but I'm too lazy to write it.
    """
    Z = 100.0
    r = math.log(x*Y/Z)
    return r

## 4.1 Modules, packages and namespaces

Modules are sets of variables, classes and functions that can be included in other Python scripts with the `import` statement. Packages are collections of related modules. We've already seen several examples, like `os`, `sys` and `math`, which are part of the Python standard library.

Modules define a **namespace**, so you can use the name of the module followed by a `.` to distinguish functions and variables defined in the scope of the module from variables (possibly with the same name) defined in other scopes. For example:

In [None]:
pi = 3.0

import math
print('In the local namespace, pi=%f'%(pi))
print('In the math module namespace, pi=%f'%(math.pi))
print('pi == math.pi? %s'%(pi==math.pi))

You can choose to import specific items from a module into the local namespace using `from x import y`:

In [None]:
from math import log10
log10 == math.log10
log10(2.0)

This has a small speed advantage (it imports faster, especially if the module is huge, and executes very slightly faster).

You can also rename modules and things you import from modules. This is useful if the module has a long name, or you want to remember which module a particular function comes from or avoid conflicts.

In [None]:
import math as m
m.log10(2)

sqrt = lambda x: 'S.Q.R.T.: %s'%(x)

from math import sqrt as math_sqrt

print(sqrt(2))
print(math_sqrt(2))


You *can* also do this, but usually **you should not do this**:

In [None]:
from math import *  # Imports _everything_ from the math module into the local namespace.
sqrt(2)

Making effective use of namespaces is one reason to avoid using `from module_x import *` (unless you have a very good reason). It makes it much easier to see where functions are coming from, and it avoid accidentally re-assigning (or unknowingly defining) local variables with whatever happens to be in the module you're importing. 

In [None]:
pi = 3.0

from math import *
print('In the local namespace, pi=%f'%(pi))
print('In the math module namespace, pi=%f'%(math.pi))
print('pi == math.pi? %s'%(pi==math.pi))


Packages can organize related modules. For example, the `os` package has the module `path`:

In [None]:
import os
print(os.path.join('hello','world'))

import os.path as op
print(op.join('hello','world'))

To make a package, you just need to put a `.py` file with some code in it (the module) in a directory with the name you want for the package, and **write an empty file called `__init__.py` in that directory**.

In [None]:
# Write the module using the shell
!mkdir mymodule
!rm mymodule/*.py*
!echo
!echo 'def myfunc():\n    print("This is a function in my module!")' >> mymodule/myroutines.py
!cat mymodule/myroutines.py
!touch mymodule/__init__.py
!echo
!tree ./mymodule
!echo

# Now some python:
import mymodule.myroutines as my
my.myfunc()

You can read about what the `__init__.py` file does, and more about how to make packages [here](https://docs.python.org/3/tutorial/modules.html#packages).

Python has a list of directories that it searches (in order) for modules when you `import something`. This list is stored as `sys.path`:

In [None]:
sys.path

It's not always easy to work out where all these entries come from, but that doesn't matter. What matters is being able to tell python where to find packages that you have installed (which are usually grouped in a small number of places) and packages/scripts you have written yourself (which could be anywhere).

The environment variable `PYTHONPATH` works in the same way as the shell's `PATH`, but for python files. Like path, it's supposed to be set by the user, so by default it's empty. Any directory in `PYTHONPATH` (entries are separated by '`;`') will end up in `sys.path`, with higher priority than all the Python default search paths.

To make good use of this, it's a good idea to keep your own modules (i.e. code that you want to include in other projects with `import`) under a directory like `~/projects/code/python`. For example, you might have two modules like  `~/projects/code/python/plot_tools` and `~/projects/code/python/astro_routines`. If you set `PYTHONPATH` (for example, in `~/.profile`) with something like
```bash
export PYTHONPATH=~/projects/code/python
```

then you will be able to
```python
import plot_tools
import astro_routines
```

in any Python script (provided those directories are marked as Python modules by including an `__init__.py` file).

Tools like `pip` and `conda` have their own way of getting the places where they install files into `sys.path`.

The current working directory is always the first place python will look, even if it's not on `sys.path`.

`sys.path` is a list that is made when your script runs, so if all else fails (or you just want a quick hack) you can modify it directly.

In [None]:
sys.path.append('/my/custom/path')
sys.path

## 4.2 Scope

The value associated with a variable name at any specific point in the code depends on where that name was last defined -- in jargon, it depends on the scope. The 'main' (outer) block of each module/script defines a scope for that module, and any functions and classes define a new scope.

In [None]:
y = 1
x = 10 # This 'x' ...
def a_function(z):
    x = 1000  # ...is in a different scope to this x 
    return z + x
print(x)

The most confusing cases occur within class definitions, lambdas and nested functions, otherwise it's straightforward. There is a discussion of the general rules of Python scope in [this StackOverflow post](http://stackoverflow.com/questions/291978/short-description-of-scoping-rules). The reason to know about this is that there are some traps:

In [None]:
# Before you run this, think about what you expect it to print
x = 100
y = 0
for x in range(0,10):
    y = y + x
print(y)
print(x)

Watch out for this. The result will be different in Python 3, where loop variables also define a separate scope.

If a variable is used that isn't defined in the current scope, Python will look for it in a higher-level scope:

In [None]:
x = 10
def a_function(z):
    return z+x
print(a_function(10))
print(x)

In [None]:
x = list()
def a_new_function(z):
    return x.append(z)

a_new_function(10)
print(x)
a_new_function(20)
print(x)
a_new_function(30)
print(x)

You can explicitly force variables to be global like this:

In [None]:
x = 10
def a_function(z):
    global x
    return z+x

print(a_function(100))
print(x)
    

What happens if you change the function argument from `z` to `x` in the `def` statement in the example above?

`import` statements can go anywhere in python files, but they're usually grouped together at the top. The scope of names of imported modules/functions/variable names depends on where the import statement appears.

In [None]:
import math as m

def matrix_loop():
    for n in xrange(0,5):
        for m in xrange(0,5):
            print('%2d'%(m)),
        print('')
    print('Inside matrix_loop, m is: %s'%(type(m)))
    return

matrix_loop()
print('In the toplevel scope, m is: %s'%(type(m)))


## 4.3 `globals()` and `locals()`

These a builtin functions that are occassionaly useful. They return dictionaries, the keys of which are all the varaibles defined in the global (top level) and local scope, respectively.

In [None]:
x = 5 
print(globals()['x'])
print(locals()['x'])

def a_function(x):
    print('Inside a_function: %d and %d'%(globals()['x'],locals()['x']))
    
a_function(42)
print(globals()['x'])
print(locals()['x'])

Sometimes I've used this as a way to call particular functions chosen at runtime -- for example, depending on some input from the user, or a command line parameter. You could do the same thing with `if` statements, as long as you know the name of each function when you write the program...

In [None]:
def function_a(): print('Function A!')
def function_b(): print('Function B!')
def function_c(): print('Function C!')
    
def pick_a_function(letter):
    function = 'function_%s'%(letter.lower())
    if function in globals():
        globals()[function]() # Note the brackets here
    else:
        print('No such function: %s'%(letter))

for x in 'abcde':
    pick_a_function(x)

## 4.4 Memory, copies and references to variables

In languages like C, you have to know where every bit of data actually 'lives' in memory. In those languages there is a very clear difference between the **value** of a variable and a **reference** (or pointer) to the location in memory where that value is stored. That difference is mostly hidden from you in Python. This is only really important when working with big arrays of data (this will be covered next week), but it might help to know some basics in standard Python first.

The `id()` built-in function gives you a number that refers to the location in memory associated with a Python variable (it's just an arbitrary number, not a memory address or anything like that).

In [None]:
a = 1
b = 1
print(a,b)
print('ID of a: %d'%id(a))
print('ID of b: %d'%id(b))

In [None]:
b = b + 1
print(a,b)
print('ID of a: %d'%id(a))
print('ID of b: %d'%id(b))

The distinction between references and copies is most relevant to lists and similar objects.

In [None]:
a = [1,2,3,4]
b = a
b[2] = 'Something else!'
print(a)
print(b)

In some cases, including `list`s it's possible (but in principle a waste of time and memory) to make a copy by creating a new object initialized with the values of the old one:

In [None]:
a = [1,2,3,4]
b = list(a)
b[2] = 'Something else!'
print(a)
print(b)

The most generally safe way to copy values, rather than assign multiple references to the same thing, is using the `copy` function in the `copy` standard library module.

In [None]:
import copy
a = [1,2,3,4]
b = copy.copy(a)
b[2] = 'Something else!'
print(a)
print(b)

In C or Fortran, you also have to specifically create and delete (allocate and deallocated) blocks in memory to store the variables you want. In Python, this is managed invisibly in the background by part of the interpreter called the 'garbage collector' (GC). This keeps track of all the allocated memory and periodically frees any that it's sure is not going to be used again. More-or-less, it checks that the are no references to the variable in scope (i.e. the memory used by the variables defined inside a function is available to be freed once the function has returned).

You almost never need to care about the GC, although there are ways to interact with it. The only one you're likely to see is the built-in function `del()`, which explicitly deletes any references to the variable (i.e. marks it as out of scope after that point, rather than, say, at the end of the block of code that defines the variable.

This has the effect of helping the GC to clear up the memory associated with the deleted variable **as soon as possible**, rather than at some unspecified time in the future. `del` doesn't actually free any the memory itself. If there are multiple references to the same bit of memory, `del`'ing only one of them will have no effect.

This is sometimes a useful optimization when you're doing something that takes a huge chunk of memory on a machine that doesn't have much. For example, loading huge images in three different bands and processing each one in sequence in the same block of code. There is no point using `del` except for such cases, because the GC is usually very reliable enough in keeping the memory usage of your code down.

In [None]:
x = range(0,20)
print('x is:',x)
del(x)
print('y is:', y)
print('x is:', y,x)

In [None]:
x = range(0,20)
y = x # y and x now refer to the same value
print("""I can be sure y and x refer to the same thing, 
because their Python object ids are the same: 
%d and %d.
"""%(id(y),id(x)))

del(x)
print("y is still defined after x is deleted:")
print(y) # y is still there, so the memory is not going to be freed yet

Note that `del(x)` and `x = None` are not the same thing. Setting a variable to `None` is not the same as deleting it.

In [None]:
x = None
id(x)

----
*Further reading:*
    - the gc module and gc.collect() function

# 5. Getting things done with Python

## 5.1 Exceptions and debugging

Above we saw errors result in things like this:

In [None]:
print(1/0)

This `ZeroDivisionError` is an example of an `Exception`. Rather than crashing the program, exceptions can be 'trapped' and handled as a special case using a `try...except` block. You probably won't need to do this very often unless you're writing code that will be published and used by others, but it's worth knowing that it exists. 

In [None]:
x = 1
y = 0

try:
    answer = x/y
except ZeroDivisionError as e:
    print('Python complained about %s, but keep going...'%(e.args[0]))
    answer = 0.0
    
print(answer)

This is useful when working with files and connections to databases, which we'll look at next week.

IPython has a built-in debugger. After an exception is triggered, you can type %debug at the next prompt (don't do anything else in between!) and you will be taken to a command prompt called `ipdb`. This is more limited than the regular ipython prompt, but it has some special, one letter commands that allow you to inspect the state of the whole program frozen at the point of failure. 

If you've never used an interactive debugger before, the learning curve for `ipdb` will be steep, but the investment will be worth it, especially if you write a lot of Python code. Many people figure out why programs crash by writing hundreds of print statements everywhere -- the debugger is usually a much more efficient way. 

If you want to try the debugger now, execute this series of commands in a separate IPython session, then type %debug when the exception occurs.

```python
for i,x in enumerate(xrange(10.0,0,-1)):
    y = 1/(x-2)
```

- Type `print i` to see the value of `i`
- Type 'help pdb' to get help using the debugger.
- Type 'q' or 'exit' to get out of the debugger.

----
*Further reading:*
- the `finally` statement
- the `assert` statement

## 5.2 File input and output

Writing simple text files in Python is easy:

In [None]:
f = open('myfile.txt','w') # To write to a file; f is the 'handle' of the open file
f.write('Line 1\n') # Note the explcit newline \n
f.write('Line 2\n')
f.write('Line 3\n')
f.close() # close the file

In [None]:
!cat myfile.txt

The '`w`' above is the 'mode' used to access the file. To read a file, use the 'r' mode:

In [None]:
f = open('myfile.txt','r') # To read from a file
for line in f.readlines():
    print(line)
f.close()

Why are there now gaps between each line?

In [None]:
f = open('myfile.txt','r')
for line in f.readlines():
    print(line.strip())
f.close()

The `with` syntax is alternative way to access files, that doesn't need the explicit 'close' statement. This has more uses than just accessing files, but that's the most common.

In [None]:
with open('myfile.txt','r') as f:
    for line in f.readlines():
        print(line.strip())

How is `file.read()` different from `file.readline()`?

The `str.split()` function is the simplest way to chop up lines of ascii when reading tables.

In [None]:
data  = list()
with open('myfile.txt','r') as f:
    for line in f.readlines():
        data.append(float(line.split()[1]))

data

If you had a large complicated table, trying to read it this way would get very messy. There is a better way (`numpy.loadtxt`) which we'll look at below.

The above deals with ascii text files. This is fine for small amounts of human-readable data. You can also use the same method to store binary data, but that's too complicated to cover here.

The `pickle` module provides an easier way of 'saving' almost any python object in a binary format that python itself can read back in. Operations using `pickle` still require you to handle the opening and closing of the file.

In [None]:
my_list = [1, 'alpha', 2, 'beta', ['and','a','nested','list']]

import pickle

# Save the list
with open('my_pickled_list.pkl','w') as f:
    pickle.dump(my_list,f)
    
# Load the list
with open('my_pickled_list.pkl','r') as f:
    a_new_list = pickle.load(f)

a_new_list

`cPickle` is a much faster version of `pickle` for very large objects, but it has more limitations on what it can save. Don't expect that data saved with `pickle` or `cPickle` can be read on another machine or with a different version of python -- this may or may not be the case. This is mainly intended for short-term storage.

In the next session we'll look at reading and writing two common file formats for scientific data, FITS and HDF5. We'll also look at how to read 'fortran formatted' binary files.

Numpy arrays can be saved with `numpy.savez` and `numpy.load`. Also `numpy` has routines for reading in ascii data tables. We'll look at this below.

----
*Further reading:*
- [configparser](https://pymotw.com/2/ConfigParser/)
- [json](https://docs.python.org/2/library/json.html)
- [yaml](https://en.wikipedia.org/wiki/YAML)

## 5.3 Working with the filesystem

In [None]:
import os
os.getcwd()

In [None]:
os.path.abspath('../../')

In [None]:
os.path.split(os.getcwd())

In [None]:
os.path.join('first_element','second_element','third_element')

In [None]:
if os.path.exists('./new_dir'): # check if directory exists
    os.rmdir('./new_dir') # if so, remove it
    
os.makedirs('./new_dir') # make a new directory

In [None]:
os.makedirs('./new_dir')

In [None]:
try:
    os.makedirs('./new_dir')
except OSError:
    print('Directory exists')

--- 
Further reading:
- next week we'll look at `os.subprocess`, which is a way to run external programs from inside python.
- we'll also look at `glob`, which is an extremely useful way to find all the files matching a particular pattern.

## 5.4 Working with numerical data using numpy

Python lists can hold any mix of data types and you don't have to specify how much memory they contain when you create them -- they can keep growing automatically. However, both these advantages make them extremely slow and memory-inefficient.

The numpy package is fundamental to serious scientific computing with Python. To quote directly from the [numpy website](http://www.numpy.org/), numpy adds:

> - a powerful N-dimensional array object
> - sophisticated (broadcasting) functions
> - tools for integrating C/C++ and Fortran code
> - useful linear algebra, Fourier transform, and random number capabilities

> Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

Numpy arrays allow vectorized operations on homogeneous chunks of memory of fixed size, representing many elements of data of a single uniform type (integers or floats or whatever). Many of the functions in numpy are written directly in C or Fortran and take advantage of industry-standard libaries like [`LAPACK`](https://en.wikipedia.org/wiki/LAPACK).

There's an official [numpy tutorial](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html) -- it would be pointless to repeat all of that here. We'll look more at how to use `numpy` next week. For now, the following basics should be understandable with some experimentation

'Vectorized' means that if you're using Python loops to operate on `numpy` arrays, you're *probably* doing it wrong.

Scipy is a library of more complex routines that build on numpy. We'll look at a few Scipy routines next week.

In [None]:
import numpy as np

### 5.4.1 Creating arrays

Numpy arrays can only hold data of a single type, called the `dtype` of the array.

In [None]:
x = np.zeros(10,dtype=np.int)
y = np.zeros(10,dtype=np.float64)
print(x)
print(y)

In [None]:
x = np.empty(10,dtype=np.int32)
print(x)

In [None]:
x = np.ones(10)
y = np.repeat(2,10)
print(x)
print x.dtype
print(y)
print y.dtype

`arange`, `linspace` and `logspace` create 1-dimensional sequences.

In [None]:
x = np.arange(0,10)
print(x)
y = np.arange(100,200,10)
print(y)

In [None]:
x = np.linspace(100,200,10)
print(x)

In [None]:
y = np.logspace(-1,3,20)
print(y)

String `dtype`s have a fixed length. 

In [None]:
x = np.repeat('hello',10)
print(x)
print('The dtype of this string array is %s'%(x.dtype))
x[3] = 'longstring'
print(x)

In [None]:
x = np.array([1,2,3,4])
print(x)
print(x.dtype)

In [None]:
x = np.array([1,2,3,4],dtype=np.float32)
print(x)
print(x.dtype)

In [None]:
x = np.array([1,2,3.0,4,'hello'])
print(x)
print(x.dtype)

The most generic `dtype` is `np.object`; this can represent anything (it stores references to objects rather than the objects themselves) but on the downside is useless for mathematical operations.

In [None]:
a = lambda y: y+1
b = lambda y: y+2

x = np.array([a,b])
print(x)
print(x.dtype)

y = np.array([1,2],dtype=np.object)
np.log10(y)

The shape of arrays is easy to find:

In [None]:
x = np.zeros((4,3))
print(x)
print(x.shape)

### 5.4.2 Slicing

In [None]:
x = np.ones((5,5))
x[1:3,1:3] = 0
print(x)

### 5.4.3 Arithmetic

In [None]:
x = np.ones((4,4)) + 2
print(x)

In [None]:
x = np.ones((4,4))*4
print(x)

In [None]:
10*(-0.4*np.log10(x))

### 5.4.4 Saving numpy arrays to disk

In [None]:
x = np.ones((10,10))
np.savez('my_numpy_array.npz',x)

# This is more complicated than you might expect. Read the documentation for savez!
y = np.load('my_numpy_array.npz')
print(y.keys())
print(y['arr_0'])

### 5.4.5 Random numbers

In [None]:
np.random.seed(42) # Sets the seed of the random number generator
np.random.random((3,4)) # Generates an array of random floats

There are other routines in `np.random` to draw randomly from non-uniform distributions.

### 5.4.6 Histograms

Group values in an array into bins. Extremely useful.

In [None]:
x = np.random.normal(size=100) # What does this do?
counts, bins = np.histogram(x,bins=np.arange(-3,3,0.1))
print(counts)

### 5.4.7 Statistics, logic and `where`

In [None]:
counts.max(), counts.min()

In [None]:
np.median(counts), np.mean(counts), np.std(counts)

In [None]:
counts > 3

In [None]:
np.sum(counts >  3) # What's happening here?

What the indices of the elements of counts that are greater than 3? To find these, use `where`. This is probably one of the most important functions in all of `numpy`.

In [None]:
w = np.where(counts > 3)
w

In [None]:
len(w) # This should be the same as np.sum(counts>3), right?

What's going on here? Let's try another way...

In [None]:
w.shape

It seems `np.where` is returning a *tuple* -- perhaps we didn't notice the ',)' on the end when we printed the value of `w` above (well done if you did...).

In [None]:
print(len(w[0]))
print(w[0])

This is a quirk of the `where` function. Remember it!

`w` can be used as an index on `counts` (we don't need to worry that it's a tuple for this purpose)

In [None]:
counts[w] # This still works fine 
counts[w[0]] # This is the same thing, for indexing purposes

### 5.4.8 Reading data from a file with `np.loadtxt`

In [None]:
# Here's a typical ascii table (complete with poor formatting)
s = """# Header
# Mass Radius Flux
100.0, 30.0, 1e8
34.0, 19.5, 1e4
204.0, 18.4 , 3.4e5
40.0, 7.0, 5.4e6
"""

# Save it to a file
with open('my_table.dat','w') as f:
    f.write(s)
    
# We have to specify how the columns are separated
# (in this case, with a ',')

data = np.loadtxt('my_table.dat',delimiter=',')
print(data)

We can also 'unpack' the columns.

In [None]:
mass, radius, flux = np.loadtxt('my_table.dat',delimiter=',',unpack=True)
print(mass)
print(radius)
print(flux)

Read the documentation for `loadtxt`, in particular paying attention to the options:
- `usecols`
- `skiprows`
- `dtype`

How could you use `loadtxt` to read the following file?

In [None]:
# First execute this cell to write the file
s = """Header
Mass Radius Flux
A, 100.0, 30.0, 1e8
B, 34.0, 19.5, 1e4
C, 204.0, 18.4 , 3.4e5
D, 40.0, 7.0, 5.4e6
"""
with open('my_table_2.dat','w') as f:
    f.write(s)

In [None]:
# Try to read it here...

See also `numpy.genfromtext`, which offers more advanced options.

----
*Further reading*
- next week we'll cover more details of `numpy`, and the `astropy` package which contains even more useful ways to read and write data files.

## 5.5 Making plots with matplotlib

`matplotlib` is a comprehensive package for making plots with Python. 

`matplotlib` can seem complicated and sometimes frustrating, in part because it started life as an 'emulator' for the functionality of MatLab, in part because it tries to do everything. 

Some people really don't like it and there are a number of new alternatives. This tutorial is based on `matplotlib` only because I don't use those alternatives, and because `matplotlib` is still overwhelmingly more common. 

Here are some other tutorials about `matplotlib`:

[anaconda.org](https://anaconda.org/ijstokes/16-visualization-matplotlib/notebook)

It's also worth looking at the gallery of examples [here](http://matplotlib.org/gallery.html).

Frustratingly there are some small differences between making plots in a jupyter notebook, making them in an ipython terminal, and making them in by running python code from a file. Watch out for these below. More about plots in jupyter notebooks [here](http://nbviewer.jupyter.org/github/ipython/ipython/blob/1.x/examples/notebooks/Part%203%20-%20Plotting%20with%20Matplotlib.ipynb).

In [None]:
import matplotlib

# Because we're working in a Jupyter notebook, we have to do this, which we wouldn't have 
# to do in IPython or when running from a file
%matplotlib inline

Importing the `matplotlib` module on its own is not very useful. There is a submodule called `pyplot` that includes simple wrappers that make most of the standard types of plots. IPython can be started with the option --pylab that (among other things) automatically imports the functions from this module into the top level interactive scope (i.e. the same as typing `from matplotlib pyplot import *` at the prompt), but in non-interactive scripts it's clearer to import it with a shortened name, like this: 

In [None]:
import matplotlib.pyplot as pl # Ignore any warnings about the font cache

Just make a very simple line plot as quickly as possible:

In [None]:
pl.plot([1,2,3,4,5]);

Now let's do the same thing separating the key elements as we would in a more complicated case. I've represented this as several stages so you can see what's happening with each step.

In [None]:
# Figures are the highest-level matplotlib object you need to worry about.
f_a    = pl.figure() # This creates a new figure.
print('Created a figure with number %d'%(f_a.number)) # Each figure has a unique number

data = [1,2,3,4,5]
pl.plot(data) # This draws a set of axes into the current figure

ax = pl.gca() # This gets the current axis from the figure.

Resize the figure:

In [None]:
f_a    = pl.figure() 
pl.plot(data)
f_a.set_size_inches(2.5,2.5) # Yes, inches...

Plot two lines on the same figure.

In [None]:
f_a    = pl.figure(figsize=(2.5,2.5)) 
pl.plot(data)
pl.plot(data[::-1],c='r',linestyle='--') # This draws another line in the same axes.

Add axis labels.

In [None]:
f_a    = pl.figure(figsize=(2.5,2.5)) 
pl.plot(data)
pl.plot(data[::-1],c='r',linestyle='--')

# Note that these are writen in LaTeX math mode, using $$! (they don't have to be, but it looks nicer)
pl.xlabel('$r$ $\mathrm{[kpc]}$',fontsize=12)
pl.ylabel('$\log_{10}\;M(<r)/M_{\odot}$',fontsize=12);

Take away the tick labels at the corners:

In [None]:
f_a    = pl.figure(figsize=(2.5,2.5)) 
pl.plot(data)
pl.plot(data[::-1],c='r',linestyle='--')
pl.xlabel('$r$ $\mathrm{[kpc]}$',fontsize=12)
pl.ylabel('$\log_{10}\;M(<r)/M_{\odot}$',fontsize=12)

ax = pl.gca() # gets the current axes
for i in [0,-1]:
    pl.setp(ax.get_xticklabels()[i],visible=False)
    pl.setp(ax.get_yticklabels()[i],visible=False)

Make the tick labels smaller using `fontsize`

In [None]:
f_a    = pl.figure(figsize=(2.5,2.5)) 
pl.plot(data)
pl.plot(data[::-1],c='r',linestyle='--')
pl.xlabel('$r$ $\mathrm{[kpc]}$',fontsize=12)
pl.ylabel('$\log_{10}\;M(<r)/M_{\odot}$',fontsize=12)
ax = pl.gca()
for i in [0,-1]:
    pl.setp(ax.get_xticklabels()[i],visible=False)
    pl.setp(ax.get_yticklabels()[i],visible=False)
    
pl.setp(ax.get_xticklabels(),fontsize=8); # suppress output
pl.setp(ax.get_yticklabels(),fontsize=8); # suppress output

Use LaTeX to render all the text on the plot, not just the math mode expressions we used for the axis labels:

In [None]:
# This may take a longer time to run because latex is being run in the background!
from matplotlib import rc
rc('text', usetex=True)

f_a    = pl.figure(figsize=(2.5,2.5)) 
pl.plot(data)
pl.plot(data[::-1],c='r',linestyle='--')
pl.xlabel('$r$ $\mathrm{[kpc]}$',fontsize=12)
pl.ylabel('$\log_{10}\;M(<r)/M_{\odot}$',fontsize=12)
ax = pl.gca()
for i in [0,-1]:
    pl.setp(ax.get_xticklabels()[i],visible=False)
    pl.setp(ax.get_yticklabels()[i],visible=False)
    
pl.setp(ax.get_xticklabels(),fontsize=8); # suppress output
pl.setp(ax.get_yticklabels(),fontsize=8); # suppress output

A more complete example, this time of a scatter plot. You should be able to figure out what's happening yourself using the docstrings for the routines involved, and a bit of googling.


In [None]:
# Here are some random data for two variables. Read the docs to figure out how they're generated.
x = np.random.normal(loc=50,scale=20,size=1000)
y = np.random.normal(loc=x+5,scale=10,size=1000)

In [None]:
f_b = pl.figure(figsize=(2.5,2.5))
ax = pl.gca()
ax.scatter(x,y,s=1,edgecolor='None',c='purple',label='$z=0$')
ax.set_xlim(0,100)
ax.set_ylim(0,100)
ax.axhline(50.0,linestyle='--',c='grey',lw=0.5,zorder=-10) # What do these lines do?
ax.axvline(50.0,linestyle='--',c='grey',lw=0.5,zorder=-10)
for i in [0,-1]:
    pl.setp(ax.get_xticklabels()[i],visible=False)
    pl.setp(ax.get_yticklabels()[i],visible=False)
ax.set_xlabel(r'$\alpha$') # What happens without the r at the start here?
ax.set_ylabel(r'$\delta$')

pl.legend(loc='upper left',fontsize=8, frameon=False,scatterpoints=3,markerscale=2);

To save the figure, use `savefig`:

In [None]:
f_b = pl.figure(figsize=(2.5,2.5))
ax  = pl.subplot(1,1,1)
ax.scatter(x,y,s=1,edgecolor='None',c='purple',label='$z=0$')
ax.set_xlim(0,100)
ax.set_ylim(0,100)
ax.axhline(50.0,linestyle='--',c='grey',lw=0.5,zorder=-10) # What do these lines do?
ax.axvline(50.0,linestyle='--',c='grey',lw=0.5,zorder=-10)
for i in [0,-1]:
    pl.setp(ax.get_xticklabels()[i],visible=False)
    pl.setp(ax.get_yticklabels()[i],visible=False)
ax.set_xlabel(r'$\alpha$') # What happens without the r at the start here?
ax.set_ylabel(r'$\delta$')

pl.legend(loc='upper left',fontsize=8, frameon=False,scatterpoints=3,markerscale=2)

# Save the figure as a png. To save in a differnt format, e.g. pdf, just change the extension!
pl.savefig('my_scatter_plot.png',bbox_inches='tight',pad_inches=None)
# Close the figure
f.close();

In a normal IPython you can also make plots interactively (one line at a time, rather than having to write out code for the whole plot each time as in the examples above). For this to work in a natural way, the easiest thing to do is to start IPython like this:
```
> ipython --pylab
```
With this option, all the commands in pyplot will be imported as if you'd written `from matplotlib.pyplot import *`, and the interactive plot windows will work as you might expect. So you can type:
```
In[ ] : plot([1,2,3,4])
```
and a window will pop up with the plot in it.

------
*Further reading:*
    - matplotlig.pyplot.savefig
    - matplotlib.pyplot.imshow
    - matplotlib.pyplot.hist
    - matplotlib.pyplot.text

# 6. Homework challenge

Here is a data table, which we write to a file.

In [None]:
data_file = """# Name  X       Y       counts     sigma (pixels)
star    40      38      1e3         2
star    50      80      1e4         2
star    13.5    70      2e4         4
star    80      22.3    5e5         3
"""
with open('challenge_data.txt','w') as f:
    f.write(data_file)

!cat challenge_data.txt

Each row in the file describes the image of a 'star' on a CCD with (NX,NY) = 100x100 pixels. These images are formed by counting individual photons, given by the 'counts' column (by the way, there is no optics, physics or real photon statistics in this problem at all). The probability distribution for each source is described by a Gaussian kernel around the point X,Y (in pixel coordinates, with 0,0 at the bottom left) with a dispersion given by the `sigma` column, in units of the CCD pixels.

As well as these star images, the image should have a background of Poisson noise with mean 10 counts/pixel.

Write a program, `challenge.py`, that reads this table from the file, reports the total counts across the whole CCD and maximum count in a single pixel, and produces the following image of the CCD with the four stars on it, in `.png` format:

```bash
> python challenge.py challenge_data.txt
```

```
Total counts:  6.311420e+05
Maximum count: 8.714000e+03
Wrote output to: /Users/andrew/python/tutorial2016/python-durham2017/examples/challenge.png
```

<img style="float: left;" src="challenge.png">

You will probably need the following:

- `numpy.histogram2d`
- `matplotlib.pyplot.imshow`
- A function (from `numpy`) to draw randomly from a 2D Gaussian distribution.
- A similar function to draw from a Poisson distribution.
- A `colorbar` (note the size of the colorbar relative to the plot in the example)

The *colormap* is **viridis**. What is special about this colormap?

The image should look the same if you make it twice.

The orientation of the CCD might prove tricky. Read the documentation of `imshow` carefully.

