# Introduction to Jupyter Notebooks

The [Jupyter Notebook](http://jupyter.org/) is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text.

IM&T Scientific Computing makes it easy to start, stop, and review your remote sessions using the [SC Launcher](https://wiki.csiro.au/display/ASC/Using+the+SC+Launcher).

Specifically, the SC Launcher provides assistance in [managing Jupyter Notebook sessions](https://wiki.csiro.au/display/ASC/Using+IPython+Notebooks).

**Table of Contents**

 * [Introduction to Jupyter Notebooks](#Introduction-to-Jupyter-Notebooks)

 * [Why Notebooks?](#Why-Notebooks?)

 * [2-Minute Tour of the Notebook](#2-Minute-Tour-of-the-Notebook)

   * [Code Cells](#Code-Cells)

   * [Markdown cells](#Markdown-cells)

   * [The Notebook Format](#The-Notebook-Format)

   * [Navigation](#Navigation)

 * [Introduction to Python](#Introduction-to-Python)

   * [Print](#Print)

   * [Comments](#Comments)

     * [Single Line Comments](#Single-Line-Comments)

     * [Multiple Line Comments](#Multiple-Line-Comments)

   * [Indentation](#Indentation)

   * [Variables and their Type](#Variables-and-their-Type)

     * [Numbers](#Numbers)

     * [Strings](#Strings)

     * [Lists](#Lists)

     * [Packing and Unpacking](#Packing-and-Unpacking)

   * [Operators](#Operators)

     * [Arithmetic Operators](#Arithmetic-Operators)

     * [String Operators](#String-Operators)

     * [List Operators](#List-Operators)

   * [String Formatting](#String-Formatting)

   * [Conditions](#Conditions)

     * [Boolean - True or False](#Boolean---True-or-False)

     * [Boolean operators](#Boolean-operators)

     * [The in operator](#The-in-operator)

     * [The is operator](#The-is-operator)

     * [The not operator](#The-not-operator)

   * [Looping Constructs](#Looping-Constructs)

     * [The for loop](#The-for-loop)

     * [The while loop](#The-while-loop)

     * [break and continue statements](#break-and-continue-statements)

   * [Functions](#Functions)

     * [What are Functions?](#What-are-Functions?)

     * [How do you write a function?](#How-do-you-write-a-function?)

     * [How do you call a function?](#How-do-you-call-a-function?)

     * [Optional Arguments](#Optional-Arguments)

   * [Classes and Objects](#Classes-and-Objects)

     * [Accessing Object Variables](#Accessing-Object-Variables)

     * [Accessing Object Functions](#Accessing-Object-Functions)

     * [Self](#Self)

     * [Constructors](#Constructors)

   * [Dictionaries](#Dictionaries)

     * [Iterating over dictionaries](#Iterating-over-dictionaries)

     * [Removing a dictionary value](#Removing-a-dictionary-value)

   * [Modules and Packages](#Modules-and-Packages)

     * [Exploring built-in modules](#Exploring-built-in-modules)

   * [List Comprehensions](#List-Comprehensions)

   * [Reading and Writing Files](#Reading-and-Writing-Files)

     * [Opening Files](#Opening-Files)

     * [Writing to files](#Writing-to-files)

     * [Closing Files](#Closing-Files)

     * [Reading from files](#Reading-from-files)

     * [With](#With)

   * [Downloading Files](#Downloading-Files)

   * [Working with CSV Files](#Working-with-CSV-Files)

     * [Pandas for CSV](#Pandas-for-CSV)

   * [Debugging in the Notebook](#Debugging-in-the-Notebook)

   * [Python IDEs](#Python-IDEs)

   * [Python 2 or 3?](#Python-2-or-3?)

 * [Final Thoughts](#Final-Thoughts)



# Why Notebooks?

In the 1980s, Donald Knuth proposed [Literate programming](http://en.wikipedia.org/wiki/Literate_programming), the objective of which was to combine human and machine languages in the same document (program) more naturally.

IPython and other notebook formats might be better considered examples of **literate computing**. They combine natural languge and visualisations with the live computation to provide insight and explanation.

# 2-Minute Tour of the Notebook

You can edit this notebook in-place, so lets start by making a backup!

<span style="color:green">In the menu select:

     File -> Make A Copy
</span>     

It should open a new browser tab in which you can safely experiment on the duplicated notebook.

Notebooks are made up of *cells* such as:

- code cells
- markdown cells
- header cells

## Code Cells

Generally speaking code cells are written in Python, however that need not be the case - there are [many projects](https://github.com/ipython/ipython/wiki/Projects-using-IPython) using the IPython framework with alternative language backends.


<span style="color:green">Run any code cell on a page by first selecting it and then using `shift-enter` or pressing the <button><i class="fa-step-forward fa"></i></button> button in the toolbar above:</span>

In [None]:
a = 10
print(a)

There are two other keyboard shortcuts for running code:

* `Alt-Enter` runs the current cell and inserts a new one below.
* `Ctrl-Enter` run the current cell and enters command mode.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
plt.plot([5,1,5,1,5]);

Notice that images produced are displayed inline, supported representations include:

* HTML
* JSON
* PNG
* JPEG
* SVG
* LaTeX

In [None]:
from IPython.display import Image
Image('http://www.scienceimage.csiro.au/images/embed/300_0_DA3675.jpg')

Code is run in a separate process called the IPython Kernel.  The Kernel can be interrupted or restarted.
While a cell is running (marked with a \*) you can press the <button><i class="fa-stop fa"></i></button>  button in the toolbar above. This is roughly equivalent to pressing *ctrl-c* in a Python session.

Try running the following cell and then hit the <button><i class='fa-stop fa'></i></button> button.

In [None]:
import time
time.sleep(10)

## Markdown cells

You can make text *italic* or **bold**.

You can build nested itemized or enumerated lists:

* One
    - Sublist
        - This
  - Sublist
        - That
        - The other thing
* Two
  - Sublist

You can embed images:
![Bat?](http://www.scienceimage.csiro.au/images/embed/300_0_DA3675.jpg "Inline Image")

Courtesy of [MathJax](https://www.mathjax.org/), you can include mathematical expressions using $\LaTeX$ both inline: 
$e^{i\pi} + 1 = 0$  and displayed:

$$e^x=\sum_{i=0}^\infty \frac{1}{i!}x^i$$

Because Markdown is a superset of HTML you can even add things like HTML tables:

<table>
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
<tr>
<td>row 1, cell 1</td>
<td>row 1, cell 2</td>
</tr>
<tr>
<td>row 2, cell 1</td>
<td>row 2, cell 2</td>
</tr>
</table>

You can learn more about Markdown at:

- http://daringfireball.net/projects/markdown/syntax
- [examples as a notebook](http://nbviewer.ipython.org/github/ipython/ipython/blob/1.x/examples/notebooks/Part%204%20-%20Markdown%20Cells.ipynb)

More about MathJax and the support $\LaTeX$ command set:
- [MathJax](https://www.mathjax.org/)
- [MathJax TeX and LaTeX Support](http://docs.mathjax.org/en/latest/tex.html#tex-support)

## The Notebook Format

Notebooks are files in JSON (JavaScript Object Notation) format and look something like this:

    {
     "metadata": {
      "name": "",
      "signature": "sha256:922e2efa9f20e67705e5ce015471a18197f12fb5cacec18b4f2ec8fe342396f5"
     },
     "nbformat": 3,
     "nbformat_minor": 0,
     "worksheets": [
      {
       "cells": [
        {
         "cell_type": "heading",
         "level": 1,
         "metadata": {},
         "source": [
          "A Heading"
         ]
        },
        {
         "cell_type": "markdown",
         "metadata": {},
         "source": [
          "This notebook demonstrates how to automatically generate a table of contents from Header cells.\n",
          "\n",
          "When run it generates a Markdown cell at the start of the notebook containing:"
         ]
        }
      }

Thankfully, you should never need to edit this directly. In fact the notebook file may contain binary blobs such as output images so it is not intended to be human readable.

A notebook can be shared with others, to view, edit, and run.

In the menu bar select:

    File -> Download As -> IPython Notebook (.ipynb)

Alternatively, you directly access the notebooks within your home directory on IM&T Scientific Computing platforms in:

    ~ident/notebooks/

Simply drag any notebook you receive onto the dashboard / home page of your notebook server to add it to your own collection.

## Navigation
The Notebook UI works in two modes - *command-mode* and *edit-mode*. In command-mode, cells are highlighted in blue, and you can perform commands that work on whole cells or the entire notebook. Edit-mode is active while a cell is being edited, with the cell being highlighted in green. To edit a cell, press *enter* while the cell is selected in command-mode. To exit edit-mode, press *escape*.

Some useful command-mode keyboard shortcuts are:
* `a`: Add a new cell above the current cell
* `b`: Add a new cell below the current cell
* `m`: Change the cell type to Markdown
* `y`: Change the cell type to Code
* `c`: copy the selected cells into the cell clipboard
* `v`: paste the cell clipboard into new cells below the current cell selection
* `j`: select the preceeding cell
* `k`: select the next cell
* `enter`: Edit the selected cell (enter edit-mode)
* `control-enter`: Execute the current cell
* `shift-enter`: Execute the current cell and move the cell selection to the next cell

You can find a complete list of shortcuts in the `Help - Keyboard Shortcuts` menu.

## More on the Jupyter Notebook Environment
For more information on the notebook environment, see the [Jupyter Notebook Examples](../Additional%20Notebooks/Jupyter%20Notebook%20Examples).

***

# Introduction to Python

What follows is  an overview of the Python language - syntax, keywords and core features.

Those with no programming background would do well to first search out tutorials on *computational thinking*.

There are very few defined exercises, instead readers are encouraged to modify the code cells and observe changes in behaviour.

## Print

The first Python directive we will explore is `print`.

Unsurprisingly, it simply displays a line of text. Unlike many languages such as C/C++, it includes a newline.

To print a string, we write:

In [None]:
print("Hello Everyone!!!")

<span style="color:green">Now would be a good time to try inserting your own code cell by pressing the <button><i class="fa-plus fa"></i></button> button and printing your own message.</span>

## Comments

Comments are annotations embedded in the code to summarise and explain the programmer's intent.

In Python there two styles of comment, the single line and the multiple line.

### Single Line Comments

Anything that follows a # (pound) symbol is considered a comment, ending with at the first newline encountered.

In [None]:
print("Hello")
# print("Again")
print("World") # trailing comment

### Multiple Line Comments

Multiple line comments use triple single quotes to mark their beginning and end.

In [None]:
'''
print("Hello")
print ("World")
'''

Did you notice that the previous code cell produces an output, even though all the code is enclosed in a multi-line comment? This is because multiple line comments don't really exist. Triple quotes actually create a multi-line string, but if you don't assign that string to a variable, then it has no effect and is effectively "commented out". However, the Notebook environment always trys to display the result of the final line of code in a cell, which is why you see the string output.

Of course when using an IPython Notebook, we might prefer to make good use of Markdown cells to document how our program works!

To disable an entire code cell, just change the cell type to Markdown. Reenable by changing back to a code cell. This is much easier than commenting out the code. Note that executing the Markdown cell will make it look like your code formatting has been destroyed, but this is just an illusion caused by the way Markdown renders whitespace. Revert to a code cell and the code layout will be restored. Try this on one of the code cells now.

<span style="color:green">Try changing a code cell to Markdown and then back to code now.</span>

## Indentation

Python uses the indentation level to determine the start and end of code blocks - unlike many languages which use braces (curly brackets).

For example:

In [None]:
if True:
    print('We have a tautology!') # this line is indented four spaces
    print('inside the block')
    if not False:
        print('false')
        
print("outside the block")

You will typically find a colon (:) at the end any line that preceeds an indented code block.

The Python standard requires the indentation be exactly four spaces (or multiple thereof when nested) however tabs are supported. Additionally, some interpreters are more relaxed and will accept indentation that is not exactly 4 spaces so long as the entire code block is consistent. Don't do this though. Just use a Python-aware editor and stick to 4 spaces for maximum compatibility.

## Variables and their Type

Variables are symbols (often descriptive words) that can represent or stand in for different values and that value may change during the execution of a program. 

For the computer science minded, Python is a dynamically-typed, strongly-typed, object-oriented, and garbage-collected language. Quite a mouthful, but in day to day use this means:

* **Dynamic typing**: You do not need to declare variables before using them, or declare that x is an integer and foo is a string. Variables need not even remain the same type over the entire execution. Also, you generally don't need to check the type of an object before using it. Your code can just try to use the variables. This is sometimes called *Duck Typing*: if it looks like a duck, and walks like a duck, then it is a duck.
* **Strong typing**: Objects have a strong type that cannot be changed (although conversions to a new object of a different type are possible). Operations only work with specific types, so adding numbers to strings for example will not work.
* **Object-oriented**: Nearly everything in Python is an object. Objects (sometimes called Classes although there is a distinction) are a way of bundling together some data and the operations that are defined on that data. Nearly everything in Python is an object, even things that you might not expect. So while the expected things like numbers, strings, and lists are objects, so are functions. This allows some powerful programming techniques. Finally, although not covered today, Python provides very rich mechanisms for defining your own classes of objects.
* **Garbage-collected**: Unlike some other languages such as C or C++, you generally don't need to worry about memory management. When there are no longer any references to an object, the memory will be reclaimed automatically by the Python garbage collector. There *are* occasions where you need to worry about resource management (for example when opening a file), but Python gives you ways to do this.

### Numbers

Python supports different types of numbers:

- integers
- floating point numbers
- complex numbers

To define an integer we simple write:

In [None]:
myint = 1

To define a floating point number we must include a decimal point (otherwise it would be an integer!):

In [None]:
myfloat = 1.0

# alternatively..
myfloat = float(1)
myfloat = float(myint)

Complex numbers can be defined by using the built in `Complex` type, or by using `j` to denote the imaginary part:

In [None]:
# the most natural-looking way
mycomplex = 3 + 4j
print(mycomplex)

# alternatively...
mycomplex = complex(3,4)
print(mycomplex.real, ',', mycomplex.imag)

Simple operators can be executed on numbers:

In [None]:
x = 22
y = 7
z = x + y
print(z)

xy = x * y
print(xy)

### Strings

Strings may be defined using single or double quotes:

In [None]:
string1 = 'hello'
string2 = "hello"

One difference is that double quotes allow for apostrophes, which would otherwise mark the end of the string.

In [None]:
string3 = "When you're using single quotes, confusing your with you're is not even wrong, it's fatal"
string3

In [None]:
string4 = "This is a single quote: '. This is a double quote: \""
string4

There are additional variations on defining strings that make it easier to include things such as carriage returns, backslashes and Unicode characters. These are beyond the scope of this tutorial, but are covered in the [Python documentation](http://docs.python.org/tutorial/introduction.html#strings).

Simple operators can be executed on strings:

In [None]:
s1 = "hello"
s2 = "world"
s3 = s1 + " " + s2
print(s3)

Mixing operations on numbers and strings is not supported (recall strong-typing):

In [None]:
print("Do you 'like' or" + 1)

However, you can explicitly convert numbers to their string representation first:

In [None]:
print(str(x) + " / " + str(y) + " exceeds Pi")

**However, if you ever find yourself building complex string concatenations in order to format some output, you are doing it wrong. Python provides much more [elegant ways of formatting strings](https://docs.python.org/3/library/string.html#string-formatting) (covered below).**

### Lists

Probably the most commonly used container in Python, lists are used to group together multiple values. They can contain variables of any type and can dynamically vary the number of elements contained. It is simple to iterate over the items in a list.

Here is an example of how to build a list:

In [None]:
mylist = []

In [None]:
mylist.append(1)
mylist.append(2)
mylist.append(3)
print(mylist)

In [None]:
print(mylist[0]) # prints 1
print(mylist[1]) # prints 2
print(mylist[2]) # prints 3

In [None]:
# prints out 1,2,3
for x in mylist:
    print(x)

Attempting to access a non-existent index generates an error. We call it *raising an exception*:

In [None]:
mylist = [1,2,3]
print(mylist[10])

In [None]:
len(mylist)

We can access multiple elements at one time, known as a *slice*:

In [None]:
mylist[0:2]

Here are some more list methods:

In [None]:
a = ['spam', 'spam', 'eggs', 'spam']

In [None]:
print(a.count('spam'))
print(a.count('eggs'))
print(a.count('bacon'))
print(a.count(5))

In [None]:
a.insert(2, 'sausage')
a.append('eggs')
a

In [None]:
print(a.index('eggs'))

In [None]:
a.remove('spam')
a

In [None]:
a.reverse()
a

In [None]:
a.sort()
a

Notice for these final examples we did not bother calling `print`.

For now accept there are subtle differences between the two types of output but the latter is only visible if it is the final statement executed in a code cell!

### Packing and Unpacking

Assignments of more than one variable can be performed in a single statement:

In [None]:
a, b = 3, 4
print(a)
print(b)

The longer form would be:

In [None]:
a = 3
b = 4

An interesting use for swapping two values:

In [None]:
a = 'a'
b = 'b'
a, b = b, a
print(a)
print(b)

We can unpack a list into multiple variables:

In [None]:
mylist = [5, 6, 7, 8]
a, b, c, d = mylist
print(mylist)
print(a)
print(b)

This would be equivalent to:

In [None]:
a = mylist[0]
b = mylist[1]
c = mylist[2]
d = mylist[3]

The reverse is not quite as simple:

In [None]:
ab = a, b
print(ab)

Here we have *not* created a list, but rather a tuple!

Unlike lists, tuples are *immutable*  sequences (cannot be modified) which may not be the desired outcome.

Instead we need to use `list()`:

In [None]:
tuple(list(ab))

## Operators

### Arithmetic Operators

As in any programming language, the standard arithmetic operators are available:

In [None]:
result = 1 + 2 * 3 / 4.0 - 5.
result

Note that the interpretation of integer division is different in Python 2 and Python 3. In Python 2, '2 / 3' is the integer modulo operator, while Python 3 treats the operands as floating point and then divides.

In [None]:
nearpi = 22 / 7
nearpi

Operators beyond addition, subtraction, multiplication, and division operators have less familiar notation.

In Python 3, the modulo (%) operator returns the integer remainder of one number divided by another:

In [None]:
print(16 % 5)

The power (\*\*) operator raises the first value to the power of the second, i.e. m \*\* n = $m^{n}$

In [None]:
ninesquared = 9 ** 2 
kibibyte = 2 ** 10

Finally, the floor division operator (//) simply divides the number and rounds down.

In [None]:
print (9 // 10) #floor division

### String Operators

In Python, string concatenation is done with the addition operator:

In [None]:
cafe = "Greasy" + " " + "Spoon"
print(cafe)

Python also supports multiplying strings to form a string with a repeating sequence:

In [None]:
menu = "spam " * 10
print(menu)

### List Operators

Lists can be concatenated using the addition operator (which also performs string concatenation):

In [None]:
evens = [0, 2, 4, 6, 8]
odds = [1, 3, 5, 7, 9]
somenumbers = evens + odds
print(somenumbers)
#print(sort(somenumbers))

As with strings, lists can be repeated using the multiplication operator. As for strings, this is syntactic sugar for repeated concatenation:

In [None]:
print(odds * 3)

## String Formatting

In addition to concatenation, Python has several ways to format longer strings. The preferred method for new Python code is to use the `string.format` method.

In [None]:
print('Python is named after {0}, not the snake.'.format('Monty Python'))

You can also refer to substitions by name:

In [None]:
print(
    'Despite this, images of {image_name} are {frequency} used for Python projects'.format(
        image_name='snakes',
        frequency='often'))

There are a wide range of options for controlling the precision used for formatting numbers. Note also that the positional order of the format arguments doesn't have to match the order in the format string. The index inside the '{}' is used to determine the substitutions.

In [None]:
numerator = 22.0
denominator = 7.0

print('{1} / {2} = {0}'.format(numerator/denominator, numerator, denominator))
print('{1} / {2} = {0:.4}'.format(numerator/denominator, numerator, denominator))

Referencing the properties of an object is particularly nice:

In [None]:
c = 3-7j # define a complex number
print('The complex number {0} is formed from the real part {0.real} and the imaginary part {0.imag}.'.format(c))

### Old-style % Formatting
Predating the `str.format()` method outlined above, there is another string formatting method that is similar to other languages' `printf`. It uses the percent (%) operator followed by special symbols such as "%s" or "%d".

These describe the expected variable types that will follow the string.

   - 's'	String format. This is the default type for strings and may be omitted.
   - 'd'	Decimal Integer. Outputs the number in base 10.
   - 'o'	Octal format. Outputs the number in base 8.
   - 'x'	Hex format. Outputs the number in base 16, using lower- case letters for the digits above 9.
   - 'n'	Number. This is the same as 'd', except that it uses the current locale setting to insert the appropriate number separator characters.

For example:

In [None]:
job = "Lumberjack"
print("He's a %s!" % job)

When two or more specifiers are used, the % is followed by a tuple:

In [None]:
number = 5
string = "infinity"
print("%d is a sufficiently close approximate to %s." % (number, string))

The %s operator can also be used on objects with a `repr` method, such as a `list`:

In [None]:
mylist = [1,2,3]
print("A list: %s" % mylist)

## Conditions

Conditional statements are those that are only executed if a boolean expression evaluates to true.

### Boolean - True or False

Python comparison operators return a boolean value, `True` or `False`, and these can be stored in boolean variables.

Comparison between two variables is performed with the double-equals (==) operator while comparing for inequality uses exclamation-mark-equals (!=).

For example:

In [None]:
count = 3
print(count == 4) # "Four shalt thou not count."
print(count > 4)  # "Five is right out."
print(count < 3 )
print(count == 3)

Here is an example for using Python's "if" statement using code blocks:

```python
    if <a true statement>:
        <do something>
        ....
    elif <another true statement>: # else if
        <do something else>
        ....
    else:
        <do something else>
        ....
```

For example:

In [None]:
# try modifying the value
value = 0
if value == 0:
    print("value is zero")
else:
    print("value is non-zero")

### Boolean operators

The boolean operators `and` and `or` let us build complex boolean expressions, such as:

In [None]:
if 1 != 0 and 2 == 1:
    print("It is True that 1 != 0 and 2 == 1")
else:
    print("It is False that 1 != 0 and 2 == 1")
    

if True or False:
    print("It is True, that it is True or False.")
else:
    print("The Liar Paradox is a lie.")

### The in operator

The "in" operator tests for membership in a container such as a list:

In [None]:
name = 'John'
if name in ["Graham", "John", "Terry", "Eric", "Michael"]:
    print(name + " is very naughty.")

### The is operator

Unlike the double-equals operator "==", the "is" operator does not match the values of two variables.

Instead, it compares their identity which is essentially their memory addresses.

Two variables with the same value will be equal but *may* not have the same identity.

In [None]:
x = [1,2,3]
y = [1,2,3]
z = y

print(x == y) # True
print(x is y) # False. The lists are equivalent but are unique objects in memory.
print(z is y) # probably True (depends on circumstances like object type)

### The not operator

Using `not` before a boolean expression negates it:

In [None]:
print(not True) # False
print((not True) == (False)) # True

## Looping Constructs

There are two types of loops in Python, `for` and `while`.

### The for loop

For loops iterate over a given sequence. Here is an example:

In [None]:
primes = [2,3,5,7]
for prime in primes:
    print(prime)

If you need to iterate over a sequence of numbers, the built-in function range() comes in handy. It generates arithmetic progressions:

In [None]:
for x in range(5):
    print(x)

Range can take a lower bound too.

Jupyter Notebooks stream output to the browser as it arrives, rather than waiting for the loop to complete, so this should produce output progressively:

In [None]:
import time
for x in range(20,25):
    time.sleep(1)
    print(x)

To iterate over the indices of a sequence, you could combine `range()` and `len()`:

In [None]:
phrase = ['I’m', 'afraid', 'my', 'walk', 'has', 'become', 'rather', 'sillier', 'recently']

In [None]:
for i in range(len(phrase)):
    print(i, phrase[i])

Or better yet, use the `enumerate()` function:

In [None]:
for i, val in enumerate(phrase):
    print(i, val)

### The while loop

While loops repeat as long as a boolean condition is true:

In [None]:
count = 5
while count > 0:
    print(count)
    count = count - 1

### break and continue statements

To exit a loop prematurely, the `break` statement is used.

We can also `continue` the loop from the next iteration immediately, skipping the remainder of the current block.

In [None]:
count = 0
while True:
    print(count)
    count += 1
    if count >= 5:
        break  # Terminates the entire loop    

In [None]:
for x in range(10):
    # Check if x is even
    if x % 2 == 0:
        continue  # skips the rest of the current iteration. Loop continues from next iteration
    print(x)

Of course, if you actually need a sequence of odd numbers, there are easier ways to get them. The `range()` function accepts extra parameters for the start, stop, and step:

In [None]:
for x in range(1, 10, 2):
    print(x)

## Functions

### What are Functions?

Functions are a convenient way to divide your code into useful blocks, allowing us to order our code, make it more readable, reuse it and save some time. Also functions are a key way to define interfaces so programmers can share their code.

### How do you write a function?

The keyword `def` marks the start of a function definition. 

It is then followed by the function name and a list of parameter names in brackets.

Finally, the function body occurs as an indented code block.

```python
def function_name(param1, param2): 
    do_something()
    # a comment
    something_else()
```

Functions may also return a result, for example:

In [None]:
def avg(x, y):
    return (x + y) / 2

You can return multiple values in a tuple or other container:

In [None]:
def minmax(x):
    return (min(x), max(x))  # this returns a tuple of two values

However, returning multiple values from a function automatically creates a tuple, so you can omit the outer braces:

In [None]:
def minmax(x):
    return min(x), max(x)  # this also returns a tuple of two values

### How do you call a function?

Simply write the function name followed by ()

If the function takes arguments, place them within the brackets:

In [None]:
avg(1, 3) == 2

In [None]:
foo = [1, 2, 3, 4]
minmax(foo)

In [None]:
# Use the built-in function type() to confirm that the tuple is automatically created when the outer brackets are omitted
r = minmax(foo)
type(r)

### Optional Arguments

Optional arguments are those that provide a default value like this:

In [None]:
def spamify(words, replacement="spam"):
    return [replacement for word in words.split(' ')]

spamify('the original phrase', 'eggs')

When the argument is omitted, it uses the default value:

In [None]:
spamify('hello world')

<span style="color:green">Try defining a function that accepts a name and sings (prints) happy birthday to that person:</span>

In [None]:
# insert code here



<span style="color:green">Now modify it to instead return the string and provide a suitable default name (e.g. 'You').</span>

## Classes and Objects

Objects are used to associate functions with data.

Just as we have types such as numbers and strings, each manipulated with their own set of functions, user defined structures may need custom functions. It may be meaningless to 'perform addition on two objects' - perhaps the objects represent People!

The 'template' for an object is described using a `class` definition which can contain both variables and function definitions.

For example, a minimal class might be:

In [None]:
class MyClass:
    foo = 123

    def function(self):
        print("Hello from MyClass")

To instantiate an object of the above class we write:

In [None]:
obj1 = MyClass()

The `obj1` variable now points to an object of the class "MyClass". Next we'll see how to access it's internal data and functions.

### Accessing Object Variables

To access a variable of `obj1` we write:

In [None]:
obj1.foo

We can modify the variable too:

In [None]:
obj1.foo = 456
obj1.foo

This only modifies the specific object (instance) of the class.

You can create multiple objects from a class template and each will have its own copy of a variable.

For example:

In [None]:
obj2 = MyClass()
obj2.foo = 789

Then print out both values:

In [None]:
print(obj1.foo)
print(obj2.foo)

### Accessing Object Functions

Unsurprisingly, this looks just like accessing an object variable but with the additional of function brackets (and arguments where required):

In [None]:
obj1.function()

You'll notice we did not provide a value for the first argument of ``function()`` or explain why it was necessary.

Part of the answer is that Python passes the object itself as the first argument to the function, in effect re-writing it like this:

```python
MyClass.function(obj1)
```

The rest of the answer revolves around why it needs to be an explicit part of the language but is outside the scope of this tutorial.


However ``self`` deserves more explanation as it is used in other ways...

### Self

The function in MyClass used a new keyword, `self`, as a way of differentiating between the multiple objects sharing a class template.

Consider the following:

In [None]:
class AnotherClass:
    foo = 123

    def function(self):
        print("My value is %d" % self.foo)
        

obj3 = AnotherClass()
obj3.function()

obj4 = AnotherClass()
obj4.foo = 456
obj4.function()

``self`` was also required for ``function`` to refer to the specific instance of ``foo`` associated with the object ``function``` was being called upon.

Without it, printing ``foo`` would search for a variable with that name in the global scope (which nay or may not exist).

### Constructors

Constructors are methods for creating objects.

Python provides a default constructor as we saw with ``obj1 = MyClass()`` however if you wish to initialise an object's variables to specific values at instantiation time you must define a new constructor.

This is done with the specially named ``__init__`` function:

In [None]:
class Order:
    items = []

    def __init__(self, arg):
        self.items = arg
        
    def check(self):
        print("Your order was: %s?" % self.items)

In [None]:
myorder = Order(['eggs', 'spam'])
myorder.check()

Constructors can take additional arguments and do more than simply copy them into the objects own variables.

<span style="color:green">Try defining a class with a constructor that accepts Polar coordinartes Rho and Phi but internally stores them as Cartesian coordinates *x* and *y*.</span>

It may help to know:

$x = r cos \phi$

$y = r sin \phi$

The ``cos`` and ``sin`` functions should be available in most Python environments, if not return to this after learning about ``modules``.

In [None]:
class Coord:
    x = None
    y = None
    
    def __repr__(self):
        # objects of a class can be passed to print
        # when a __repr__ function is defined
        # it should return a string
        return str([self.x, self.y])

    # insert code here

    
    
location = Coord(0.5, 1)
print(location)

**Solution:**

```python
    def __init__(self, rho, phi):
        self.x = rho * cos(phi)
        self.y = rho * sin(phi)
```

## Dictionaries

A dictionary is a data structure similar to a list, but that uses keys instead of indexes.

Each value in the dictionary is looked up using a key and new entries can be added by assigning to a non-existent key.

For example, a phonebook can be created with:

In [None]:
phonebook = {}
phonebook["Chapman"] = 55512341
phonebook["Cleese"] = 55512342
phonebook["Gilliam"] = 55512343

Alternatively, the following shorthand can be used to initialise a dictionary:

In [None]:
phonebook = {
    "Chapman" : 55512341,
    "Cleese" : 55512342,
    "Gilliam" : 55512343
}

A key can be a string, number, or any hashable object (which roughly means no lists or other mutable containers).
 
You can even mix key types in a single dict although it tends to be confusing (you often wish to sort on the keys later!):

In [None]:
phrasebook = {
    1 : "Is 1 a page number?",
    "My hovercraft is full of eels" : "A légpárnás hajóm tele van angolnákkal",
    2 : "Does 2 come before or after 'My hovercraft'?",
    (2, '9-12') : "sacred relic"
}
phrasebook

### Iterating over dictionaries

Dictionaries can be iterated over in a similar manner to lists.

However, dictionaries do not store their contents in order sorted on keys.

To iterate over a dictionary's key-value pairs we write:

In [None]:
for name, number in phonebook.items():
    print("Phone number of %s is %d" % (name, number))

Where in a list we expect `enumerate()` to return indexes in sorted (ascending numerical) order, the same would not be true of a dictionary with numerical keys, nor would strings as keys be expected in alphabetical order.

### Removing a dictionary value

To remove a specified index, use either one of the following notations:

In [None]:
del phonebook["Cleese"]

or, if we want to return the value of the entry being removed:

In [None]:
phonebook.pop("Chapman")

We can see the dictionary has been modified:

In [None]:
phonebook

## Modules and Packages

Modules in Python are simply Python files with the .py extension, which implement a set of functions. Modules are imported from other modules using the import command.

To import a module, we use the ``import`` command.

The first time a module is loaded into a running Python script, it is initialized by executing the code in the module once. If another module in your code imports the same module again, it will not be loaded twice but once only - so local variables inside the module act as a "singleton" - they are initialized only once.

The full list of built-in modules can be found at [module index](https://docs.python.org/3/py-modindex.html) but some immediately useful modules include:
    
- sys: for interactive with the system (environment, file I/O, etc.)
- os: for platform-specific operations (file statistics, directories, paths, etc.)
- math: for mathematical functions and constants

Some of specific interest to science and engineering include:

- numpy: a numerical library for vectorized arrays
- scipy: a scientific library
- matplotlib: for plotting and graphing
- pandas: for data analysis

For example:

In [None]:
# import the library
import os

# use it
os.listdir()

### Exploring built-in modules

Two very important functions come in handy when exploring modules in Python - the `dir()` and `help` functions.

We can look for which functions are implemented in each module by using the dir function:

In [None]:
import csv
dir(csv)

However, the output from `dir()` is not very useful for getting an overview of a module. For that, the `help()` function is often more helpful:

In [None]:
help(csv)

When we find the function in the module we want to use, we can read about it more using the `help()` function again, but this time passing just the specific function of interest:

In [None]:
import os
help(os.listdir)

Within a Jupyter Notebook, you can display help in a separate panel with the `?` command:

In [None]:
# This won't work outside Jupyter Notebooks
?os.listdir

You can supply help strings for functions you define like this: 

In [None]:
def myfunc():
    '''here is some help'''
    print('hello')
    
help(myfunc)

Often we only wish to import a subset of functionality from a module. This can be done using the `from` statement:

In [None]:
from os import listdir

Notice we can call the function without the prefix `os.`:

In [None]:
listdir()

This can even make use of wild cards, for example

        from os import *
    
However it is generally best avoided so as not to "pollute the global namespace"

<span style="color:green">Try importing the ``math`` module and explore its contents with ``dir()`` and ``help()``:</span>

In [None]:
# insert code here



## List Comprehensions

A List Comprehension is a powerful and concise way to create a new list from an existing list.

Consider the following problem:

    find the length of each word in a phrase if it begins with a capital letter

In [None]:
phrase = "Nobody expects the Spanish Inquisition"
words = phrase.split()

lengths = []
for word in words:
    if word[0].isupper():
        lengths.append(len(word))
        
print(lengths)

A list comprehension is far shorter and reads very much like the description:

In [None]:
phrase = "Nobody expects the Spanish Inquisition"
words = phrase.split()

lengths = [len(word) for word in words if word[0].isupper()]

print(lengths)

A list comprehension has the following form:

- an expression (the new element), followed by
- a `for` clause, followed by
- a number of optional conditions (`if` clauses)

Consider this new problem:

    combine the elements of two lists where they are not equal
    
The solution:

In [None]:
[(x, y) for x in ['Arthur', 'Ken', 'Spiny'] for y in ['Luigi', 'Ken'] if x != y]

List comprehensions can contain complex expressions and nested functions:

In [None]:
from math import pi
[str(round(pi, i)) for i in range(1, 6)]

## Generators
If you know how to use list comprehensions, then you know how to write simple generators. For generators that use comprehension syntax, the code looks very similar:

```python
    # This is a list comprehension
    [i for i in range(5)]
    
    # This is a generator
    (i for i in range(5))
```

Both of those code snippets can be used to create the sequence of numbers [0..4]. But there is an important difference. The list comprehension creates the entire list into memory, all at once when the expression is first evaluated. On the other hand, the generator uses lazy (or on-demand) evaluation to only generate the next number in the sequence when that value is required. For small sequences the difference is negligible, but for large sequences using a generator can be much more efficient both for memory use and performance.

There is another way to write generators which is more suitable when more complex logic is involved. We don't cover this other method here.

In [None]:
my_sequence1 = [i for i in range(10) if i % 2 == 0]
print(my_sequence1)

my_sequence2 = (i for i in range(10) if i % 2 == 0)
print(my_sequence2)

Note that printing the generator only displayed some type information. So how do we access the actual number sequence?

One way is to create a list from the generator:

In [None]:
list(my_sequence2)

However, doing this probably indicates that a list comprehension would have been a better choice in the first place.

The most common way of consuming values from a generator is by iterating in a loop or comprehension:

In [None]:
# We have to create the generator again, since the sequence was exhausted when creating the list earlier
my_sequence2 = (i for i in range(10) if i % 2 == 0)

for i in my_sequence2:
    print(i)

## Reading and Writing Files

### Opening Files

To open a file we use `open()` which returns a file object:

In [None]:
fout = open('workfile', 'w')

The first argument is a string containing the *filename*.

The second argument is another string containing the *mode* which describes the intended use.

The mode can be:

- 'r' : the file will only be read (this is the default)
- 'w' : only writing (an existing file with the same name will be erased)
- 'a' : for appending; any data written to the file is automatically added to the end
- 'r+': both reading and writing


Files are normally opened in text mode meaning reads and writes to the file will use strings. By appending 'b' to the mode, the file will be opened in binary mode.

### Writing to files

`f.write(string)` writes the contents of string to the file, returning the number of characters written.

In [None]:
contents = 'This is the first line of the file.\n' + 'Second line of the file\n'
fout.write(contents)

### Closing Files

In [None]:
fout.close()

### Reading from files

In [None]:
fin = open('workfile', 'r')

`f.readline()` reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline. This makes the return value unambiguous; if f.readline() returns an empty string, the end of the file has been reached, while a blank line is represented by '\n', a string containing only a single newline.

In [None]:
fin.readline()

In [None]:
fin.readline()

In [None]:
fin.readline()

We can seek back to the beginning of a file:

In [None]:
fin.seek(0)

For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code:

In [None]:
for line in fin:
    print(line)

In [None]:
fin.close()

### Keeping Things in Context With `with`

It is good practice to use the `with` keyword when dealing with file objects.

This has the advantage that the file is properly closed after its context finishes, even if an exception is raised on the way.

**A good general pattern for line by line processing of a file is:**
```python
with open('myfile', 'r') as f:
    for line in f:
        # Do some processing on the line ...
        process_the_line(line)
```

For example:

In [None]:
with open('workfile', 'r') as f:
     for line in f:
            print(line)

We can see the file was automatically closed:

In [None]:
f.closed

## Downloading Files

Everything we need for this example is located in the `urllib.request` module.
However this is only applicable to Python 3, earlier versions will likely use `urllib2`.

In [None]:
import urllib.request

url='http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv'
filename='Sacramentorealestatetransactions.csv'

The easiest way is to use `urlretrieve` although it is marked as legacy so may be deprecated one day:

In [None]:
urllib.request.urlretrieve(url, filename)

A slightly more verbose method might be:

In [None]:
import shutil

with urllib.request.urlopen(url) as response:
    with open(filename, 'wb') as fout:
        shutil.copyfileobj(response, fout)

And of course there is no reason you couldn't provide your own one-liner.

<span style="color:green">Fill in the body of `myurlretrieve()`:</span>

In [None]:
def myurlretrive(url, filename):
    '''Downloads the file at url and saves it to filename.'''
    # insert code here
    

    

myurlretrive(url, filename)

However, if you need to do any serious work with web APIs, I suggest looking into the [requests](http://docs.python-requests.org/en/master/) library.

## Working with CSV Files
The `csv` built-in module provides low-level CSV functionality.

In [None]:
import csv

In [None]:
with open(filename, 'r') as csvfile:
    for row in csv.reader(csvfile):
        print(row)

Working with each row as a list is possible, for example we could create a table as a list of lists 

```python
        table = []
        for row in csv.read(csvfile):
            table.append(row)
```
However, this code will become messy very quickly, making it error prone, and there are dedicated libraries already in existance.

### Pandas for CSV

Pandas is a data analysis library that lets you use data frames (similar to those in the R language) as well as providing support for CSV data, and time-series data.

In [None]:
import pandas as pd

We can grab the data from a local file or even straight from a url:

In [None]:
df = pd.read_csv(url)

We can extract rows in many ways:

In [None]:
df[5:8]

<span style="color:green">Try calling the ``head()`` and ``tail()`` functions on the dataframe:</span>

In [None]:
# insert code here




Selecting columns is also simple:

In [None]:
df.loc[5:10, ['street','city','zip']]

There are many useful built in functions:

In [None]:
df.sort_values(by='price', ascending=False)[-3:]

In [None]:
df.describe()

## Debugging in the Notebook

You can run ``%debug`` in a code cell to enter the Python debugger (pdb) after an [exception](https://docs.python.org/3.5/tutorial/errors.html) has occurred. This is sometimes called post-mortem debugging.

In the debugger you can inspect variables, execute statements, and browse the call stack.

For more details see https://docs.python.org/3/library/pdb.html#debugger-commands

In [None]:
raise Exception("I accidentally by zero")

In [None]:
%debug

As you can see, this can be useful but it's not meaningful to use a filename and/or linenumber when setting breakpoints in code defined in a Notebook.

**Enter q on the idb command line to exit.**

If necessary you might choose to export the notebook as a regular Python file and switch to a different environment for your debugging needs. IDEs such as PyCharm provide interactive GUI debuggers that make the entire process much more productive.

You can also set up the notebook to automatically launch the debugger on any unhandled exception:

In [None]:
%pdb

The next time an execption occurs it will automatically launch the debugger:

In [None]:
1 / 0

Running the magic again turns it off:

In [None]:
%pdb

## Python IDEs

In certain instances a Notebook is *not* the right choice for developing Python code.

You can find a list and discussion of IDEs at:

- https://wiki.python.org/moin/IntegratedDevelopmentEnvironments

Contact the [Scientific Computing Helpdesk](mailto:schelp@csiro.au) for advice on Python IDEs available on the clusters.

## Python 2 or 3?

There are still some useful libraries that have not been updated to Python 3, despite it being released in 2008.

If you depend on such a library there may be no choice of version to use.

However here are some resources that may help steer you towards version 3+:

* [What's new in Python 3?](http://docs.python.org/3.3/whatsnew/3.0.html).
* [Official wiki page about the Python 2/Python 3 question](http://wiki.python.org/moin/Python2orPython3).
* [*"Ten awesome features of Python that you can't use because you refuse to upgrade to Python 3"*, a presentation by Aaron Meurer](http://asmeurer.github.io/python3-presentation/slides.html).
* [Key differences between Python 2 and Python 3](http://sebastianraschka.com/Articles/2014_python_2_3_key_diff.html).    

There are tools that can help when converting Python2 code to Python3:
    
* [2to3 module](http://docs.python.org/2/library/2to3.html).
* [Python Future](http://python-future.org/) 

# Final Thoughts

You can download tutorial Notebooks from the web and install them using the dashboard interface.

Some excellent resources include:

- [nbviewer](http://nbviewer.ipython.org/ "nbviewer homepage") and associated [examples](http://nbviewer.ipython.org/github/ipython/ipython/blob/master/examples/Index.ipynb "nbviewer examples")
- [A gallery of Scientific Computing Notebooks](https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks#scientific-computing-and-data-analysis-with-the-scipy-stack "A gallery of interesting IPython Notebooks related to scientific computing")
