<img style="float: right;" src="http://www2.le.ac.uk/liscb1.jpg">

# Scientific Python Basics - Refresher

Modified by T.J. Ragan and T. Forey  
Originally by: Sofware Carpentry (Cindee Madison and Thomas Kluyver, with thanks to Justin Kitzes and Matt Davis)

Python is an easy to write, very easy to read general programming language.  Like any language, however, it's only as beautiful as the people who use it make it.  Please remember that your code is not only meant to do something, it's meant to be read.  In fact, most programmers spend more than half their time (in fact, most of their time,) reading the code they've written - so be good to yourself.

## Variables
There are no $ signs to mark variables in Python, they must simply start with a letter or underscore (not a number!). Furthermore, variables are dynamic, they can change type within a script. 

In [None]:
a = 2
print(a)
print(type(a))
a = "2b"
print(a, type(a))

In [None]:
# Variable names are case-sensitive
age = 24
Age = 26
print(age, Age)

In [None]:
# What happens when a new variable points to a previous variable?
a = 1
b = a
a = 2
## What is b?

## Operators
All of the basic math operators work like you think they should for numbers. They can also do some useful operations on other things, like strings. There are also boolean operators that compare quantities and give back a bool variable as a result.

In [None]:
# Standard math operators work as expected on numbers
a = 2
b = 3
print(a + b)
print(a * b)
print(a ** b)  # a to the power of b (a^b does something completely different!)
print(a / b)   # Careful with dividing integers if you use Python 2

In [None]:
# There are also operators for strings
print('hello' + 'world')
print('hello' * 3)
#print('hello' / 3)  # You can't do this!

In [None]:
# Boolean operators compare two things
a = (1 > 3)
b = (3 == 3)
print(a)
print(b)
print(a or b)
print(a and b)

## Functions

Using a function is very easily, the syntax is simply `function_name(argument)`, we've already been using the functions `print()` and `type()`. Function names are usually lower-case.

In [None]:
# There are thousands of functions that operate on things
print(type(3))
print(len('hello'))
print(round(3.3))

**TIP:** To find out what a function does, you can type it's name and then a question mark to get a pop up help window. Or, to see what arguments it takes, you can type its name, an open parenthesis, and hit tab.

In [None]:
round?
#round(
round(3.14159, 2)

__TIP:__ Many useful functions are not in the Python built in library, but are in external
scientific packages. These need to be imported into your Python notebook (or program) before
they can be used. Probably the most important of these are numpy and matplotlib.

In [None]:
# Many useful functions are in external packages
# Let's meet numpy
import numpy as np

In [None]:
# To see what's in a package, type the name, a period, then hit tab
#np?
np.

In [None]:
# Some examples of numpy functions and "things"
print(np.sqrt(4))
print(np.pi)  # Not a function, just a variable
print(np.sin(np.pi))

## Methods

Python is an object-orientated language, and so although we won't go into detail here about creating objects, you have already been using objects. Even simple things like ints and strings are objects in Python.

In the simplest terms, you can think of an object as a small bundled "thing" that contains within
itself both data and functions that operate on that data. For example, strings in Python are
objects that contain a bunch of characters in order, and also various functions that operate on those
characters. When bundled in an object, these functions are called "methods".

Instead of the "normal" `function(arguments)` syntax, methods are called using the
syntax `variable.method(arguments)`.

In [None]:
# A string is actually an object
a = 'hello, world'
print(type(a))

In [None]:
# Objects have bundled methods
#a.
print(a.capitalize())
print(a.replace('l', 'X'))

### Exercise 1 - Conversion

Throughout this lesson, we will successively build towards a program that will calculate the
variance of some measurements,  in this case `Height in Metres`.  The first thing we want to do is convert from an antiquated measurement system.

To change inches into metres we use the following equation (conversion factor is rounded)

$$metre = \frac{inches}{39}$$

1. Create a variable for the conversion factor, called `inches_in_metre`.
1. Create a variable (`inches`) for your height in inches, as inaccurately as you want.
2. Divide `inches` by `inches_in_metre`, and store the result in a new variable, `my_height_in_metres`.
1. Print the result

__Bonus__

Convert from feet and inches to metres.

## Lists

Python lists are ordered collections of things, each element of which is assigned an index (position). Lists can contain any type of object and are declared using square brackets `[]`.

In [None]:
# Lists are created with square bracket syntax
a = ['blueberry', 'strawberry', 'pineapple']
print(a, type(a))

In [None]:
# Lists (and all collections) are also indexed with square brackets
# NOTE: The first index is zero, not one
print(a[0])
print(a[1])

In [None]:
# you can access multiple items from a list by slicing, using a colon between indexes
# NOTE: The end value is not inclusive
print('a =', a)
print('get first two:', a[0:2])

In [None]:
# You can leave off the start or end if desired
print(a[:2])
print(a[2:])
print(a[:])
print(a[:-1])

In [None]:
# Lists are objects, like everything else, and have methods such as append
a.append('banana')
print(a)

a.append([1,2])
print(a)

a.pop()
print(a)

### EXERCISE 2 - Store a bunch of heights (in metres) in a list

1. Ask five people around you for their heights (in metres).
2. Store these in a list called `heights`.
3. Append your own height, calculated above in the variable *my_height_in_metres*, to the list.
4. Get the first height from the list and print it.

__Bonus__

1. Extract the last value in two different ways: first, by using the index for
the last item in the list, and second, presuming that you do not know how long the list is.

__HINT:__ **len()** can be used to find the length of a collection

## Tuples

Tuples are similar to lists, only:

1. You declare tuples using ( ) instead of [ ]
1. Once you make a tuple, you can't change what's in it (referred to as immutable)

You'll see tuples come up throughout the Python language, and over time you'll develop a feel for when
to use them. 

In general, they're often used instead of lists:

1. To group items when the position in the collection is critical, such as coord = (x,y)
1. When you want to make prevent accidental modification of the items, e.g. shape = (12, 23)

In [None]:
xy = (23, 45)
print(xy[0])
xy[0] = "this won't work with a tuple"

## Dictionaries
Dictionaries are the collection to use when you want to store and retrieve things by their names
(or some other kind of key) instead of by their position in the collection. A good example is a set
of model parameters, each of which has a name and a value. Dictionaries are declared using `{}`.  Unlike lists or tuples, dictionaries are not guaranteed to be in any particular order.

In [None]:
# Make a dictionary of model parameters
convertors = {'inches_in_feet' : 12,
              'inches_in_metre' : 39}

print(convertors)
print(convertors['inches_in_feet'])

In [None]:
## Add a new key:value pair
convertors['metres_in_mile'] = 1609.34
print(convertors)

In [None]:
# Raise a KEY error
print(convertors['blueberry'])

You can directly access the keys in your dictionary by using a dictionary method, note that these are then returned as a list. You can also use `items()` to retrieve a list of tuples.

In [None]:
# print the dictionary keys
print(convertors.keys())

# print a list of key:value pairs
print(convertors.items())

## Numpy arrays (ndarrays)
Even though numpy arrays (often written as ndarrays, for n-dimensional arrays) are not part of the core Python libraries, they are so useful in scientific Python that we'll include them here in the core lesson. Numpy arrays are collections of things, all of which must be the same type, that work similarly to lists (as we've described them so far). The most important are:

You can easily perform elementwise operations (and matrix algebra) on arrays
Arrays can be n-dimensional
There is no equivalent to append, although arrays can be concatenated
Arrays can be created from existing collections such as lists, or instantiated "from scratch" in a few useful ways.

When getting started with scientific Python, you will probably want to try to use ndarrays whenever you're doing math or dealing with numerical data, saving the other types of collections for those cases when you have a specific reason to use them.

In [None]:
# We need to import the numpy library to have access to it 
# We can also create an alias for a library, this is something you will commonly see with numpy
import numpy as np

In [None]:
# Make an array from a list
alist = [2, 3, 4]
blist = [5, 6, 7]
a = np.array(alist)
b = np.array(blist)
print(a, type(a))
print(b, type(b))

In [None]:
# Do element-wise arithmetic on arrays
print(a**2)
print(np.sin(a))
print(a * b)

# Do linear algegra on arrays
print(a.dot(b), np.dot(a, b))

In [None]:
# Boolean operators work on arrays too, and they return boolean arrays
print(a > 2)
print(b == 6)

c = a > 2
print(c)
print(type(c))
print(c.dtype)

In [None]:
# Indexing arrays
print(a[0:2])

c = np.random.rand(3,3)
print(c)
print('\n')
print(c[1:3,0:2])

c[0,:] = a
print('\n')
print(c)

In [None]:
# Arrays can also be indexed with other boolean arrays
print(a)
print(b)
print(a > 2)
print(a[a > 2])
print(b[a > 2])

b[a == 3] = 77
print(b)

In [None]:
# ndarrays have attributes in addition to methods
#c.
print(c.shape)
print(c.prod())

In [None]:
# There are handy ways to make arrays full of ones and zeros
print(np.zeros(5), '\n')
print(np.ones(5), '\n')
print(np.identity(5), '\n')

In [None]:
# You can also easily make arrays of number sequences
print(np.arange(0, 10, 2))

### EXERCISE 3 - Using Arrays for simple analysis

Revisit your list of heights

1. turn it into an array
2. calculate the mean
3. create a mask of all heights greater than a certain value (your choice)
4. find the mean of the masked heights

__BONUS__

1. find the number of heights greater than your threshold
2. mean( ) can take an optional argument called axis, which allows you to calculate the mean across different axes, eg across rows or across columns. Create an array with two dimensions (not equal sized) and calculate the mean across rows and mean across columns. Use 'shape' to understand how the means are calculated.


In [1]:
# Bonus

## For loops

To iterate through a list, range or any sequence you can use a python for loop. Unlike many other languages, python loops aren't inclosed within brackets or 'do; done;' notations. Instead loops are all defined by the indentation, this is one of the reasons that Python is such an easy to read language. 

**NOTE** In Python the default is to use four (4) spaces for each indentation, most editors can be configured to follow this guide.

In [None]:
# A basic for loop - don't forget the white space!
wordlist = ['hi', 'hello', 'bye']
for word in wordlist:
    print(word + '!')
    
print("No indentation so now we're outside the loop")

In [None]:
# Indentation error: Fix it!
for word in wordlist:
    new_word = word.capitalize()
   print(new_word + '!') # Bad indent

In [None]:
# Often we want to loop over the indexes of a collection, not just the items
print(wordlist)

for i, word in enumerate(wordlist):
    print(i, word, wordlist[i])

In [None]:
# While loops are useful when you don't know how many steps you will need,
# and want to stop once a certain condition is met.
step = 0
prod = 1
while prod < 100:
    step = step + 1
    prod = prod * 2
    print(step, prod)
    
print('Reached a product of', prod, 'at step number', step)

### EXERCISE 4 - Variance

We can now calculate the variance of the heights we collected before.

As a reminder, **sample variance** is the calculated from the sum of squared differences of each observation from the mean:

$$variance = \frac{\Sigma{(x-mean)^2}}{n-1}$$

where **mean** is the mean of our observations, **x** is each individual observation, and **n** is the number of observations.

First, we need to calculate the mean:

1. Create a variable `total` for the sum of the heights.
2. Using a `for` loop, add each height to `total`.
3. Find the mean by dividing this by the number of measurements, and store it as `mean`.

__Note__: To get the number of things in a list, use `len(the_list)`.

Now we'll use another loop to calculate the variance:

1. Create a variable `sum_diffsq` for the sum of squared differences.
2. Make a second `for` loop over `heights`.
  - At each step, subtract the mean from the height and call it `diff`. 
  - Square this and call it `diffsq`.
  - Add `diffsq` on to `sum_diffsq`.
3. Divide `sum_diffsq` by `n-1` to get the variance.
4. Display the variance.

__Note__: To square a number in Python, use `**`, eg. `5**2`.

__Bonus__

1. Test whether `variance` is larger than 0.01, and print out a line that says "variance more than 0.01: "
followed by the answer (either True or False).

In [None]:
# Bonus


## If loops

To query whether a condition is True or False, we can use an if loop. These are very similar in syntax to other languages but do not require being inclosed in brackets. 

In [None]:
# A simple if statement
x = 3
if x > 0:
    print('x is positive')
elif x < 0:
    print('x is negative')
else:
    print('x is zero')

In [None]:
# If statements can rely on boolean variables
x = -1
test = (x > 0)
print(type(test))
print(test)

if test:
    print('Test was true')

## Creating functions

It's very simple to create your own python functions, making pieces of your code easy to reuse.

In [None]:
# It's very easy to define your own functions
def multiply(x, y):
    return x*y

In [None]:
# Once a function is "run" and saved in memory, it's available just like any other function
print(type(multiply))
print(multiply(4, 3))

In [None]:
# It's useful to include docstrings to describe what your function does
def say_hello(time, people):
    '''
    Function says a greeting. Useful for engendering goodwill
    '''
    return 'Good ' + time + ', ' + people

**Docstrings**: A docstring is a special type of comment that tells you what a function does.  You can see them when you ask for help about a function.

In [None]:
say_hello('afternoon', 'friends')

In [None]:
# All arguments must be present, or the function will return an error
say_hello('afternoon')

In [None]:
# Keyword arguments can be used to make some arguments optional by giving them a default value
# All mandatory arguments must come first, in order
def say_hello(time, people='friends'):
    return 'Good ' + time + ', ' + people

In [None]:
say_hello('afternoon')

In [None]:
say_hello('afternoon', 'students')

### EXERCISE 5 - Creating a variance function

Finally, let's turn our variance calculation into a function that we can use over and over again. 
Copy your code from Exercise 4 into the box below, and do the following:

1. Turn your code into a function called `calculate_variance` that takes a list of values and returns their variance.
1. Write a nice docstring describing what your function does.
1. In a subsequent cell, call your function with different sets of numbers to make sure it works.

__Bonus__

1. Refactor your function by pulling out the section that calculates the mean into another function, and calling that inside your `calculate_variance` function.
2. Make sure it can works properly when all the data are integers as well.
3. Give a better error message when it's passed an empty list. Use the web to find out how to raise exceptions in Python.

In [None]:
print(calculate_variance([0.6, 0.1, 0.8]))
print(calculate_variance([174.3, 165.2, 208]))
print(calculate_variance([1.1, 1.5, 2.0]))

### EXERCISE 6 - Putting the `calculate_mean` and `calculate_variance` function(s) in a module

We can make our functions more easily reusable by placing them into modules that we can import, just
like we have been doing with `numpy`. It's pretty simple to do this.

1. Copy your function(s) into a new text file, in the same directory as this notebook,
called `stats.py`.
1. In the cell below, type `import stats` to import the module. Type `stats.` and hit tab to see the available
functions in the module. Try calculating the variance of a number of samples of heights (or other random numbers) using your imported module.

In [None]:

samples = [1.8,1.9,2.0,1.7,1.6,2.2]

#calculate your result here
result = 

# compare it to numpy's calculation 
np_result = np.var(samples, ddof=1)
assert result == np_result