# Scientific Python Basics

_Based on the work by: Cindee Madison, Thomas Kluyver, Justin Kitzes, Matt Davis_

To run the code in a cell, press:
- **Ctrl - Enter**  (stays in the current cell) *or*
- **Shift - Enter** (advances to the next cell).

To quickly create a new cell below an existing one, type **Ctrl-m** then **b**.
Other shortcuts for making, deleting, and moving cells are in the menubar at the top of the
screen.

## 1. Things

The most basic elements of any programming language are "things".<br>
The most basic _things_ in Python are: integers, floats, strings, and booleans.

In [1]:
# An integer
2

2

In [2]:
# A floating-point number
2.0

2.0

In [3]:
# A string
'string'

'string'

In [4]:
# A complex number
1 + 2j

(1+2j)

In [5]:
# A boolean
True

True

## Printing things

In [6]:
# We can print things to see values of things
print(2)
print(3.0)
print('hello')
print(False)

2
3.0
hello
False


## Variables

In [7]:
# Things can be stored as variables
a = 2
b = 'hello'
c = True  # This is case sensitive
C = 1+2j

In [8]:
# We can print values of variables
print(a)
print(b)
print(c)
print(c)

2
hello
True
True


In [9]:
# We can print types of variables
print(type(a))
print(type(b))
print(type(c))
print(type(C))

<class 'int'>
<class 'str'>
<class 'bool'>
<class 'complex'>


In [10]:
# We can print several things at once
print(a, b, c, C)
print(a, type(a))

2 hello True (1+2j)
2 <class 'int'>


## Quick quizz

In [None]:
# Assume we execute:
a = 1
b = a
a = 2

# Value of "b" is...?

In [12]:
help(print)

Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.



## 2. Commands that operate on things

Just storing data in variables isn't much use to us. Right away, we'd like to start performing
operations and manipulations on data and variables.

There are three very common means of performing an operation on a thing.

### 2.1 Use an operator

All of the basic math operators work like you think they should for numbers. They can also
do some useful operations on other things, like strings. There are also boolean operators that
compare quantities and give back a `bool` variable as a result.

In [13]:
# Standard math operators work as expected on numbers
a = 2
b = 3
c=4   # spaces are not important
C = 5 # capitalization IS important
print(c)
print(C)
print(a + b)
print(a - b)
print(a * b)
print(a ** b)  # a to the power of b (a^b does something completely different!)
print(a / b)   # Careful with dividing integers if you use Python 2
print(a // b)  # Integer division 

4
5
5
-1
6
8
0.6666666666666666
0


In [14]:
# There are also operators for strings
print('hello' + 'world')
print('hello' * 3) # we can multiply strings and integers works
# print('hello' / 3) # but can't divide
# print('hello' * 3.5) # does not work either

helloworld
hellohellohello


In [15]:
# Boolean operators compare two things
print(1 > 3)
print(3 == 3)

False
True


In [16]:
# We can assign the result of a comparison to a variable
a = (1 > 3)
b = (3 == 3)
print(a)
print(b)
print(a or b)
print(a and b)
print(a is not b)
print(a is b)

False
True
True
False
True
False


### 2.2  Functions()

These will be very familiar to anyone who has programmed in any language, and work like you
would expect.

In [18]:
# There are thousands of functions that operate on things
print(type(3))
print(len('hello'))
print(round(3.3))

<class 'int'>
5
3


__TIP:__ To find out what a function does, you can type it's name and then a question mark to
get a pop up help window. Or, to see what arguments it takes, you can type its name, an open
parenthesis, and hit tab.

In [17]:
round?
#round(
round(3.14159,2)

3.14

### 2.3 .methods

Before we get any farther into the Python language, we have to say a word about "objects". We
will not be teaching object oriented programming in this workshop, but you will encounter objects
throughout Python (in fact, even seemingly simple things like ints and strings are actually
objects in Python).

In the simplest terms, you can think of an object as a small bundled "thing" that contains within
itself both data and functions that operate on that data. For example, strings in Python are
objects that contain a set of characters and also various functions that operate on the set of
characters. When bundled in an object, these functions are called "methods".

Instead of the "normal" `function(arguments)` syntax, methods are called using the
syntax `variable.method(arguments)`.

In [19]:
# A string is actually an object
a = 'hello, world'
b = 5
print(a, type(a))
print(b, type(b))

hello, world <class 'str'>
5 <class 'int'>


In [20]:
# Objects have bundled methods
#a.
print(a.capitalize())
print(a.replace('l', 'X'))

Hello, world
heXXo, worXd


In [21]:
# Integers do not have .capitalize() method
b.capitalize() # fails

AttributeError: 'int' object has no attribute 'capitalize'

In [23]:
print(b.to_bytes)

<built-in method to_bytes of int object at 0x000000006D620270>


In [25]:
help(a.count)

Help on built-in function count:

count(...) method of builtins.str instance
    S.count(sub[, start[, end]]) -> int
    
    Return the number of non-overlapping occurrences of substring sub in
    string S[start:end].  Optional arguments start and end are
    interpreted as in slice notation.



## 3. Collections of things

While it is interesting to explore your own height, in science we work with larger  slightly more complex datasets. In this example, we are interested in the characteristics and distribution of heights. Python provides us with a number of objects to handle collections of things.

Probably 99% of your work in scientific Python will use one of four types of collections:
`lists`, `tuples`, `dictionaries`, and `numpy arrays`. We'll look quickly at each of these and what
they can do for you.

### 3.1 Lists

Lists are probably the handiest and most flexible type of container. 

Lists are declared with square brackets []. 

Individual elements of a list can be selected using the syntax `a[ind]`.

In [26]:
# Lists are created with square bracket syntax
a = ['blueberry', 'strawberry', 'pineapple']
print(a, type(a))

['blueberry', 'strawberry', 'pineapple'] <class 'list'>


In [27]:
# Lists (and all collections) are also indexed with square brackets
# NOTE: The first index is zero, not one
print(a[0])
print(a[1])

blueberry
strawberry


In [28]:
## You can also count from the end of the list
print('last item is:', a[-1])
print('second to last item is:', a[-2])

last item is: pineapple
second to last item is: strawberry


In [29]:
# you can access multiple items from a list by slicing, using a colon between indexes
# NOTE: The end value is not inclusive
print('a =', a)
print('get first two:', a[0:2])

a = ['blueberry', 'strawberry', 'pineapple']
get first two: ['blueberry', 'strawberry']


In [30]:
# You can leave off the start or end if desired
print(a[:2])
print(a[2:])
print(a[:])
print(a[:-1])

['blueberry', 'strawberry']
['pineapple']
['blueberry', 'strawberry', 'pineapple']
['blueberry', 'strawberry']


In [32]:
# Lists are objects, like everything else, and have methods such as append
a.append('banana')
print(a)

a.append([1,2])
print(a)

b = a.pop()
print(a)

['blueberry', 'strawberry', 'pineapple', 'banana', 'banana']
['blueberry', 'strawberry', 'pineapple', 'banana', 'banana', [1, 2]]
['blueberry', 'strawberry', 'pineapple', 'banana', 'banana']


In [33]:
a.pop()

'banana'

__TIP:__ A 'gotcha' for some new Python users is that many collections, including lists,
actually store pointers to data, not the data itself. 

Remember when we set `b=a` and then changed `a`?

What happens when we do this in a list?

__HELP:__ look into the `copy` module


In [34]:
a = 1
b = a
a = 2
## What is b?
print('What is b?', b)

a_list = [1, 2, 3]
b_list = a_list
print('original b_list', b_list)
a_list[0] = 42
print('What is b_list after we change a_list ?', b_list)

What is b? 1
original b_list [1, 2, 3]
What is b_list after we change a_list ? [42, 2, 3]


In [37]:
## Exercise
# 1. Create a list of digits
# 2. Append a string
# 3. Pop the string out from the list and store it in a variable
mylist = [2,4,6,8]
mylist.append('string')
print(mylist)
mystring = mylist.pop()
print(mylist)
print(mystring, type(mystring))

[2, 4, 6, 8, 'string']
[2, 4, 6, 8]
string <class 'str'>


### 3.2 Tuples

We won't say a whole lot about tuples except to mention that they basically work just like lists, with
two major exceptions:

1. You declare tuples using () instead of []
1. Once you make a tuple, you can't change what's in it (referred to as immutable)

You'll see tuples come up throughout the Python language, and over time you'll develop a feel for when
to use them. 

In general, they're often used instead of lists:

1. to group items when the position in the collection is critical, such as coord = (x,y)
1. when you want to make prevent accidental modification of the items, e.g. shape = (12,23)

In [38]:
xy = (23, 45)
print(xy[0])
xy[0] = "this won't work with a tuple"

23


TypeError: 'tuple' object does not support item assignment

### Anatomy of a traceback error

Traceback errors are `raised` when you try to do something with code it isn't meant to do.  It is also meant to be informative, but like many things, it is not always as informative as we would like.

Looking at our error:

```
TypeError                                 Traceback (most recent call last)
<ipython-input-8-c7b77af2676f> in <module>()
      1 xy = (23, 45)
      2 print(xy[0])
----> 3 xy[0] = "this won't work with a tuple"

TypeError: 'tuple' object does not support item assignment
```

1. The command you tried to run raise a **TypeError**  This suggests you are using a variable in a way that its **Type** doesnt support
2. the arrow ----> points to the line where the error occurred, In this case on line 3 of your code form the above line.
3. Learning how to **read** a traceback error is an important skill to develop, and helps you know how to ask questions about what has gone wrong in your code.




### 3.3 Dictionaries

Dictionaries are the collection to use when you want to store and retrieve things by their names
(or some other kind of key) instead of by their position in the collection. A good example is a set
of model parameters, each of which has a name and a value. Dictionaries are declared using {}.

In [39]:
# Make a dictionary of model parameters
convertors = {'inches_in_feet' : 12,
              'inches_in_metre' : 39}

print(convertors)
print(convertors['inches_in_feet'])

{'inches_in_feet': 12, 'inches_in_metre': 39}
12


In [40]:
## Add a new key:value pair
convertors['metres_in_mile'] = 1609.34
print(convertors)

{'inches_in_feet': 12, 'inches_in_metre': 39, 'metres_in_mile': 1609.34}


In [41]:
# Raise a KEY error
print(convertors['blueberry'])

KeyError: 'blueberry'

#### Some Python "packages" extend the list of "COLLECTIONS". The most popular are:
##### Numpy: n-dimentional arrays
##### Pandas: data frames

## 6. Creating chunks with functions and modules

One way to write a program is to simply string together commands, like the ones described above, in a long
file, and then to run that file to generate your results. This may work, but it can be cognitively difficult
to follow the logic of programs written in this style. Also, it does not allow you to reuse your code
easily - for example, what if we wanted to run our logistic growth model for several different choices of
initial parameters?

The most important ways to "chunk" code into more manageable pieces is to create functions and then
to gather these functions into modules, and eventually packages. Below we will discuss how to create
functions and modules. A third common type of "chunk" in Python is classes, but we will not be covering
object-oriented programming in this workshop.

In [44]:
# It's very easy to write your own functions
def multiply(x, y):
    print(x,y,sep=" * ")
    return x*y

In [45]:
# Once a function is "run" and saved in memory, it's available just like any other function
print(type(multiply))
print(multiply(4, 3))

<class 'function'>
4 * 3
12


In [46]:
# It's useful to include docstrings to describe what your function does
def say_hello(time, people):
    '''
    Function says a greeting. Useful for engendering goodwill
    '''
    return 'Good ' + time + ', ' + people

**Docstrings**: A docstring is a special type of comment that tells you what a function does.  You can see them when you ask for help about a function.

In [47]:
say_hello('afternoon', 'friends')

'Good afternoon, friends'

In [48]:
help(say_hello)
# docstrings create automatic documentation for python functions!!

Help on function say_hello in module __main__:

say_hello(time, people)
    Function says a greeting. Useful for engendering goodwill



In [49]:
# All arguments must be present, or the function will return an error
say_hello('afternoon')

TypeError: say_hello() missing 1 required positional argument: 'people'

In [50]:
# Keyword arguments can be used to make some arguments optional by giving them a default value
# All mandatory arguments must come first, in order
def say_hello(time, people='friends'):
    return 'Good ' + time + ', ' + people

In [51]:
say_hello('afternoon')

'Good afternoon, friends'

In [52]:
say_hello('afternoon', 'students')

'Good afternoon, students'

### EXERCISE 5 - Creating a variance function

Finally, let's turn our variance calculation into a function that we can use over and over again. 
Copy your code from Exercise 4 into the box below, and do the following:

1. Turn your code into a function called `calculate_variance` that takes a list of values and returns their variance.
1. Write a nice docstring describing what your function does.
1. In a subsequent cell, call your function with different sets of numbers to make sure it works.

__Bonus__

1. Refactor your function by pulling out the section that calculates the mean into another function, and calling that inside your `calculate_variance` function.
2. Make sure it can works properly when all the data are integers as well.
3. Give a better error message when it's passed an empty list. Use the web to find out how to raise exceptions in Python.

In [2]:
def calculate_variance(values):
    '''
    Function to calculate variance of a list of numbers.

    Arguments
    ---------
    values : list of floats
        The measurements.

    Returns
    -------
    v : float
        The sample variance of the input values.
    '''
    mean = calculate_mean(values)
    
    sum_diffsq = 0
    for height in heights:
        diff = height - mean
        diffsq = diff ** 2
        sum_diffsq = sum_diffsq + diffsq

    return sum_diffsq/(len(heights) - 1)

def calculate_mean(values):
    '''Calculate mean of values.'''
    total = 0

    for val in values:
        total = total + val

    return total / len(values)

In [3]:
print(calculate_variance([0.6, 0.1, 0.8]))
print(calculate_variance([174.3, 165.2, 208]))
print(calculate_variance([1.1, 1.5, 2.0]))

NameError: name 'heights' is not defined

### EXERCISE 6 - Putting the `calculate_mean` and `calculate_variance` function(s) in a module

We can make our functions more easily reusable by placing them into modules that we can import, just
like we have been doing with `numpy`. It's pretty simple to do this.

1. Copy your function(s) into a new text file, in the same directory as this notebook,
called `stats.py`.
1. In the cell below, type `import stats` to import the module. Type `stats.` and hit tab to see the available
functions in the module. Try calculating the variance of a number of samples of heights (or other random numbers) using your imported module.

In [4]:
# import stats
# reload(stats) # This line forces the re-import of pop if you make changes and run this cell again
# result = stats.calculate_variance([1.8,1.9,2.0,1.7,1.6,2.2])
## compare it to numpy's calculation 
## np_result = np.var([1.8,1.9,2.0,1.7,1.6,2.2])
assert result == np_result


NameError: name 'result' is not defined