# Jupyter notebook

The Jupyter Notebook (what you're looking at now) is an interactive computing environment (IDE) that enables users to make notebook documents that include: 
- Live code
- Plots
- Narrative text
- Equations
- Images

To run the notebook locally, follow instructions in the <a href="http://nbviewer.jupyter.org/github/antopolskiy/sciprog/blob/master/000_pre_course_tutorial_annotated.ipynb">pre-course tutorial notebook</a>.

You can find full documentation <a href="http://jupyter-notebook.readthedocs.io/en/latest/index.html">here</a>. If you want to get more comfortable with notebooks quickly, I suggest you go through <a href="http://jupyter-notebook.readthedocs.io/en/latest/examples/Notebook/Notebook%20Basics.html">notebook basics</a> section. For now what you need to know is that notebooks are composed of cells, and each cell contains either **Markdown** (like this cell), which is used for text, or **Code**, which is used for snippets of code or even whole scripts. You can execute the content of the cell by pressing **Shift+Enter**. To create a new cell **b**elow the current cell, press **B** key; to create a new cell **a**bove the current cell, press **A** key.

# Spyder

<a href="https://pythonhosted.org/spyder/">Spyder</a> (Scientific PYthon Development EnviRonment) is another IDE shipped with Anaconda. To start it, you can run `spyder` in your terminal. It is more Matlab-like, and more suited for running scripts rather than exploring and prototyping.

There are other IDEs available for Python, but these will fulfill most of your needs.

# Python 2 vs Python 3

Question which almost always comes up in the Python courses: what's the deal with two different versions of Python -- Python 2 (currently in version 2.7) and Python 3 (currently in version 3.6)? Very quick note on that. Python is a quickly developing language, with new versions coming out almost every year. Usually they contain small and specific improvements, and the scripts are *forward compatible* -- whatever you could run in the previous version, you can also run in the new one. However, at some point (in 2008) it appeared that there is a stack of changes for the internal logic of the language, which need to be made for the language to develop further, and they will quite significantly change how people should use the language (significantly is a strong word: in reality it is significant for software development, not really for data science). At this point script became *forward incompatible*: you cannot (should not!) run a Python 2 script in Python 3 without any changes.

Python 2.7 is the last version of Python 2, and no new versions will come out. In fact, in 2020 support for Python 2 versions will stop completely. Python 3 is the future of the language, so definitely use that if you can. That said, there are several of packages which haven't been translated to Python 3, like `psychopy`. It is generally not a problem: you can install both versions of Python easily with Anaconda, and keep them separate. Whenever you need to use a library supported by Python 2, just switch kernel. We will learn how to do this later, in the last parts of the course (if you need it now, ask me during office hours). In any case, hopefully very soon all active packages will be transferred to Python 3.

If you want to know more, check out here: https://wiki.python.org/moin/Python2orPython3

# Documentation
Python has very good documentation: https://docs.python.org/3/. It also includes a very thorough <a href="https://docs.python.org/3/tutorial/index.html">tutorial</a>. Different Python packages have their own documentation in separate places, but they are only one google search away.

You can get in-built documentation dynamically in Jupyter by running `help(<function-name>)` or just `<function-name>?` (e.g. `help(range)` or `range?`)

##### Quickly work through the pre-course tutorial notebook

# Now let's dive a bit deeper on each topic

## Notebook's cell output

As you noticed, a cell will display output of the script inside, when you run. However, it will display only the output of the **last** line. If last line doesn't have output, it won't display anything. Compare the following two cells:

In [None]:
x = 10
x

In [None]:
x = 10
x
y = 9

If you want to actually display something, you need to say it explicitly with `print` function. Note that in this case the message is not an "output" *per se* (which you can notice by the fact that on the left it doesn't say `Out[]:`), it is just printed. You can `print` infinitely many things, but show output of only one per cell.

In [None]:
x = 10

# this value is just printed
print(x)

y = 5

# this is going to be displayed as an output
y

**Pro-tip**: When you're not sure what the function's output, just put it in a separate cell and run it. If it has an output, if will show up as an output of the cell.

## Assignment

Variables allow you to store value. Or do they? In fact, variables are just a pointer (a *reference*) to an object in memory. Here is an example which can be confusing to a novice:

In [None]:
X = [0,5,10,15,20]
Y = X
Y[1] = -999
print(X)
print(Y)

What happens here? We created a list in memory, and variable `X` points to that object. Variable `Y` is assigned the same value as `X`, but the list is not copied, rather `Y` merely points to the same object. We modify the list through `Y` and discover that `X` still points to the same object.

If you want to avoid this, use explicit `.copy()` method on the list:

In [None]:
X = [0,5,10,15,20]
Y = X.copy()
Y[1] = -999
print(X)
print(Y)

This is an example of Python giving you more control over memory and pointers. In Matlab and R default behavior is to copy an object, which can lead to some serious memory replication (using a lot of memory for copies of the same objects). Control is good but you need to be aware of this behavior.

# Tuples: immutable lists

We already learned about `list`: they are containers for different types of stuff. There is another type of *in-built* contained data type, called `tuple`. They are denoted with parentheses `()` instead of brakets `[]`. 

In [None]:
# make a tuple
info = ('Sergey', 28, 'Russian', 1989, 1, 9)
info

In [None]:
type(info)

Tuples are very much like lists, except one thing -- they cannot be changed like lists, here is an example:

In [None]:
# let's make a list out of our tuple:
info_list = list(info)
info_list

In [None]:
type(info_list)

In [None]:
# now try to change something in it: it works
info_list[1] = 29
info_list

In [None]:
# let's try to do the same with tuple
info[1] = 29

We get an error if we try to change some value in a tuple. The same if we try to add something to it. 

A good question would be -- why do we need to have exactly the same thing as `list`, but which can do LESS than a `list`? It turns out that for many reasons it is very convenient to have some data type, which cannot be changed. We won't go into details here, but if you have something which you don't intend to change, consider making it a `tuple` instead of a `list`. In the very least you won't change it *accidentally*.

# Mapping data types

Another *in-built* container data type is `dict` (short for *dictionary*). `Dict` contains **pairs of things**. Any entry in a `dict` is pair `key`:`value` (in programming this relationship is called *mapping*: values maps onto the key). Think about it as a real world dictionary -- in an English-Italian dictionary you have a `key` word, e.g. **shirt**, and a `value`, associated with it: **camicia**. And you can find a `value` by addressing the `key`. Just like in the real dictionary, you cannot go the other way and find the word **shirt** by looking up **camicia** -- you would need another, Italian-English dictionary for that. Same with `dict`: `keys` and `values` are not symmetric, you can only get them in one direction `key`->`value`.

Syntax for a `dict` is to put `key:value` pairs inside curly brackets `{}`, with different pairs separated by comma:

In [None]:
{'shirt':'camicia'}

In [None]:
info = {'name':'Adina', 'surname':'Drumea', 'lab':'Diamond', 'taken_prog_class':False, 'languages': ['Matlab','C++']}
info

Another way of defining a `dict`. Results are equivalent, so choose whatever you like. Note in this case `keys` need not be strings, but they become strings in the dict:

In [None]:
info = dict(name='Adina', surname='Drumea', lab='Diamond', taken_prog_class=False, languages=['Matlab','C++'])
info

We can retrieve values from `dict` by specifying `key` like this:

In [None]:
info['surname']

In [None]:
info['taken_prog_class']

**Note**: Both `key` and `value` can be of any type (with only exception that `keys` cannot be `list` and some other *modifiable* types; this has to do with implementation of `dict` in Python). If `key` repeats, it will override:

In [None]:
{'name':'Adina', 'name':'Marinella'}

Besides storing and retrieving values from `dict`, you can also iterate through `keys` and `values` easily:

In [None]:
for (key, value) in info.items():
    print('The key was:', key)
    print('The value was:', value)
    print('')

`dict` supports a lot of different operations (check full documentation <a href="https://docs.python.org/2/library/stdtypes.html#mapping-types-dict">here</a>). Here are some of them:

In [None]:
# check whether certain key is in the dict
'surname' in info

In [None]:
'age' in info

In [None]:
# return list of keys
info.keys()

In [None]:
# return list of values
info.values()

In [None]:
# add stuff to the dict
info.update({'age':28, 'rooms':324})

In [None]:
# removing stuff from the dict
del info['taken_prog_class']

In [None]:
info

**Note**: The most attentive of you will notice that order of the `key`:`value` pairs has changed when we updated the `dict`. This shows potential pitfall of using `dict`, which you have to be careful about: **`dict` does NOT store the order of inserted pair**! For example, if you try to iterate through the values in the dict (using, you cannot trust that it will iterate in the order in which you inserted the pairs. 

If ever you need to use mapping type which remembers the order, take a look at <a href="https://docs.python.org/2/library/collections.html#collections.OrderedDict">`OrderedDict` from `collections` module</a>. It operates the same way as `dict`, but will keep the order if you iterate.

In [None]:
from collections import OrderedDict
info_ordered = OrderedDict(name='Adina', surname='Drumea', lab='Diamond', taken_prog_class=False, languages=['Matlab','C++'])
info_ordered

In [None]:
for key, value in info_ordered.items():
    print(key, value)

# <font color='DarkSeaGreen '>Exercise</font>
In the cell below create a dictionary to hold information about pets. Each key is an animal's name, and each value is the kind of animal. For example, 'ziggy': 'canary'. Put at least 3 key-value pairs in your dictionary. Use a `for` loop to print out a series of statements such as "Willie is a dog."

# <font color='DarkSeaGreen '>Exercise</font>
In the cell below:
- Make a copy of your program from the previos exercise.
- Modify one of the values in your dictionary. You could clarify to name a breed, or you could change an animal from a cat to a dog.
- Now that same values are updated, again use a `for` loop to print out a series of statements such as "Willie is a dog." 
- Add a new key-value pair to your dictionary.
- Print values again
- Remove one of the key-value pairs from your dictionary.
- Print values again
- **Bonus**: Use a function to do all of the looping and printing in this problem.

# List comprehensions

In Python there is a number of syntax simplifications, which can be used to speed up coding. You will learn those over time, but there is one particularly useful shortcut called *list comprehensions*, which not only speeds up the coding, but also significantly improves code readability. As a consequence it is used ubiquitously. 

It has to do with how we write `for` loops. In particular, consider the following (real life) example. Let's say I recored behavior in a bunch of rats, and for each session I have a name, which contains year, month, day of the session and the codename of the rat in the following format: YYYYMMDDratcode. Example: `20170114S8`, where `S8` is the name of the rat.

In [None]:
sessions = ['20160701S8', '20160702S9', '20160702S8','20160703S10', '20160703S9', '20160703S8']
sessions

Now I just want to get the dates of the session, so that I can see on which days I recorded at least 1 rat. I could construct the following loop:

>**Syntax tip**: `append(x)` is a method of type `list`, which will add `x` in the end of the `list`.

In [None]:
# create a new empty list, which we will append later
sessions_date = []
# iterate through every session
for s in sessions:
    # append the new list with the first 8 characters from the session name
    sessions_date.append(s[:8])
    
sessions_date

Possible, but a bit too tedious. Especially the part where you have to create an empty `list` and then append values there. There is a better way in Python:

In [None]:
[s[:8] for s in sessions]

This produces the same exact output, but instead of taking several lines it just takes one. It is also quite easy to read once you get a hang of it. See `for s in sessions`, which is exactly the same as in the long `for` loop above, and it does the same thing: iterates through values of the `sessions`, and on each iteration `s` takes value from the list, one after another. And for each iteration, you return `s[:8]`, which is the first 8 characters from `s`. These values are automatically captured in the list. You can assign it to a variable in the same way as any other list:

In [None]:
sessions_date = [s[:8] for s in sessions]
sessions_date

**Side note**: To follow through with the example, if I wanted to get unique days of the recording, I can use a function `unique` from `numpy` module, which will return only the unique values:

In [None]:
from numpy import unique
print(unique(sessions_date))

*List comprehensions* (or *listcomps* for short) will save you a lot of time and space in your script. You can even do some conditional things inside. Let's say I wanted to return the date ONLY for the rat `S9`. I can use `in` to check presence of a sub-string in a larger string like so:

In [None]:
s = '20160701S9'
'S9' in s

However to go through all sessions and check, I would need a `for` loop with `if` inside (if you want, try to implement it like that as an exercise). Instead we can do the same with listcomp:

In [None]:
[s[:8] for s in sessions if 'S9' in s]

# <font color='DarkSeaGreen '>Exercise</font>
In the cell below translate to the loop the following listcomp:

    numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    numbers_even = [n for n in numbers if n%2==0]
    
>**Tip**: First run it and see what it outputs.

# <font color='DarkSeaGreen '>Exercise</font>
In the cell below translate to listcomp the following loop:

    values = [1, 2.5, 3, 4.1, 5, 'd', 1, 25, 'hello', 8.7]
    floats = []

    for v in values:
        if type(v) == float:
            floats.append(v)
            
>**Tip**: First, figure out what it does. Second, make a very simple listcomp which puts all objects from list `values` to list `floats`. Then add `if` to your listcomp.

# Functions with default arguments

We saw how to create basic functions. One additional useful trick is to have a function with default arguments. That is, when you call the fucntion, you can specify the argument if you want, but you can also skip it, and it will take on some default value. It is very easy to do in Python: when you define the function's arguments, simple write `<argument-name> = <default-value>` and the argument will take this value if no other value is specified when the function is called. 

Let's see an example. By default, the following function raises `x` to the power `p`. However, there is a twist, you can specify additional argument `verbose`, and if it is `True`, the function will print out what it does. (It is a good practice to add these kinds of options to your functions, because they can help you debug the errors in your code).

In [None]:
def power(x, p, verbose=False):
    if verbose:
        print("evaluating power for x = " + str(x) + " using exponent p = " + str(p))
    return x**p

In [None]:
# we can use the function to just raise 5 to power 2 (equivalent to 5**2)
power(5,2)

In [None]:
# but we can also make the function "verbose", so it tells us what it does
power(5,2,True)

Any number of arguments can have default values, even all of them. Try to understand which values each argument takes and what the redefined function outputs in the following cells:

In [None]:
def power(x=5, p=2, verbose=False):
    if verbose:
        print("evaluating power for x = " + str(x) + " using exponent p = " + str(p))
    return x**p

In [None]:
power()

In [None]:
power(6)

In [None]:
power(6,3)

In [None]:
power(6,3,True)

# Positional arguments as keyword arguments

When you're creating the function, you always specify the names of the arguments so that you can use them inside the function. These names have another use: when you call the function, you can pass values by using argument names, for example:

In [None]:
power(x=10, p=3, verbose=False)

In [None]:
power(x=3, p=10, verbose=False)

Why would we do it? First, it leads to clearer function calls, you don't need to remember that first argument was `x`, and second was `p`, etc. However, more importantly, if you can only pass arguments based on position, you cannot keep default value for an argument while specifying a non-default value to the argument after that.

Consider the following example: what if I wanted to call own function `power()` with default `x` and `p`, but specifying `verbose=True`? I cannot do `power(True)`, in that case `True` will become the value of `x`, because it is the first argument. But what I can do is the following:

In [None]:
power(verbose=True)

In that case both `x` and `p` keep their default values. This is extremely useful for functions with many parameters.

Moreover, if I use this type of passing arguments (which is called *keyword arguments* as opposed to *positional arguments*), I don't need to care about the order at all:

In [None]:
power(verbose=True, x=9, p=2)

In [None]:
# same call as before: order doesn't matter
power(p=2, verbose=True, x=9)

You can also combine *keyword* and *positional* arguments:

In [None]:
# in this case p will keep default value 
power(10, verbose=True)

The only thing you cannot do is to pass positional arguments after keyword:

In [None]:
power(verbose=True, 10)

# A (very brief) intro to object-oriented programming

**Note**: This is a bit more advanced topic which might be difficult to understand on your own (outside class). If you have difficulties with this section, just skip it. It is not necessary for further parts, it is here only to improve your understanding of what is going inside the language.
___

<img src="https://cdn.meme.am/cache/instances/folder307/46864307.jpg">



In Python, everything is an object, which belongs to a certain class (which is the same thing as type). We already saw many different classes (types) of objects, such as `int`, `float`, `str`, `list`, `tuple`, `dict`, etc. You can use existing classes, but you can also create new ones. In reality, in working with data there is almost no application for this (however if you do simulations, especially parts-based simulations like neuronal simulations, this is extremely useful). Still, it will help you understand what is going on with the language. Let's create a class `Student` and inside it define a variable and a function, like so:

In [None]:
class Student():
    
    message = 'this is a message'
    
    def say_hi(self):
        print('Hi!')

Now we can create an *instance* of that class:

In [None]:
Alex = Student()

It does nothing special, it just exists. We can check the type of `Alex` and verify that it is of our class `Student`:

In [None]:
type(Alex)

(`__main__` refers to the fact that we defined the class inside this particular notebook; if you imported that class from, let's say, module `math`, it would say `math.Student`)

There is one particular thing about our class `Student` though. We defined a function inside it. This function is not actually a function *per se*, which you can verify by trying to run it (we can use any input just for demonstation):

In [None]:
say_hi(some_input)

Python says that the function doesn't exist. That is because it only exists inside the object. We can call it, using *dot-notation*, like so:

In [None]:
Alex.say_hi()

In [None]:
Ehsan = Student()
Ehsan.say_hi()

The reason I am describing creating classes is to point out the difference between *functions* and *methods*. 

Methods are "functions inside an object". They cannot be called separately from an object, but are accessed with *dot-notation*: `<object>.<method>`. Which methods an object has is defined by its `class`, for example, object of the class `dict` will have different methods from object of the class `list`. In the example above `say_hi()` is a method of a class `Student`.

Let' consider an example from above (section about dictionaries):

In [None]:
for key, value in info.items():
    print(key, value)

Note that to iterate through the dictionary, we used syntax `info.items()`. You might not recognize it before, but now you should understand that `items()` is nothing else but a *method* of the object `info`. It is there because `info` is an instance of the class `dict`. In particular, `items()` is a method of a class `dict` which returns contents of the `dict` (`key:value` pair saved there) in a form easily used for iteration through loops.

In your work, you will use both functions and methods and we will see many examples of that.

There was also a variable we defined inside the `class Student`, which we called `message`. We can access it in a similar way:

In [None]:
Alex.message

Variables inside an object are called *attributes*.

**Pro-tip**: if you write `<object>.` and then press TAB, Jupyter will list all the methods and attributes available for the object. E.g, in the cell below try writing `info.` and press TAB, and it will list all the methods available for this instance of the class `dict`.

In [None]:
# write below res. and press TAB


# Troubles with floats
Example from the survey:

In [None]:
x = 0.1 + 0.2
y = x == 0.3
y

Why this happens? If you can spare 9 minutes of your time, I suggest you to watch a great explanation on <a href="https://www.youtube.com/watch?v=PZRI1IfStY0">Computerphile</a> youtube channel. You will understand how floating numbers are stored and why the floating arithmetic is not precise.

I will attempt to explain it here in a nutshell. The problem is more evident if we just look at what `0.1 + 0.2` gives us:

In [None]:
0.1+0.2

As you can see, there is a marginal error in the end. Why is it there? The answer has to do with how numbers (in particular, `float` type) are stored in memory: they can only store a certain number of *significant digits*. In situations when the precision requires an infinite number of repeating digits, it will fail. It is easier to understand with an analogy between fractions and decimals.

Think about a fraction `1/3`. If you try to write it as a decimal, you'd have to write `0.33333333...` and so on. But what if you were able to only store 5 significant digits after `0.`? You'd have to write `0.33333`, that's the best approximation you can do. Now if you try to do arithmetic with `1/3`, let's say `1/3 + 1/3 + 1/3 = 3`. However, if you try to do it in decimal where you can only store 5 significant digits, you'd get `0.99999`.

A similar thing happens in the computers when you write any decimals (like `0.1` and `0.2`), because computers cannot store decimal notation in memory, they have to translate them to *binary*. And sometimes with this translation you get a repeating pattern, just like with `0.33333...`. But computers cannot store infinite numbers, they have to cut it after some digits. This introduces errors when working with `float`, and it will happen in any language, because it is a fundamental property of computers.

Based on these difficulties, there are 2 fundamental principles one has to keep in mind when working with `float` type.

**First**: Never compare equality of floats, this is exactly same mistake which we did in the script from the survey:
    
    x = 0.1 + 0.2
    y = x == 0.3
    
The result will sometimes be `True`, sometimes `False`, based on which numbers exactly you try, but the point is that it is *inconsistent*, you cannot trust it. It is usually fine to compare which float is larger though. (**Pro-tip**: if you ever do need to compare equality of floats and cannot get around it, there is function called <a href="https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.isclose.html">`isclose`</a> in the `numpy` package, which will compare two things up to a specified tolerance).

**Second**: Be very careful when doing summation or difference between floats of vastly different order: the smaller ones can get lost in the noise. Consider the following example, where we add `0.001` one thousand times to a large number $10^{13}$, and expect to get $10^{13}+1$:

In [None]:
a = 10.0**13
for i in range(1000):
    a = a + 0.001
a

As you can see, the error is almost the same size as the number we added.

**Pro-tip**: if you even need to do very precise operations with `float` type, check out <a href="https://docs.python.org/2/library/decimal.html">`decimal`</a> module in Python)

If you want to dig a little deeper on that, check out this really great page: http://floating-point-gui.de/