# Plotting and Programming in Python
## August 26 and 27, 2019
In today's lesson, we will cover the basics of the Python programming language and introduce two of its most famous data science libraries: _pandas_ and _matplotlib_.

## Variables and Assignment

A variable is a name for a value stored in your computer. This value can be a number, text, or a piece of code. When you create a variable, you can refer to it later in your program to retrieve whatever information it stores. In Python, you create (or assign) variables using the `=` symbol; to the left of the `=` is the name of the variable and to the right its value. 

In [134]:
var = 33

In [135]:
another_var = "Joao"

Knowing how to name your variables is _extremely_ important. After all, code is meant to be read. As such, both your future self and your colleagues will appreciate if you put effort in creating clear and informative variable names.

In Python, naming variables follows two simple rules: (1) variable names cannot start with a digit (e.g. 1variable) and (2) they can only contain letters, digits, and underscores. Besides these two rules, Python does not care what you call your variables. People do, however. There are several conventions to naming variables in the Python community, namely those enshrined in the PEP8, the so-called "official" Python Style Guide. Its authors _suggest_ that you use lowercase letters for variable names and underscores to separate words, e.g. `first_name` versus `FirstName`.

Regardless of these suggestions, the most important is to be _consistent_ and to follow whatever conventions others working with you use already. Again, code is meant to be read. The easier someone reads it, the faster they'll understand it, use it, and change it. Use simple and meaningful names. Abbreviations are also fine, as long as they are obvious to the reader: e.g. `idx` instead of `index`, `res` for `residue`, and `aa` for `aminoacid`.

Lastly, there are some few words that you cannot use as variable names, because they represent special commands or _statements_ in Python. These include  `if` or `for` or `with`. Trying to assign a variable with one of these names will result in an error:

In [3]:
if = "Will this work?"

SyntaxError: invalid syntax (<ipython-input-3-d3ac3f1b5d05>, line 1)

Let's rename our variables appropriately then:

In [136]:
age = 33
first_name = "Joao"

While the computer always knows what a certain variable represents, we readers don't. In programs with thousands of lines and complex operations, it is easy to lose track of what the current value of a specific variable is. Enter the `print()` function. We will learn more about functions later but for now, consider a function as something that takes values (input) and returns transformed values (output). In Python, the input values passed to a function are called arguments.

The `print()` function exists to output things as text. Its arguments can either be variables or values themselves.

In [26]:
print("Joao")

Joao


In [27]:
print(first_name)

Joao


This last example highlights an important feature of Jupyter notebooks that is worth remembering. When you declare a variable in one cell and evaluate that cell, all other cells in the notebook can access that variable. In other words, the order in which _cells are written_ in the notebook is not important. What matters is the order in which _cells are executed_. Regardless, do yourself and others a favor and keep a nice logical flow in your notebooks!

The `print()` function can also be used to output multiple variables or values, allowing us to write full sentences. In this case, the function adds a space between each output item to separate them. In addition, by default, it also adds a line break, or newline character, after the last item.

In [9]:
print(first_name, 'is', age, 'years old')

Joao is 33 years old


As expected, trying to print a variable that we did not defined before will result in an error. The computer does not know what that name points to! This error is quite common, both for beginners and veteran programmers alike, and often arises from typos in variable names. Make sure to always double check your code when you get an error - look specifically for misplaced underscores and for case swaps. Variable names in Python are case sensitive:

In [30]:
print(First_Name)

NameError: name 'First_Name' is not defined

Variables can also be assigned to other variables, instead of values. While this might seem confusing - why would you want two variables pointing to the same piece of information? - it comes handy when dealing with functions and different scopes. These are concepts we will learn later in the lesson.

__Exercise__

Look at the following piece of code and run it in your head. What do you think is the final value of `x`?

(A) `1`

(B) `42`

(C) `"three"`

In [36]:
x = 1
y = "three"

x = y

y = 42

In [37]:
print(x)

three


In many programming languages, creating a variable creates a space in the computer's memory that stores a particular value. In Python, however, variables are just _pointers_ to _objects_ that store values themselves. What this means is that in Python, variables do not store values but simply point to _something_ that stores a value. This is the reason why the code in the previous cell works the way it does.

First, we define `x` and `y` as pointing to `1` and `3`, respectively. Then, we assign the value of `y` to `x`, effectively saying "x is now pointing to what y is pointing to". As such, changing the value of `y` after this does not impact the value of `x`: `x` keeps pointing to whatever `y` was pointing to before, and `y` points to something new.

## Data Types

Now that we got the hang of defining variables and printing values, let's go back to our `age` and `first_name` variables. What is the difference between the declarations of these two variables?

The variable `age` was assigned to 33, without quotes, while `first_name` was assigned to a value inside quotes. This syntax tells Python that `age` is an _integer_ while `first_name` is a _string_. Integers and strings are two of the several data types encoded in Python.

Going slightly more in-depth, Python is what is called a _dynamically-typed_ language, in contrast to _statically-typed_ languages like C. Variables themselves do not have types, because they are pointers to objects. As such, at different points in time, the same variable can point to different objects, like we saw in the example above. Objects, however, _do_ have types, which we can check with the `type()` function:

In [34]:
print(type(age))

<class 'int'>


In [35]:
print(type(first_name))

<class 'str'>


In [158]:
print(type(type(first_name)))

<class 'type'>


So, what are the basic data types defined in Python?

In [57]:
an_int = 33  # whole numbers
a_str = "thirty-three"  # characters or text
a_float = 33.0  # real numbers (decimals)
a_complex = 2j + 1  # complex numbers (real + imaginary)
a_bool = True  # True or False
a_nonetype = None  # special object indicating a null value

The different data types have different purposes, properties, and support different operations. In most cases, mixing data types results in an error, e.g. adding an integer to a string.

In [138]:
result = an_int + a_str

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Let's have a look at each data type in more detail.

### NoneType
The NoneType is a special object in Python that represents a "null" value. You will often find it as the default return value of functions. For example, `print()` does not return anything. If we assign its return value to a variable and print it, we get `None` back.

In [114]:
result = print('banana')
print(result)

banana
None


### Boolean

Booleans are simple types with two possible values - `True` or `False` - which are usually used together with conditional statements, which we will cover later in this lesson. For now, it is important to remember that every single Python object evaluates to a boolean value:

* The number 0 (integer or float), any empty sequence (including empty strings), and the `None` value __always__ evaluate to `False`
* Anything else evaluates to `True`

In [113]:
print(bool(0.0))
print(bool(""))
print(bool(123))
print(bool(" "))

False
False
True
True


### Numerical Types

Integers and floats support all basic arithmetic operations: addition, subtraction, multiplication, and division, as well as others. You can combine all these operators in long expressions, allowing you to code more complex equations. Python arithmetics follows standard operation precedence rules: exponentiation before multiplication/division before addition/subtraction. You can use parenthesis to force operations to take precedence, just like in regular pen-and-paper math. 

In [159]:
i = 2
j = 3

print('sum:', i + j)
print('subtraction:', i - j)
print('multiplication:', i * j)
print('true division:', i / j)  # note the change to a float type
print('exponentiation:', j ** i)
print('integer division:', i // j)
print('modulus:', i % j)

sum: 5
subtraction: -1
multiplication: 6
true division: 0.6666666666666666
exponentiation: 9
integer division: 0
modulus: 2


As seen above, some operations on integers return floating point numbers, and vice-versa. As a general rule, numerical operations with float and integer types return floats. The exceptions are integer division and modulus operations, that always return integers. Operations between integers always return integers, except for true division (`/`), which always returns a float.

In [60]:
result = 23 + 6.5
print(result, type(result))

29.5 <class 'float'>


__Exercise__

Complete the following code to calculate the euclidean distance between two points, knowing that the formula is:
$\sqrt{(x1 - x2)^2 + (y1 - y2)^2}$

In [195]:
x1 = 0.0
y1 = 0.0
x2 = 1.0
y2 = 1.0

# euclidean_distance = _____
euclidean_distance = ((x1-x2)**2 + (y1-y2)**2)**0.5
print(euclidean_distance)

1.4142135623730951


### Strings
Strings are an example of a so-called _sequence_ type. Unlike numerical types like ints and floats, which represent one value, strings are actually a collection of multiple characters. Moreover, they are ordered, which means we can access them individually by providing a positional index.

To access elements of a string, we use square brackets. Inside the square brackets, we write the positional index of the characters we want to extract. **In Python, the first index in any sequence is zero, not one.**

In [93]:
atom_name = 'helium'

In [94]:
first_letter = atom_name[0]
print(first_letter)

h


We can access multiple characters, ie to extract a substring, using the slice notation. Slices are also objects in Python and contain three elements, _start_, _end_, and _step_, separated by `:` characters. Importantly, slices are __not inclusive__ on the end index. In other words, the character at the position defined by the _end_ element will not be included in the output substring.

In [95]:
substr = atom_name[0:3]
print(substr)

hel


If you want to slice from the beginning of the string, or until the end of the string, you can omit the start or end element respectively. Omitting both returns a copy of the original string.

In [97]:
print(atom_name[:3])
print(atom_name[0:])
print(atom_name[:])

hel
helium
helium


The _step_ element of slices allows us to return only every n-th element of the string. By default, the step is 1, but we can change it:

In [108]:
print(atom_name[::2])

hlu


An interesting feature of Python sequences is the support for _negative_ indexes. What do you think the following code returns?

In [109]:
print(atom_name[-1])

m


__Exercise__

Can you think of a way to combine negative indexing and slicing to invert a string (revert the order of all characters)?

In [107]:
reverse = atom_name[::-1]
print(atom_name, "reversed is", reverse)

helium reversed is muileh


Finally, some arithmetic operators work on strings, but they obviously do not do math. Adding two strings joins them, i.e. string concatenation. Multiplying a string by an integer, on the other hand, repeats the string _n_ times. Subtraction, division, and any other of the previously demonstrated operators will give an error if applied to strings.

In [118]:
result = 'badger' + 'badger'
print(result)

badgerbadger


In [117]:
result = 'badger' * 10
print(result)

badgerbadgerbadgerbadgerbadgerbadgerbadgerbadgerbadgerbadger


In [110]:
result = a_str - 3

TypeError: unsupported operand type(s) for -: 'str' and 'int'

You can however, convert certain strings to ints or floats. The reverse, converting ints or floats to strings, always works.

In [122]:
three = "3"
print(int(three) + 3)
print(float(three) + 0.1)

6
3.1


In [133]:
big_int = 122333444455555666666777777788888888
big_str = str(big_int)
print(big_str[0:5])

12233


### Type Conversions

You can forcefully convert (or cast, or coerce) one type into another, which comes handy some times. There are, however, some caveats to keep in mind.

In [147]:
result = 23 + 6.5
print('result is', result, 'which cast as an int is', int(result))

result is 29.5 which cast as an int is 29


In [148]:
print(int("3"))

3


In [149]:
print(int("three"))

ValueError: invalid literal for int() with base 10: 'three'

Boolean values also have numerical representations.

In [153]:
print("True is", int(True), "and False is", int(False))

True is 1 and False is 0


This leads to some odd operations that nevertheless come very handy sometimes:

In [155]:
print(True + True)

2


__Exercise__

What is the output of the following line of code? What does it actually do?

In [187]:
print("Fractional string to int:", int("3.4"))

ValueError: invalid literal for int() with base 10: '3.4'

How could you write the code above to output the properly converted string?

In [157]:
print("Fractional string to int:", int(float("3.4")))

Fractional string to int: 3


### What about complex numbers?

Complex numbers are well, more complex. They have real and imaginary parts. To access these, we have to understand a bit more about the internals of Python. As mentioned before, Python variables point to _objects_ and in fact, everything in Python is an object. Values are objects, functions are objects, even types themselves are objects. 

So what are objects? Objects are entities that contain not only data (the values) but also associated metadata (attributes) and/or functionality (methods). In Python, an object's attributes and methods can be accessed using the _dot notation_:

In [90]:
a_complex = 2 + 3j
print("Our complex number is", a_complex.real, " + ", a_complex.imag, "j")

Our complex number is 2.0  +  3.0 j


## Built-in Functions

If everything in Python is an object, then everything must have attributes and methods. How can we know what attributes and methods each object has then, without memorizing every single one of them?

Python is known as a programming language that comes with _batteries included_. What this means is that besides the barebones of the language, like statements and data types, a standard Python installation comes with a wealth of so-called _built-in functions_ and _libraries_ that simplify many operations. Let's focus on these built-in functions first.

We have already seen several such functions: `print()`, `type()`, `float`, `int`, etc. These come for free with Python and enable us to output text, check the data type of a certain object, and convert types. Another very useful built-in function is `help()`, which allows us to access the documentation written by the creators of a particular object and learn about it. Let's try it on our complex number.

In [None]:
help(a_complex)

In Jupyter notebooks, you can also access the help documentation by typing the object name followed by a question mark. This does not show the full help text but only a summary. You can also display this text by placing the cursor anywhere in the object name, holding down `shift`, and then press `tab`.

In [165]:
a_complex?

[1;31mType:[0m        complex
[1;31mString form:[0m (2+3j)
[1;31mDocstring:[0m  
Create a complex number from a real part and an optional imaginary part.

This is equivalent to (real + imag*1j) where imag defaults to 0.


There are many other built-in functions in Python. Googling "Python built-in functions" will show you the official documentation page that lists all of them. Let's go over a few important ones:

- `len` works on _sequence_ type objects, such as strings, and returns the number of elements (the length) in the sequence. In a string, it returns the number of characters.

In [169]:
print(first_name, "has", len(first_name), "letters")

Joao has 4 letters


- `round` works on integer and float types and round off a number to _n_ decimal places, by default 0. You can change this default value by giving the number of decimal places as the second argument.

In [171]:
pi = 3.14159
print(round(pi))
print(round(pi, 2))

3
3.14


- `abs` returns the absolute value of a number, if an integer or float.

In [174]:
print(abs(-3))

3


- `min` and `max` return the minimum and maximum of a sequence of values. They also work on sequence objects, like strings, by returning the min/max element.

In [180]:
print(min(1, 2, 3, 4))
print(max(1, 2, 3, 4))

1
4


In [181]:
print(min('a', 'b', 'c', 'd'))
print(max('a', 'b', 'c', 'd'))

a
d


In [184]:
print(min('a', 'A', 'b', 'B'))
print(max('a', 'A', 'b', 'B'))

A
b


In [186]:
my_str = 'abc'
print(max(my_str))

c


Although both functions support strings and integers, we cannot mix them. Python does not know how to compare them relative to each other, so it throws an error.

In [185]:
print(min('A', 1))

TypeError: '<' not supported between instances of 'int' and 'str'

__Exercise__

Predict what the following calls to `print()` will output. Does the result make any sense? 

In [189]:
rich = 'gold'
poor = 'tin'
print(max(rich, poor))

tin


In [191]:
print(max(len(rich), poor))

TypeError: '>' not supported between instances of 'str' and 'int'

## Coffee Break (15 minutes)

## Libraries

In [None]:
# Write the euclidean distance with functions from the math module.