#  Welcome

Welcome to the pre-course tutorial for the Scientific Programming course. It is intended for people with no or very little experience in programming, and no experience in Python. But you can also use it to remind yourself the syntax and basics of the language if you used it in the past.

We will go through the basics of the language and what we can do with it. Please make sure that you understand what is going on at every point in the tutorial, you will need this to progress in the Scientific Programming course at a normal speed.

You are looking at this document one of two ways. First way is when you're on https://github.com/antopolskiy/sciprog, which is the repository for the course materials. In here, this is a static document: you can't do anything with it, only look at it. If this is the case, you need to switch to the second way: opening this locally on your computer with an ability to run and edit code. To do this, you need to have a distribution of Python 3.5 or higher on your machine. You can get it by installing an <a href="https://www.continuum.io/downloads">Anaconda distribution of Python</a> (it is a distribution which includes most of the packages we will be working with during this course). I highly suggest you to install it instead of installing "vanilla" Python (please install Anaconda for Python 3.6, not Python 2.7; you can install Python 2.7 along with it later when you need it). 

If you already have Python and don't want to install Anaconda, follow guidelines <a href="https://jupyter.readthedocs.io/en/latest/install.html">here</a>.

After you have installed Anaconda (or just `notebook` module to your Python distribution), just run `jupyter notebook` in your OS console/terminal ("Command Prompt" in Windows). It will start a notebook server and open a web browser with Jupyter Dashboard (despite the fact that notebooks appear in the browser, they are not online -- they are just rendered in the browser). The folder in which the Dashboard opens is the one in which your console runs by default (e.g. in Windows it is most likely `C:\Users\<username>`). Download this document: go to https://github.com/antopolskiy/sciprog and click green button `Clone or Download` on the right, and choose `Download ZIP`: this will download all course materials as a `ZIP` archive. Unpack the archive and put the contents in the folder where you can navigate in your Jupyter Dashboard. Open this notebook (`000_pre_course_tutorial_annotated.ipynb`) and follow further instructions here.

What you're looking at is called Jupyter Notebook (formerly IPython Notebook). In a nutsheel, it is an environment which allows a very dynamic interation with the programming language (usually with Python, but you can also make it work with other languages, like R). For now what you need to know is that each cell contains either **Markdown** (like this cell), which is used for text, or **Code** (like the next few cells), which is used for snippets of code or even whole scripts. You can execute the content of the cell by selecting it and pressing *Shift+Enter*. When you do this, the output of the code will appear just below the cell (as you will see very soon). **If you're looking at this notebook on your own, I highly suggest you to execute every code cell, not just look at the code**. You can also change the code however you like and then execute it again to better understand what is happening.

If you have any questions, please ask them on our Slack forum http://sciprog.slack.com (you can sign up with @sissa.it email; if you don't have one, send an email to the course instructor).

# Using cells

You can do simple computations in cells and get output. Try running the following cells:

In [None]:
2+5

In [None]:
34+154-12

In [None]:
0.5+1.7

In [None]:
# this is a comment, it allows you to add descriptions to your code
# comments are prefaced by "#" symbol

# this will not run and doesn't give output:
# 0.5+1.7

# Variables

Variables are like pointers -- they point to a certain object. You can assign an object to a variable with `=` and then you can retrieve this object using the variable.

Variables can have any name. When you're writing a script, you want your variables to have **descriptive** names. This will make it easier to read and understand you code, which is important not only if you share your code with others, but also when you come back to some code after a period of time.

In [None]:
# example of non descriptive names
x = 24
y = 60
c = x*y
c

In [None]:
# example of descriptive names
hours_in_day = 24
minutes_in_hours = 60
minutes_in_day = hours_in_day*minutes_in_hours
minutes_in_day

Yes, it means writing a bit more, but it will save you time in the long run.

## Order of operations

Just like in math, in programming order of operations matters. For example, consider the following code:

In [None]:
(1 + 1/2) * (9-1)

Just like in math, the stuff in the brakets is computed first and multiplication and division has preference over summation and difference.

In [None]:
2 + 2 * 2

In [None]:
(2 + 2) * 2

Assignment is *always* performed last.

In [None]:
# example 
x = 2
y = x + 3
y

In the line `y = x + 3` first look at what happens on the right of equals sign, which signifies assignments operation: `x + 3`, then this value is assigned to the `y` variable.

# Functions and operators

Functions perform actions on objects. Sometimes they return something. In Python functions are **always** with parentheses `()`, where you put an input (also called an "argument"). Even if the function doesn't take any arguments, you still write `()` in the end. So when you see `()`, you know it is a function.

Few examples of useful functions:

In [None]:
# this functions prints its content; useful for displaying 
# intermediate results and catching errors (debugging)
print('I AM PRINTED!')

In [None]:
# this function returns an absolute value of a number
abs(-346)

In [None]:
# this function returns length of an object, for example a string
len('This string has 29 characters')

Operators are denoted with mathematical signs (`+`, `-`, `*`, etc) and usually perform actions we are already familiar with, such as summation, difference, multiplication, etc. However, it is important to understand that *operators are functions in disguise*. The signs are merely shortcuts for functions.

Some operators used in programming languages are equivalent to mathematical operators. Other are not, for example, raising to power is denoted as double multiplication `**`:

In [None]:
2**8

An important group of operators are **comparison operators**, which allow us to compare values. Main ones are smaller `<`, larger `>`, equals `==`, smaller or equals `<=`, larger or equals `>=`, not equals `!=`. If used correctly, they will return either `True` or `False`.

In [None]:
4 > 2

In [None]:
# note that comparison happens after expressions on both sides are evaluated
9 == 8 + 1

In [None]:
4 != 4

In [None]:
# you can also compare variables
x = 8
y = 0.7
x < y

# Data types

There are several primary data types, which you should learn from the beginning:
- `int` -- integer, whole number, e.g. 1, 5, -9, 0, 25325, etc.
- `float` -- decimals ("float" short for "floating point number"), e.g. 3.6, 8.7, 1.23452, -0.36245, etc.
- `str` -- character string, sometimes also called "literal". E.g. 'hello world', 'SFISUBDFW&#@^$(', etc.
- `bool` -- "boolean", a binary variable, which has value of either `True` or `False`

Depending on data type, same operators (`+`, `-`, `*`, etc) may have different effects:

In [None]:
# sum of ints
5+11

In [None]:
# sum of floats
0.5+0.2

In [None]:
# "sum" of strings (=concatenation)
'hello ' + 'world'

If you use inappropriate operator, you will receive an error: `unsupported operand type(s)`

In [None]:
# "difference" of strings doesn't make sense
"hello " - "world"

You can look up the type of any object by using functions `type()`, like so:

In [None]:
type(1)

In [None]:
type(True)

In [None]:
x = 'this is a string'
type(x)

**Note 1**: in Python, for denoting strings you can use either single quote `'` or double quote `"` to the same effect. However, it must be the same on both sides. Example:

In [None]:
'this string will work'

In [None]:
"this string will work"

In [None]:
"but this string will not work'

**Note 2**: Strings can contain numbers, but you won't be able to make computations on them, because the language doesn't understand that these are numbers, it treats them like characters. Example:

In [None]:
# sum of numbers
5 + 7

In [None]:
# "sum" of strings (=concatenation)
'5' + '7'

**Note 3**: `bool` is treated as a number if you try to make numerical operations on it: `True` is `1`, `False` is `0`.

In [None]:
# same as 1 + 1
True + True

In [None]:
# same as 1*5 + 0
True*5 + False

# Type conversions

You can convert from one data type to another. It can be done in many ways (depending on the circumstances), but the simples is to just use function with the name same as the type you want to convert to, e.g. `str()` if you want to convert to string, `int()` if you want to convert to integer, etc. A few examples:

In [None]:
x = '56'
y = '34'
x + y

In [None]:
int(x) + int(y)

In [None]:
# bool value of anything, except 0, is True
bool('hello')

In [None]:
bool(-2345)

In [None]:
bool(0)

In [None]:
# if you try to convert something that doesn't work, python will raise an error
int('sdg43')

# Container data types

Single numbers and strings are nice, but frequently we have a lot of them and it would be extremely ineffecient to assign each to a single variable. Hence we have container data types, which allow us to assign a bunch of objects to a single variable.

## `List`

The simplest container is a `list`, which in Python is denoted with brakets `[]`.

In [None]:
my_list = [1,2,3,4,9,-1,0]
my_list

Lists can contain objects of different types:

In [None]:
list_of_different_things = [345,-0.34,'hello',True,False,23,0.34,12,'ok enough']
list_of_different_things

We can get individual items from a list by using *indexing*, that is, providing an index of the element we want to retrieve. In Python (as opposed to Matlab) indexing starts with 0. Indexing operation is also performed with brakets `[]` (as opposed to parenthesis `()` in Matlab)

In [None]:
# this gives you the 1-st element of the list (most left one)
list_of_different_things[0]

In [None]:
# this gives you the 3-rd element of the list
list_of_different_things[2]

**Side note**: At first it might seem confusing to start indexing with 0, especially if you're used to starting with 1 (like in Matlab). That this is not made just to complicate things, it actually makes sense to use 0 indexing in many other circumstances, and we will see this later. For now just accept it that at first you will get confused that `my_list[2]` gives you 3rd element, not the second; this will pass and you will be able to appreciate 0 indexing soon enough.

You can also slice lists using `:` and get whole sections of the list, like so:

In [None]:
list_of_different_things[3:6]

If you leave one of the sides empty, python will assume that you meant beginning or end of the list:

In [None]:
# slice from 4th element to the end
list_of_different_things[3:]

In [None]:
# slice up to 6th element
list_of_different_things[:6]

You can also specify the step when slicing, e.g. you can take every second element like this:

In [None]:
# from 2nd to 6th element, give every second element
list_of_different_things[1:6:2]

In [None]:
# from all elements of the list, give me every third element
list_of_different_things[::3]

# Loops

Loops are importnat when you want to do the same operation on many different objects. As a simple example, let's try to print something several times. We use `for` loop to do it:

In [None]:
for i in [1,2,3,4,5]:
    print(i)

Please note several things.

First, when we create a `for` loop, we need to specify a variable which is iterating through some values. In the example above, variable `i` is iterating through the list `[1,2,3,4,5]`. On each iteration, `i` takes one of the values from that list, and each of them only once. The order is the same as in the list. That is why in the output you see same values as in list.

Second, syntax of the `for` loop. We start with `for`, then specify iterating variable (`i`) and then specify through which values it will be iterating using `in`. Then we must put colon `:`. Notice that next line has an **indentation**: it is shifted towards the right by exactly 4 spaces (or 1 tabulation). **This is mandatory in Python!** This is how the language understand what is the content of the loop and what is outside the loop. Consider the following two examples and note the differences in code and in the output:

In [None]:
x = [1,2,3,4,5]
for i in x:
    print(i)
    print('This is the end')

In [None]:
x = [1,2,3,4,5]
for i in x:
    print(i)
print('This is the end')

Note the difference: in the first case `print('This is the end')` is **inside** the loop and thus is printed on every iteration. In the latter case, it is outside the loop and printed only once when the loop is done.

**For Matlab users**: In Python, as opposed to Matlab, there is no `end` statement in the end of the loop. You just shift the indentation back to "normal" and continue writing code.

In [None]:
# another example of the for loop
names = ['Alessandro','Ehsan','Federica','Adina']
for name in names:
    print(name + ' is not here!')

**Pro-tip**: As we will learn later, loops in languages like Python, R and MATLAB are very slow compared to other operations. When working with large datasets, we will try to avoid using loops as much as possible.

# `if` - `else`
Sometimes we want to do something in our script when a certain condition is met. Consider the following example. I want to output whether a person won a blackjack hand, depending on the score that she got and the casino got. To do it, I must check whether the score of the player exceeds that of the casino:

In [None]:
player_score = 20
casino_score = 19
if player_score > casino_score:
    print('You won! Congratulations!')
else:
    print('You lost!')

But what if the player score exceeds 21? It must be that the player is lost also. To make several conditions at the same time, we can use `elif`:

In [None]:
player_score = 20
casino_score = 19
if player_score > 21:
    print('You lost!')
elif casino_score > 21:
    print('You won! Congratulations!')
elif player_score > casino_score:
    print('You won! Congratulations!')
else:
    print('You lost!')

Note that `if-else` always requires a `bool` to decide whether to execute, and it will execute if the `bool` values `True` and skip without execution when it is `False`. Comparison operators return exactly `bool`, which you can check:

In [None]:
player_score > 21

In [None]:
type(player_score > 21)

**Note**: `if-else` also uses intentation (4 spaces or 1 tab) to understand what belongs to it and what is outside, just like in the `for` loop.

# Creating functions

We talked a bit about functions like `print` and `type`. These are examples of *in-built* functions: they are always with Python, you can use them at any time. But you can also create your own functions. Here is the syntax:

    def function_name(input_1, input_2, ...):
        <whatever the function does is here>
        return <some value or variable that you want your function to return>
        
There are 2 main reasons to create your own functions: 

**First** is captured in *DRY* abbreviation: Don't Repeat Yourself. Basically, you should always avoid copy-pasting same code in several places in your script (we can discuss why this it important, if you want). Instead, take that code, put it in a function (or several functions) and use this function whenever necessary. 

**Second** is modularity: when there is something clear that a piece of code does, sometimes it is worth to separate it out in a function even if you only use it once. This will improve the readability of your code and it is especially important for long projects and code that you might use for several months or even years.

**Rule of thumb**: Investment of time in making your code more modular and more readable *always* pays off in the long run.

Let's write a simple function which calculates the discriminant of the quadratic equation $ax^2 + bx + c = 0$. The formula of the discriminant is $D = b^2 - 4ac$. We pass the coefficients $a, b, c$ into the functions and we want to get $D$ as the output.

In [None]:
def quadratic_discriminant(a,b,c):
    D = b**2 - 4*a*c
    return D

Just defining the function doesn't do anything, now we have to use it. Let's try to find discriminant of the equation $5x^2 + 2x - 10 = 0$ and save it to some variable.

In [None]:
some_variable = quadratic_discriminant(5,2,-10)
some_variable

**Note**: the variable to which you assign the output of the function (in the example above it is `some_variable`) doesn't need to match the name of the output of the function (in the example above it is `D`). The notation you use inside the function is "encapsulated", which means that it doesn't interfere with the notation in the script. As an example, if I try to access the variable `D` (which we used in the `quadratic_discriminant` function) I will not find it, because it is encapsulated inside the function and only exists when I run the function, and then it disappears.

In [None]:
D

**Pro-tip**: Python doesn't have complete encapsulation like some other languages. This means that while definitions from inside the function do not "leak" outside (just like I couldn't find variable `D` when I asked for it above), the definitions from **outside** can be freely used inside the functions even if we didn't specify them as inputs to the functions. Example:

In [None]:
def test_function(x):
    local_variable = 'this is local'
    print(x)
    print(outside_variable) # note that it is not part of the input!

outside_variable = 'this is outside'
input_variable = 'this is input'

test_function(input_variable)

print(local_variable)

However, this is only true about functions you define in the same script (or notebook) as the one where you use it, like in our case. If you import a function from another place (we will learn how to do it in the next section), you have complete encapsulation. Which is why when you're building a complex script, it is **good practice** to keep your functions in a separate script and import them as needed.

# Imports and namespaces

Programming languages differ in many respects, and one of them is how you import additional (external) functions into your code. For example, in MATLAB you have all functions in your PATH available to you from the beginning. This can be handy, since you can use all functions whenever you want without any additional actions. However, it also creates problems when you have a lot of packages (toolboxes), which can have functions with conflicting names. In the beginning this is not a problem, and novice programmers are usually not aware of this. But the longer you work in Matlab, the more you'll face this problem, as you explore and use more and more different libraries. This also unnecessarily complicates the function names, because every package wants its names to be unique and not conflict with other arbitrary packages.

Python's approach to this problem is completely different. Anything in your PATH is available to you, but you have to `import` it first. The collection of all functions and variables you have access to in your script is called *namespace*. In Python every script have its own namespace, which you *populate* by importing new functions and creating variables. By using same we can say that MATLAB has just one global namespace, which is automically populated by what is in the PATH (at least when it comes to functions).

In practice, separate namespaces (Python's approach) turned out to be such a great idea, that over time Python developers decided to put almost all functions outside the Python core. Before we refered to *in-built* functions, which we can now redefine as the functions available without any imports. Python has only ~65-70 (depending on the version) in-built functions, and some of them we already saw, such as `print()`, `len()`, `str()`, `bool()`, `int()`, `type()`. Just for reference, you can find other in-built functions in this Python documentation section: https://docs.python.org/3/library/functions.html

Now let's import some other functions. Let's start with `math` module, which contains most basic mathematical operations. We will take a look at some of functions from the module, and full documentation can be found here: https://docs.python.org/3.6/library/math.html.

There are several ways of importing modules. Here is the most basic:

In [None]:
import math

After we made this import, all the functions of `math` module are accessible using `math.<function name>` notation. For example, we can use some triginometric functions:

In [None]:
math.sin(2.2)

In [None]:
math.cos(-0.8)

Modules can contain also some constants, `math` module contains constants like $\pi$ and $e$. Note that constants, as opposed to functions, don't have `()` in the end:

In [None]:
math.pi

In [None]:
math.e

Let's explore another module, callen `os`, which is made for interacting with your operating system. We will mostly use it for interating with files.

In [None]:
import os

In [None]:
# listdir function lists content of a directory (when no directory is specified
# lists content of the working directory)
os.listdir()

Last thing you need to know about imports is that there are a couple of other ways to do it. If you need just 1 function from a module it doesn't make sense to import whole module, you can import just this one function. Below I import a function `uniform` from module `random` (this function generates a random number in the range specified range with uniform distribution).

In [None]:
from random import uniform
for i in range(5):
    # print a random number from 0 to 10
    print(uniform(0,10))

Note that after you import a single function like this, you can access it with its name only, without specifying the module, i.e. `uniform`, not `random.uniform`, as would be if you did `import random`.

You can also change the name to access the module, which is very useful, because modules can have long names and you don't want to spell it out completely every time.

In [None]:
import math as m

m.sin(0.5)

Some modules are used by people so often (and we will use them too) that they even have "standard" short names:

In [None]:
# module for efficient numeric computation
import numpy as np

# module for working with data tables
import pandas as pd

# module for making plots
import matplotlib.pyplot as plt

Now instead of using full module names (`numpy`, `pandas`, `matplotlib.pyplot`) we can use short names: `np`, `pd` and `plt`. We will learn a lot about what these modules do during the course. 

**Note**: If you try to import a module which you don't have, Python will raise an error: `ImportError: No module named <module name>`. It means that you either misspelled the module name, or you need to install it first. You can usually do it by running `pip install <module name>` in your operating system console.