This Python warmup notebook aims at walking you briefly through the main tools that will be used in the course.

This warmup is compatible with both 

* `Python 2.7.w :: Anaconda 4.1.x` and 
* `Python 3.5.y :: Anaconda 4.2.z`

tested with `w=12`, `x=1`, `y=2`, `z=0`

It assumes you have installed Anaconda properly (otherwise the next cell will generate errors). It is not meant to be complete but rather to show you some basics so that you have some basics on the day. _You will have the opportunity to ask all your questions then too!_

How to use a notebook?

* **To execute a cell**: click inside the cell, press MAJ+ENTER
* **To delete  a cell**: click inside the cell, press ESC then D then D again
* **To insert  a cell**: click inside the cell, press ESC then B then B again (will insert below the current cell) or ESC then A then A again (will insert above the current cell)

You can also use the help function to check what a specific function does:

In [None]:
help(max)

Lastly, a useful trick to know is that you can use TAB to autocomplete what you write in the notebook. You can also use tab after a dot to see all the functions available in a module so if you write "`<modulename>.`" then hit TAB a list will appear. This can be very very useful.

## Table of contents

* (1) **Python: the very basics**
    1. Basic types
    2. Basic operations
    3. Brief look at lists
    4. Functions
    5. Conditional statements
    6. Inline functions (lambdas)
    7. Loops
    8. List comprehensions
    9. List vs numpy array
    10. Plotting
    
    
* (2) **Pandas primer**
    1. A look at DataFrames
    2. A few basic operations


# 1. Python: the very basics
## 1.0 Loading libraries

In the bootcamp we will make use of a few libraries: `numpy`, `pandas`, `scipy`, and `plotly`, the code below imports them and also imports two functions to make this code compatible whether you are using Python 2.7 or Python 3.x

In [None]:
# in case you're on Python 2.7 this allows compatibility between py2 and py3
from __future__ import print_function, division 

# library for dealing with arrays + some numerical methods
import numpy as np
# library for dealing with data frames
import pandas as pd
# library for miscellaneous scientific functions
import scipy as scipy

# Plotly is a library to display things interactively
import plotly.plotly as py
from plotly import tools
from plotly.graph_objs import *
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
init_notebook_mode()

print("If this is printed and no huge error blocks is printed, you're good to go.")

## 1.1 Basic Types

Declaring variables in Python is pretty straightforward and so is declaring arrays using `numpy`. To access values of an array you need to use square brackets and specify the range of values you are interested in noting that, in python, arrays are 0-indexed (first element is at `array[0]`). See below for some quick examples.

In [None]:
a = 5    # integer
b = 6.0  # floating point number

# array of floating point number
c = np.array([5.0,3.5,-2.3,np.pi,2.0,3.0,-1.0]) 

# Note the brackets when using the print function, since
# we're using the (Python 2,3 compatible) print function 
# from the __future__ library
print('a       :',a)
print('c[0]    :',c[0])    # accessing first element of an array
print('c[-1]   :',c[-1])   # accessing the last element
print('len(c)  :',len(c))  # computing the length
print('c.sum() :',c.sum()) # computing the sum

# accessing list of values
print()
print('c[2:4]  :',c[2:4])  # will print the 3d and 4th value (index 2 and 3)
print('c[:2]   :',c[:2])   # will print the first 2 values
print('c[4:]   :',c[4:])   # print values from 5th onwards
print('c[4:-1] :',c[4:-1]) # print values from the 5th onwards without last


# using logical operations?
print()
print('c.sum()==sum(c):',c.sum()==sum(c)) # a boolean: checking for equality
print('b<=7.0         :',b<=7.0)          # another boolean

## 1.2 Basic Operations

Operations on floating point numbers are as you would expect on a standard scientific calculator, for specific mathematical functions, use `np.<function>` (use TAB to see the list of available functions, examples: the exponential):

In [None]:
# basic operations on floats as you'd expect them
print( np.exp((2.0 * 3.0 + 5 - np.pi)))

Those `np` operations can also be applied on all the elements of an array:

In [None]:
print(np.exp(np.array([2.0,0.0])))

You can combine `string`s using the `+` operator:

In [None]:
"first string "+"another string"

Formatting is a convenient way to display values of variables within a string

In [None]:
v = np.round(np.pi,3)
print("Pi rounded to 3 decimals: {} and squared: {}".format(v,v**2))

### A very small note on Python 3 vs Python 2: 

there are a few differences, some of which actually matter and are imported via the `from __future__ import print_function, division` (there are other differences but these two actually matter in the course).
The print function is called with brackets so `print("blah")` instead of `print "blah"`, this is in line with the way most functions are called in python.
The division returns a float if need be so `3/4==0.75` (now) and not `3/4==0` (before). If you do want the floored division use `3//4` and `3%4` for the remainder.

Here all the code generated is Python 2 and Python 3 compliant

In [None]:
print(3/4==0.75)
print(3//4==0)

## 1.3 A very brief look at lists

List are collections of values, for example of integers. 
In the course we will favour `np.array` for numerical values which offer some built in mathematical methods which is convenient, but will use lists ocasionnally too:

In [None]:
mylist = [1,2,5,7,5]
print(mylist[0])   # 0 indexing in python everywhere
print(len(mylist)) # length (number of elements) of the list

## 1.4 Functions

To define a simple function, use the word `def`. 
Do not forget the colon `:` nor the indentation (it matters in Python). 
Note the use of `np.sqrt` to call mathematical functions (here the square root):

In [None]:
def aSimpleNorm(var1, var2):
    # var1**2 is the square of var1
    return np.sqrt(var1**2+var2**2) 
    
# unindenting, we're now out of the scope of the function
print(aSimpleNorm(1.0,2.0))

## 1.5 Conditional statements

You can make conditional statements using `if`-`else`, don't forget the colon `:` or the indentation:

In [None]:
if (aSimpleNorm(1.0,2.0)) < 2.24:
    print("this is true and will be printed")
else:
    print("this is not true (and will never be printed)")

In the case of short tests, it is sometimes convenient to write everything in one line:

In [None]:
"yes" if (aSimpleNorm(1.0,2.0) < 2.24) else "no"

## 1.6 Inline functions: lambdas

Although not typically encouraged, it can be very convenient (and it may be used occasionally in the course) to define one-liner functions. 
This can be done using the `lambda` keyword. 

In [None]:
f = lambda x: x*x+x-1

print(f(2))

# this is equivalent to 

def f2(x):
    return x*x+x-1

print(f2(2))

## 1.7 Loops

Loops can be defined using `for` and `while`. For the `for` loop, it is convenient to use the `in` keyword to say that a variable should take all values in a collection (e.g. a list). Note also the indentation, as always

In [None]:
for i in [1,7,2,0]:
    print(i)

We will use a lot of for loops over ordered set of indices in which case it is very useful to consider the `range` function with basic syntax:

* `range(5)` corresponds to `[0,1,2,3,4]`
* `range(2,5)` corresponds to `[2,3,4]`

In [None]:
for i in range(5): print(i)
    
print()

for i in range(2,5): print(i)

The while loop (which we will not use much) works with a condition that has to eventually be broken lest the loop will never stop...:

In [None]:
i = 1
while (i<=5):
    print(i**2)
    i+=1 # increment the counter
    
print()
# this can also be done with a for loop:
for i in range(1,6): print(i**2)

## 1.8 List comprehension 

It is sometimes useful and convenient to define a list of objects defined by some relation to the index using a *list comprehension*:

In [None]:
g = lambda x: x * np.pi  # definition of an inline function
h = [g(i) for i in range(0,5)]
print(h)

In the case above, `h` is a list where each element is computed with a function (`g`) over a range. Let's give another simple example using strings to fix ideas:

In [None]:
h2 = ["Hello "+s+"!" for s in ["Alice", "Bob", "Charlotte"]]

for message in h2:
    print(message)

## 1.9 Difference between List and Numpy Arrays

Essentially, `Numpy arrays` natively allow a number of useful (mathematical) operations where a `list` does not. For example if you want to compute the "dot product" of two vectors (sum of the product of each elements of two vectors, so for exaple `(1 2 3) dot (3 2 1) = 1*3 + 2*2 + 3*1 = 10`). In the course we will mostly use numpy arrays.

In [None]:
list1 = [1,2,3]
list2 = [3,2,1]
arr1  = np.array(list1)
arr2  = np.array(list2)

# using lists, you have to manually code the dot product:
resList = 0
for i in range(0,len(list1)):
    resList += list1[i]*list2[i]

print(resList)
# with the numpy array, it's directly available:
print(np.dot(arr1,arr2))

## 1.10 Plotting

When plotting in python, you can consider several options, two in particular are noteworthy: 
    
* `matplotlib`, and
* `pyplot`.

The first one is a bit more austere but can be used to generate graphs with a huge amount of flexibility (and it is reasonably easy to do so). We encourage you to consider the package `seaborn` (can be installed with `pip`) when using `matplotlib` which tends to make everything look a lot nicer.

The second one is more interactive and fun to use which is great for working in the notebook and showing/discussing your results collaboratively. It consumes more memory though (and if you have a very big notebook this can quickly become an issue). It also offers a somewhat more constrained framework to work in.

In this workshop we will focus on using the second option.

In [None]:
xx  = np.linspace(-5,5,100)
yy  = np.sin(np.exp(-xx**2))
yy2 = np.cos(np.exp(-xx**2))

line1 = Scatter(x=xx,y=yy,  name="Sin-based")
line2 = Scatter(x=xx,y=yy2, name="Cos-based")
data = [line1,line2]

layout = Layout(
    xaxis = dict(title = "XAxis"),
    yaxis = dict(title = "YAxis"),
    width = 700,
)

fig = dict(data = data, layout = layout)
iplot(fig)

# 2. Pandas Primer
## 2.1 A look at DataFrames

Throughout this workshop you will be using `pandas` which is a python library that you could see as "Excel for Python". `pandas` is built on top of `numpy`. It takes care of making it easy to work with tabular data in providing selections, merging, calculating statistics, filling in missing values etc.

When you load data with `pandas` it is put into a `DataFrame`. These objects have the structure illustrated below. On the left you see a "table" as you may intuitively think about one, and on the right you see a DataFrame where:

* each column of raw values corresponds to a `numpy` array
* each column + its column name + the row indices form `Series` object in `pandas`
* the collection of all `Series` forms the `DataFrame`.

![Table Anatomy Class](./table_anatomy.png)

Let's now have a look at a data file `testData.dat`, it should look like:

![A look at data](./lookAtData.png)

As you can see in the raw data, it has three columns (no matter the meaning) with column names `D`, `X` and `Y`. The columns are separated by tabs. This can be loaded in pandas easily using the `read_csv` function and a quick peek can be obtained by using `head`:

In [None]:
# by default it assumes the separator is a comma (,)
# here it is not (it is a tab or '\t') so we have to indicate it
data = pd.read_csv("testData.dat", sep="\t")
data.head()

## 2.2 A few basic operations

In the workshop you will discover far more operations that can be used with `DataFrames`, here are a  few useful ones to give you a taste.

### 2.2.1 Acessing elements

Columns can be accessed directly using their column names, the underlying numpy array can then be accessed by using the `values` function:

In [None]:
colD = data['X']
colD.values[:5]

### 2.2.2 Removing columns or rows

This can be done by using the `drop` function, indicating the column or row name and the axis (0 for a row, 1 for a column):

In [None]:
data.drop('X',axis=1).head()

In [None]:
data.drop(2,axis=0).head()

### 2.2.3 Concatenating data frames

Assuming their dimensions are compatible DataFrames can be combined vertically or horizontally. Here we show how to do this using twice the same array (not very useful but it shows how the method can be used)



In [None]:
pd.concat([data,data]).head() # by default it is done vertically (one array after the other)

In [None]:
pd.concat([data,data],axis=1).head()

**You are now ready to go. Don't worry if you still have questions, the workshop will still teach you a lot and allow you to ask many questions!**