# Session 3.1: Computation
One mayor strength of Python is, that it works with modules. This means, that additional data types, functions, etc. can be made available by simply importing existing modules. Those modules are in general written by other python users. Due to the large number of python users, there is an enormous amount of python modules available. We had seen some of them already:

* os (Miscellaneous operating system interfaces)
* sys (Access system-specific parameters and functions)
* math (Mathematical functions (sin() etc.))

In this part we will focus on three other also very popular modules for scientific use:

* numpy (Fundamental package for scientific computing with Python) http://www.numpy.org/
* scipy (Scientific Computing Tools for Python) https://www.scipy.org/
* pandas (Python Data Analysis Library) https://pandas.pydata.org/

## Numpy

Numpy provides many functions to create arrays:

In [None]:
a = np.arange(18)     # Return 18 evenly spaced values
b = np.zeros((2,2))   # Create an array of all zeros
c = np.ones((1,2))    # Create an array of all ones
d = np.full((2,2), 7)  # Create a constant array
e = np.eye(2)         # Create a 2x2 identity matrix
f = np.random.random((2,2))  # Create an array filled with random values

<div class="alert alert-block alert-info">
Print some of those arrays to see, what is the difference.
</div>

- **ndim**: number of dimensions, rank of array
- **size**: total number of elements
- **shape**: number of elements in each dimension

In [None]:
print(a)
print("dims:", a.ndim)
print("size:", a.size)
print("shape:", a.shape)

In [None]:
print(f)
print("dims:", f.ndim)
print("size:", f.size)
print("shape:", f.shape)

### Indexing

Indexing with arrays works quite similar compared to lists.

In [None]:
print(a)

print(a[0])   # first element
print(a[-1])  # last element

print(a[:4])  # the first four elements
print(a[-4:])  # the last four elements

print(a[1:4]) # slice of elements from index 1 to (not including) index 4
print(a[::2]) # every second element

print(a[::])   # all elements
print(a[::-1]) # all elements in reversed order

As you have seen arrays can have more than one dimension. Dimensions are also called "axis" in numpy. You can also change it dimensionality by calling `.reshape()`. 

In [None]:
a = np.arange(18)     # Return 18 evenly spaced values
print("Array a:\n", a)
print("Size of a:", a.size)
print("Shape of a:", a.shape)

Indexing for high-dimensional arrays changes a little bit.

In [None]:
a2 = np.arange(16).reshape((4,4)) # 4x4 matrix
print(a2)

In [None]:
print(a2[0,0])    # first row, first element
print(a2[-1,-1])  # last row, last element
print("--")
print(a2[0])      # firs row
print(a2[0,:])    # first row as well
print("--")
print(a2[:,0])    # first column
print(a2[:,-1])   # last column
print("--")
print(a2[:2,:2])   # upper left 2x2 part of matrix
print(a2[-2:,-2:]) # lower right 2x2 part of matrix
print("--")
print(a2.diagonal())

<div class="alert alert-block alert-info">
Try to get the diagonal of the upper left 3x3 part of the original 4x4 matrix. Use `.diagonal()` for that. Use a cell below.
</div>

### Basic operations

Arrays allow you a very intuitive use of calculations on large data sets. Let's assume you want to calculate $f(x)$ given that

$f(x) = \dfrac{a \cdot b \cdot t}{x}$

In [None]:
a = 1.0
b = 300
t = 3.14
x = np.arange(1.0, 10.0, 0.1) # arange numbers from 1 to 10 in steps of 0.1 (is 10 included?)
x

<div class="alert alert-block alert-info">
Calculate $f(x)$ according to the formula.
</div>

Numpy has also build in some usefull mathematical functions.

In [None]:
# small teaser for plotting (next session)
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x))
plt.show()

### Unary and Comparison Operations

Many operations, like calculations of the sum of an array, are build in functions of the numpy array data type.

In [None]:
a = np.arange(10.0)
print("Sum is", a.)
print("Minimum is", a.)
print("Maximum is", a.)
print("Mean is", a.)

If an array is used together with comparison operators, you can determine and select certain values. 

__Masking__: Masking (boolean array indexing) lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:

Let's assume, we have some data inside of an array, that we want to analyze. We will learn, how to load such data from files, for example. For now, we fill this array with random values.

In [None]:
# create an 20x5 array with random values between 0 and 1
data = np.random.random(100)
data = data.reshape((20,5)) 

# You could also do it in one line
# data = np.random.random(100).reshape((20,5))

print(data)

In 2D arrays you can decide if you want to perform calculations on the whole data set or row or column wise.

<div class="alert alert-block alert-info">
Get the range of values (min-max) for each column of this dummy data set.
</div>

### Save and Load

If we want to save and load some arrays, we can use the functions provided by `numpy`.

You can find details: 
* [numpy docs: savetxt](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.savetxt.html)
* [numpy docs: loadtxt](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.loadtxt.html)


In [None]:
np.savetxt("my_first_data.txt", data)

In [None]:
loaded_data = np.loadtxt("my_first_data.txt")
loaded_data

In [None]:
np.save("my_first_data.npy", data)          # binary compressed data format
loaded_data = np.load("my_first_data.npy")
loaded_data

In [None]:
# annotated array: recarray
loaded_data = np.genfromtxt("my_first_data.txt")
# better use pandas

### Numpy functions
Numpy provides already a bunch of functions which you can use for some standard tasks. Here we will only use `histogram` as a basic example. See the numpy documentation for a full list:

- [Numpy functions by category](https://docs.scipy.org/doc/numpy-1.13.0/reference/)
- [Numpy mathematical functions](https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.math.html)

In [None]:
np.histogram(data)

In [None]:
count, bins = np.histogram(data, bins=20)
print(count.shape)
print(bins.shape)

In [None]:
bin_centers = (bins[:-1] + bins[1:]) / 2.0
print(bin_centers)
print(bin_centers.shape)

In [None]:
plt.bar(bin_centers, count, width=0.01)
plt.show()

## Scipy

In [None]:
# it is good practice to import everything at the top
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit # this is another way to import a python function
# we can now use curve_fit() without putting any prefix

@#comment

Scipy offers a vast amount of functions for data analysis and scientific use. We will have a look at a very common application: **Fit a function to your data**

In [None]:
data = np.loadtxt("experimental_data.txt")  # load data from a file
print(data)

We can see our data looks like a exponential function.

Lets define a general exponential function $f(x)=a \cdot \exp(-b \cdot x) + c$.

In [None]:
# This is a function declaration. 
def func(x, a, b, c):
    """Function we want use for fitting"""
    return a * np.exp(-b * x) + c

In [None]:
# let's have a look at the fit
%matplotlib inline
plt.scatter(xdata, ydata, label='data')

plt.plot(xdata,
         func(xdata, *popt ), 
         'r-', 
         label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))

plt.legend()

From theory we know that we have some constraints to our function. We can easily also put them in `curve_fit` using the argument `bounds=( list(lowerbounds), list(upperbounds) )`.

## Pandas
The pandas module helps you to arrange and treat your data like tables. You have also the posibility to import and export your data from various different formats (CSV for example which is also compatible with Exel).

In [None]:
import pandas as pd

### Basics

By using pandas you have some advantages. Nicer output of your data is one of them. Selection can get easier as well, but is different compared to numpy.

You can perform calculations with pandas as you would do with numpy. You should consider not to use pandas for really big data sets, since it may low down, particularly if you try to display the data. 

### Mixed tables

Pandas allows you to arrange your data like in the following example. Imagine you have a data set with two species with several members. You could then easily group your data.

In [None]:
table2 = pd.DataFrame({'spec' : ['human', 'animal', 'human', 'animal', 'human', 'animal', 'human', 'animal'],
                       'num'  : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
                       'A' : np.random.randn(8),
                       'B' : np.random.randn(8)})
table2

<div class="alert alert-block alert-info">
Calculate the mean values for A and B for the two species.
</div>

### Save and Load

In [None]:
table2.to_csv("table2.csv")  # save to file, check the directory where this notebook is stored
loaded_table = pd.read_csv("table2.csv", index_col=0)
loaded_table

In [None]:
table2.to_csv("table2.csv", index=False)  # save to file, check the directory where this notebook is stored
loaded_table = pd.read_csv("table2.csv")
loaded_table

In [None]:
print(table2.to_latex())   # get a latex table

<div class="alert alert-block alert-info">
For LaTeX users: Write the output of `to_latex()` to a file (which you can import into you TeX document.)
</div>

### Bonus: View

Pandas can be also used to improve the way you view your data. It makes it very easy to create a custom style sheet for your tables. Here we define a simple function which takes a single value. Based on the type of the value it is colored:
* String - green
* Number smaller than 0 - red
* Number bigger than 0 - black

In [None]:
def color_negative_red(val):
    """
    Takes a scalar and returns a string with
    the css property `'color: red'` for negative
    strings, black otherwise.
    """
    if type(val) == float or type(val) == int:
        color = 'red' if val < 0 else 'black'
    else:
        color = 'green'
    return 'color: %s' % color

In [None]:
table2.style.applymap(color_negative_red)  # we apply the function to our table

We can also apply a function to each column of out data set. Then our function is able to do some analysis for us, for example highlight the maximum value of each column.

In [None]:
def highlight_max(s):
    '''
    highlight the maximum in a Column yellow.
    '''
    if s.dtype == float or s.dtype == int:
        is_max = s == s.max()
        return ['background-color: yellow' if v else '' for v in is_max]
    else:
        return ['' for v in s]

In [None]:
table2.style.apply(highlight_max)

<div class="alert alert-block alert-info">
What do you think? Is it possible to apply both functions to our table to highlight negative and max values at the same time?
</div>

Finally, we can save our data as a CSV file for example. We could then load this file again with python for later analysis or with MS Exel for use in a document. It is also possible to print a pandas table as LaTeX code.

<div class="alert alert-block alert-info">
Edit the function below to highlight min values inside a column in blue. Apply both functions, highlight_max and highlight_min, to the table which is prepared for you. 
</div>

In [None]:
def highlight_max(s):             # <- change name
    '''
    highlight the maximum in a Column yellow.
    '''
    if s.dtype == float or s.dtype == int:
        is_max = s == s.max()    #  <- chnage to min
        return ['background-color: yellow' if v else '' for v in is_max]  # <- change to min
    else:
        return ['' for v in s]

In [None]:
# you can use this dummy data
data = np.random.random(100).reshape((20,5))
table = pd.DataFrame(data, columns=["obs1","obs2","obs3","obs4","obs5"])