<h1> Python for Scientific Computation </hh1>

This tutorial will focus on fundamental python libraries for scientific computing. Mainly, numpy. Other libraries like scipy, pandas and seaborn will also be breifly mentioned.

This tutorial borrows from the Python introduction section in CS231n by Stanford University.
http://cs231n.github.io/python-numpy-tutorial/#numpy

In [116]:
from IPython.display import Image
Image(url= "https://www.python-kurs.eu/images/matlab_python_vergleich.png")

<h2> Numpy </h2>

By default, array like structures in python are possible, for example by making a list out of list, but for most mathematical operations they're insufficient, cumbersome to use and often perform slowly.
If you're familiar with Matlab you can check out Numpys official Tutorial: "Numpy for Matlab Users" https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html



<h3> Arrays and Introduction</h3>

Arrays are the matrixes of Numpy/Python. All elements must have the same data type, they're grid wise in nature (unlike nested lists) and have an index for each axis.

Let's start by importing numpy and creating a matrix ( 2 dimensional array ). We can use the shape method to see the matrix dimensions.

In [8]:
import numpy as np  # Imports numpy and shortens the name numpy to np

b = np.array([[1,2,3],[4,5,6]]) 
print(b)
print("\n", b.shape)

[[1 2 3]
 [4 5 6]]

 (2, 3)


Now we create an 1d array. (Note that Matlab doesn't have 1d arrays)

In [53]:
a = np.array([1, 2, 3])
print(a)
print("\n", a.shape)

[1 2 3]

 (3,)


Numpy provides many predefined function to create common types of matrixes
For these functions we provide the 2d shape as an input argument

In [29]:
a = np.zeros((2,3))   # Create an array of all zeros
print(a)              

b = np.ones((1,2))    # Create an array of all ones
print(b)              


c = np.full((2,2), 7)  # Create a constant array
print(c)  

d = np.eye(2)         # Create a 2x2 identity matrix
print(d)   

e = np.random.random((2,2))  # Create an array filled with random values
print(e)               

[[0. 0. 0.]
 [0. 0. 0.]]
[[1. 1.]]
[[7 7]
 [7 7]]
[[1. 0.]
 [0. 1.]]
[[0.59365175 0.40380171]
 [0.99734904 0.19917118]]


Let's multiply two matrices! In numpy all the "usually" operator function as element-wise operations. This behavior is consistent for all arrays. 

To use specified matrix-multiplication we need to use the .dot() Method or @ as a shortcut.

In [13]:
c = np.array([[2,4,6],[0,0,0]]) 
print(b*c) 
print("\n")
print(b@c)

[[ 2  8 18]
 [ 0  0  0]]




ValueError: shapes (2,3) and (2,3) not aligned: 3 (dim 1) != 2 (dim 0)

Numpy is telling us that the shapes are not compatible for matrix multiplication, to fix this we need to transpose matrix b.

In [20]:
print(b.T @ c)
print("\n")
print(np.dot(b.T, c))

[[ 2  4  6]
 [ 4  8 12]
 [ 6 12 18]]


[[ 2  4  6]
 [ 4  8 12]
 [ 6 12 18]]


Useing .dot() on 1d arrays gives us the scalar product.

In [55]:
np.dot(a, a)

14

Sometimes machine learning libraries need 2d array data as inputs. However very often yor data has the form of a mathematical vector. In this case you create a 2d array and reshape it into a column or row vector. (We need 2d arrays to store information about vector orientation)
For this special case we specify one dimension as 1 and the other dimension as -1 which means "let numpy figure out how many elements need to be there". As you will see, normally -1 has another meaning in python syntax.

In [122]:
# Compare the two different arrays. What is the difference?
a1 = np.array([1,2,3])
a2 = np.array([1,2,3]).reshape(1,3)
print(a1)  # 1d array
print(a2)  # 2d array that acts as a row vector

print("\n")

print(a1.T)  # We cant transpose 1d arrays
print(a2.T)  # We can transpose 2d arrays

[1 2 3]
[[1 2 3]]


[1 2 3]
[[1]
 [2]
 [3]]


<h3> Array math </h3>
Since you know that .dot() is used for matrix multiplication, we give you a list of most element wise operations in numpy. 

In [59]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

print(x + y)
print(np.add(x, y))

[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]


In [60]:
print(x - y)
print(np.subtract(x, y))

[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]


In [61]:
print(x * y)
print(np.multiply(x, y))

[[ 5. 12.]
 [21. 32.]]
[[ 5. 12.]
 [21. 32.]]


In [62]:
print(x / y)
print(np.divide(x, y))

[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]


In [63]:
print(np.sqrt(x))

[[1.         1.41421356]
 [1.73205081 2.        ]]


In [64]:
x = np.array([[1,2],[3,4]])

print(np.sum(x))  # Compute sum of all elements
print(np.sum(x, axis=0))  # Compute sum of each column
print(np.sum(x, axis=1))  # Compute sum of each row

10
[4 6]
[3 7]


<h3> Indexing and Slicing </h3>
IMHO the most important skill to know in numpy and one of the easiest to get wrong. 
Also note that indexing and slicing in numpy works similiar as indexing and slicing on python lists.

**Indexing** is a convenient way of accessing one or multiple elements via an integer index.

**Slicing** a list or array give us another list or array. Think of it as big chunk of cheese.

In [76]:
#Python Slicing notation
a = np.array([0,1,2,3,4])
list = [0,1,2,3,4]

start = 1
stop = 4

print(a)
print(a[start:stop])  # prints from start to stop -1 [1 2 3]
print(a[start:])  # prints from start until the last array element is reached
print(a[:stop])  # print from the first array element to stop-1
print("\n")  # If you haven't it figured out by now. This prints new lines

# Same syntax works for the list 
print(list)
print(list[start:stop])
print(list[start:])
print(list[:stop])
print("\n")

[0 1 2 3 4]
[1 2 3]
[1 2 3 4]
[0 1 2 3]


[0, 1, 2, 3, 4]
[1, 2, 3]
[1, 2, 3, 4]
[0, 1, 2, 3]




To get the last item in a list/array. Python/numpy provides us with a very pythonic notation style.

In [83]:
print(a[-1])  # Last element of the array
print(a[-2])  # Last second last element of the array

print(a[-2:]) # Get the last two elements in an array
print(a[:-2]) # Everything except the last two elements in the array

4
3
[3 4]
[0 1 2]


Now we take this notation and use it for 2d arrays (or nd if you like). Multiple dimensions are now seperated by comma.

In [100]:
b = np.array([[1,2,0],[4,5,1]])

print(b, "\n")

print(b[:,:])  # Prints all rows and all columns
print("\n")

print(b[0,0])  # Prints first element
print("\n")

print(b[:, -1])  # SELECT LAST COLUMN. EASY WAY TO SELECT TARGETS! THIRD MOST IMPORTANT LINE OF CODE!

[[1 2 0]
 [4 5 1]] 

[[1 2 0]
 [4 5 1]]


1


[0 1]


Why does Python starts at 0? Why does it use half-open bounds for indexing?

Many reasons! But most of the time calculation with slicing get easier this way and stay consistent.

Nonetheless other programming languages like MATLAB and Julia index at 1 and have reasonable arguments for doing so.

In [119]:
# We assume that all features are stored as coulmns and all following columns denote the targets.
# We assume that length of features + length of targets = length of data
data = np.array([[1,2,0],[4,5,1]])

number_features = 2
number_targets = 1

# FIRST AND SECOND MOST IMPORTANT LINES OF CODE FOR THIS SECTION!
x = b[:, 0:number_features]  # Note that the first : means "all". The second : means "from -> to". 
y = b[:, -number_targets:]

print(x)
print("\n")
print(y)

[[1 2]
 [4 5]]


[[0]
 [1]]


<h2> Scipy </h2>

Numpy provides a high-performance multidimensional array and basic tools to compute with and manipulate these arrays. SciPy builds on this, and provides a large number of functions that operate on numpy arrays and are useful for different types of scientific and engineering applications.

The best way to get familiar with SciPy is to browse the documentation. We will highlight some parts of SciPy that you might find useful for this class.

HOWEVER: Scipy for machine learning applications is mostly used for preprocessing or feature engineering purpose. Since these methods are highly specific for each particular field, they won't be covered here.

The functions <code>scipy.io.loadmat</code>  and <code>scipy.io.savemat</code>  allow you to read and write MATLAB files. You can read about them in the [documentation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html).





<h2> Pandas </h2>

Python Data Analysis Library

pandas is an open source, BSD-licensed **library providing high-performance, easy-to-use data structures and data analysis tools** for the Python programming language. Source: https://pandas.pydata.org/

We won't cover Pandas in this case, but basically pandas is very useful when dealing with large amounts of data. In this case storing them in numpy arrays becomes inefficient computational wise. Pandas also features additional tools to modify and view data.

The minimum you want to do with pandas is using <code>read_csv</code> to read data from a csv file and store them in a pandas dataframe. After that you call <code>.values</code> to convert your dataframe into a numpy array. This is always faster than using numpy to read a csv file directly. However this is of course optional. (Like everything shown in this tutorial)

<h2> Plotting </h2>

There are a multitude of plotting libraries available in python. You can also use third party plotting software, such as Tableau or Excel. For this class we give a very short overview of two common python based plotting libraries. 

<h3> Matplotlib</h3>

Matplotlib is a Python **2D plotting library which produces publication quality figures** in a variety of hardcopy formats and interactive environments across platforms. 

Matplotlib tries to make easy things easy and hard things possible. **You can generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc.**, with just a few lines of code. For examples, see the sample plots and thumbnail gallery.

For simple plotting the pyplot module provides a MATLAB-like interface, particularly when combined with IPython. For the power user, you have full control of line styles, font properties, axes properties, etc, via an object oriented interface or via a set of functions familiar to MATLAB users.

Source: https://matplotlib.org/

__________________________________________________________________________
<h3> Seaborn </h3>

Seaborn: statistical data visualization

Seaborn is a Python data **visualization library based on matplotlib. It provides a high-level interface** for drawing attractive and informative statistical graphics.

Source: https://seaborn.pydata.org/


<h3> A short (over)-simplification: </h3>

Matplotlib is the root for most plotting done in python. Matplotlib is inspired by MATLAB plotting. 

However most user consider the high-level interface in matplotlib still as too complicated. This is especially true for beginners. Hence, additional libraries such as Seaborn abstract futher away by basically providing one-liners to create complex charts ang graphics. 