# EPA1333 - Computer Engineering for Scientific Computing
## Week 5 - Oct 3, 2017

**Python Data Science Handbook**

*Jake VanderPlas*


In [None]:
from IPython.core.display import Image
Image('https://covers.oreillystatic.com/images/0636920034919/lrg.jpg')

## NumPy

Fundamental package for scientific computing. It provides efficient and fast
n-dimensional arrays (ndarray).

  * Fast and space-efficient multidimensional array (close to hardware)
  * Standard mathematical functions on entire array (scientific computing)
  * Basis for other libraries, such as pandas.
  * Array-oriented computing
  
  
Example:
  * Store results of experiments/simulations at each time-step in an array
  * Store results of multiple such experiments/simulations in a 2-d array
  * Store pixels of a 3-d image
  * ...


### Documentation
  * http://www.numpy.org
  * http://docs.scipy.org
  * Google :-)


In [None]:
%matplotlib inline
import numpy as np

In [None]:
# Comparing efficiency of python lists and numpy arrays
# First python lists, time the creation of a list of 1000 squared entries.
%timeit [ i**2 for i in range(1000)]

In [None]:
# Now the numpy way... (syntax will follow later)
# Speed is a 2 orders of magnitude faster! (100x)
%timeit np.arange(1000)**2

Numpy arrays have similar syntax to lists, but there are some differences.

```python
import numpy as np

n = np.random.random( (3,4) )   # 2d array of 3x4, with random numbers

n[2] -> 2th row
n[2,3] -> element at row 2, column 3
n[2:4] -> slice, rows 2 and 3, all columns
n[2, 1:3] -> slice, row 2, columns 1 and 2
n[:, 1] -> slice, column 1 (all rows)
```

### Differences between ndarray and lists

* lists can contain *multiple types* in one list. ndarrays are of *one type only*.
 * No type checking necessary -> more efficient
* Accessing: 
 * list[x][y] vs. array[x,y]
 * list[:1][:1] does not work, vs. array[:1,:1] does work.
* universal functions on arrays work efficiently: np.sin( array )

### Creating arrays

  * np.array( sequence )
  * np.zeroes()
  * np.ones()
  * np.eye()
  * np.arange( start, stop, step)
  * np.linspace( start, stop, number) - create number samples between start, stop
  * np.random.rand( size=(x,y) )

In [None]:
# Creating an array
l = [1 ,2 , 4, 5]

data = np.array( l )
data

In [None]:
# Creating a 2d-array
l = [ [1,2,3,4], [5,6,7,8] ]
data = np.array(l)
data

In [None]:
# The nr of dimensions of an array
data.ndim

In [None]:
# The shape of the array
data.shape

In [None]:
# The type of the elements of the array
data.dtype

In [None]:
# Other ways to create an array
data = np.zeros( 6 )
data

In [None]:
data = np.zeros ( ( 4, 6) )
data

In [None]:
# similar with 1's
data = np.ones( ( 3,4) )
data

In [None]:
# Identity matrix
data = np.eye( 4 )
data

In [None]:
# The arange, with begin, end, step
data = np.arange(10)
print(data)

data = np.arange(10, 30, 3)
print(data)

In [None]:
# The linspace will return evenly spaced numbers over an interval
data = np.linspace( 10, 90, 15)
data

In [None]:
# Create a matrix with random values

data = np.random.randint( 10, 20, size=(3,4) )
data

In [None]:
# Reshaping the dimensions of an array
data = np.arange( 20 ).reshape( (4, 5) )
data

## Basic indexing, slicing

Accessing arrays is similar to accessing lists.

In [None]:
# Create a 3x4 matrix to demonstrate indexing and slicing
data = np.arange(12).reshape( (3, 4) )
data


In [None]:
# Select a row, indexing starts at 0 as usual
data[1]

In [None]:
# The list way to select an item
data[1][3]

In [None]:
# With arrays we can write this instead: array[x,y]
data[ 1,3 ]

In [None]:
# slicing, selecting subsections of array
data[0:2]

In [None]:
data[0:2, 1:3]

In [None]:
data[1, 1:3] 

In [None]:
# Selecting a column
data[:, 2]

In [None]:
# Or as 2 slices, the results are different!
data[:, 2:3]

In [None]:
# Let's use a bigger matrix
data = np.arange(30).reshape((5,6))
data


In [None]:
# Slices with begin, end, step
data[1:4:2]


In [None]:
data[1:4:2, ::2]


### Changing values in arrays

Like lists, you can change values in arrays with indexes and slices.

In [None]:
data

In [None]:
data[0,1] = 100
data

In [None]:
data[0:2, -2:] = 200
data

In [None]:
# Change a whole row
data[-1] = 99
data

In [None]:
# Change a whole column
data[:,2] = [-1,-2,-3,-4,-5]
data

### Exercise 1: Creating arrays

 1. Create an array filled with a repeated pattern (e.g. 5 times) of [1,2,3].
 
    *Hint:* Look at the method np.tile().
    
    Try multiple dimensions too.
    
 2. Create a 8x8 matrix where the center is 0's and the four borders are 1. 
 3. Create a checkered 8x8 matrix where the elements alternate 0 and 1. 
    *Hint:* Use slices and assignment.

In [None]:
# solve ex. 1 here
# 1
# 1.1



In [None]:
# 1.2



In [None]:
# 1.3

### Boolean indexing

You can select elements in an array based on booleans.

``` python
a = array([ 1, 2, 3, 4] )
b = array([ True, False, False, True ])
a[b] -> array([1, 4])
```

In [None]:
a = np.arange(1,5)
print(a)
b = np.array([ True, False, False, True ])
c = a[b]
print(c.size)
print(c)

In [None]:
# Comparing values in an array
# First create a random array to demonstrate this
data = np.random.randint( 5, size=(3,4) )
data

In [None]:
# Which values are equal to ...
data == 2

In [None]:
# Use this boolean array as index into the array
data[ data == 2 ]

In [None]:
# Another example, find all values > 3
data[ data >= 3 ]

In [None]:
# Selecting entire rows ...
a = np.array( [False, True, False] )
data[a]

### Example: student grades

Consider the following 2 arrays:
  * 1-d array with student names
  * 2-d array with multiple grades on rows. Each row contains the grades
  of one student.

In [None]:
# Generate some names and grades.
student = np.array(['Anne', 'Bob', 'Mary', 'John'])
grades = 6 * np.random.random(size=(4, 8)) + 4
grades

In [None]:
# Show the grades of Bob
bob_grades =  grades[ student == 'Bob' ]
bob_grades

In [None]:
# Note, the answer is still a 2d array (1,8) 
bob_grades.shape

In [None]:
# Enumerate over the entries in the array
for (nr, g) in enumerate( bob_grades[0] ):
    print( "Grade %d is %.1f" % (nr+1, g) )

In [None]:
# Show the grades of Anne and Mary
# Use parenthesis! 
# or = |
# and = &
grades [ (student == 'Anne') | (student == 'Mary') ]

In [None]:
# Show all the grades that did not pass
grades[grades < 5.8]

In [None]:
# Show the names who have received a 9.0 or higher at least once

# First, which values are >= 9
grades >= 9.0

In [None]:
# Now for each row, which ones have a True at least once...
np.any( grades > 9, axis=1 )

In [None]:
# Now use this as a boolean index into the student names.
student[ np.any( grades > 9, axis=1) ]

In [None]:
# Show the min, max, and average grade
print("Min", grades.min())
print("Max", grades.max())
print("Mean", grades.mean())

In [None]:
# Bob's average
grades[ student == 'Bob' ].mean()

In [None]:
# Averages per student
# mean() can take an argument to take the mean by collapsing a given axis (0=row, 1=column)
# This means: axis=0 - calculate mean per column, axis=1 - calculate men per row.
# No argument: average of all values.
# max and min can take the same argument.
grades.mean( axis=1 )


In [None]:
for (name, grade) in zip( student, grades.mean( axis=1 )):
    print('Average of %s is %.1f' % (name, grade) )

In [None]:
# Average grade per test (mean per column),
# so we can see which test was made the most difficult.
grades.mean( axis = 0 )

In [None]:
# Quick peek: Let's plot the averages
import matplotlib.pyplot as plt
plt.bar( np.arange( grades.shape[1] ), grades.mean( axis = 0 ) )

In [None]:
# Or plot the averages of each student
plt.bar( np.arange( grades.shape[0]), grades.mean( axis = 1), align='center', color='red' )

# Nice labels on the x-axis
plt.xticks( np.arange( len(student) ), student)

# Label the axis
plt.xlabel("Student")
plt.ylabel("Grade")

# Put the actual grade on top of the bars, for clarity
for (i, grade) in enumerate( grades.mean(axis=1)):
    plt.text( i - 0.13, grade + 0.25, "%.2f" % (grade) )


### Mathematical and Statistical functions

Some useful functions:
  * min(), max()
  * mean()
  * std(), var() - standard deviation and variance
  
  * argmin(), argmax() - index of the minimum/maximum argument
  
  * sum() - sum of all elements in array or along an axis.
  * cumsum() - cumulative sum of elements, starting at 0
  * cumprod() - cumulative product of elements, starting at 1
  

In [None]:
a = np.arange(8).reshape( (2,4) )
a

In [None]:
# Note the use of the axis argument 
print("Max =", a.max())
print("Max along axis 0 =", a.max(axis=0))
print("Max along axis 1 =", a.max(axis=1))

In [None]:
a = np.random.randint( 10, size=(2,4) )
a

In [None]:
# Find the index of the smallest value
print("index of smallest value =", a.argmin())
print("index of smallest value along axis 0 =", a.argmin(axis=0))
print("index of smallest value along axis 1 =", a.argmin(axis=1))

In [None]:
# Cumsum demonstration
print("Cumsum =", a.cumsum())
print("Cumsum along axis 0 =\n", a.cumsum(axis=0))
print("Cumsum along axis 1 =\n", a.cumsum(axis=1))

### Universal functions: Fast element-wise array functions

A universal function (*ufunc*) performs an element-wise operation on ndarrays.

  * sqrt()
  * square()
  * exp
  * log/log10/log2
  * sin/cos/tan/arcsin/arccos/arctan/...
  * abs  - absolute values
  * ceil/floor - rounding to nearst integer (down/up)
  
There are also binary universal functions
  * add / subtract/ multiply / divide - element-wise for two arrays
  * maximum / minimum - element-wise maximum / minimum of two arrays
  * greater / greater_equal / less / less_equal / equal / not_equal - 
      element-wise comparison, yiels ndarray of booleans
  * ...

In [None]:
# Example
a = np.arange(10)
print(a)

np.sqrt(a)

In [None]:
np.exp(a)

In [None]:
# Create 2 random arrays, normal distribution
a = np.random.randn(4)
b = np.random.randn(4)

print(a)
print(b)

In [None]:
# Pick the maximum of both arrays
np.maximum(a, b)

In [None]:
# Compare two arrays (element-wise)
np.greater(a,b)

## Methods on boolean arrays and conditional logic

To check for the existence of boolean values in an array we can use:

  * array.any( axis )   - check if any value is True (non-zero) in array
  * array.all( axis )   - check if all values are True (non-zero) in array
  
  * np.any( array )     - equivalent functions
  * np.all( arraY )    
  
The argument axis can be used to determine if the check should be along an axis.
  
With np.where( cond, array_x, array_y ) we can select values from array_x or array_y
based on a condition.
 

In [None]:
# Example
a = np.array( [True, False, False ] )
print("a.any():", a.any())
print("a.all():", a.all())

In [None]:
# Let's see a 2d matrix:
a = np.array( [ [True, False, False] , [ True, True, False]])
a


In [None]:
# Check along row-axis, which means create a value for each column
print("a.any(axis=0)", a.any(axis=0))

In [None]:
# Check along column-axis, which means create a value for each row
print("a.any(axis=1)", a.any(axis=1))

Example of np.where, see next example

### Operations on arrays

Simple operations such as + - * / are defined. They all work on each element separately.


In [None]:
a = np.arange(1,10).reshape( (3,3) )
b = np.arange(11,20).reshape( (3,3) )
print(a)
print(b)

In [None]:
# Same as np.add(a, b)
a + b

In [None]:
# Same as np.subtract(b, a)
b - a

In [None]:
# Same as np.multiply(a, b)
a * b

In [None]:
# Same as np.divide(b, a)
b / a 

In [None]:
# Operators with scalers work similar
a + 100

In [None]:
a * 2

In [None]:
1 / a

### Sorting and unique

There are 2 ways to sort an array:
  * array.sort() - sorts the array in-place
  * np.sort( array ) - returns a new sorted array
  
  * np.unique( array ) - returns a new sorted array with unique elements only.
  

In [None]:
a = np.random.randint( 10, size=(3,4) )
a

In [None]:
# With multi-dimensional arrays, think about the axis along which to sort.
# axis = None : flatten the array
# if axis not specified, sorted along the last axis.
np.sort( a, axis=None )

In [None]:
# Sort along row-axis, so sort the values in each "column"
np.sort( a, axis=0)

In [None]:
# This is how to sort per row.
np.sort( a, axis=1 )

In [None]:
# Find all unique elements in array.
np.unique( a )

### Exercise 2: Student and Grades

For the following questions, use the following student name array:

```
students = np.array(['Anne', 'Bob', 'Mary', 'John', 'Julia', 'Mike', 'Susan', 'Zach'])
```

  1. Generate a 8x20 matrix with (random) grades, representing the grades for each student (chose reasonable (random) values) for 20 assignments each.
  2. Find the students with the highest and with the lowest grade average
  3. Print a list of students and their average grades, sorted in descending order of average grades.
  
     *Hint* Use np.argsort(). To reverse a list you can use array[::-1]
  4. Find the average grade for each student, but ignore the failed grades (< 5.8)
  
     *Hint* Use np.ma.masked_array() to create a grade matrix in which you mask out the not wanted values. Then you can calculate the mean() value for the matrix ignoring masked out values.

In [None]:
# 2.1

In [None]:
# 2.2

In [None]:
# 2.3

In [None]:
# 2.4