# Python Day 2 

## Goals for Today

There are three main goals for today's lecture:

1. Getting comfortable with making, manipulating, and visualizing NumPy arrays. 
2. Building the habit of searching for and reading [code documentation](https://numpy.org/doc/stable/reference/).
3. Breaking down complex programming challenges step-by-step using pseudo-code.

## Section 1: Importing Libraries
If vanilla python seems rather lackluster, that's because it is. Fortunately, the python scientific stack adds a broad and powerful array of python packages to fill in the gaps. Once installed, packages in python are easily loaded for use.

In [1]:
import numpy as np
print(np.__version__)

1.18.1


Commands from packages are like attributes of objects. Many libraries also have submodules, or clusters of related functions.

In [None]:
np.linalg
np.random

## Section 2: NumPy Array Basics

NumPy arrays have many built-in attributes that make them very convenient to use.

In [2]:
np.random.seed(47404)

## Generate arbitrary array.
x = np.random.randint(0,9,(3,3,2))

With ipython environments, if you cannot remember the functions available you can make use of **tab-complete**.

In [None]:
## Try out tab-complete.
x

The following attributes help you keep track of the most important pieces of metadata.

In [3]:
print(x.shape)    # shape: dimensions of array
print(x.ndim)     # ndim:  number of dimensions of array
print(x.size)     # size:  number of elements in array
print(x.dtype)    # dtype: data type of elements

(3, 3, 2)
3
18
int64


Changing the dtype of an array is easy!

In [4]:
x.astype(int);         # change to int
x.astype(str);         # change to string
x.astype(float);       # change to float (default)
x.astype(np.float16);  # change to float16

Many useful functions are built-in to NumPy arrays.

In [5]:
print('Min:', x.min())      # Get max of array.
print('Max:', x.max())      # Get min of array.
print('Sum:', x.sum())      # Get sum of array.
print('Mean:',x.mean())     # Get mean of array.

Min: 0
Max: 8
Sum: 64
Mean: 3.5555555555555554


### Mini-exercise

a) Look up `np.linspace`. How does it differ from `np.arange`? 

b) Using `np.linspace`, make an evenly-spaced array, 21 elements long, spanning from -1 to 1. Confirm it's length = 21.

c) Compute the standard deviation of the array. (Hint: this is a built-in attribute.)

d) Convert the elements to type `int`. What happens?

## Section 3: Axis Operations

### Standard operations

In [6]:
## Generate arbitrary matrix.
X = np.arange(16).reshape(4,4)
print(X)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


In [7]:
print( X.sum() )

120


In [8]:
## Sum across rows.
print( X.sum(axis=0) )

[24 28 32 36]


In [9]:
## Sum across columns.
print( X.sum(axis=1) )

[ 6 22 38 54]


### Detour: defining functions

In [10]:
def minmax_scale(arr, feature_range=(0,1)):
    """Scale array to given range.
    
    Parameters
    ----------
    arr : 1d array
        The data to scale.
    feature_range : tuple (min, max), default=(0, 1)
        Desired range of transformed data.
        
    Returns
    -------
    arr_scale : 1d array
        Transformed array.
    """
    
    ## Error-catching.
    arr = np.array(arr)
    
    ## Scale between [0,1].
    arr_scale = (arr - arr.min()) / (arr.max() - arr.min())
    
    ## Scale between [min,max].
    a, b = feature_range
    arr_scale = arr_scale * (b - a) + a
    
    return arr_scale

make sure it works

In [11]:
X_scale = minmax_scale(X)
print(X_scale.round(2))

[[0.   0.07 0.13 0.2 ]
 [0.27 0.33 0.4  0.47]
 [0.53 0.6  0.67 0.73]
 [0.8  0.87 0.93 1.  ]]


### Apply along axis 

Use `np.apply_along_axis` to apply our function across each row.

In [12]:
## Apply across columns.
X_scale = np.apply_along_axis(minmax_scale, 0, X)
print(X_scale)

[[0.         0.         0.         0.        ]
 [0.33333333 0.33333333 0.33333333 0.33333333]
 [0.66666667 0.66666667 0.66666667 0.66666667]
 [1.         1.         1.         1.        ]]


In [13]:
## Apply across rows.
X_scale = np.apply_along_axis(minmax_scale, 1, X)
print(X_scale)

[[0.         0.33333333 0.66666667 1.        ]
 [0.         0.33333333 0.66666667 1.        ]
 [0.         0.33333333 0.66666667 1.        ]
 [0.         0.33333333 0.66666667 1.        ]]


### Mini-exercise
How would you make sure you applied `minmax_scale` to the right axis?

## Section 4: Manipulating Arrays 

### Shaping Arrays

Importantly, all NumPy arrays and matrices have a **reshape** attribute allowing for transforming matrices into different dimensions.

In [14]:
## Make new array.
x = np.arange(24)

## Reshape array.
X = x.reshape(24,1);      # col vector
X = x.reshape(1,24);      # row vector
X = x.reshape(6,4);       # 2d array
X = x.reshape(4,3,2);     # 3d array

Smart reshaping

In [15]:
X = x.reshape(3,-1,2)
X.shape

(3, 4, 2)

In [16]:
X.T.shape

(2, 4, 3)

In [17]:
X.swapaxes(0,1).shape

(4, 3, 2)

In [18]:
np.rollaxis(X,1).shape

(4, 3, 2)

### Joining Arrays

Assuming they have similar shapes & dtypes, NumPy arrays are easily joined. In general, the syntax is to pass a list of arrays to a joining function. 

In [19]:
## Initialize array.
x = np.arange(10).reshape(5,2)
print(x)

[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]


The simplest (silliest) merging approach is to use `np.array` to concatenate two arrays along the first axis. 

In [20]:
## Join arrays.
X = np.array([x,x])
print(X)
print('shape:', X.shape)

[[[0 1]
  [2 3]
  [4 5]
  [6 7]
  [8 9]]

 [[0 1]
  [2 3]
  [4 5]
  [6 7]
  [8 9]]]
shape: (2, 5, 2)


There are other joining functions, differing in how (along which axis) they merge two arrays. Some examples are below:

In [21]:
## Join along first axis.
X = np.concatenate([x,x])
print(X)
print('shape:', X.shape)

[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]
 [0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]
shape: (10, 2)


In [22]:
## Join along rows (same as vstack).
X = np.row_stack([x,x])
print(X)
print('shape:', X.shape)

[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]
 [0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]
shape: (10, 2)


In [23]:
## Join along columns (same as hstack).
X = np.column_stack([x,x])
print(X)
print('shape:', X.shape)

[[0 1 0 1]
 [2 3 2 3]
 [4 5 4 5]
 [6 7 6 7]
 [8 9 8 9]]
shape: (5, 4)


In [24]:
## Join along specified axis.
X = np.concatenate([x,x], axis=0)
print(X)
print('shape:', X.shape)

[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]
 [0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]
shape: (10, 2)


In [25]:
## Join along new axis.
X = np.stack([x,x], axis=0)
print(X)
print('shape:', X.shape)

[[[0 1]
  [2 3]
  [4 5]
  [6 7]
  [8 9]]

 [[0 1]
  [2 3]
  [4 5]
  [6 7]
  [8 9]]]
shape: (2, 5, 2)


### Mini-exercise

a) Beginning below with `x`, a random [4,1] array, use various **joining** functions above until you have `X`, a new matrix of shape [12,3]. The order of the elements of `X` don't matter.

In [26]:
x = np.random.uniform(size=(4,1))

b) `Y` is a new random variable of shape [1,12,7,3]. Reshape `Y` until you can subtract `X` from it. In other words, reshape this new variable until you can execute: ``` Y - X```.

In [27]:
Y = np.random.normal(size=(1,12,7,3))

## Section 5: Indexing, Masking, and Assignments

NumPy supports a great many ways of indexing.

In [28]:
np.random.seed(47404)

## Construct an arbitrary matrix.
X = np.random.randint(0, 10, (10,10))

In [28]:
## Access particular rows.
X[:1]

In [28]:
## Access particular columns.
X[:,:5]

In [28]:
## Access particular rows/columns.
X[:5,5:]

In [28]:
## Access using lists of indexes.
X[[1,3,5],[5,1,2]]

array([6, 1, 1])

Far more useful is indexing with boolean arrays.

In [29]:
## Return all elements of matrix that meet criterion.
X[X > 5]

In [29]:
## Return all rows that begin with particular integer.
X[X[:,0] == 1]

In [29]:
## Return all columns whose sum is greater than 40.
X[:,X.sum(axis=0) > 40]

array([[0, 2, 0, 3, 4, 5],
       [9, 8, 3, 8, 7, 0],
       [3, 6, 7, 3, 3, 3],
       [1, 7, 0, 4, 6, 6],
       [5, 5, 9, 1, 9, 1],
       [5, 1, 8, 1, 7, 9],
       [4, 1, 5, 0, 5, 8],
       [9, 0, 9, 7, 9, 4],
       [9, 6, 8, 8, 2, 5],
       [2, 9, 6, 6, 0, 7]])

For larger matrices, we can use the ellipsis as a shorthand.

In [None]:
X = np.random.randint(0,9,(5,5,5,5))
X[0,...,-1]

We can update NumPy arrays in place.

In [None]:
np.random.seed(47404)

## Construct an arbitrary matrix.
X = np.random.randint(0, 10, (10,10))

## Update first element.
mat[0,0] = 99

## Update full row.
mat[3,:] = 99

## Update multiple columns.
mat[:,-2:] = 0

This also allows for convenient masking.

In [None]:
mat[mat==5] = 99

For a complete list of convenient routines, see the [NumPy indexing documentation](https://docs.scipy.org/doc/numpy/user/basics.indexing.html#).

`np.where` will also return indices

In [30]:
np.random.seed(47404)

## Construct an arbitrary matrix.
X = np.random.randint(0, 10, (10,10))

np.where(X > 5)

(array([0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5,
        5, 5, 5, 6, 7, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 9]),
 array([0, 8, 1, 2, 4, 5, 6, 7, 2, 3, 8, 2, 4, 7, 9, 0, 3, 4, 7, 8, 3, 5,
        7, 8, 9, 9, 0, 1, 3, 6, 7, 1, 2, 3, 6, 2, 3, 5, 6, 9]))

multiple operations

In [34]:
X[np.logical_and(X > 5, X % 2)]

array([9, 9, 9, 7, 7, 7, 7, 9, 9, 7, 7, 7, 9, 9, 9, 7, 9, 9, 9, 7])

In [35]:
(X > 5) & (X % 5)

array([[0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
       [0, 0, 1, 0, 0, 1, 1, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 0, 0, 0, 1, 0, 1],
       [1, 0, 0, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 1, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 1, 0, 1, 1, 0, 0, 0]])

In [33]:
X[np.logical_or(X < 2, X > 8)]

array([9, 0, 0, 9, 9, 1, 0, 1, 1, 0, 0, 9, 1, 9, 1, 0, 1, 1, 9, 1, 0, 0,
       9, 0, 9, 1, 0, 9, 0, 9, 9, 0, 0, 1])

In [37]:
(X < 2) | (X > 8)

array([[ True,  True, False,  True, False, False, False, False, False,
        False],
       [False,  True, False, False,  True, False, False, False,  True,
         True],
       [False, False, False, False,  True, False, False, False, False,
        False],
       [False,  True, False,  True, False,  True, False, False, False,
        False],
       [False, False, False,  True, False, False,  True,  True, False,
         True],
       [ True, False,  True, False, False, False,  True, False, False,
         True],
       [False, False,  True, False, False, False,  True, False,  True,
        False],
       [False,  True,  True,  True,  True,  True, False,  True, False,
        False],
       [ True,  True, False, False, False, False, False, False, False,
        False],
       [False, False,  True, False,  True, False, False,  True,  True,
        False]])

### Mini-exercise

masking reaction times. Find fast and slow RTs. Compute mean on masked.

In [69]:
np.random.seed(47404)

## Generate random data.
Y = np.random.normal([0.6,0.8,1.0], 0.2, size=(100,3)).flatten()
Y[np.random.choice(np.arange(Y.size), 15, replace=False)] = np.random.uniform(0.0,0.1,15)  # Simulate fast RTs.
Y[np.random.choice(np.arange(Y.size), 15, replace=False)] = np.random.uniform(2.0,2.4,15)  # Simulate slow RTs.
Y = Y.reshape(100,3)

a) How many fast RTs (< 0.2s) are there per column?

array([8, 7, 3])

b) How many slow RTs (> 2.0s) are there per column?

c) Set the outlier (fast and slow) RTs to -1.

d) Compute the mean of each column, excluding the outlier RTs.

## Section 6: Noteworthy Functions

### Mathematical functions

NumPy includes a variety of mathematical functions. All of these can be applied across an entire matrix or across arrays.

In [None]:
np.sum;       # Sum of an array or matrix.
np.cumsum;    # Cumulative sum over an array.
np.prod;      # Element-wise multiplication of an array.
np.divide;    # Element-wise division of two arrays.
np.diff;      # Pairwise difference of elements of an array.
np.exp;       # Exponential transform.
np.log;       # Natural logarithm.
np.log10;     # Base-10 logarithm.

Many arithmatic operations have a built-in outer function.

In [None]:
np.add.outer(np.arange(4), np.arange(10))

### Rounding Functions

In [None]:
mat = np.linspace(0,1,5)
print('Original: %s' %mat)
print('np.round: %s' %np.round(mat, 1) )
print('np.floor: %s' %np.floor(mat) ) 
print('np.ceil:  %s' %np.ceil(mat) )

### Summary Functions

NumPy includes many functions to summarize an array. With the exception of `np.corrcoef`, all of these can be
applied across an entire matrix or across arrays.

In [None]:
np.min;           # Return the smallest element.
np.max;           # Return the largest element.
np.argmin;        # Return the index of the smallest element.
np.argmax;        # Return the index of the largest element.
np.mean;          # Compute the mean of an array.
np.median;        # Compute the median of an array.
np.std;           # Compute the standard deviation of an array.
np.var;           # Compute the variance (sd^2) of an array.
np.percentile;    # Compute the xth percentile of an array.
np.corrcoef;      # Compute the row-/col-wise correlation of a matrix.

### Set Functions
NumPy includes functions for identifying unique elements within or between arrays.

In [None]:
## Define two arrays for example.
arr1 = np.array([41, 16, 34, 0, 2, 20, 19, 14, 22, 15, 18, 9, 35, 41])
arr2 = np.array([42, 22, 40, 7, 33, 0, 12, 19, 44, 10, 31, 11, 11, 49])

In [None]:
## Sort elements (ascending order).
np.sort(arr1)

In [None]:
## Return unique elements.
np.unique(arr1)

In [None]:
## Return unique elements, count number of appearances.
np.unique(arr1, return_counts=True)

In [None]:
## Find the elements of array-1 in array-2.
np.in1d(arr1, arr2)

In [None]:
## Return all unique elements of arrays 1 & 2.
np.union1d(arr1, arr2)

In [None]:
## Return all elements belonging to both arrays 1 & 2.
np.intersect1d(arr1, arr2)

### Brief Note on NaNs

NumPy has a unique NaN class. `np.nan` dominates all other numeric types.

In [None]:
print(7. * np.nan)              # NaN dominates numeric types.
print(np.arange(5) * np.nan)    # NaN dominates numeric arrays.

NaNs may appear wherever there is missing data, or when an operation returns an invalid number.

In [None]:
## Example array.
arr = np.arange(15,dtype=float).reshape(3,5)
arr[1,-1] = np.nan
print(arr)

NaNs can be challenging because they corrupt most standard routines.

In [None]:
print(arr.max(axis=1))     # NaNs corrupt max.
print(arr.mean(axis=1))    # NaNs corrupt mean.

NumPy offers a suite of NaN robust functions. These are slower, but can be useful in analysis.

In [None]:
print(np.nanmax(arr, axis=1))     # NaN robust max.
print(np.nanmean(arr, axis=1))    # NaN robust mean.

## Section 7: Visualization w/ Matplotlib

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline 

## NOTE: The second line is a bit of notebook magic! 
## It's a jupyter-notebook shortcut that makes all
## plots be displayed at the bottom of a cell.

### Basic Example
Lineplots are more intuitive than are barplots, requiring at the minimum only the x- and y-datapoints. Many tweaks and embellishments can similarly be added. 

In [None]:
## Initialize canvas.
fig, ax = plt.subplots(1,1,figsize=(12,4))

## Define sigmoid function.
def inv_logit(arr):
    return 1. / (1 + np.exp(-arr))

## Simulate data.
x = np.linspace(-5,5,101)

## Plot lines.
for b in [0.5,1.0,2.0]:
    ax.plot(x, inv_logit(x * b), lw=2.5, label=r'$y = logit^{-1}( \ %0.1f \cdot x \ )$' %b)

## Add details.
ax.set(xlim=(x.min(), x.max()), xlabel='X', ylim=(0), ylabel='Y', title='Example Lineplot')
ax.legend(loc=2, frameon=False, fontsize=14)

plt.tight_layout()

### Mini-exercise

Divide into three groups. Each group will have 10 minutes to learn about and make an example plot for one of the following graph types:
- barplot
- scatterplot
- histogram
- heatmap
- something Sam hasn't thought of

After 10 minutes, each group will give a mini-demonstration for the other students.