# Section 1: Introduction to Jupyter-Notebooks
## Cell Types
In Jupyter-Notebooks there are two main types of cells: 
* **Code:** where python (or other languages) code will be written and executed
* **Markdown:** where documentation and organizational details can be written. Interprets the Markdown language. 

This cell is a Markdown cell. Click on this cell to expose the underlying Markdown code. 

In addition, in the toolbar directly above, cells can be toggled between *code* and *Markdown*. Additional buttons in the toolbar above allow for cutting, copying, pasting, executing, and stopping cells. Keyboard shortcuts also allow for running these commands without needing to click on the toolbar.

## Toolbar
The toolbar at the top of a Jupyter-Notebook has many different tools, which are briefly reviewed below:
* **File:** The usual open/save commands. Also includes a **Download as** function, which allows for exporting Notebooks to pure python, HTML, and PDF.
* **Edit:** Functions for cutting, copying, and pasting cells. Also includes functions for splitting and merging cells.
* **Insert:** Functions for inserting new cells.
* **Cell:** Functions for executing a single or multiple cells.
* **Kernal:** Functions for starting, restarting, and shutting down Notebooks.
* **Navigate:** If Notebooks are documented with headers, the Navigate tab allows users to quickly jump to different points in the code.

## Printing and Getting Help
Use the **print** command to print variables.

In [1]:
print('Hello, World!')

Hello, World!


With Jupyter notebooks, calling the variable itself acts as a print statement.

In [2]:
'Hello, World!'

'Hello, World!'

Getting help is easy in Notebooks: append a '?' to the end of a function/variable.

In [3]:
print?

Jupyter-Notebooks also allow for tab-completion. This can be tremendously helpful in seeing what functions are available. Move the cursor to the function below and hit the tab key.

In [None]:
str.

# Section 2: Introduction to Python

## Basic Data Types in Python

### Integers, Floats, and Mathematic Operations
Integers are numbers without a decimal point.

In [5]:
print(1, type(1))

1 <class 'int'>


Floats are numbers with a decimal point.

In [6]:
print(1.3, type(1.3))

1.3 <class 'float'>


Adding a decimal point will convert an integer to a float.

In [7]:
print(1., type(1.0))

1.0 <class 'float'>


These are the basic operators in python:

In [8]:
print(4 + 2)   # Addition
print(4 - 2)   # Substraction
print(4 * 2)   # Multiplication
print(4 / 2)   # Division
print(4 % 2)   # Remainder (modulo)
print(4 ** 2)  # Exponent

6
2
8
2.0
0
16


Floats dominate integers in operations:

In [9]:
print(4 + 2.)
print(4 / 2.)
print(4 ** 2.)
print(4 // 2)  # Except for integer division!

6.0
2.0
16.0
2


In [10]:
5 // 2

2

Python also supports scientific notation and complex numbers:

In [11]:
print(1e3)
print(2+3j)

1000.0
(2+3j)


In python, variable assignment is denoted with the equal sign (=). 

In [12]:
x = 1.

The numeric classes are special in that they allow in-place assignment for convenience. This 

In [13]:
## In-place addition
x += 1.
print(x)

## In-place multiplcation.
x *= 2.
print(x)

## In-place exponentiation.
x **= 2.
print(x)

2.0
4.0
16.0


### Text and Strings 

Strings are demarcated by single or double quotation marks.

In [14]:
string = 'a run-of-the-mill string'
string

'a run-of-the-mill string'

Paragraphs (e.g. docstrings) can be written with triple quotes:

In [15]:
paragraph = '''You can use the triple quotes to write paragraphs
of text. Note that any line-break is maintained. Triple quotes are
used to define function docstrings.'''

paragraph

'You can use the triple quotes to write paragraphs\nof text. Note that any line-break is maintained. Triple quotes are\nused to define function docstrings.'

One very useful feature is string substition, where text or numbers can be inserted into a string. String substitution is denoted by the by the percent operator. Different substitutions types exist.  

In [16]:
# %s: Insert as string (no modification).
print('Hi, my name is %s!' %'Sam')   

Hi, my name is Sam!


In [17]:
# %0._f: Insert a number rounded to the _th digit.
print('Pi to the 2nd digit is %0.2f.' %3.14159) 

Pi to the 2nd digit is 3.14.


In [18]:
# %0._d: Insert a number prepended with _ zeros.
print('Prepend two zeros: %0.3d' %1)

Prepend two zeros: 001


The string object has many associated functions that can be used to modify the string.

In [19]:
print( 'Original:      %s' %string)
print( 'Capitalize:    %s' %string.capitalize() )
print( 'Uppercase:     %s' %string.upper() )
print( 'Count "i"s:    %s' %string.count('i') ) 
print( 'Replace (i,o): %s' %string.replace('i','o'))

Original:      a run-of-the-mill string
Capitalize:    A run-of-the-mill string
Uppercase:     A RUN-OF-THE-MILL STRING
Count "i"s:    2
Replace (i,o): a run-of-the-moll strong


Strings are easily combined, through with the 
addition operator or the join attribute.

In [20]:
print('This is the first half.' + ' ' + 'This is the second half.')

print(' '.join(['This is the first half.', 'This is the second half.']))

This is the first half. This is the second half.
This is the first half. This is the second half.


### Booleans
The Boolean objects in Python are the **True** and **False** objects.

In [21]:
print(True, False) 

True False


These are the comparison operators in python:

In [22]:
print('4 > 2:  %s' %(4 > 2))    # Greater than
print('4 < 2:  %s' %(4 < 2))    # Less than
print('4 == 2: %s' %(4 == 2))   # Equal to
print('4 >= 2: %s' %(4 >= 2))   # Greater than or equal to
print('4 <= 2: %s' %(4 <= 2))   # Less than or equal to
print('4 != 2: %s' %(4 != 2))   # Not equal to

4 > 2:  True
4 < 2:  False
4 == 2: False
4 >= 2: True
4 <= 2: False
4 != 2: True


In Python, True and False are equvalent to the integers 1 and 0, respectively.

In [23]:
print(True == 1)   # True is equivalent to 1.
print(False == 0)  # False is equivalent to 0.

True
True


Python also uses **is** and **not** operators.

In [24]:
print(4 is 2)
print(4 is not 2)

False
True


## Containers
### Lists
Lists are the most basic container and are denoted by brackets. Lists can store any pythonic type, and elements of a list do not need to be of the same type.

In [25]:
example_list = [1, 1., 1e3, 2+3j, True]
print(example_list)

[1, 1.0, 1000.0, (2+3j), True]


Brackets are used again to index into lists. **NOTE:** Python is a 0-indexed language. The first element of a list 
is the 0th position of the list! 

In [26]:
print('The first element of the list is %s.' %example_list[0])
print('The third element of the list is %s.' %example_list[2])
print('The last element of the list is %s.' %example_list[-1])

The first element of the list is 1.
The third element of the list is 1000.0.
The last element of the list is True.


Mutliple elements from a list can be retrieved through slicing, which uses the colon operator.

In [27]:
print(example_list[1:3])    # Second-through-third elements (up-to-not-include)
print(example_list[1:])     # Second element onwards.
print(example_list[:2])     # Up to (but not including) the third element element.

[1.0, 1000.0]
[1.0, 1000.0, (2+3j), True]
[1, 1.0]


Slicing also allows for the following operations:

In [28]:
print('list       = %s' %example_list)

## Second-to-last element onwards
print('list[-2:]  = %s' %example_list[-2:])  

## Up to second-to-last element
print('list[:-2]  = %s' %example_list[:-2])    

## Every other element 
print('list[::2]  = %s' %example_list[::2])

## Reverse elements.
print('list[::-1] = %s' %example_list[::-1]) 

list       = [1, 1.0, 1000.0, (2+3j), True]
list[-2:]  = [(2+3j), True]
list[:-2]  = [1, 1.0, 1000.0]
list[::2]  = [1, 1000.0, True]
list[::-1] = [True, (2+3j), 1000.0, 1.0, 1]


Slicing operators can be combined. Here we extract every other element, starting from the second through the second-to-last.

In [29]:
print(example_list[1:-1:2])

[1.0, (2+3j)]


Indexing and slicing can also be used to update elements in the list.

In [30]:
example_list[-1] = False
example_list

[1, 1.0, 1000.0, (2+3j), False]

We can add new elements to the list using **append**. Note that this occurs **in-place.**

In [31]:
example_list.append( 111 )
example_list

[1, 1.0, 1000.0, (2+3j), False, 111]

**Insert** allows for adding new elements to specified positions.

In [32]:
## (Index, Value)
example_list.insert(0, 222)
example_list

[222, 1, 1.0, 1000.0, (2+3j), False, 111]

Elements can be removed from a list using the **pop** or **remove** function. **Pop** deletes an element by its index, **remove** deletes an element by its value.

In [33]:
print('Original: %s' %example_list)

## Pop the third element
example_list.pop(2)

## Remove the value 111.
example_list.remove(111)

print('Now: %s' %example_list)

Original: [222, 1, 1.0, 1000.0, (2+3j), False, 111]
Now: [222, 1, 1000.0, (2+3j), False]


The contents of a list can also be tested with the **in** operator.

In [34]:
print( 222 in example_list )
print( 999 in example_list )

True
False


Strings are essentially lists with characters.

In [35]:
print('string       = %s' %string)

## 4th character onwards.
print('string[4:]   = %s' %string[4:])   

## Reversed string.
print('string[::-1] = %s' %string[::-1]) 

## Every other character.
print('string[::2]  = %s' %string[::2])

string       = a run-of-the-mill string
string[4:]   = n-of-the-mill string
string[::-1] = gnirts llim-eht-fo-nur a
string[::2]  = arno-h-ilsrn


### Tuples
Tuples are denoted by parantheses. Tuples are like lists except that they are **immutable.** Tuples cannot be modified once they are created. 

In [36]:
example_list = [1, 2, 3, 4]
example_tuple = (1, 2, 3, 4)

In [37]:
## Change the second element of the list.
example_list[1] = 9
print(example_list)

[1, 9, 3, 4]


In [38]:
## Change the second element of the tuple.
example_tuple[1] = 9
print(example_tuple)

TypeError: 'tuple' object does not support item assignment

### Dictionaries
Dictionaries are simple lookup tables. They are denoted by curly brackets.

In [39]:
example_dict = {'a':1, 'b':2, 'c':3}
print(example_dict)
print(example_dict['c'])

{'a': 1, 'b': 2, 'c': 3}
3


Dictionaries can also be generated using the **dict()** command. Notice the slightly different syntax.

In [40]:
example_dict = dict(a=1, b=2, c=3)
print(example_dict)
print(example_dict['c'])

{'a': 1, 'b': 2, 'c': 3}
3


Dictionaries are comprised of "keys" and "values".

In [41]:
print(example_dict.keys())
print(example_dict.values())

dict_keys(['a', 'b', 'c'])
dict_values([1, 2, 3])


Once initialized, new key/value pairs can be stored in a dictionary.

In [42]:
example_dict['d'] = 4
print(example_dict)

{'a': 1, 'b': 2, 'c': 3, 'd': 4}


## Control Flow in Python

### For and While Loops
For loops have a very simple syntax. 

In [43]:
for x in [0, 1, 2, 3, 4]:
    print(x)

0
1
2
3
4


Elements of a list can be directly iterated over in python.

In [44]:
for x in ['how', 'now', 'brown', 'cow']:
    print(x)

how
now
brown
cow


For loops can be paired with the **enumerate** command for indexing.

In [45]:
for i, x in enumerate([2,4,1,2]):
    print(i,x)

0 2
1 4
2 1
3 2


The **zip** command can be used to iterate over multiple lists at once.

In [46]:
list1 = [0, 2, 4, 6, 8]
list2 = ['zero', 'two', 'four', 'six', 'eight']

for a,b in zip(list1,list2):
    print(a,b)

0 zero
2 two
4 four
6 six
8 eight


**While** loops are similarly simple. While loops are initialized with a boolean statement. While True, the while loop will continue executing. Once False, the while loop terminates.

In [47]:
i = 0
while i < 5:
    print(i)
    i += 1  

0
1
2
3
4


### Conditional logic with if, elif, else

In python, the three conditional statements are if, elif, and else.
Here we will construct a simple for-loop testing parity.

In [48]:
example_list = [4, 7, 9.4]

for x in example_list:
    
    if x % 2 == 0: 
        print('%s is even.' %x)
        
    elif x % 2 == 1:
        print('%s is odd.' %x)
        
    else: 
        print('%s is not an integer.' %x)

4 is even.
7 is odd.
9.4 is not an integer.


### Contiue and Break statements
Conditional logic statements can be paired with the "continue" and "break" 
statments for additional control flow in For and While loops. The "continue"
statement skips the current iteration of a For/While loop, whereas the
"break" statement terminates the For/While loop.

Below is an example of the continue statement. The for loop skips at the odd numbers.

In [49]:
example_list = [4, 7, 9.4]

for x in example_list:
    
    if x % 2 == 0: 
        print('%s is even.' %x)
        
    elif x % 2 == 1:
        continue
        print('%s is odd.' %x)
        
    else: 
        print('%s is not an integer.' %x)

4 is even.
9.4 is not an integer.


An example of the **break** statement. The for loop terminates at the first odd number.

In [50]:
example_list = [4, 7, 9.4]

for x in example_list:
    
    if x % 2 == 0: 
        print('%s is even.' %x)
        
    elif x % 2 == 1:
        break
        print('%s is odd.' %x)
        
    else: 
        print('%s is not an integer.' %x)

4 is even.


### List comprehensions
Python also allows for embedding For loops within lists as a nifty way of constructing/modifying lists. List comprehensions are very powerful (though sometimes memory intensive) and can be constructed with a few different syntaxes.

In [51]:
example_list = [0,1,2,3,4]

[x for x in example_list]

[0, 1, 2, 3, 4]

Inclusive/exclusive list comprehension: here we exclude variables from the list if they do not meet a certain criterion.

In [52]:
[x for x in example_list if x > 2]

[3, 4]

Conditional list comprehension: here we transform variables from the list based on whether they meet a certain criterion.

In [53]:
['odd' if x % 2 == 1 else 'even' for x in example_list]

['even', 'odd', 'even', 'odd', 'even']

Note that else statements can be chained in list comprehensions.

In [54]:
example_list = [4, 7, 9.4]

['odd' if x % 2 == 1 else 'even' if x % 2 == 0 else 'non-integer' for x in example_list]

['even', 'odd', 'non-integer']

### Error handling with try/except logic.
Python allows for intelligent error handling with the "try" and "except" logics. Code nested under a "try" command will be evaluated. If an error arises under a try block, the code in the except block will be evaluated instead. This is useful for handling exceptions and preventing scripts from breaking. Use with caution though if you cannot predict error corner cases!

Here we will test try/except logic with a division-by-zero error. Note that I am specifying the error class, i.e. ZeroDivisionError. Python has a number of built-in error types, and it is better to specify the exact type of error you expect to ensure that only those types of errors are passed to the except block. Multiple excepts are permissible in try/except workflows.

In [55]:
## An example divide by zero error.
example_list = [2, 10, 0]

for x in example_list:
    print(20 / x)

10.0
2.0


ZeroDivisionError: division by zero

In [56]:
## An example try/catch handling divide by zero errors.
example_list = [2, 10, 0]

for x in example_list:
    try:
        print(20 / x)
    except ZeroDivisionError: 
        print('You cannot divide 0, silly!')

10.0
2.0
You cannot divide 0, silly!


## Basic Commands in Python

The most important command is the **range** command. Range produces a list of sequential integers and is identical to **seq** in R.

In [57]:
print('range(5)     = %s' %range(5))     # Specify stop position.
print('range(0,5)   = %s' %range(1,5))   # Specify start & stop position
print('range(0,5,2) = %s' %range(0,5,2)) # Specify start, stop, and by.

for x in range(0,5,2):
    print(x)

range(5)     = range(0, 5)
range(0,5)   = range(1, 5)
range(0,5,2) = range(0, 5, 2)
0
2
4


Each python type has its own function for the purpose of converting variables to different types. 

In [58]:
X = range(5)

print( list(X) )    # Range as list.
print( tuple(X) )   # Range as tuple.
print( str([1, 12, 4]) )     # Range as string (converts to string literally).

[0, 1, 2, 3, 4]
(0, 1, 2, 3, 4)
[1, 12, 4]


The **type** and **isinstance** commands can be used to check the type of variables.

In [59]:
X = [1, 4, 5, 2]

print( type(X) )
print( type(X) == str )
print( isinstance(X, list) )

<class 'list'>
False
True


There are commands to modify lists.

In [60]:
example_list = [4, 1, 5, 2, 7, 1, 2, 3]

print(sorted(example_list))    # Sort list.
print(set(example_list))       # Get unique elements of list. Returns set.

[1, 1, 2, 2, 3, 4, 5, 7]
{1, 2, 3, 4, 5, 7}


In [61]:
## There are also commands to summarize lists.
print(len(example_list))    # Count element in list.
print(sum(example_list))    # Sum across list.
print(min(example_list))    # Min across list.
print(max(example_list))    # Max across list.

8
25
1
7


In [62]:
## The any/all commands are very useful for testing if any 
## conditionals are met in a list.
example_list = [True if x > 2 else False for x in range(5)]
print(example_list)
print(any(example_list))
print(all(example_list))

[False, False, False, True, True]
True
False


## Defining Functions
Define new functions with **def** and **return** functions. Any arguments specified in the **def** statement are necessary arguments for the newly defined function. 

In [63]:
def average(X):
    '''Compute arithmetic mean for list, X.'''
    return sum(X) / len(X)

average?

In [64]:
## Define an example list.
example_list = [2, 2, 2, 3, 3, 2, 1, 4]

## Compute average.
print( 'Average: %0.3f' %average(example_list) )

Average: 2.375


Arguments of user-defined functions can also be assigned default parameters.

In [65]:
def average(X, weights=False):
    '''Compute arithmetic mean for list, X. Weighted average will be computed if weights are given.'''
    if not weights: weights = [1] * len(X)
    return sum( [x*w for x,w in zip(X,weights)] ) / sum(weights)    

## Define some weights.
weights = [1, 1, 1, 0.5, 0.5, 2, 4, 0.25]

## Compute average / weighted average.
print( 'Average:          %0.3f' %average(example_list) )
print( 'Weighted average: %0.3f' %average(example_list, weights) )

Average:          2.375
Weighted average: 1.756


In [66]:
## The lambda operator can also be used to define short functions.
average = lambda X: sum(X) / len(X)

print( 'Average: %0.3f' %average(example_list) )

Average: 2.375


# Section 3: Introduction to NumPy

## Importing Modules
If vanilla python seems rather lackluster, that's because it is. Fortunately, the scientific stack adds a broad and powerful array of python packages fill in the gaps. Once installed, packages in python are easily loaded for use.

In [67]:
import numpy
print(numpy.__version__)

1.13.0


Commands from packages are like attributes of objects. For convenience, we will import packages using shorthand.

In [68]:
import numpy as np
print(np.__version__)

1.13.0


## NumPy Arrays
### Why arrays improve on lists
Arrays are the most basic type of the NumPy package. NumPy arrays are vectors (Nx1), similar to pythonic lists. In contrast to lists, however, arrays have many more attributes and can be modified in substantially more ways. Several examples are provided below demonstrating the improvement of arrays over lists.

In [69]:
## Define example list.
example_list = list(range(5))

print(example_list)
print(example_list * 3)            # scalar * list
print(example_list * example_list) # list * list

[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4]


TypeError: can't multiply sequence by non-int of type 'list'

In contrast, NumPy arrays can be modified in this way. We use the **arange** command to initialize an array of sequential integers.

In [70]:
arr = np.arange(5)

print(arr, type(arr))
print(arr * 3)
print(arr * arr)

[0 1 2 3 4] <class 'numpy.ndarray'>
[ 0  3  6  9 12]
[ 0  1  4  9 16]


Every array has an object type. These can be looked up and modified.

In [71]:
print(arr, arr.dtype)   # Print current datatype.
arr = arr.astype(float) # Conver to float.
print(arr, arr.dtype)   # Print new datatype.

[0 1 2 3 4] int64
[ 0.  1.  2.  3.  4.] float64


Numpy arrays store metadata about their contents. These can be helpful, especially the **shape** atribute.

In [72]:
print('Array shape:', arr.shape) # Print shape of array.
print('Array size:', arr.nbytes) # Print bytes of array.

Array shape: (5,)
Array size: 40


Arrays now have a number of other built-in attributes 
not available for lists.

In [73]:
print('Round:', arr.round()) # Round array.
print('Min:', arr.min())     # Get max of array.
print('Max:', arr.max())     # Get min of array.
print('Sum:', arr.sum())     # Get sum of array.
print('Mean:',arr.mean())    # Get mean of array.

Round: [ 0.  1.  2.  3.  4.]
Min: 0.0
Max: 4.0
Sum: 10.0
Mean: 2.0


### Generating NumPy Arrays
There are many ways of generating NumPy arrays. The most simple way is to convert a Python list to NumPy array using the **array** command.

In [74]:
## Making an array from a list using the array command.
example_list = [4, 7, 9.4]
arr = np.array(example_list)

print(example_list, type(example_list))
print(arr, type(arr)) 

[4, 7, 9.4] <class 'list'>
[ 4.   7.   9.4] <class 'numpy.ndarray'>


NumPy has recreated all of the standard R/Matlab commands for 
generating arrays.

In [75]:
print('np.arange(5)        = %s' %np.arange(5))         # Array of 5 sequential integers.
print('np.zeros(5)         = %s' %np.zeros(5))          # Array of 5 zeros.
print('np.ones(5)          = %s' %np.ones(5))           # Array of 5 ones.
print('np.linspace(0,10,5) = %s' %np.linspace(0,10,5))  # Length-5 evenly-spaced array from 0 to 10.

np.arange(5)        = [0 1 2 3 4]
np.zeros(5)         = [ 0.  0.  0.  0.  0.]
np.ones(5)          = [ 1.  1.  1.  1.  1.]
np.linspace(0,10,5) = [  0.    2.5   5.    7.5  10. ]


## NumPy Matrices
### Why matrices improve on lists
It is possible to represent matrices in pythonic lists, though it is inefficient. Similar to the benefits of arrays, NumPy matrices dramatically improve upon the numerical capabilities of core python. Python can technically represent matrices as a list of lists.

In [76]:
nested_lists = [[1,2,3],
                [4,5,6],
                [7,8,9]]

print(nested_lists)
print(nested_lists[1][2])   # To extract the 2nd row, 3rd column, two brackets are necessary.

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
6


NumPy matrices make this much easier!

In [77]:
mat = np.array(nested_lists)

print(mat, type(mat))
print(mat[1,2])

[[1 2 3]
 [4 5 6]
 [7 8 9]] <class 'numpy.ndarray'>
6


Indexing of NumPy matrices (and arrays for that matter) obey all of the slicing conventions of lists. Commas are used to demarcate which axis a slice operation is targeting.

In [78]:
print('mat[1,2]  = %s' %mat[1,2])    # Second row, third column.
print('mat[0,:]  = %s' %mat[0,:])    # All the first row.
print('mat[:,-1] = %s' %mat[:,-1])   # All of the final column.

mat[1,2]  = 6
mat[0,:]  = [1 2 3]
mat[:,-1] = [3 6 9]


NumPy matrices have all the same attributes of NumPy arrays, but now functions can be applied to specific rows or columns in addition to the entire matrix.

In [79]:
## Sum across matrix.
print(mat)
print( mat.sum() )          

[[1 2 3]
 [4 5 6]
 [7 8 9]]
45


In [80]:
## Sum across columns.
print( mat.sum(axis=0) )

[12 15 18]


In [81]:
## Sum across rows.
print( mat.sum(axis=1) )

[ 6 15 24]


Importantly, all NumPy arrays and matrices have a **reshape** attribute allowing for transforming matrices into different dimensions.

In [82]:
print('Original shape', mat.shape)

# Reshape to column vector
mat = mat.reshape(9,1)
print('Column vector', mat.shape)

# Reshape to column vector
mat = mat.reshape(1,9)
print('Row vector', mat.shape)

Original shape (3, 3)
Column vector (9, 1)
Row vector (1, 9)


Importantly, reshape can be used to change the shape of NumPy arrays. The order flag can also change how they are organized (row-ordered vs. column-ordered).

In [83]:
print('Original:', mat)

Original: [[1 2 3 4 5 6 7 8 9]]


In [84]:
## Reshape (column organized)
print(mat.reshape(3,3,order='C'))

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [85]:
## Reshape (row organized)
print(mat.reshape(3,3,order='F')) 

[[1 4 7]
 [2 5 8]
 [3 6 9]]


The dimensions of matrices can also be quickly changed with **flatten** and **squeeze**. 

In [86]:
## Reshape to new dimensions.
mat = mat.reshape(3,3,1)
print('Original:', mat.shape)

## Flatten matrix.
print('Flatten:', mat.flatten().shape )

## Squeeze matrix.
print('Squeeze:', mat.squeeze().shape )

Original: (3, 3, 1)
Flatten: (9,)
Squeeze: (3, 3)


### Generating NumPy Matrices
Just as with arrays, there are a number of ways of generating NumPy matrices. The simplest is to use the **array** command on a list of lists. 

In [87]:
nested_lists = [[0, 1, 1],[2, 3, 5], [8, 13, 21]]
mat = np.array(nested_lists)

print(nested_lists)
print(mat)

[[0, 1, 1], [2, 3, 5], [8, 13, 21]]
[[ 0  1  1]
 [ 2  3  5]
 [ 8 13 21]]


The same commands previously introduced to generate NumPy arrays can also be used to generate matrices. Simply specify extra dimensions.

In [88]:
np.zeros( [3,3] )               # 3x3 matrix of zeros.
np.ones( [3,3] )                # 3x3 matrix of ones.
np.arange(9).reshape(3,3)       # 3x3 matrix of sequential integers.
np.linspace(0,8,9).reshape(3,3) # 3x3 matrix evenly-spaced array from 0 to 8. 
np.identity(3)                  # 3x3 identity matrix.

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

Matrices can also be formed by joining NumPy arrays. There are several methods for doing this, including: **r_**, **c_**, **hstack**, **vstack**, and **concatenate**. We demonstrate each below. 

In [89]:
## np.r_ = join two arrays on their first axis.
arr = np.arange(5)
print('Original', arr)

## Join on first axis.
np.r_[arr,arr]

Original [0 1 2 3 4]


array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4])

In [90]:
## np.c_ = join two arrays on their second axis.
## Create a second axis if does not exist.

np.c_[arr,arr]

array([[0, 0],
       [1, 1],
       [2, 2],
       [3, 3],
       [4, 4]])

In [91]:
## np.hstack = join arrays along their columns.

print(np.hstack([arr,arr]))
print(np.hstack([arr.reshape(5,1), arr.reshape(5,1)]))

[0 1 2 3 4 0 1 2 3 4]
[[0 0]
 [1 1]
 [2 2]
 [3 3]
 [4 4]]


In [92]:
## np.vstack = join arrays along their rows.
np.vstack([arr,arr])
print(np.vstack([arr.reshape(5,1), arr.reshape(5,1)]))

[[0]
 [1]
 [2]
 [3]
 [4]
 [0]
 [1]
 [2]
 [3]
 [4]]


In [93]:
## np.concatenate = join arrays along specified axis.
## Default is first axis.

print(np.concatenate([arr, arr], axis=0))
print(np.concatenate([arr.reshape(5,1), arr.reshape(5,1)], axis=1))

[0 1 2 3 4 0 1 2 3 4]
[[0 0]
 [1 1]
 [2 2]
 [3 3]
 [4 4]]


## Core NumPy Functions
NumPy also introduces a number of useful functions designed to operate efficiently over NumPy arrays. The following is a non-exhaustive overview of some important NumPy functions.

### Rounding Functions

In [94]:
mat = np.linspace(0,1,5)
print('Original: %s' %mat)
print('np.round: %s' %np.round(mat, 1) )
print('np.floor: %s' %np.floor(mat) ) 
print('np.ceil:  %s' %np.ceil(mat) )

Original: [ 0.    0.25  0.5   0.75  1.  ]
np.round: [ 0.   0.2  0.5  0.8  1. ]
np.floor: [ 0.  0.  0.  0.  1.]
np.ceil:  [ 0.  1.  1.  1.  1.]


### Mathematical functions

NumPy includes a variety of mathematical functions. All of these can be applied across an entire matrix or across arrays.

In [95]:
np.sum;       # Sum of an array or matrix.
np.cumsum;    # Cumulative sum over an array.
np.prod;      # Element-wise multiplication of an array.
np.divide;    # Element-wise division of two arrays.
np.diff;      # Pairwise difference of elements of an array.
np.exp;       # Exponential transform.
np.log;       # Natural logarithm.
np.log10;     # Base-10 logarithm.

### Summary Functions

NumPy includes many functions to summarize an array. With the exception of **corrcoef**, all of these can be
applied across an entire matrix or across arrays.

In [96]:
np.min;           # Return the smallest element.
np.max;           # Return the largest element.
np.argmin;        # Return the index of the smallest element.
np.argmax;        # Return the index of the largest element.
np.mean;          # Compute the mean of an array.
np.median;        # Compute the median of an array.
np.std;           # Compute the standard deviation of an array.
np.var;           # Compute the variance (sd^2) of an array.
np.percentile;    # Compute the xth percentile of an array.
np.corrcoef;      # Compute the row-/col-wise correlation of a matrix.

In [97]:
## To give a few examples.
mat = np.vstack([ np.arange(5), np.arange(5)[::-1] ])
print('Original:\n%s' %mat)

Original:
[[0 1 2 3 4]
 [4 3 2 1 0]]


In [98]:
## Compute percentile.
print( '70%% (all):  %s' %np.percentile(mat, 70) )

## Compute mean across rows.
print('70%% (rows): %s' %np.percentile(mat, 70, axis=1) )

## Compute mean across cols.
print('70%% (cols): %s' %np.percentile(mat, 70, axis=0) )

70% (all):  3.0
70% (rows): [ 2.8  2.8]
70% (cols): [ 2.8  2.4  2.   2.4  2.8]


In [99]:
## Compute correlation.
print('Correlation:\n', np.corrcoef(mat))

Correlation:
 [[ 1. -1.]
 [-1.  1.]]


### Set Functions
NumPy includes functions for identifying unique elements within or between arrays.

In [100]:
## Define two arrays for example.
arr1 = np.array([41, 16, 34, 0, 2, 20, 19, 14, 22, 15, 18, 9, 35, 41])
arr2 = np.array([42, 22, 40, 7, 33, 0, 12, 19, 44, 10, 31, 11, 11, 49])

In [101]:
## Sort elements (ascending order).
np.sort(arr1)

array([ 0,  2,  9, 14, 15, 16, 18, 19, 20, 22, 34, 35, 41, 41])

In [102]:
## Return unique elements.
np.unique(arr1)

array([ 0,  2,  9, 14, 15, 16, 18, 19, 20, 22, 34, 35, 41])

In [103]:
## Return unique elements, count number of appearances.
np.unique(arr1, return_counts=True)

(array([ 0,  2,  9, 14, 15, 16, 18, 19, 20, 22, 34, 35, 41]),
 array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2]))

In [104]:
## Find the elements of array-1 in array-2.
np.in1d(arr1, arr2)

array([False, False, False,  True, False, False,  True, False,  True,
       False, False, False, False, False], dtype=bool)

In [105]:
## Return all unique elements of arrays 1 & 2.
np.union1d(arr1, arr2)

array([ 0,  2,  7,  9, 10, 11, 12, 14, 15, 16, 18, 19, 20, 22, 31, 33, 34,
       35, 40, 41, 42, 44, 49])

In [106]:
## Return all elements belonging to both arrays 1 & 2.
np.intersect1d(arr1, arr2)

array([ 0, 19, 22])

### Replacing List Comprehensions

NumPy includes a number of very helpful functions that act to replace list comprehensions (np.where) and for loops (np.apply_across_axis, np.apply_over_axes). These are often more efficient than writing out a full For loop. We will emphasize these functions with a simple example of standard-scoring (z-scoring) a matrix.

In [107]:
## Define the standard score (z-score) function.
def zscore(arr): 
    return (arr - arr.mean()) / arr.std()

## Define a simple matrix.
mat = np.arange(12).reshape(2,6)
print(mat)

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]]


Use **apply_across_axis** to apply our function across each row.

In [108]:
zmat = np.apply_along_axis(zscore, axis=1, arr=mat)
print(zmat.round(2))

[[-1.46 -0.88 -0.29  0.29  0.88  1.46]
 [-1.46 -0.88 -0.29  0.29  0.88  1.46]]


Use the **where** command to set all negative numbers to 0, else 1. **where** is identical to the **which** command in R. 

In [109]:
amat = np.where(zmat < 0, 0, 1)
print(amat)

[[0 0 0 1 1 1]
 [0 0 0 1 1 1]]


If no transforms are specified, **where** returns the indices of the array where the conditional is met.

In [110]:
print( np.where(zmat < 0 ) )

(array([0, 0, 0, 1, 1, 1]), array([0, 1, 2, 0, 1, 2]))


### Linear Algebra Functions

NumPy includes an entire submodule dedicated to efficient linear algebra functions (though it should be noted that SciPy has reimplemented them for maximal efficiency). See np.linalg for a full list of commands.

In [111]:
## Define a simple matrix.
mat = np.arange(16).reshape(4,4)
print(mat)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


In [112]:
## Transpose the matrix
print(mat.T)           

[[ 0  4  8 12]
 [ 1  5  9 13]
 [ 2  6 10 14]
 [ 3  7 11 15]]


In [113]:
## Return diagonal of matrix
print(np.diag(mat))

[ 0  5 10 15]


In [114]:
## Return upper triangular matrix
print(np.triu(mat))    

[[ 0  1  2  3]
 [ 0  5  6  7]
 [ 0  0 10 11]
 [ 0  0  0 15]]


In [115]:
## Matrix multiply itself. Can also np.dot.
print(np.dot(mat, mat))    

[[ 56  62  68  74]
 [152 174 196 218]
 [248 286 324 362]
 [344 398 452 506]]


In [116]:
## Can also use:
print(mat.dot(mat))

[[ 56  62  68  74]
 [152 174 196 218]
 [248 286 324 362]
 [344 398 452 506]]


In [117]:
## Linear algebra operations include:
np.linalg.norm;        # Vector or matrix norm
np.linalg.inv;         # Inverse of a square matrix
np.linalg.det;         # Determinant of a square matrix
np.linalg.eig;         # Eigenvalues and vectors of a square matrix
np.linalg.cholesky;    # Cholesky decomposition of a matrix
np.linalg.svd;         # Singular value decomposition of a matrix
np.linalg.lstsq;       # Solve linear least-squares problem

### Generating Random Data
NumPy also includes many functions for generating random data. 

In [118]:
## Set the RNG seed!
np.random.seed(47404)

In [119]:
## Generate ten random integers between 0-9.
print( np.random.randint(0,10,10) )

[9 0 2 0 2 4 3 4 6 5]


In [120]:
## Generate five random samples of a normal distribution with mu=0,sd=1.
print( np.random.normal(0,1,5) )

[-1.46523567  0.72885891 -0.73496833 -0.38356834 -0.29662156]


In [121]:
## Generate 10 random coin flips.
print( np.random.binomial(1,0.5,10))

[1 1 0 0 1 0 0 1 1 0]


In [122]:
## Choose five numbers from 0-9 without replacement.
print( np.random.choice(np.arange(10), 5, replace=False) )

[8 2 9 7 6]
