## numpy Documentation 

* https://docs.scipy.org/doc/numpy/ 

I'll refer to the main numpy through out this tutorial to point out specific sections that are relevant. 

### Running Jupyter Notebook Cells

Jupyter notebook is a great way to code iteratively. You can run a cell by clicking on the "Run" button above or you can use the shortcut by holding both "Shift + Enter" or "Shift + Return" (on mac) to run a cell. 

In [1]:
# <-- Means comment. I'll be using a lot of comments in this notebook.
# In this first cell we are going to import the libraries to use for
# creating our matrix. 

# numpy is imported as 'np' as a convention. 

import numpy as np 

In [2]:
# There are a couple data structures that we will rely on. As you may 
# know the basic data structure is a numpy array (vector).

In [3]:
# Basic building block of numpy as the array. Here we are going to use
# a list (denoted by square brackets) to create an array. 

# Creating list mylist
mylist = [1,2,3,4]

# Transforming mylist into an array:
myarray = np.array(mylist)

# Reviewing the output of myarray:
print(myarray)

# Still looks like a list but it is really a numpy array. Let's 
# take a closer look.

[1 2 3 4]


In [4]:
# Reviewing the data type of myarray:
print(type(myarray))

# Notice that the array is called an ndarray. This is short for 
# n-dimensional array. We can quickly find the shape (number of 
# rows vs columns) by using a numpy method. 

<class 'numpy.ndarray'>


### Documentation for Shape

* https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html

In [5]:
# This means we have an array with four entries. We will use this later
# when verifying the shape of our randomly generated matrix. 
myarray.shape

(4,)

# Generating a Random Number

Relevant Doc: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.random.html

numpy provides a random class that generates real or integer numbers for us automatically. Let's take a look:

In [6]:
# How to generate a single real random number. By default it will
# generate a real number between 0 and 1.

np.random.random()

0.1451369609652211

In [7]:
# How to generate a single integer number. Here, the input determines
# what kind of data is returned to us. 

# If we enter one digit it will randomly select an integer from 0 to
# one less the provided number. For example, if we enter 3 into our
# randint method numpy will generate a number between 0 and 2. This 
# means it excludes the number we provided. Everything up to but not
# including the input will be randomly generated. 

np.random.randint(3)

0

# Generating a Random Matrix

Relevant doc: https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.randint.html

This is helpful in explaining the parameters for the randint method.

In [8]:
# Creating a matrix of shape 1000 x 1000 with randomly generated 
# integers from 1 to 9 (included).

data = np.random.randint(low=1, high=10, size=(1000,1000))

# This randint method is nice because it does the work of creating
# the randomly generated matrix with lower and upper thresholds for
# the element values. Plus, it defines the size of the matrix for you.


In [9]:
# Preview of the data. Too big to display all of it. What jupyter does
# is it gives you a preview of the first few records and then skip
# to the tail (or end) and show you the last few records in the 
# matrix. 
data

array([[8, 4, 3, ..., 1, 5, 8],
       [8, 6, 2, ..., 4, 5, 9],
       [4, 7, 3, ..., 1, 1, 3],
       ...,
       [8, 8, 5, ..., 2, 3, 3],
       [2, 4, 8, ..., 9, 8, 6],
       [6, 7, 9, ..., 2, 6, 1]])

In [10]:
# The shape method returns a tuple of the (rows, columns) for 
# the data array. 

data.shape

(1000, 1000)

In [11]:
# Checking the maximum value in the matrix. We would expect 9 since
# we defined our high parameter as 10. Recall that it is up to but not
# including the high number. 

data.max()

9

In [12]:
# Verifying that the lowest integer in our matrix matches the 
# parameters above. 

data.min()

1

In [13]:
# This shows the first two arrays in the data matrix. 
data[:2]

array([[8, 4, 3, ..., 1, 5, 8],
       [8, 6, 2, ..., 4, 5, 9]])

# Finding Inverse of a Matrix

Here, we will use the Linear Algebra class in the numpy library.

Relevant doc: https://docs.scipy.org/doc/numpy/reference/routines.linalg.html

In [14]:
# Let's create a matrix called A using the randint method.

A = np.random.randint(low=1, high=10, size=(1000,1000))

In [15]:
# Creates the inverse of matrix A.
inv_A = np.linalg.inv(A)

In [16]:
# Let's preview A.
inv_A

array([[-4.31429149e-02, -4.46806811e-03,  4.20605526e-02, ...,
         2.13410664e-03, -2.32041024e-02, -5.46730636e-02],
       [ 4.25537401e-02,  2.05757835e-02, -2.93632223e-02, ...,
         4.55843361e-03,  3.17459960e-02,  6.91201380e-02],
       [-4.29975831e-02, -4.32581909e-03,  4.35803857e-02, ...,
         1.43039067e-03, -2.04716194e-02, -6.49787279e-02],
       ...,
       [ 1.18553912e-01,  1.49706633e-02, -1.19088466e-01, ...,
        -1.08290056e-02,  7.56539636e-02,  1.75939293e-01],
       [ 2.60731670e-02,  7.31013404e-03, -4.50436442e-02, ...,
         5.57041169e-03,  3.22120928e-02,  5.50177204e-02],
       [-7.23417657e-02, -3.51255654e-02,  6.62278060e-02, ...,
        -9.12383401e-05, -5.29266162e-02, -1.10745716e-01]])

In [17]:
#  Okay, but how do we know that this actually generated the inverse
# of matrix A? We know that the dot product of a matrix and its inverse
# results in an identity matrix. 

np.dot(inv_A, A)

array([[ 1.00000000e+00,  3.92741395e-13,  1.73666637e-13, ...,
        -8.42173553e-14,  8.31279490e-14, -4.39648318e-14],
       [ 6.66716682e-13,  1.00000000e+00, -5.51475532e-13, ...,
        -2.31537012e-13, -1.26149091e-13, -4.32320846e-13],
       [-4.70387618e-13, -8.04911693e-14,  1.00000000e+00, ...,
        -4.11795598e-13, -3.81639165e-13,  4.52637927e-13],
       ...,
       [ 6.57807142e-14,  2.18047802e-13,  1.53876911e-13, ...,
         1.00000000e+00, -8.78186412e-14, -3.88578059e-15],
       [ 2.10040318e-14,  3.77475828e-14,  2.26416108e-14, ...,
         6.71337985e-14,  1.00000000e+00, -9.86710713e-15],
       [-4.53248550e-14, -1.14130927e-13, -7.85205234e-14, ...,
        -9.17321774e-14,  6.77791157e-14,  1.00000000e+00]])

In [18]:
# Looks a bit odd but this is because the numbers are very small.
# Let's use round and abs methods to clean this up.


identity_matrix = np.dot(inv_A, A)

# Absolute Value of rounded dot_product
identity_matrix = np.abs(identity_matrix.round())

In [19]:
identity_matrix

array([[1., 0., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       [0., 0., 1., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 0., 1.]])

In [20]:
# Ok, what about a matrix of size 10,000 x 10,000?


new_matrix = np.random.randint(low=1, high=10, size=(10000,10000))


# Handling Huge Arrays

* Use h5py
* Stores data on the hard drive.
* Definitely use compression otherwise you will fill up your hdd. 

In [1]:
import h5py
import numpy as np

In [3]:
hdf5_store = h5py.File("./data.hdf5", "w")
dim = 100000

results = hdf5_store.create_dataset(
                                    name='data', 
                                    shape=(dim, dim), 
                                    data=np.random.randint(low=1, high=10, size=(dim,dim)),
                                    compression='gzip'
                                   )

In [4]:
results.ndim

2

In [5]:
results.shape

(100000, 100000)

In [6]:
results.size

10000000000

In [7]:
results.dtype

dtype('int32')

In [8]:
hdf5_store.close()

In [2]:
with h5py.File('./data.hdf5', 'r') as f:
    data_set = f['data']
    data = data_set[:10]

In [3]:
data

array([[1, 3, 2, ..., 3, 1, 7],
       [5, 3, 2, ..., 2, 7, 7],
       [7, 2, 9, ..., 6, 8, 2],
       ...,
       [2, 5, 2, ..., 2, 4, 6],
       [2, 8, 1, ..., 2, 7, 7],
       [2, 1, 2, ..., 5, 2, 3]])