# CSE5ML Lab 2: Introduction to Python - Part 2

## Introduction

In Lab 1, many of you have some experience with Python and its working environment. From this week, we can move to a few popular libraries (numpy, pandas) that make Python a powerful tool for scientific computing.

This lab will focus on the widely-used library Numpy, which provides a high-performance multidimensional array object, and tools for working with these arrays. In detail, we will cover the usage of Arrays, Array indexing, Datatypes, Array math, Broadcasting, etc. in this tutorial.

Note: Before reading this tutorial you should know a bit of Python. If you would like to refresh your memory, take a look at Lab 1.

To work the examples in this tutorial, you must also have numpy and pandas package installed in your environment. These two package are pre-installed in the default anaconda environment, but if you want to use a newly created environment, then you will need to install these two packages in the new enviroment before continue this lab.

## Numpy

To use Numpy, we first need to import the `numpy` package:

In [1]:
import numpy as np


### Arrays

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In NumPy dimensions are called axes. The number of axes is rank.

For example, the coordinates of a point in 3D space [1, 2, 3] is an array of rank 1 (it is 1-dimensional), because it has one axis. That axis has a length of 3. In the example below (variable b), the array has rank 2 (it is 2-dimensional). The first dimension (axis) has a length of 2, the second dimension has a length of 3.

We can initialize numpy arrays from nested Python lists, and access elements using square brackets:

In [2]:
a = np.array([1, 2, 3])   # Create a rank 1 array

In [3]:
# Create a rank 2 array
# [[ 1  2  3]
#  [ 4  5  6]]
b = np.array([[1,2,3],[4,5,6]])    # Create a rank 2 array
print(b.shape)                     # Prints "(2, 3)"
print(b[0, 0], b[0, 1], b[1, 0])   # Prints "1 2 4"               

(2, 3)
1 2 4


The function zeros creates an array full of zeros, the function ones creates an array full of ones, and the function empty creates an array whose initial content is random and depends on the state of the memory. By default, the dtype of the created array is float64.

In [4]:
a = np.zeros((2,2))   # Create an array of all zeros
print(a)              # Prints "[[ 0.  0.]
                      #          [ 0.  0.]]"

[[0. 0.]
 [0. 0.]]


In [5]:
b = np.ones((1,2))    # Create an array of all ones
print(b)              # Prints "[[ 1.  1.]]"

[[1. 1.]]


In [6]:
np.ones( (2,3,4), dtype=np.int16 )                # dtype can also be specified

array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=int16)

In [7]:
c = np.full((2,2), 7)  # Create a constant array
print(c)               # Prints "[[ 7.  7.]
                       #          [ 7.  7.]]"

[[7 7]
 [7 7]]


In [8]:
d = np.eye(2)         # Create a 2x2 identity matrix
print(d)              # Prints "[[ 1.  0.]
                      #          [ 0.  1.]]"

[[1. 0.]
 [0. 1.]]


In [9]:
e = np.random.random((2,2))  # Create an array filled with random values
print(e)                     # Might print "[[ 0.91940167  0.08143941]
                             #               [ 0.68744134  0.87236687]]"

[[0.38746542 0.39078863]
 [0.80325207 0.66139971]]


To create sequences of numbers, NumPy provides a function analogous to range that returns arrays instead of lists. For example:

In [10]:
np.arange(10, 30, 5)

array([10, 15, 20, 25])

In [11]:
np.arange(0, 2, 0.3)                 # it accepts float arguments

array([0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])

### Array Attributes

NumPy’s array class is called ndarray and the more important attributes of an ndarray object are:

ndarray.ndim: the number of axes (dimensions) of the array. 

ndarray.shape: the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the rank, or number of dimensions, ndim.

ndarray.size: the total number of elements of the array. 

ndarray.dtype: an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.

Here are some instances for the usage of 'ndarray.ndim', 'ndarray.shape', 'ndarray.size', and 'ndarray.dtype'.

In [12]:
import numpy as np
a = np.arange(15).reshape(3, 5)
print (a)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


In [13]:
a.shape # should return (3,5)

(3, 5)

In [14]:
a.size  # should return 15

15

In [15]:
a.ndim  # should return 2

2

In [16]:
a.dtype # return 'int64'

dtype('int32')

Every numpy array is a grid of elements of the same type. Numpy provides a large set of numeric datatypes that you can use to construct arrays. Numpy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype. Here is an example:

In [17]:
import numpy as np

x = np.array([1, 2])   # Let numpy choose the datatype
print(x.dtype)         # Prints "int64"

int32


In [18]:
x = np.array([1.0, 2.0])   # Let numpy choose the datatype
print(x.dtype)             # Prints "float64"

float64


In [19]:
x = np.array([1.0, 2.0], dtype=np.int64)   # Force a particular datatype
print(x.dtype)                         # Prints "int64"

int64


You can read all about numpy datatypes in the [documentation](http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html). Also, you can read about other methods of array creation [in the tutorial](https://docs.scipy.org/doc/numpy/user/basics.creation.html#arrays-creation).

### Array indexing

Numpy offers several ways to index into arrays.

Slicing: Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array:

In [20]:
import numpy as np

# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]
b = a[:2, 1:3]

# A slice of an array is a view into the same data, so modifying it
# will modify the original array.
print(a[0, 1])   # Prints "2"
b[0, 0] = 77     # b[0, 0] is the same piece of data as a[0, 1]
print(a[0, 1])   # Prints "77"

2
77


You can also mix integer indexing with slice indexing. However, doing so will yield an array of lower rank than the original array. Note that this is quite different from the way that MATLAB handles array slicing:

In [21]:
import numpy as np

# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Two ways of accessing the data in the middle row of the array.
# Mixing integer indexing with slices yields an array of lower rank,
# while using only slices yields an array of the same rank as the
# original array:
row_r1 = a[1, :]    # Rank 1 view of the second row of a
row_r2 = a[1:2, :]  # Rank 2 view of the second row of a
print(row_r1, row_r1.shape)  # Prints "[5 6 7 8] (4,)"
print(row_r2, row_r2.shape)  # Prints "[[5 6 7 8]] (1, 4)"

# We can make the same distinction when accessing columns of an array:
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
print(col_r1, col_r1.shape)  # Prints "[ 2  6 10] (3,)"
print(col_r2, col_r2.shape)  # Prints "[[ 2]
                             #          [ 6]
                             #          [10]] (3, 1)"

[5 6 7 8] (4,)
[[5 6 7 8]] (1, 4)
[ 2  6 10] (3,)
[[ 2]
 [ 6]
 [10]] (3, 1)


Integer array indexing: When you index into numpy arrays using slicing, the resulting array view will always be a subarray of the original array. In contrast, integer array indexing allows you to construct arbitrary arrays using the data from another array. Here is an example:

In [22]:
import numpy as np

a = np.array([[1,2], [3, 4], [5, 6]])

# An example of integer array indexing.
# The returned array will have shape (3,) and
print(a[[0, 1, 2], [0, 1, 0]])  # Prints "[1 4 5]"

# The above example of integer array indexing is equivalent to this:
print(np.array([a[0, 0], a[1, 1], a[2, 0]]))  # Prints "[1 4 5]"

# When using integer array indexing, you can reuse the same
# element from the source array:
print(a[[0, 0], [1, 1]])  # Prints "[2 2]"

# Equivalent to the previous integer array indexing example
print(np.array([a[0, 1], a[0, 1]]))  # Prints "[2 2]"

[1 4 5]
[1 4 5]
[2 2]
[2 2]


One useful trick with integer array indexing is selecting or mutating one element from each row of a matrix:

In [23]:
import numpy as np

# Create a new array from which we will select elements
a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])

print(a)  # prints "array([[ 1,  2,  3],
          #                [ 4,  5,  6],
          #                [ 7,  8,  9],
          #                [10, 11, 12]])"

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [24]:
# Create an array of indices
b = np.array([0, 2, 0, 1])

# Select one element from each row of a using the indices in b
print(a[np.arange(4), b])  # Prints "[ 1  6  7 11]"

[ 1  6  7 11]


In [25]:
# Mutate one element from each row of a using the indices in b
a[np.arange(4), b] += 10

print(a)  # prints "array([[11,  2,  3],
          #                [ 4,  5, 16],
          #                [17,  8,  9],
          #                [10, 21, 12]])

[[11  2  3]
 [ 4  5 16]
 [17  8  9]
 [10 21 12]]


Boolean array indexing: Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:

In [26]:
import numpy as np

a = np.array([[1,2], [3, 4], [5, 6]])

bool_idx = (a > 2)   # Find the elements of a that are bigger than 2;
                     # this returns a numpy array of Booleans of the same
                     # shape as a, where each slot of bool_idx tells
                     # whether that element of a is > 2.

print(bool_idx)      # Prints "[[False False]
                     #          [ True  True]

                     #          [ True  True]]"

[[False False]
 [ True  True]
 [ True  True]]


We use boolean array indexing to construct a rank 1 array consisting of the elements of a corresponding to the True values of bool_idx

In [27]:
print(a[bool_idx])  # Prints "[3 4 5 6]"

# We can do all of the above in a single concise statement:
print(a[a > 2])     # Prints "[3 4 5 6]"

[3 4 5 6]
[3 4 5 6]


One more example to help you understand the indexing. Try the following commands and see the results

In [28]:
a = np.arange(10)**3
print(a)

[  0   1   8  27  64 125 216 343 512 729]


In [29]:
a[2]

8

In [30]:
a[2:5]

array([ 8, 27, 64], dtype=int32)

In [31]:
a[:6:2] = -1000    # equivalent to a[0:6:2] = -1000; from start to position 6, exclusive, set every 2nd element to -1000
print(a)

[-1000     1 -1000    27 -1000   125   216   343   512   729]


In [32]:
a[ : :-1]   # reversed a

array([  729,   512,   343,   216,   125, -1000,    27, -1000,     1,
       -1000], dtype=int32)

For brevity we have left out a lot of details about numpy array indexing; if you want to know more you should [read the documentation](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html)

Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas. For example:

In [33]:
b = np.arange(15).reshape(3, 5)
b

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [34]:
b[2,3]                         # select the element in the third row and fourth column of b

13

In [35]:
b[0:5, 1]                       # each row in the second column of b

array([ 1,  6, 11])

In [36]:
b[ : ,1]                        # equivalent to the previous example

array([ 1,  6, 11])

In [37]:
b[1:3, : ]                      # each column in the second and third row of b

array([[ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

### Array math

Basic mathematical functions operate elementwise on arrays, and are available both as operator overloads and as functions in the numpy module:

In [38]:
import numpy as np

x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array
# [[ 6.0  8.0]
#  [10.0 12.0]]
print(x + y)
print(np.add(x, y))

[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]


In [39]:
# Elementwise difference; both produce the array
# [[-4.0 -4.0]
#  [-4.0 -4.0]]
print(x - y)
print(np.subtract(x, y))

[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]


In [40]:
# Elementwise product; both produce the array
# [[ 5.0 12.0]
#  [21.0 32.0]]
print(x * y)
print(np.multiply(x, y))

[[ 5. 12.]
 [21. 32.]]
[[ 5. 12.]
 [21. 32.]]


In [41]:
# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(x / y)
print(np.divide(x, y))

[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]


In [42]:
# Elementwise square root; produces the array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(np.sqrt(x))

[[1.         1.41421356]
 [1.73205081 2.        ]]


Note that unlike MATLAB, `*` is elementwise multiplication, not matrix multiplication. We instead use the dot function to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices. dot is available both as a function in the numpy module and as an instance method of array objects:

In [43]:
import numpy as np

x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

v = np.array([9,10])
w = np.array([11, 12])

# Inner product of vectors; both produce 219
print(v.dot(w))
print(np.dot(v, w))

219
219


In [44]:
# Matrix / vector product; both produce the rank 1 array [29 67]
print(x.dot(v))
print(np.dot(x, v))

[29 67]
[29 67]


In [45]:
# Matrix / matrix product; both produce the rank 2 array
# [[19 22]
#  [43 50]]
print(x.dot(y))
print(np.dot(x, y))

[[19 22]
 [43 50]]
[[19 22]
 [43 50]]


In [46]:
# One more example, try to find the differences
A = np.array([[1,1],[0,1]] )
B =np.array([[2,0],[3,4]] )
print(A*B)                         # elementwise product
print(A.dot(B))                  # matrix product
print(np.dot(A, B))             # another matrix product

[[2 0]
 [0 4]]
[[5 4]
 [3 4]]
[[5 4]
 [3 4]]


Numpy provides many useful functions for performing computations on arrays; one of the most useful is `sum`:

In [47]:
import numpy as np

x = np.array([[1,2],[3,4]])

print(np.sum(x))  # Compute sum of all elements; prints "10"
print(np.sum(x, axis=0))  # Compute sum of each column; prints "[4 6]"
print(np.sum(x, axis=1))  # Compute sum of each row; prints "[3 7]"

10
[4 6]
[3 7]


You can find the full list of mathematical functions provided by numpy [in the documentation](http://docs.scipy.org/doc/numpy/reference/routines.math.html).

Apart from computing mathematical functions using arrays, we frequently need to reshape or otherwise manipulate data in arrays. The simplest example of this type of operation is transposing a matrix; to transpose a matrix, simply use the T attribute of an array object:

In [48]:
import numpy as np

x = np.array([[1,2], [3,4]])
print(x)    # Prints "[[1 2]
            #          [3 4]]"
print(x.T)  # Prints "[[1 3]
            #          [2 4]]"

[[1 2]
 [3 4]]
[[1 3]
 [2 4]]


In [49]:
# Note that taking the transpose of a rank 1 array does nothing:
v = np.array([1,2,3])
print(v)    # Prints "[1 2 3]"
print(v.T)  # Prints "[1 2 3]"

[1 2 3]
[1 2 3]


NumPy provides familiar mathematical functions such as sin, cos, and exp. In NumPy, these are called “universal functions”(ufunc). Within NumPy, these functions operate elementwise on an array, producing an array as output.

In [50]:
B = np.arange(3)
np.exp(B)

array([1.        , 2.71828183, 7.3890561 ])

In [51]:
np.sqrt(B)

array([0.        , 1.        , 1.41421356])

In [52]:
C = np.array([2., -1., 4.])
np.add(B, C)

array([2., 0., 6.])

Numpy provides many more functions for manipulating arrays; you can see the full list [in the documentation](https://docs.scipy.org/doc/numpy/reference/routines.array-manipulation.html).

### Reshape an Array

The shape of an array can be changed with various commands. Note that the following three commands all return a modified array, but do not change the original array:

In [53]:
a = np.floor(10*np.random.random((3,4))) #np.random.random() Return random floats in [0.0, 1.0). 
a

array([[1., 4., 0., 4.],
       [8., 5., 6., 9.],
       [1., 4., 4., 7.]])

In [54]:
a.ravel()  # returns the array, flattened

array([1., 4., 0., 4., 8., 5., 6., 9., 1., 4., 4., 7.])

In [55]:
a.reshape(6,2)  # returns the array with a modified shape

array([[1., 4.],
       [0., 4.],
       [8., 5.],
       [6., 9.],
       [1., 4.],
       [4., 7.]])

In [56]:
a.T  # returns the array, transposed

array([[1., 8., 1.],
       [4., 5., 4.],
       [0., 6., 4.],
       [4., 9., 7.]])

In [57]:
print(a.T.shape)
print(a.shape)

(4, 3)
(3, 4)


For more details about reshape function, please refer to the [documentation](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html)

### Stacking Arrays

Several arrays can be stacked together along different axes, i.e.,

In [58]:
a = np.floor(10*np.random.random((2,2)))
a

array([[9., 5.],
       [3., 1.]])

In [59]:
b = np.floor(10*np.random.random((2,2)))
b

array([[4., 0.],
       [5., 3.]])

In [60]:
np.vstack((a,b))

array([[9., 5.],
       [3., 1.],
       [4., 0.],
       [5., 3.]])

In [61]:
np.hstack((a,b))

array([[9., 5., 4., 0.],
       [3., 1., 5., 3.]])

### Splitting an Array into Smaller Ones

Using hsplit, you can split an array along its horizontal axis, either by specifying the number of equally shaped arrays to return, or by specifying the columns after which the division should occur. For example:

In [62]:
a = np.floor(10*np.random.random((2,12)))
a

array([[3., 6., 1., 3., 7., 0., 5., 7., 4., 1., 9., 0.],
       [6., 2., 8., 4., 5., 5., 4., 1., 0., 2., 6., 4.]])

In [63]:
np.hsplit(a,3)   # Split a into 3

[array([[3., 6., 1., 3.],
        [6., 2., 8., 4.]]),
 array([[7., 0., 5., 7.],
        [5., 5., 4., 1.]]),
 array([[4., 1., 9., 0.],
        [0., 2., 6., 4.]])]

In [64]:
np.hsplit(a,(3,4))   # Split a after the third and the fourth column

[array([[3., 6., 1.],
        [6., 2., 8.]]),
 array([[3.],
        [4.]]),
 array([[7., 0., 5., 7., 4., 1., 9., 0.],
        [5., 5., 4., 1., 0., 2., 6., 4.]])]

### Reading and Writing Files in Python

In this section, we will learn some basic opearation about reading and writing files. Moreover, as a data scientist, building an accurate machine learning model is not the end of the project. We will showing you how to save and load your machine learning model in Python.This allows you to save your model to file and load it later in order to make predictions.

#### Read txt file

In [65]:
f = open("files/Python.txt", "r") #opens file with name of "Python.txt"
# read and print the entire file
print(f.read())
# remember to colse the file
f.close()

Life is short,
Use Python!


Used the **readline()** method twice, we would get the first 2 lines because of Python's reading process.

In [66]:
f = open("files/Python.txt", "r") #opens file with name of "Python.txt"
# read the 1st line
print(f.readline())
# read the next line
print(f.readline())
f.close()

Life is short,

Use Python!


In [67]:
#opens file with name of "Python.txt"
f = open("files/Python.txt", "r") 
myList = []
for line in f:
    myList.append(line)
f.close()
    
print(myList)
print(myList[0])
print(myList[1])

['Life is short,\n', 'Use Python!']
Life is short,

Use Python!


#### Write txt file

In [68]:
# Write file with name of "test.txt"
f = open("files/test.txt","w")  
f.write("I love Python.\n")
f.write("I will be a Python master.\n")
f.write("I need to keep learning!")
f.close()

# read and see the test.txt file
f = open("files/test.txt","r") 
print(f.read())
f.close()

I love Python.
I will be a Python master.
I need to keep learning!


#### Read csv file

In [69]:
import csv
csvFile = open("files/test.csv", "r") 
reader = csv.reader(csvFile, delimiter=',')
# load the data in a dictionary 
result = {}
for item in reader:
    # ignore the first line
    if reader.line_num == 1:
        continue    
    result[item[0]] = item[1]
csvFile.close()

print(result)    

{'Ali': '25', 'Bob': '24', 'Chirs': '29'}


#### Write csv file

In [70]:
import csv
fileHeader = ["name", "age"]

d1 = ["Chris", "27"]
d2 = ["Ming", "26"]

csvFile = open("files/write.csv", "w")
writer = csv.writer(csvFile)
writer = csv.writer(csvFile)

# write the head and data
writer.writerow(fileHeader)
writer.writerow(d1)
writer.writerow(d2)

# Here is another command 
# writer.writerows([fileHeader, d1, d2])

csvFile.close()

# go to see the "write.csv" file.

You can find more information from the [documentation](https://docs.python.org/3.6/library/csv.html)

#### Using Pandas to Read CSV file

In [71]:
import pandas as pd
import numpy as np
# data = pd.read_csv("files/test.csv")
data = pd.read_csv("files/test.csv")
# data is pandas dataframe
print(data) 

# extract the age data
Age = np.array(data.Age, dtype = 'double')
print(Age)

# reshap this age vector
Age = np.reshape(Age, [3,1])
print(Age)

    Name  Age
0    Ali   25
1    Bob   24
2  Chirs   29
[25. 24. 29.]
[[25.]
 [24.]
 [29.]]


#### Creating a pandas dataframe from data

In [72]:
import pandas as pd

df = pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})
print(df)

   Yes   No
0   50  131
1   21    2


#### Writing a pandas datafrom to csv file

In [73]:
df.to_csv('files/out.csv', index=False)

Find more operation about Pandas in the [documentation](https://pandas.pydata.org/) and [cheatsheet](https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf)

### Save and Load file by Pickle

The Pickle pacakge is used for serializing and de-serializing a Python object structure. Any object in python can be pickled so that it can be saved on disk and loaded back to continue the work. 
You can read about them in the [documentation](https://docs.python.org/3.6/library/pickle.html?highlight=pickle#module-pickle).

In [74]:
import numpy as np
import pickle
X = np.eye(5)
print(X)

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]


In [75]:
# Save the matirx X
with open('files/X.pickle', 'wb') as f:
    pickle.dump(X, f)
# Change the value of the original X    
X =  X + 4
print(X)

[[5. 4. 4. 4. 4.]
 [4. 5. 4. 4. 4.]
 [4. 4. 5. 4. 4.]
 [4. 4. 4. 5. 4.]
 [4. 4. 4. 4. 5.]]


In [76]:
# load the matrix 
with open('files/X.pickle', 'rb') as f:
    X = pickle.load(f)
print(X)

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]


### Some Useful Resources

This brief overview has touched on many of the important things that you need to know about numpy, but is far from complete. Check out the [numpy reference](http://docs.scipy.org/doc/numpy/reference/) to find out much more about numpy.

For some basic operations of Numpy package, you can download the [Cheat Sheet for NumPy Basics](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf)

Large parts of this lab note originate from the following two resources:
[Quickstart tutorial](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)  and
[Python Numpy Tutorial](http://cs231n.github.io/python-numpy-tutorial/). Hereby, we acknowledge the contributors and developers for their efforts in providing these useful online resources.

### Some Supplementatry contents for numpy array: Broadcasting

Broadcasting allows universal functions to deal in a meaningful way with inputs that do not have exactly the same shape. It is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

For example, suppose that we want to add a constant vector to each row of a matrix. We could do it like this:

In [77]:
import numpy as np

# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = np.empty_like(x)   # Create an empty matrix with the same shape as x

# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
    y[i, :] = x[i, :] + v

# Now y is the following
# [[ 2  2  4]
#  [ 5  5  7]
#  [ 8  8 10]
#  [11 11 13]]
print(y)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


This works; however when the matrix `x` is very large, computing an explicit loop in Python could be slow. Note that adding the vector v to each row of the matrix `x` is equivalent to forming a matrix `vv` by stacking multiple copies of `v` vertically, then performing elementwise summation of `x` and `vv`. We could implement this approach like this:

In [78]:
import numpy as np

# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
vv = np.tile(v, (4, 1))   # Stack 4 copies of v on top of each other
print(vv)                 # Prints "[[1 0 1]
                          #          [1 0 1]
                          #          [1 0 1]
                          #          [1 0 1]]"
y = x + vv  # Add x and vv elementwise
print(y)  # Prints "[[ 2  2  4
          #          [ 5  5  7]
          #          [ 8  8 10]
          #          [11 11 13]]"

[[1 0 1]
 [1 0 1]
 [1 0 1]
 [1 0 1]]
[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:

In [79]:
import numpy as np

# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = x + v  # Add v to each row of x using broadcasting
print(y)  # Prints "[[ 2  2  4]
          #          [ 5  5  7]
          #          [ 8  8 10]
          #          [11 11 13]]"

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


The line `y = x + v` works even though `x` has shape `(4, 3)` and `v` has shape `(3,)` due to broadcasting; this line works as if v actually had shape `(4, 3)`, where each row was a copy of `v`, and the sum was performed elementwise.

Broadcasting two arrays together follows these rules:

1. If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
2. The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.
3. The arrays can be broadcast together if they are compatible in all dimensions.
4. After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.
5. In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension

If this explanation does not make sense, try reading the explanation from the [documentation](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) or this [explanation](http://wiki.scipy.org/EricsBroadcastingDoc).

Functions that support broadcasting are known as universal functions. You can find the list of all universal functions in the [documentation](http://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs).

Here are some applications of broadcasting:

In [80]:
import numpy as np

# Compute outer product of vectors
v = np.array([1,2,3])  # v has shape (3,)
w = np.array([4,5])    # w has shape (2,)
# To compute an outer product, we first reshape v to be a column
# vector of shape (3, 1); we can then broadcast it against w to yield
# an output of shape (3, 2), which is the outer product of v and w:
# [[ 4  5]
#  [ 8 10]
#  [12 15]]
print(np.reshape(v, (3, 1)) * w)

[[ 4  5]
 [ 8 10]
 [12 15]]


In [81]:
# Add a vector to each row of a matrix
x = np.array([[1,2,3], [4,5,6]])
# x has shape (2, 3) and v has shape (3,) so they broadcast to (2, 3),
# giving the following matrix:
# [[2 4 6]
#  [5 7 9]]
print(x + v)

[[2 4 6]
 [5 7 9]]


In [82]:
# Add a vector to each column of a matrix
# x has shape (2, 3) and w has shape (2,).
# If we transpose x then it has shape (3, 2) and can be broadcast
# against w to yield a result of shape (3, 2); transposing this result
# yields the final result of shape (2, 3) which is the matrix x with
# the vector w added to each column. Gives the following matrix:
# [[ 5  6  7]
#  [ 9 10 11]]
print((x.T + w).T)

[[ 5  6  7]
 [ 9 10 11]]


In [83]:
# Another solution is to reshape w to be a column vector of shape (2, 1);
# we can then broadcast it directly against x to produce the same
# output.
print(x + np.reshape(w, (2, 1)))

[[ 5  6  7]
 [ 9 10 11]]


In [84]:
# Multiply a matrix by a constant:
# x has shape (2, 3). Numpy treats scalars as arrays of shape ();
# these can be broadcast together to shape (2, 3), producing the
# following array:
# [[ 2  4  6]
#  [ 8 10 12]]
print(x * 2)

[[ 2  4  6]
 [ 8 10 12]]


Broadcasting typically makes your code more concise and faster, so you should strive to use it where possible.