## What are the key features of numpy?

* Multi-dimensional arrays
* Built-in array operatioms
* Simplified, but powerful array interactions -> broadcasting
* Integration of other languages (Fortran, C, C++)

## Why numpy for data science?

* **Speed** (more than python's list)
* **Functionality**

Many packages are built on Numpy such as Pandas, for example.


In [125]:
import numpy as np

an_array = np.array([4,55,6])   # Create a rank 1 array

print(type(an_array))             # The type of an ndarray is: "<class 'numpy.ndarray'>"


<class 'numpy.ndarray'>


In [126]:
# test the shape of the array we just created, it should have just one dimension
print(an_array.shape)

(3,)


In [127]:
an_array[0] = 9    # ndarrays are mutable, here we change an element

print(an_array)

[ 9 55  6]


## How to create a Rank 2 numpy array:

A rank 2 **ndarray** is one with two dimensions. Notice that format below of [[row], [row]].
2 dimensional arrays are great for representing matrics which are often useful in data science.


In [128]:
another = np.array([[11,12,14], [21,22,23]])

print(another) 

print("The shape is 2 rows and 3 columns: ", another.shape)

print("Accessing elements [0,0], [0,1], and [1,0] of the ndarray: ", another[0,0], ",", another[0,1], ",", another[1,0])

[[11 12 14]
 [21 22 23]]
The shape is 2 rows and 3 columns:  (2, 3)
Accessing elements [0,0], [0,1], and [1,0] of the ndarray:  11 , 12 , 21


## There are many ways to create numpy arrays:

Here we create a number of different size arrays with different shapes and different pre-filled values. Numpy has a number of built-in methods with help us quickly and easily create multidimensional arrays.

In [129]:
import numpy as np

# Create a 2*2 array of zeros
ex1 = np.zeros((2,2))
print(ex1)

[[ 0.  0.]
 [ 0.  0.]]


In [130]:
# Create a 2*2 array filled with 9.0
ex2 = np.full((3,3), 9.0)
print(ex2)

[[ 9.  9.  9.]
 [ 9.  9.  9.]
 [ 9.  9.  9.]]


In [131]:
# IDENTITY Matrix: Create a 2*2 matrix with the diagonal 1s and the others 0

ex3 = np.eye(4,4)
print(ex3)

[[ 1.  0.  0.  0.]
 [ 0.  1.  0.  0.]
 [ 0.  0.  1.  0.]
 [ 0.  0.  0.  1.]]


In [132]:
# Create an array of ones

ex4 = np.ones((3,9))
print(ex4)

[[ 1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.]]


In [133]:
# notice that the above ndarray (ex4) is actually rank 2, it is a 2*1 array
print(ex4.shape)

# which means we need to use two indexes to access an element
print()
print(ex4[0,5])

(3, 9)

1.0


In [134]:
# create an array of random floats between 0 and 1
ex5 = np.random.random((2,2))
print(ex5)

[[ 0.90725145  0.24889629]
 [ 0.64695576  0.31958327]]


In [135]:
# One random integer

np.random.randint(1,100)

69

In [136]:
# 10 random integer

np.random.randint(1, 100, 10)

array([95, 21, 81, 96, 32, 40, 37, 92, 40, 33])

In [137]:
# evenly spaced length
np.linspace(1, 10, 5)

array([  1.  ,   3.25,   5.5 ,   7.75,  10.  ])

In [138]:
randarray = np.random.randint(1, 10, 10)
randarray

array([1, 7, 9, 4, 7, 2, 3, 5, 6, 4])

In [139]:
# Find max in the randarray
randarray.max()

9

In [140]:
# Find min in the randarray
randarray.min()

1

In [141]:
#  Find the index of the max element
randarray.argmax()

2

In [142]:
# Find the index of the min elemint

randarray.argmin()

0

In [143]:
# we are create a 1-d vector

one_d_vector = np.arange(9)

one_d_vector

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [144]:
# we can create a matrix
one_d_vector.reshape(3,3)

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [145]:
one_d_vector.dtype

dtype('int32')

In [146]:
type(one_d_vector)

numpy.ndarray

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Array Indexing:

</p>
Slice indexing:
Similiar to the use of slice indexing with lists and strings, we can use slice indexing to pull out sub-regions of ndarrays.



In [147]:
import numpy as np

arr = np.arange(0, 11)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [148]:
arr[8]

8

In [149]:
arr[0:5]

array([0, 1, 2, 3, 4])

In [150]:
arr[:6]

array([0, 1, 2, 3, 4, 5])

In [151]:
arr[5:]

array([ 5,  6,  7,  8,  9, 10])

In [152]:
arr[-1]

10

In [153]:
arr[:-1]

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [154]:
# copy array

copy_arr = arr.copy()

copy_arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [155]:

# Rank 2 array of shape (3, 4)
an_array = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34]])
print(an_array)

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


In [156]:
an_array[0]

array([11, 12, 13, 14])

In [157]:
an_array[0,2]

13

In [158]:
an_array[:2]

array([[11, 12, 13, 14],
       [21, 22, 23, 24]])

In [159]:
an_array[:1]

array([[11, 12, 13, 14]])

In [160]:
an_array[:2,1:]

array([[12, 13, 14],
       [22, 23, 24]])

<p style="font-family: Arial; font-size:1.3em;color:#2462C0; font-style:bold"><br>
Use array slicing to get a subarray consisting of the first 2 rows x 2 columns.
</p>

In [161]:
a_slice = an_array[:2, 1:3]
print(a_slice)

[[12 13]
 [22 23]]


<p style="font-family: Arial; font-size:1.3em;color:#2462C0; font-style:bold"><br>
When you modify a slice, you actually modify the underlying array.
</p>

In [162]:
print("Before:", an_array[0, 1])     # inspect the element at 0, 1
a_slice[0, 0] = 1000     # a_slice[0, 0] is the same piece of data as an_array[0,0]

print("After:", an_array[0,1])

Before: 12
After: 1000


In [163]:
print(an_array)

[[  11 1000   13   14]
 [  21   22   23   24]
 [  31   32   33   34]]


In [164]:
# we can create a copy of a portion of a matrix

b_slice = np.array(an_array[:2, 1:3])

print(b_slice)

[[1000   13]
 [  22   23]]


In [165]:
arr = np.arange(1, 11)

arr

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [166]:
bool_arr = arr > 5

bool_arr


array([False, False, False, False, False,  True,  True,  True,  True,  True], dtype=bool)

In [167]:
arr[bool_arr]

array([ 6,  7,  8,  9, 10])

In [168]:
arr[arr>5]

array([ 6,  7,  8,  9, 10])

In [169]:
arr[arr < 3]

array([1, 2])

In [170]:
# Excercise

arr_2d = np.arange(50).reshape(5, 10)

arr_2d

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])

In [171]:
arr_2d[1:3, 3:5]

array([[13, 14],
       [23, 24]])

## Numpy Operations

* **Array with Array**
* **Array with Scalers**
* **Universal Array Functions**

In [172]:
import numpy as np

In [173]:
a = np.arange(0,11)

a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [174]:
a + 100

array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110])

In [175]:
a * 100

array([   0,  100,  200,  300,  400,  500,  600,  700,  800,  900, 1000])

In [176]:
b = 2
# multplication (point to point)
print(a * b)

[ 0  2  4  6  8 10 12 14 16 18 20]


In [177]:
# square

a ** 2

array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100])

In [178]:
# square root
np.sqrt(a)

array([ 0.        ,  1.        ,  1.41421356,  1.73205081,  2.        ,
        2.23606798,  2.44948974,  2.64575131,  2.82842712,  3.        ,
        3.16227766])

In [179]:
 # exponent
np.exp(a)

array([  1.00000000e+00,   2.71828183e+00,   7.38905610e+00,
         2.00855369e+01,   5.45981500e+01,   1.48413159e+02,
         4.03428793e+02,   1.09663316e+03,   2.98095799e+03,
         8.10308393e+03,   2.20264658e+04])

In [180]:
a.dot(b)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20])

In [181]:
a + b

array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [182]:
a - b

array([-2, -1,  0,  1,  2,  3,  4,  5,  6,  7,  8])

In [183]:
a / b

array([ 0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5,  5. ])

In [185]:
np.sin(a)

array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ,
       -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849,
       -0.54402111])

In [187]:
from pylab import *

plot(np.sin(a))

[<matplotlib.lines.Line2D at 0x901d952f60>]

In [188]:
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [189]:
a * 5

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

<p style="font-family: Arial; font-size:2.75em;color:#2462C0; font-style:bold"><br>
Boolean Indexing:
</p>

<p style="font-family: Arial; font-size:1.75em;color:#2482A0; font-style:bold"><br>

Array Indexing for changing elements:

</p>

In [190]:
# create a 3x2 array
an_array = np.array([[11,12],[21,22],[31,32]])
print(an_array)

[[11 12]
 [21 22]
 [31 32]]


In [None]:
# create a filter which will be boolean values for whether each element meets this condition
a_filter = an_array > 15
print(a_filter)

In [None]:
# we can now select just those elements which meet that criteria
print(an_array[a_filter])

In [None]:
an_array[an_array > 15]

In [None]:
# Add 100 to all the even values.

an_array[an_array % 2 == 0] += 100
print(an_array)

<p style="font-family: Arial; font-size:2.75em;color:#2462C0; font-style:bold"><br>
Datatypes and Array Operations
</p>

<p style="font-family: Arial; font-size:1.75em;color:#2482A0; font-style:bold"><br>

Datatypes:

</p>

In [None]:
ex1 = np.array([11, 12])  # Python assigns the data type
print(ex1.dtype)

In [None]:
ex2 = np.array([11.0, 12.0])   # Python assings the data type
print(ex2.dtype)

In [None]:
ex3 = np.array([11, 21], dtype=np.int64)   # You can also specify Python the data type you wanna use
print(ex3.dtype)

In [None]:
# You can use this to force floats into integers (like floor function)
ex4 = np.array([11.1, 12.2], dtype=np.int64)
print(ex4.dtype)

print()

print(ex4)

In [None]:
# Similarly you can use this to force integers into floats.
ex5 = np.array([11, 21], dtype=np.float64)
print(ex5.dtype)

print()

print(ex5)

<p style="font-family: Arial; font-size:1.75em;color:#2482A0; font-style:bold"><br>
Statistical Methods, Sorting, and Set Operations:

</p>

In [None]:
# setup a random 2 x 4 matrix
arr = 10 * np.random.randn(2,5)
print(arr)

In [None]:
# compute the mean for all elements
print(arr.mean())

In [None]:
# compute the mean by row
print(arr.mean(axis = 1))

In [None]:
# compute the mean by column
print(arr.mean(axis = 0))

In [None]:
# sum all the elements
print(arr.sum())

In [None]:
# compute the median of row
print(np.median(arr, axis=1))

In [None]:
# compute the median of column
print(np.median(arr, axis=0))

<p style="font-family: Arial; font-size:1.75em;color:#2482A0; font-style:bold"><br>
Sorting:

</p>

In [None]:
# create a 10 element array of randoms
unsorted = np.random.randn(10)

print(unsorted)

In [None]:
# create copy and sort
sorted = np.copy(unsorted)
sorted.sort()

print(sorted)
print()
print(unsorted)


<p style="font-family: Arial; font-size:1.75em;color:#2482A0; font-style:bold"><br>
Finding Unique elements:

</p>

In [None]:
array = np.array([1,2,3,2,1,4,5,3,1,3,2])

# print unique numbers  (just like set function)
print(np.unique(array))

In [None]:
s1 = np.array(['desk', 'chair', 'bulb'])
s2 = np.array(['lamp', 'bulb', 'chair'])
print(s1, s2)

In [None]:
print(np.intersect1d(s1, s2))


In [None]:
print(np.union1d(s1, s2))

In [None]:
print(np.setdiff1d(s1,s2))       # elements in s1 that are not in s2

In [None]:
print(np.in1d(s1,s2))    # which element of s1 is also in s2

In [None]:
# setup a random 2 x 4 matrix
arr = 10 * np.random.randn(2,5)
print(arr)

In [None]:
np.transpose(arr)

<p style="font-family: Arial; font-size:1.75em;color:#2482A0; font-style:bold"><br>
Broadcasitng rules:

</p>

When operating on two arrays, Numpy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when

1) they are equal, or

2) one of them is 1"

In [None]:
import numpy as np

In [None]:
start = np.zeros((4,3))
print(start)

In [None]:
# create a rank 1 ndarray with 3 values
add_rows = np.array([1, 0, 2])
print(add_rows)

In [None]:
y = start + add_rows    # add to each row of 'start' using broadcasting
y

In [None]:
# create an ndarray which is 4x1 to broadcast across columns
add_cols = np.array([[0,1,2,3]])
add_cols = add_cols.T             # transpose

add_cols

In [None]:
# add to each column of 'start' using broadcating
y = start + add_cols
print(y)

In [None]:
# this will just broadcast in both directions

add_scaler = np.array([1])
print(start + add_scaler)

<p style="font-family: Arial; font-size:2.75em;color:#2482A0; font-style:bold"><br>
Speedtest: ndarrays vs lists

</p>

From setup parameters for the speed test. We'll be testing time to sum elements in an adarray vs. a list.


In [None]:
from numpy import arange
from timeit import Timer

size = 1000000 
timeits = 1000


In [None]:
# create the ndarray with values 0,1,2,...., size-1

nd_array = arange(size)
print(type(ndarray))