# NumPy Indexing and Selection

In this lecture we will discuss how to select elements or groups of elements from an array.

### Numpy is more memory efficient then list

http://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/
<img src="array_vs_list.png">

In [5]:
import numpy as np
import sys
import time

In [9]:
l = range(1000)
print(sys.getsizeof(5)*len(l))
print(sys.getsizeof(1)*len(l))

28000
28000


In [11]:
array = np.arange(1000)
print(array.size*array.itemsize)

8000


### Numpy is fast

In [47]:
SIZE = 10000000

In [48]:
l1 = range(SIZE)
l2 = range(SIZE)

In [49]:
a1 = np.arange(SIZE)
a2 = np.arange(SIZE)

In [50]:
start = time.time()
result=[(x+y) for x,y in zip(l1,l2)]
print("python list took ", (time.time() - start) *1000)

python list took  1271.744966506958


In [51]:
start = time.time()
result= a1 + a2
print("numpy took ", (time.time() - start) *1000)

numpy took  267.4691677093506


In [52]:
a = np.array([5,6,9])

In [53]:
a[0]

5

In [54]:
a[1]

6

In [55]:
a= np.array([[1,2],[3,4],[5,6]])

In [56]:
a

array([[1, 2],
       [3, 4],
       [5, 6]])

In [57]:
a.ndim

2

In [58]:
a=np.array([5,6,9])

In [59]:
a

array([5, 6, 9])

In [60]:
a.ndim

1

In [61]:
a.itemsize

8

In [62]:
a.dtype

dtype('int64')

In [63]:
a= np.array([[1,2],[3,4],[5,6]],dtype=np.int32)

In [65]:
a.itemsize

4

In [64]:
a.dtype

dtype('int32')

In [66]:
a= np.array([[1,2],[3,4],[5,6]],dtype=np.float64)

In [67]:
a.itemsize

8

In [68]:
a.size

6

In [69]:
a

array([[ 1.,  2.],
       [ 3.,  4.],
       [ 5.,  6.]])

In [70]:
a.shape

(3, 2)

In [71]:
a= np.array([[1,2],[3,4],[5,6]],dtype=complex)

In [72]:
a

array([[ 1.+0.j,  2.+0.j],
       [ 3.+0.j,  4.+0.j],
       [ 5.+0.j,  6.+0.j]])

In [73]:
np.zeros((3,4))

array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])

In [74]:
np.ones((3,4))

array([[ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.]])

In [75]:
np.arange(1,5)

array([1, 2, 3, 4])

In [76]:
np.arange(1,5,2)

array([1, 3])

In [77]:
np.linspace(1,5,10)

array([ 1.        ,  1.44444444,  1.88888889,  2.33333333,  2.77777778,
        3.22222222,  3.66666667,  4.11111111,  4.55555556,  5.        ])

In [78]:
np.linspace(1,5,5)

array([ 1.,  2.,  3.,  4.,  5.])

In [79]:
a= np.array([[1,2],[3,4],[5,6]],dtype=np.float64)

In [80]:
a

array([[ 1.,  2.],
       [ 3.,  4.],
       [ 5.,  6.]])

In [81]:
a.shape

(3, 2)

In [82]:
a.reshape(2,3)

array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

In [83]:
a.ravel()

array([ 1.,  2.,  3.,  4.,  5.,  6.])

In [84]:
a

array([[ 1.,  2.],
       [ 3.,  4.],
       [ 5.,  6.]])

### zip function

In [35]:
ll1 =[1,2,3]
ll2 =['a','b','c']

In [36]:
ll3 = zip(ll1,ll2)

In [37]:
for i in ll3:
    print(i,type(i))
    

(1, 'a') <class 'tuple'>
(2, 'b') <class 'tuple'>
(3, 'c') <class 'tuple'>


In [38]:
list(zip(ll1,ll2))

[(1, 'a'), (2, 'b'), (3, 'c')]

In [85]:
a.min()

1.0

In [86]:
a.max()

6.0

In [87]:
a.sum()

21.0

##### axis=0 is column

In [88]:
 a.sum(axis=0)

array([  9.,  12.])

In [89]:
a

array([[ 1.,  2.],
       [ 3.,  4.],
       [ 5.,  6.]])

##### axis=1 is row

In [91]:
 a.sum(axis=1)

array([  3.,   7.,  11.])

In [93]:
np.sqrt(a)

array([[ 1.        ,  1.41421356],
       [ 1.73205081,  2.        ],
       [ 2.23606798,  2.44948974]])

In [94]:
np.std(a)

1.707825127659933

In [97]:
a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])

In [98]:
a

array([[1, 2],
       [3, 4]])

In [99]:
b

array([[5, 6],
       [7, 8]])

In [100]:
a+b

array([[ 6,  8],
       [10, 12]])

In [101]:
a*b

array([[ 5, 12],
       [21, 32]])

In [102]:
a.dot(b)

array([[19, 22],
       [43, 50]])

###  numpy 
-  indexing, slicing
-  iterating through arrays
-  stacking together two arrays
-  indexing with boolean arrays


# BreakOff

In [1]:
import numpy as np

In [2]:
#Creating sample array
arr = np.arange(0,11)

In [3]:
#Show
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

## Bracket Indexing and Selection
The simplest way to pick one or some elements of an array looks very similar to python lists:

In [5]:
#Get a value at an index
arr[8]

8

In [6]:
#Get values in a range
arr[1:5]

array([1, 2, 3, 4])

In [7]:
#Get values in a range
arr[0:5]

array([0, 1, 2, 3, 4])

## Broadcasting

Numpy arrays differ from a normal Python list because of their ability to broadcast:

In [8]:
#Setting a value with index range (Broadcasting)
arr[0:5]=100

#Show
arr

array([100, 100, 100, 100, 100,   5,   6,   7,   8,   9,  10])

In [9]:
# Reset array, we'll see why I had to reset in  a moment
arr = np.arange(0,11)

#Show
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [10]:
#Important notes on Slices
slice_of_arr = arr[0:6]

#Show slice
slice_of_arr

array([0, 1, 2, 3, 4, 5])

In [11]:
#Change Slice
slice_of_arr[:]=99

#Show Slice again
slice_of_arr

array([99, 99, 99, 99, 99, 99])

Now note the changes also occur in our original array!

In [12]:
arr

array([99, 99, 99, 99, 99, 99,  6,  7,  8,  9, 10])

Data is not copied, it's a view of the original array! This avoids memory problems!

In [13]:
#To get a copy, need to be explicit
arr_copy = arr.copy()

arr_copy

array([99, 99, 99, 99, 99, 99,  6,  7,  8,  9, 10])

## Indexing a 2D array (matrices)

The general format is **arr_2d[row][col]** or **arr_2d[row,col]**. I recommend usually using the comma notation for clarity.

In [14]:
arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))

#Show
arr_2d

array([[ 5, 10, 15],
       [20, 25, 30],
       [35, 40, 45]])

In [15]:
#Indexing row
arr_2d[1]


array([20, 25, 30])

In [16]:
# Format is arr_2d[row][col] or arr_2d[row,col]

# Getting individual element value
arr_2d[1][0]

20

In [17]:
# Getting individual element value
arr_2d[1,0]

20

In [18]:
# 2D array slicing

#Shape (2,2) from top right corner
arr_2d[:2,1:]

array([[10, 15],
       [25, 30]])

In [19]:
#Shape bottom row
arr_2d[2]

array([35, 40, 45])

In [20]:
#Shape bottom row
arr_2d[2,:]

array([35, 40, 45])

### Fancy Indexing

Fancy indexing allows you to select entire rows or columns out of order,to show this, let's quickly build out a numpy array:

In [21]:
#Set up matrix
arr2d = np.zeros((10,10))

In [22]:
#Length of array
arr_length = arr2d.shape[1]

In [23]:
#Set up array

for i in range(arr_length):
    arr2d[i] = i
    
arr2d

array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.],
       [ 5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.],
       [ 6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.],
       [ 7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.],
       [ 8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.],
       [ 9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.]])

Fancy indexing allows the following

In [24]:
arr2d[[2,4,6,8]]

array([[ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.],
       [ 4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.],
       [ 6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.],
       [ 8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.]])

In [25]:
#Allows in any order
arr2d[[6,4,2,7]]

array([[ 6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.],
       [ 4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.],
       [ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.],
       [ 7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.]])

## More Indexing Help
Indexing a 2d matrix can be a bit confusing at first, especially when you start to add in step size. Try google image searching NumPy indexing to fins useful images, like this one:

<img src= 'http://memory.osu.edu/classes/python/_images/numpy_indexing.png' width=500/>

## Selection

Let's briefly go over how to use brackets for selection based off of comparison operators.

In [28]:
arr = np.arange(1,11)
arr

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [30]:
arr > 4

array([False, False, False, False,  True,  True,  True,  True,  True,  True], dtype=bool)

In [31]:
bool_arr = arr>4

In [32]:
bool_arr

array([False, False, False, False,  True,  True,  True,  True,  True,  True], dtype=bool)

In [33]:
arr[bool_arr]

array([ 5,  6,  7,  8,  9, 10])

In [34]:
arr[arr>2]

array([ 3,  4,  5,  6,  7,  8,  9, 10])

In [37]:
x = 2
arr[arr>x]

array([ 3,  4,  5,  6,  7,  8,  9, 10])

# Great Job!
