# Getting Started with ndarry

**ndarrays** are time and space-efficient multidimensional arrays at the core of numpy. Like the data structures in week 2, let's get started by creating ndarray using the numpy package



## How to create Rank 1 numpy arrays:

In [4]:
import numpy as np

an_array = np.array([3, 33, 333])  # Create a rank 1 array
print(type(an_array))           # The type of an ndarray is "<class 'numpy.ndarray'>"

<class 'numpy.ndarray'>


In [5]:
# test the shape of the array we just created, it should have just one dimension and in the dimension we have three elements
print(an_array.shape)

(3,)


In [6]:
# because this is a 1-rank array, we need only one index to access each element
print(an_array[0], an_array[1], an_array[2])

3 33 333


In [7]:
an_array[0] = 888  #ndarrays are mutable, here we are changing the contents of the first element
print(an_array)

[888  33 333]


## How to create a Rank 2 numpy array:

A ran 2 **ndarry** is one with two dimensions. Notice the format below of [[row],row]]. 2 dimensional arrays are great for representing matrices which are often useful in data science.

In [8]:
another = np.array([[11,12,13],[21,22,23]]) # create a rank 2 array
print(another) # print the array
print("The shape is 2 rows, 3 collumns: ", another.shape) #rows x columns
print("Accessing elements [0,0], [0,1], and [1,0] of the ndarray: ", another[0,0],another[0,1], another[1,0])

[[11 12 13]
 [21 22 23]]
The shape is 2 rows, 3 collumns:  (2, 3)
Accessing elements [0,0], [0,1], and [1,0] of the ndarray:  11 12 21


## There are many ways to create numpy arrays:

Here we create a number of different size arrays with different shapes and different pre-filled values. numpy has a number of built in methods which help us quickly and easily create multidimensional arrays.

In [9]:
import numpy as np

# create a 2x2 array of zeros
ex1 = np.zeros((2,2))
print(ex1)

[[ 0.  0.]
 [ 0.  0.]]


In [10]:
# create a 2x2 array filled with 9.0
ex2 = np.full((2,2), 9.0)
print(ex2)

[[ 9.  9.]
 [ 9.  9.]]


In [11]:
# a 2x2 matrix with the diagonal 1s and the others 0
ex3 = np.eye(2,2)
print(ex3)

[[ 1.  0.]
 [ 0.  1.]]


In [12]:
# create an array of ones
ex4 = np.ones((1,2))
print(ex4)

[[ 1.  1.]]


In [13]:
# notice that the above ndarray (ex4) is actually rank 2, it is a 2x1 array
print(ex4.shape)

# which means we need to use two indexes to access an element
print()
print(ex4[0,1])

(1, 2)

1.0


In [14]:
# create an array of random floats between 0 and 1
ex5 = np.random.random((2,2))
print(ex5)

[[ 0.71860024  0.59379467]
 [ 0.29039999  0.23506811]]


# Array Indexing


## Slice Indexing:

Similar to the use of slice indexing with lists and strings, we can use slice indexing to pull out sub-regions of ndarrys

In [15]:
import numpy as np

# Rank 2 array of shape (3, 4)
an_array = np.array([[11,12,13,14],[21,22,23,24],[31,32,33,34]])
print(an_array)

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


Use array slicing to get a subarray consisting of the first 2 rows X 2 columns

In [16]:
a_slice = np.array(an_array[:2, 1:3])
print(a_slice)

[[12 13]
 [22 23]]


When you modify a slice, you actually modify the underlying array.

In [17]:
print("Before: ", an_array[0, 1])  #inspect the element at 0, 1
a_slice[0, 0] = 1000    # a_slice[0, 0] is the same piece of data as an_array[0, 1]
print("After:", an_array[0, 1])

Before:  12
After: 12


## Use both integer indexing & slice indexing

We can use combinations of integer indexing and slice indexing to create different shaped matrices

In [18]:
# Create a Rank 2 array of shape (3, 4)
an_array = np.array([[11,12,13,14],[21,22,23,24],[31,32,33,34]])
print(an_array, an_array.shape)

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]] (3, 4)


In [19]:
# Using both integer and slicing generates an array of Lower rank
row_rank1 = an_array[1, :] # Rank 1 view

print(row_rank1, row_rank1.shape) # notice only a single []

[21 22 23 24] (4,)


In [20]:
# Slicing alone: generates an array of the same rank as the an_array
row_rank2 = an_array[1:2, :] # Rank 2 view

print(row_rank2, row_rank2.shape) # Notice the [[ ]]

[[21 22 23 24]] (1, 4)


In [21]:
# We can do the same thing for columns of an array:

print()
col_rank1 = an_array[:, 1]
col_rank2 = an_array[:, 1:2]
print(col_rank1, col_rank1.shape) # Rank 1
print()
print(col_rank2, col_rank2.shape) # Rank 2


[12 22 32] (3,)

[[12]
 [22]
 [32]] (3, 1)


In [22]:
c = np.ones((9,9,9))*np.arange(1,10) # Rank 3 array
print(c)

[[[ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]]

 [[ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]]

 [[ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  9.]
  [ 1.  2.  3.  4.  5.  6.  7.  8.  

## Array indexing for changing elements:

Sometimes it's useful to use an array of indexes to access or change elements.

In [32]:
# Create a new array
an_array = np.array([[11,12,13],[21,22,23],[31,32,33],[41,42,43]])

print('Original array: ')
print(an_array)

Original array: 
[[11 12 13]
 [21 22 23]
 [31 32 33]
 [41 42 43]]


In [33]:
# Create an array of indices
col_indices = np.array([0, 1, 2, 0])
print('\nCol indices picked : ', col_indices)

row_indices = np.arange(4)
print('\nRows indices picked: ', row_indices)


Col indices picked :  [0 1 2 0]

Rows indices picked:  [0 1 2 3]


In [34]:
#  Examine the pairings of row_indices and col_indices. These are the elements we 
for row,col in zip(row_indices,col_indices):
    print(row, ", ",col)

0 ,  0
1 ,  1
2 ,  2
3 ,  0


In [35]:
# Select one element from each row
print('Values in the array at those indices: ',an_array[row_indices, col_indices])

Values in the array at those indices:  [11 22 33 41]


In [36]:
# Change one element from each row using the indices selected
an_array[row_indices, col_indices] += 100000

print('\nChanged Array:')
print(an_array)


Changed Array:
[[100011     12     13]
 [    21 100022     23]
 [    31     32 100033]
 [100041     42     43]]


# Boolean Indexing

## Array indexing for changing elements:

In [37]:
# Create a 3x2 array
an_array = np.array([[11,12],[21,22],[31,32]])
print(an_array)

[[11 12]
 [21 22]
 [31 32]]


In [38]:
# create a filter which will be boolean values for whether each element meets this constraint
filter = (an_array > 15)
filter

array([[False, False],
       [ True,  True],
       [ True,  True]], dtype=bool)

Notice that the filter is the same size ndarray as an_array which is filled with True for each element corresponding in an_array which is greater than 15 and False for those elements less then 15.

In [39]:
# We can now select just those elements which meet that criteria
print(an_array[filter])

[21 22 31 32]


In [41]:
# for short we could have just used the approach without the need with the extra filter
an_array[an_array > 15]

array([21, 22, 31, 32])

What is partoocularly useful is that we can actually changle elements in the array applying a similiar logical filter. Let's add 100 to all even values

In [42]:
an_array[an_array % 2 == 0]  +=100
print(an_array)

[[ 11 112]
 [ 21 122]
 [ 31 132]]


# Datatypes and Array Operations

## Datatypes:

In [47]:
ex1 = np.array([11, 12]) # Python assigns the data type
print(ex1.dtype)

int32


In [48]:
ex2 = np.array([11.0, 12.0]) # Python assigns the data type
print(ex2.dtype)

float64


In [49]:
ex3 = np.array([11, 21], dtype=np.int64) # You can tell python the data type of the arrray
print(ex3.dtype)

int64


In [50]:
# you can use this to force floats into integers (using floor function)
ex4 = np.array([11.1,12.7], dtype=np.int64)
print(ex4.dtype)
print()
print(ex4)

int64

[11 12]


In [51]:
# you can use this to force intergers into floats if you anticipate
# the values may change to floats later
ex5 = np.array([11, 21], dtype=np.float64)
print(ex5.dtype)
print()
print(ex5)


float64

[ 11.  21.]


## Arithmetic Array Operations:

In [52]:
x = np.array([[111,112],[121,122]],dtype=np.int)
y = np.array([[211.1,212.1],[221.1,222.1]],dtype=np.float64)

print(x)
print()
print(y)

[[111 112]
 [121 122]]

[[ 211.1  212.1]
 [ 221.1  222.1]]


In [53]:
# add
print(x + y)     # The plus sign works
print()
print(np.add(x, y)) # so does the numpy function "add"

[[ 322.1  324.1]
 [ 342.1  344.1]]

[[ 322.1  324.1]
 [ 342.1  344.1]]


In [54]:
# subtract
print(x - y)
print()
print(np.subtract(x,y))

[[-100.1 -100.1]
 [-100.1 -100.1]]

[[-100.1 -100.1]
 [-100.1 -100.1]]


In [55]:
# multiply
print(x * y)
print()
print(np.multiply(x, y))

[[ 23432.1  23755.2]
 [ 26753.1  27096.2]]

[[ 23432.1  23755.2]
 [ 26753.1  27096.2]]


In [56]:
# divide
print(x / y)
print()
print(np.divide(x, y))

[[ 0.52581715  0.52805281]
 [ 0.54726368  0.54930212]]

[[ 0.52581715  0.52805281]
 [ 0.54726368  0.54930212]]


In [57]:
# square root
print(np.sqrt(x))

[[ 10.53565375  10.58300524]
 [ 11.          11.04536102]]


In [58]:
# exponent (e ^ x)
print(np.exp(x))

[[  1.60948707e+48   4.37503945e+48]
 [  3.54513118e+52   9.63666567e+52]]


# Statistical Methods, Sorting and Set Operrations

<p styles = "font-family: Arial; font-size:1.75ed;color: #2452c0; font-style:bold">
<br>

Basic Statistical Operations:
</p>

In [59]:
# setup a random 2 x 4 matrix
arr = 10 * np.random.randn(2,5)
print(arr)

[[ 2.27838949 -7.34522749  3.39671968 -3.42044853  6.24802903]
 [ 5.7251834  -1.21909519  8.32016991  2.49984037 -2.68885758]]


In [60]:
# compute the mean for all the elements
print(arr.mean())

1.37947030879


In [61]:
# compute the means by row
print(arr.mean(axis = 1))

[ 0.23149244  2.52744818]


In [62]:
# compute the means by column
print(arr.mean(axis = 0))

[ 4.00178645 -4.28216134  5.85844479 -0.46030408  1.77958572]


In [63]:
# sum all the elements
print(arr.sum())

13.7947030879


In [67]:
# compute
print(np.median(arr, axis = 1))

[ 2.27838949  2.49984037]


## Sorting:

In [68]:
# Create a 10 element array of randoms
unsorted = np.random.randn(10)

print(unsorted)

[ 2.09355953 -1.2568568   0.36813358  1.18664144 -0.71973361  1.21494461
 -0.23622491  0.05693187 -0.05184291 -0.02928642]


In [69]:
# create copy and sort
sorted = np.array(unsorted)
sorted.sort()

print(sorted)
print()
print(unsorted)

[-1.2568568  -0.71973361 -0.23622491 -0.05184291 -0.02928642  0.05693187
  0.36813358  1.18664144  1.21494461  2.09355953]

[ 2.09355953 -1.2568568   0.36813358  1.18664144 -0.71973361  1.21494461
 -0.23622491  0.05693187 -0.05184291 -0.02928642]


In [70]:
# inplace sorting
unsorted.sort()

print(unsorted)

[-1.2568568  -0.71973361 -0.23622491 -0.05184291 -0.02928642  0.05693187
  0.36813358  1.18664144  1.21494461  2.09355953]


## Finding unique elements:

In [72]:
array = np.array([1,2,1,4,2,1,4,2])

print(np.unique(array))

[1 2 4]


## Set operations with np.array data type:

In [73]:
s1 = np.array(['desk','chair','bulb'])
s2 = np.array(['lamp','bulb', 'chair'])
print(s1, s2)

['desk' 'chair' 'bulb'] ['lamp' 'bulb' 'chair']


In [74]:
print( np.intersect1d(s1, s2) )

['bulb' 'chair']


In [75]:
print( np.union1d(s1, s2) )

['bulb' 'chair' 'desk' 'lamp']


In [76]:
print( np.setdiff1d(s1, s2)) # elements in s1 that are not in s2

['desk']


In [77]:
print( np.in1d(s1, s2) ) # which elements in s1 is also in s2

[False  True  True]


# Broadcasting:

Introduction to broadcasting.
For more details please see:
https://docs.scipy.org/doc/numpy-1.10.1/user/basics.broadcasting.html

In [1]:
import numpy as np

start = np.zeros((4,3))
print(start)

[[ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]]


In [2]:
# create a rank one ndarray with 3 values
add_rows = np.array([1, 0, 2])
print(add_rows)

[1 0 2]


In [3]:
y = start + add_rows # add each row of 'start' using broadcasting
print(y)

[[ 1.  0.  2.]
 [ 1.  0.  2.]
 [ 1.  0.  2.]
 [ 1.  0.  2.]]


In [5]:
# create an ndarray which is 4 x 1 to broadcast across columns
add_cols = np.array([[0,1,2,3]])
add_cols = add_cols.T

print(add_cols)

[[0]
 [1]
 [2]
 [3]]


In [6]:
# add to each column of 'start using broadcasting
y = start + add_cols
print(y)

[[ 0.  0.  0.]
 [ 1.  1.  1.]
 [ 2.  2.  2.]
 [ 3.  3.  3.]]


In [9]:
# this will just broadcast in both dimensions
add_scalar = np.array([1])
print(start+add_scalar)

[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]


In [8]:
# create our 3 x 4 matrix
arrA = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
print(arrA)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


# Speedtest: ndarrays vs lists

First setup paramaters for the speed test. We'll be testing time to sum elements in an ndarray versus a list.

In [1]:
from numpy import arange
from timeit import Timer

size = 1000000
timeits = 1000

In [2]:
# create the ndarray with values 0,1,2,...,size-1
nd_array = arange(size)
print( type(nd_array)  )


<class 'numpy.ndarray'>


In [3]:
# timer expectts the operaation as a parameter,
# here we pass nd_array.sum()
timer_numpy = Timer("nd_array.sum()", "from __main__ import nd_array")

print("Time taken by numpy ndarray: %f seconds" %
     (timer_numpy.timeit(timeits)/timeits))

Time taken by numpy ndarray: 0.000689 seconds


In [4]:
# create the list with values 0,1,2,...,size-1
a_list = list(range(size))
print( type(a_list) )

<class 'list'>


In [5]:
# timer expects the operation as a parameter, here we pass sum(a_list)
timer_list = Timer("sum(a_list)", "from __main__ import a_list")

print("Time taken by list: %f seconds" %
     (timer_list.timeit(timeits)/timeits))

Time taken by list: 0.066778 seconds
