# Getting started with ndarray

**ndarrays** are time space-efficient multidimensional arrays at the core of numpy. Like the data structure in week 2, let's get started by creating ndarrays using the numpy package.

## How to create Rank 1 numpy arrays:

In [1]:
import numpy as np

an_array = np.array([3, 33, 333])  # Create a Rank 1 array

print(type(an_array))

# The type of an array is: "<class 'numpy.ndarray'>"

<class 'numpy.ndarray'>


In [2]:
# test the shape of the array we just created, it should have one dimmension 

print(an_array.shape)

(3,)


In [5]:
# because this is a 1-rank array, we need only use one index to access each element

print(an_array[0], an_array[1], an_array[2])

3 33 333


In [8]:
an_array[0] = 888

print(an_array)

[888  33 333]


# How to create a Rank 2 numpy array

A rank 2 **ndarray** is one with two dimensions. Notice the format below of [ [row], [row] ]. 2 dimensionl arrays are great for representing matrics which are often useful in data science.

In [11]:
another = np.array([[11, 12, 13], [21, 22, 23]])      # Create a rank 2 array

print(another)   # print the array

print("The shape is 2 rows, 3 columns:   ", another.shape)   # rows * columns

print("Accessing elements [0, 0], [0, 1], [1, 0] of the array: ", another[0, 0], another[0, 1], another[1, 0])


[[11 12 13]
 [21 22 23]]
The shape is 2 rows, 3 columns:    (2, 3)
Accessing elements [0, 0], [0, 1], [1, 0] of the array:  11 12 21


# There are many ways to create numpy array

Here we create a number of different size arrays with different shapes and different pre-filled values. numpy has a number of built in methods which help us quickly and easily create multidimensional arrays.

In [12]:
import numpy as np

# create a 2*2 array of zeros

ex1 = np.zeros((2, 2))

print(ex1)


[[0. 0.]
 [0. 0.]]


In [13]:
# create a 2*2 array filled with 9.0

ex2 = np.full((2 , 2), 9.0)

print(ex2)

[[9. 9.]
 [9. 9.]]


In [14]:
# create a 2*2 matrics with the diagonal 1s and others 0

ex3 = np.eye(2, 2)

print(ex3)

[[1. 0.]
 [0. 1.]]


In [17]:
# create an array of ones

ex4 = np.ones((1, 2))

print(ex4)

[[1. 1.]]


In [19]:
# notice that the above ndarray ex4 is actually rank 2, it is a 2*1 array

print(ex4.shape)

# which means we need to use two indexes to access an element

print(ex4[0, 1])

(1, 2)
1.0


In [21]:
# create an array of random floats between 0 and 1

ex5 = np.random.random((2, 2))

print(ex5)

[[0.30209035 0.59783729]
 [0.24629943 0.91258098]]


In [22]:
print(np.array([[11,12,13],[21,22,23]]))

[[11 12 13]
 [21 22 23]]


In [30]:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

arr[ : 2, ]

print(arr[ : 2, ])

[[1 2 3]
 [4 5 6]]


# Array Indexing

## Slice indexing:

Similar to the use of slice indexing with lists and strings, we can use slice indexing to pull out sub-regions of ndarrays.

In [32]:
# Rank 2 array of shape (3, 4)

an_array = np.array([[11, 12, 13, 14], [21, 22, 23, 24], [31,32, 33, 34]])

print(an_array)

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


Use array slicing to get a subarray consisting of the first 2 rows * 2 columns.

In [7]:
import numpy as np

an_array = np.array([[11, 12, 13, 14], [21, 22, 23, 24], [31,32, 33, 34]])

print(an_array)


[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


Use array slicing to get a subarray consisting of the first 2 rows * 2 columns

In [8]:
a_slice = an_array[ : 2, 1: 3]

print(a_slice)

[[12 13]
 [22 23]]


We want to make a copy of an_array

In [6]:
a_slice = np.array(an_array[ : 2, 1: 3])

print(a_slice)

[[12 13]
 [22 23]]


When we modify a slice, you actually modify the underlying array.

In [9]:
print("Before: ", an_array[0, 1])  # inspect the element at 0, 1

Before:  12


In [10]:
a_slice[0, 0] = 1000   # a_slice[0, 0] is the same piece of data as an_array[0, 1]

print(a_slice)

print("After: ", an_array[0, 1])

[[1000   13]
 [  22   23]]
After:  1000


# Use both integer indexing and slice indexing

We can use combinations of integer indexing and slice indexing to create different shaped matrics.

In [11]:
# Create a Rank 2 array of shape (3, 4)

an_array = np.array([[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]])

print(an_array)

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


In [13]:
# Using both integer indexing and slicing generates an array of lower rank

row_rank1 = an_array[1, : ]  # Rank 1 view

print(row_rank1, row_rank1.shape)  # notice only a single []

print(row_rank1.ndim)

[21 22 23 24] (4,)
1


In [14]:
# Slicing alone: generates an array of the same rank as the an_array

row_rank2 = an_array[1: 2, : ]   # Rank 2 view

print(row_rank2, row_rank2.shape)  # notice the [[]]

print(row_rank2.ndim)

[[21 22 23 24]] (1, 4)
2


In [18]:
# We can do the same thing for columns of an array

col_rank1 = an_array[ : , 1]

col_rank2 = an_array[ : , 1: 2]

print(col_rank1, col_rank1.shape)  # Rank 1

print(col_rank1.ndim)

print(col_rank2, col_rank2.shape) # Rank 2

print(col_rank2.ndim)

[12 22 32] (3,)
1
[[12]
 [22]
 [32]] (3, 1)
2


# Array indexing for chinging elements

Sometimes it's useful to use an array of indexes to access or change elements.

In [19]:
# Create a new array

an_array = np.array([[11, 12, 13], [21, 22, 23], [31, 32, 33], [41, 42, 43]])

print("Original Array: ")

print(an_array)

Original Array: 
[[11 12 13]
 [21 22 23]
 [31 32 33]
 [41 42 43]]


In [23]:
# Create an array of indicies

col_indicies= np.array([0, 1, 2, 0])

print("\nCol indicies picked: ", col_indicies) 

row_indicies = np.arange(4)

print("\nRows indicies picked: ", row_indicies)


Col indicies picked:  [0 1 2 0]

Rows indicies picked:  [0 1 2 3]


In [24]:
# Examine the pairings of row_indicies and col_indicies

for row, col in zip(row_indicies, col_indicies):
    print(row, ",", col)

0 , 0
1 , 1
2 , 2
3 , 0


In [25]:
# Select one element from each row

print("Values in the array at those indicies: ", an_array[row_indicies, col_indicies])

Values in the array at those indicies:  [11 22 33 41]


In [26]:
# Change one element from each row using the indicies selected

an_array[row_indicies, col_indicies] += 1000      # add 1000 to an_array for our row and column indicies
 
print("\nChanged Array: ")
print(an_array)


Changed Array: 
[[1011   12   13]
 [  21 1022   23]
 [  31   32 1033]
 [1041   42   43]]


# Boolean Indexing

## Array Indexing for changing elements

In [61]:
# Create a 3 * 2 array

an_array = np.array([[11, 12], [21, 22], [31, 32]])

print(an_array)

[[11 12]
 [21 22]
 [31 32]]


In [62]:
# Create a filter which will be boolean values for whether each element meets this condtional

filter = (an_array > 15)  # Here I'm asking for a filter which will be true for every element greater than 15, and false for every element less than 15.

print(filter.dtype)

filter


bool


array([[False, False],
       [ True,  True],
       [ True,  True]])

### Notice that the filter is a same size ndarray as an_array which is filled with True for each element whose corresponding element in an_array which is greater than 15, and False for those elements whose value is less than 15.

In [63]:
# we can now select just those elements which meet that criteria.

print(an_array[filter])

[21 22 31 32]


In [64]:
# For short, we could have just used the approach below

an_array[an_array > 15]  # Here I'm asking for for all the values in an_array where an_array is greater than 15. This creates a boolean filter and then applies it all in one step


array([21, 22, 31, 32])

In [65]:
# We can have even more complex logic here

an_array[(an_array > 20) & (an_array < 30)]  # We want to get all the values between 20 and 30

array([21, 22])

In [66]:
an_array[(an_array % 2 == 0)]

array([12, 22, 32])

###  What is particularly useful is that we can change elements in the array applying a similar logical filter. Let's add 100 to all even values

In [67]:
an_array[(an_array % 2 == 0)] += 100

print(an_array)

[[ 11 112]
 [ 21 122]
 [ 31 132]]


# Datatypes and  Array Operations

### Datatypes

In [2]:
import numpy as np

ex1 = np.array([11, 12])  # Python assigns the data type

print(ex1.dtype)

int32


In [3]:
ex2 = np.array([11.0, 12.0])  # Python assigns the data type

print(ex2.dtype)

float64


In [4]:
ex3 = np.array([11, 21], dtype = np.int64)  # We can also tell python the data type

print(ex3.dtype)

int64


In [10]:
# We can use this to force floats into intgers (using floor function)

ex4 = np.array([11.1, 12.7], dtype = np.int64)

print(ex4.dtype)
print()
print(ex4)

int64

[11 12]


In [9]:
# We can use this to force intgers into floats if we inticipate the values may change to floats later

ex5 = np.array([11, 21], dtype = np.float64)

print(ex5.dtype)
print()
print(ex5)

float64

[11. 21.]


# Arithmetic Array Operations

In [13]:
x = np.array([[111, 112], [121, 122]], dtype = np.int64)

y = np.array([[211.1, 212.1], [221.1, 222.1]], dtype = np.float64)

print(x)

print()

print(y)

[[111 112]
 [121 122]]

[[211.1 212.1]
 [221.1 222.1]]


In [14]:
# add

print(x + y)

print()

print(np.add(x, y))

[[322.1 324.1]
 [342.1 344.1]]

[[322.1 324.1]
 [342.1 344.1]]


In [15]:
# subtract

print(x - y)

print()

print(np.subtract(x, y))

[[-100.1 -100.1]
 [-100.1 -100.1]]

[[-100.1 -100.1]
 [-100.1 -100.1]]


In [16]:
# multiply 

print(x * y)

print()

print(np.multiply(x, y))

[[23432.1 23755.2]
 [26753.1 27096.2]]

[[23432.1 23755.2]
 [26753.1 27096.2]]


In [17]:
# divide

print(x / y)

print()

print(np.divide(x, y))

[[0.52581715 0.52805281]
 [0.54726368 0.54930212]]

[[0.52581715 0.52805281]
 [0.54726368 0.54930212]]


In [20]:
# square root

print(np.sqrt(x))

print()

print(np.sqrt(y))

[[10.53565375 10.58300524]
 [11.         11.04536102]]

[[3.24586718 3.25315312]
 [3.31662479 3.32345619]]


In [26]:
# exponent (e ** x)

print(np.exp(x))

[[1.60948707e+48 4.37503945e+48]
 [3.54513118e+52 9.63666567e+52]]


# Statistical Methods, Sorting, and Set Operations

In [30]:
# setup a random 2 * 4 matrix

arr = 10 * np.random.randn(2, 5)

print(arr)

[[  4.69186257 -13.6243874   -3.94844112  -3.15324258  -0.88305603]
 [ -6.72271469 -12.18968922 -13.18584077   5.80105329  -2.84868108]]


In [31]:
# compute the mean for all elements

print(arr.mean())

-4.606313702272989


In [32]:
# compute the means by row

print(arr.mean(axis = 1))

[-3.38345291 -5.82917449]


In [33]:
# compute the means by column

print(arr.mean(axis = 0))

[ -1.01542606 -12.90703831  -8.56714094   1.32390536  -1.86586855]


In [34]:
# sum all the elements

print(arr.sum())

-46.06313702272989


In [36]:
# compute the medians

print(np.median(arr, axis = 1))

[-3.15324258 -6.72271469]


 # Sorting

In [37]:
# Create a 10 elements of randoms

unsorted = np.random.randn(10)

print(unsorted)

[ 0.21804786  0.63052725  0.26969099  0.46615905  0.92095434  1.01610562
 -0.54994331  0.95202477 -0.05781189 -0.54027888]


In [40]:
# Create copy and sort

sorted = np.array(unsorted)

sorted.sort()

print(sorted)

print()

print(unsorted)

[-0.54994331 -0.54027888 -0.05781189  0.21804786  0.26969099  0.46615905
  0.63052725  0.92095434  0.95202477  1.01610562]

[ 0.21804786  0.63052725  0.26969099  0.46615905  0.92095434  1.01610562
 -0.54994331  0.95202477 -0.05781189 -0.54027888]


In [41]:
# inplace sorting

unsorted.sort()

print(unsorted)

[-0.54994331 -0.54027888 -0.05781189  0.21804786  0.26969099  0.46615905
  0.63052725  0.92095434  0.95202477  1.01610562]


# Finding Unique elements

In [42]:
array = np.array([1, 2, 1, 4, 2, 1, 4, 2])

print(np.unique(array))

[1 2 4]


# Set Operations with np.array data type

In [43]:
s1 = np.array(['desk', 'chair', 'bulb'])

s2 = np.array(['lamp', 'bulb', 'chair'])

print(s1, s2)

['desk' 'chair' 'bulb'] ['lamp' 'bulb' 'chair']


In [44]:
print(np.intersect1d(s1, s2))    # The method intersect will give us those elements which are common to both arrays.

['bulb' 'chair']


In [45]:
print(np.union1d(s1, s2))   # The method union will give us all of the unique elements across both arrays

['bulb' 'chair' 'desk' 'lamp']


In [46]:
print(np.setdiff1d(s1, s2))   # elements in s1 that are not in s2

['desk']


In [47]:
print(np.in1d(s1, s2))    #we can get back an array of Booleans for whether each element is in the array or not.  

                          # which element of s1 is also in s2

[False  True  True]


# Broadcasting

### Form more details, please see:

http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

In [2]:
import numpy as np

start = np.zeros((4, 3))

print(start)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [3]:
# Create a rank 1 ndarray with 3 values

add_rows = np.array([1, 0, 2])

print(add_rows)

[1 0 2]


In [4]:
y = start + add_rows  # add the array 'add_rows' to each row of 'start' array using broadcasting

print(y)

[[1. 0. 2.]
 [1. 0. 2.]
 [1. 0. 2.]
 [1. 0. 2.]]


In [5]:
# Create an ndarray which is  4 * 1  to broadcast across columns

add_cols = np.array([[0, 1, 2, 3]])

add_cols = add_cols.T

print(add_cols)

[[0]
 [1]
 [2]
 [3]]


In [6]:
# add the array 'add_cols' to each column of 'start' array using broadcasting

y = start + add_cols

print(y)

[[0. 0. 0.]
 [1. 1. 1.]
 [2. 2. 2.]
 [3. 3. 3.]]


In [7]:
# this will just broadcast in both dimentions

add_scaler = np.array([1])

print(start + add_scaler)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


In [8]:
a = np.array([[0,0],[0,0]])

print(a)

[[0 0]
 [0 0]]


In [9]:
b1 = np.array([1,1])

print(b1)

[1 1]


In [11]:
b2 = 1

print(b2)

1


In [12]:
x = a + b1

print(x)

[[1 1]
 [1 1]]


In [13]:
y = a + b2 

print(y)

[[1 1]
 [1 1]]


# Speedtest: ndarrays vs lists

### First setup parameters for the speed test. We 'll be testing time to sum elements in an ndarray versus a list

In [45]:
from numpy import arange
from timeit import Timer

size = 1000000

timeit = 1000

In [46]:
# Create the ndarray with values 0, 1, 2...,size -1

nd_array = arange(size)

print(nd_array)
print()
print(type(nd_array))

[     0      1      2 ... 999997 999998 999999]

<class 'numpy.ndarray'>


In [48]:
# timer expects the operation as a parameters 
# here we pass nd_array.sum()

timer_numpy = Timer("nd_array.sum()", "from __main__ import nd_array")


print("Time taken by numpy ndarray: %f  seconds" %
     (timer_numpy.timeit(timeit)/timeit))


Time taken by numpy ndarray: 0.000400  seconds


In [49]:
# another way to compute time

import time

start = time.time()


nd_array = arange(size)

print(nd_array)


print("\nTime taken by numpy ndarray %s seconds." % (time.time() - start))

[     0      1      2 ... 999997 999998 999999]

Time taken by numpy ndarray 0.024472951889038086 seconds.


In [54]:
# Create the list with values 0, 1, 2, 3.....,size-1

a_list = list(range(size))


print(type(a_list))


<class 'list'>


In [55]:
# timer expect the operation as a parameter, here we pass sum(a_list)

timer_list = Timer("sum(a_list)", "from __main__ import a_list")

print("Time taken by list: %f seconds" %
      (timer_list.timeit(timeit)/timeit))

Time taken by list: 0.020777 seconds


In [56]:
import time

start = time.time()

a_list = list(range(size))

print("\nTime taken by list %s seconds." % (time.time() - start))


Time taken by list 0.03092217445373535 seconds.
