Name: Numpy arrays | Author: Alf Köhn | Version 03.11.2018 | License: CC BY-SA 4.0
***
# NumPy arrays

This document is intended to provide a short overview about NumPy arrays and what can be done with them. The motivation to write this document was to have something for myself where I can look certain things up intead of always having to ask Dr. Google. Hence, the examples shown are simply due to personal preference and what I consider as important. If you are looking for a detailed and thorough introduction with some background information, then there are tons of places out there in the internet waiting for you. There even exists some old school books. A few recommendations are:

* Jake VanderPlas, *Python Data Science Handbook: Essential Tools for Working with Data* (2016, O'Reilly Media)
* https://www.tutorialspoint.com/numpy/numpy_array_creation_routines.htm
* The [official SciPy docs](https://docs.scipy.org/doc/numpy/reference/) are very useful, in particular the part about [array creation](https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html) and [array manipulation](https://docs.scipy.org/doc/numpy/reference/routines.array-manipulation.html)

Back to my brief overview. It is organized as follows:

1. [Create arrays](#1.-Create-arrays)
2. [Array attributes](#2.-Array-attributes)
3. [Array indexing (access elements)](#3.-Array-indexing)
4. [Array slicing (access subarrays)](#4.-Array-slicing)
5. [Reshape arrays](#5.-Reshape-arrays)
6. [Concatenate arrays](#6.-Concatenate-arrays)
7. [Split arrays](#7.-Split-arrays)
8. [Array methods](#8.-Array-methods)
9. [Search arrays](#9.-Search-arrays)

A few things to note:
1. In a NumPy array, all elements have to be of the same datatype  (in contrast to python lists)
2. NumPy arrays are more memory efficient than python lists, i.e. they use less memory

In [1]:
# the convention is to import numpy as np
# (but feel free to name it as you like)
import numpy as np

## 1. Create arrays

### 1.1 Homogeneous array

In [2]:
# creating arrays with zeros
np.zeros( shape=4, dtype=int )

array([0, 0, 0, 0])

In [3]:
np.zeros( shape=(2,5) )

array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

In [4]:
# creating arrays with ones
np.ones( 10 )

array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

### 1.2 Arrays from existing data

In [5]:
# creating a numpy array from an existing array
np.array( [1, 4, 42] )

array([ 1,  4, 42])

In [6]:
# creating a 2D numpy array from existing data
np.array( [[1, 2], 
           [3, 4]] )

array([[1, 2],
       [3, 4]])

In [7]:
# convert into numpy array (input can for example be tuples)
np.asarray( (1,3,5), dtype=np.float64 )

array([ 1.,  3.,  5.])

In [8]:
# create an array referenced (linked) to an original array and a copy
arr1 = np.zeros( 3 )
arr2 = arr1
arr3 = np.copy( arr1 )

# changing arr1 now changes arr2 as well, but not arr3
arr1[0] = 1
print( arr1, arr2, arr3)

[ 1.  0.  0.] [ 1.  0.  0.] [ 0.  0.  0.]


### 1.3 Empty arrays

In [9]:
# creating an empty array of given shape and dtype without initializing entries
# it should be somewhat faster than `np.ones` or `np.zeros` (but all entries have to be set)
np.empty( shape=7, dtype=float )

array([  6.91561691e-310,   6.91561691e-310,   6.91561429e-310,
         5.31021756e-317,   2.71475871e-316,   1.08771139e-313,
         3.16202013e-322])

### 1.4 Numerical ranges

In [10]:
# evenly spaced values within a given interval
# Values are generated within the half-open interval [start, stop) 
# note: for non-integer values, linspace is recommended instead
np.arange( start=0, stop=10, step=2)

array([0, 2, 4, 6, 8])

In [11]:
# Return evenly spaced numbers over a specified interval.
# Returns 'num' evenly spaced samples, calculated over the interval [start, stop].
# The endpoint of the interval can optionally be excluded.
np.linspace( start=0, stop=10, num=10)

array([  0.        ,   1.11111111,   2.22222222,   3.33333333,
         4.44444444,   5.55555556,   6.66666667,   7.77777778,
         8.88888889,  10.        ])

In [12]:
# Return numbers spaced evenly on a log scale.

# In linear space, the sequence starts at base^start and ends with base^stop.
np.logspace( start=.0, stop=1, num=10, base=10 )

array([  1.        ,   1.29154967,   1.66810054,   2.15443469,
         2.7825594 ,   3.59381366,   4.64158883,   5.9948425 ,
         7.74263683,  10.        ])

### 1.5 Random numbers

In [13]:
# using a seed ensures reproducibility
# i.e. to always get the same random number sequence
np.random.seed(42)

# random integers
# 1D array with random numbers in interval [0, 10)
np.random.randint(10, size=5)

array([6, 3, 7, 4, 6])

In [14]:
# 2D array with random numbers in interval [10,20)
np.random.randint(low=10, high=20, size=(2,5) )

array([[19, 12, 16, 17, 14],
       [13, 17, 17, 12, 15]])

In [15]:
# 3D array with random floats uniformly distributed in interval [0,1)
np.random.random( size=(2,5,3) )

array([[[  5.64115790e-02,   7.21998772e-01,   9.38552709e-01],
        [  7.78765841e-04,   9.92211559e-01,   6.17481510e-01],
        [  6.11653160e-01,   7.06630522e-03,   2.30624250e-02],
        [  5.24774660e-01,   3.99860972e-01,   4.66656632e-02],
        [  9.73755519e-01,   2.32771340e-01,   9.06064345e-02]],

       [[  6.18386009e-01,   3.82461991e-01,   9.83230886e-01],
        [  4.66762893e-01,   8.59940407e-01,   6.80307539e-01],
        [  4.50499252e-01,   1.32649612e-02,   9.42201756e-01],
        [  5.63288218e-01,   3.85416503e-01,   1.59662522e-02],
        [  2.30893826e-01,   2.41025466e-01,   6.83263519e-01]]])

## 2. Array attributes

numpy array have attributes 
* ndim: number of dimensions
* shape: size of dimensions
* size: total size of array
* dtype: data type of array
* itemsize: size of each array elements in bytes
* nbytes: total size of array in bytes (should be size*itemsize)


In [16]:
# create 3D array with random integers
arr_3D = np.random.randint(10, size=(2,5,3) )

print( "arr_3D ndim    : {0}".format(arr_3D.ndim) )
print( "arr_3D shape   : {0}".format(arr_3D.shape) )
print( "arr_3D size    : {0}".format(arr_3D.size) )
print( "arr_3D dtype   : {0}".format(arr_3D.dtype) )
print( "arr_3D itemsize: {0}".format(arr_3D.itemsize) )
print( "arr_3D nbytes  : {0}".format(arr_3D.nbytes) )

arr_3D ndim    : 3
arr_3D shape   : (2, 5, 3)
arr_3D size    : 30
arr_3D dtype   : int64
arr_3D itemsize: 8
arr_3D nbytes  : 240


## 3. Array indexing

Indexing is very similar to other programming languages like Java or C. To access elements use the bracket operator `[]` where multiple dimensions are separated by a comma. 

Note that you can use negative numbers as indices in python to retrieve values offset from the end of the array.

In [17]:
# 1D array with random numbers in interval [0, 10)
arr_1D = np.random.randint(10, size=5)

# accessing single elements of 1D array
print( "arr_1D                       : {0}".format( arr_1D     ))
print( "1st element of arr_1D        : {0}".format( arr_1D[0]  ))
print( "last element of arr_1D       : {0}".format( arr_1D[-1] ))
print( "second-last element of arr_1D: {0}".format( arr_1D[-2] ))

arr_1D                       : [8 6 8 7 0]
1st element of arr_1D        : 8
last element of arr_1D       : 0
second-last element of arr_1D: 7


In [18]:
# 2D array with random numbers in interval [10,20)
arr_2D = np.random.randint(low=10, high=20, size=(2,5) )

# accessing single elements of 2D array
print( "arr_2D                       :")
print( arr_2D )
print( "    1st col, 2nd row         : {0}".format( arr_2D[1,0] ))
print( "    3rd col, 2st row         : {0}".format( arr_2D[1,2] ))

arr_2D                       :
[[17 17 12 10 17]
 [12 12 10 14 19]]
    1st col, 2nd row         : 12
    3rd col, 2st row         : 10


## 4. Array slicing

Slicing a NumPy array means accessing a subarray of the original array. This can be done via the colon operation `:` like `[from:to]`, starting at `from` and stopping one item before `to`.

In general, slicing works as 
* `array[ start:stop:step ]`

with default values of 
* `start=0`,
* `stop=size of dimension`, 
* `step=1`.

In [19]:
print( "arr_1D                 : {0}".format( arr_1D      )) # equivalent to arr_1D[:]
print( "    first 3 elements   : {0}".format( arr_1D[:3]  ))
print( "    every other element: {0}".format( arr_1D[::2] ))

arr_1D                 : [8 6 8 7 0]
    first 3 elements   : [8 6 8]
    every other element: [8 8 0]


In [20]:
print( "arr_2D" )
print( arr_2D )
print( "    third  column: {0}".format( arr_2D[:,2] ))
print( "    second row   : {0}".format( arr_2D[1,:] ))
print( "    all columns except last: ")
print( arr_2D[:,:-1] )
# attention: for row-access, arr_2D[1,:] is equivalent to arr_2D[1]

arr_2D
[[17 17 12 10 17]
 [12 12 10 14 19]]
    third  column: [12 10]
    second row   : [12 12 10 14 19]
    all columns except last: 
[[17 17 12 10]
 [12 12 10 14]]


Note: using negative step values, results in swapping `start` and `stop`, thus providing an easy way to reverse an array.

In [21]:
print( "arr_1D         : {0}".format( arr_1D       ))
print( "arr_1D reversed: {0}".format( arr_1D[::-1] ))

arr_1D         : [8 6 8 7 0]
arr_1D reversed: [0 7 8 6 8]


Array slices are *not* copies, but rather *views* of the original array. That means they are still linked to the original array. Hence, changing the slice will also change the original array. 

If you want instead a copy, use the `copy()` method (see above at [Create arrays from existing data](#1.2-Arrays-from-existing-data) )

In [22]:
# 2D array with random numbers in interval [10,20)
arr_2D = np.random.randint(low=1, high=10, size=(2,5) )

print("arr_2D:")
print( arr_2D )

arr_2D_slice = arr_2D[ 0:, 2:4]

print("arr_2D slice:")
print( arr_2D_slice )

arr_2D:
[[7 9 7 9 8]
 [2 1 7 7 8]]
arr_2D slice:
[[7 9]
 [7 7]]


In [23]:
# changing element in slice also changes element in original array
arr_2D_slice[0,0] = 42
print("arr_2D slice:")
print( arr_2D_slice )

print("arr_2D:")
print( arr_2D )

arr_2D slice:
[[42  9]
 [ 7  7]]
arr_2D:
[[ 7  9 42  9  8]
 [ 2  1  7  7  8]]


In [24]:
# making a copy (not linked to the original) of the array
arr_2D_copy = arr_2D[ 0:, 2:4].copy()

## 5. Reshape arrays

### 5.1 The `reshape` method

With `reshape` it is possible to change the shape of an array without changing its elements.

In [25]:
# converting a 1D array into a 2D array
# using the reshape method to creating a row vector
arr = np.array( [1,2,3] )
print( "original array with shape = {0}: ".format( arr.shape ) )
print( arr )
print( "" )

arr = np.reshape( arr, newshape=(3,1) )
print( "reshaped array with shape = {0}: ".format( arr.shape ))
print( arr )

original array with shape = (3,): 
[1 2 3]

reshaped array with shape = (3, 1): 
[[1]
 [2]
 [3]]


In [26]:
# put the numbers 1-6 in a 2X3 array
arr_1 = np.arange( 1, 7 ).reshape( (2,3) )
print( arr_1 )

[[1 2 3]
 [4 5 6]]


### 5.2 The `newaxis` expression

Another way of manipulating the dimensions of an existing array is the `newaxis` expression. More precisely, it will increase the dimension of the existing array by one (a 1D array will become a 2D array, a 2D array will become a 3D array, and so on).

In [27]:
# inserting an axis along the first dimension to create a row vector
arr = np.arange( 3 )
arr2 = arr[ np.newaxis, : ]
print( "arr.shape  = {0},   arr  = {1}".format( arr.shape, arr ) )
print( "arr2.shape = {0}, arr2 = {1}".format( arr2.shape, arr2 ) )

arr.shape  = (3,),   arr  = [0 1 2]
arr2.shape = (1, 3), arr2 = [[0 1 2]]


In [28]:
# inserting a new axis along the second dimension to make a column vector
arr = np.arange( 3 )
arr2 = arr[ :, np.newaxis ]
print( "arr.shape  = {0},   arr  = {1}".format( arr.shape, arr ) )
print( "arr2.shape = {0}, arr2 =".format( arr2.shape ) )
print( arr2 )

arr.shape  = (3,),   arr  = [0 1 2]
arr2.shape = (3, 1), arr2 =
[[0]
 [1]
 [2]]


## 6. Concatenate arrays

Concatenate (i.e. join) two or more arrays in NumPy can be done with 
* `np.concatenate`
* `np.vstack`: vertical stack
* `np.hstack`: horizontal stack

In [29]:
# concatenate 1D array
arr_1 = np.zeros( 5 )
arr_2 = np.ones( 5 )
arr_3 = np.array( [30,40,50,60,70] )
print( "arr_1: {0}".format(arr_1)) 
print( "arr_2: {0}".format(arr_2)) 
print( "arr_3: {0}".format(arr_3)) 
print( "" )

print( "concatenated array: ")
print( np.concatenate( [arr_1,arr_2,arr_3] ) )

arr_1: [ 0.  0.  0.  0.  0.]
arr_2: [ 1.  1.  1.  1.  1.]
arr_3: [30 40 50 60 70]

concatenated array: 
[  0.   0.   0.   0.   0.   1.   1.   1.   1.   1.  30.  40.  50.  60.  70.]


In [30]:
# concatenate 2D arrays
arr_1 = np.array( [[1 ,2 ,3 ],
                   [10,20,30]])
# along the first axis
np.concatenate( [arr_1,arr_1] )

array([[ 1,  2,  3],
       [10, 20, 30],
       [ 1,  2,  3],
       [10, 20, 30]])

In [31]:
# along the second axis
print( np.concatenate( [arr_1,arr_1], axis=1 ) )

[[ 1  2  3  1  2  3]
 [10 20 30 10 20 30]]


In [32]:
# stack array vertically
# 1D array with only zeros of length 3
arr_1 = np.zeros( 3 )
# 2D array with only ones of size (2,3)
arr_2 = np.ones( (2,3) )

print( arr_1 )
print( arr_2 )
print( "" )
print( np.vstack( [arr_1,arr_2] ) )

[ 0.  0.  0.]
[[ 1.  1.  1.]
 [ 1.  1.  1.]]

[[ 0.  0.  0.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]


In [33]:
# stack arrays horizonzally
arr_1 = np.zeros( (2,3) )
arr_2 = np.array( [[44],
                   [44]] )

print( arr_1 )
print( arr_2 )
print( "" )
print( np.hstack( [arr_1,arr_2] ))

[[ 0.  0.  0.]
 [ 0.  0.  0.]]
[[44]
 [44]]

[[  0.   0.   0.  44.]
 [  0.   0.   0.  44.]]


## 7. Split arrays

Similar to `concatenate`, three ways exists to split an array:

* `np.split` 
* `np.hsplit`: split horizontally
* `np.vsplit`: split vertically

In [34]:
# create 1D array of random integers in interval [low,high)
arr_1 = np.random.randint( low=1, high=20, size=9 )
print( "arr_1: {0}".format(arr_1) )
print( "" )

# split into N subarrays of equal length (if not possible, exception is raised)
print( "split into 3 subarrays: {0}".format( np.split( arr_1, 3 ) ) )

arr_1: [ 5  3 12  8  3  1  3  5 15]

split into 3 subarrays: [array([ 5,  3, 12]), array([8, 3, 1]), array([ 3,  5, 15])]


In [35]:
# split at explicitly defined split points
np.split( arr_1, [2,3] )

[array([5, 3]), array([12]), array([ 8,  3,  1,  3,  5, 15])]

In [36]:
# create 2D array of random integers
arr_1 = np.random.randint( low=10, high=100, size=(3,4) )
arr_1

array([[87, 12, 10, 14],
       [99, 23, 36, 18],
       [88, 24, 99, 51]])

In [37]:
# split vertically (either via setting number of subarray or defining split points)
np.vsplit( arr_1, 3)

[array([[87, 12, 10, 14]]),
 array([[99, 23, 36, 18]]),
 array([[88, 24, 99, 51]])]

In [38]:
# split horizontally
np.hsplit( arr_1, 2 )

[array([[87, 12],
        [99, 23],
        [88, 24]]), array([[10, 14],
        [36, 18],
        [99, 51]])]

## 8. Array methods

In [39]:
# create an array with random samples drawn from a normal (Gaussian) distribution
arr = np.random.normal( loc=.0, scale=1., size=50 )

# various often used numerical methods
print( "mean: {0}".format( arr.mean() ))
print( "max : {0}".format( arr.max()  ))
print( "min : {0}".format( arr.min()  ))
print( "std : {0}".format( arr.std()  ))
print( "sum : {0}".format( arr.sum()  ))

mean: -0.11977540932033406
max : 1.9451156144867858
min : -2.0915996696872643
std : 0.8942366442982189
sum : -5.988770466016703


In [40]:
# methods to return position of maximum or minimum
pos_max = arr.argmax()
pos_min = arr.argmin()

print( "max(arr) = {0}, at position = {1}".format( arr[pos_max], pos_max ) )
print( "min(arr) = {0}, at position = {1}".format( arr[pos_min], pos_min ) )

max(arr) = 1.9451156144867858, at position = 30
min(arr) = -2.0915996696872643, at position = 36


## 9. Search arrays

To locate an element or its nearest value in an array, the numpy function `where` or the method `argmin` can be used.

Locate a value (actually its nearest value) using the `argmin` method.

In [41]:
# create 1D array 
arr = np.linspace( start=1, stop=10, num=10)

# locate neareste value
val    = 5.
val_id = ( np.abs(arr - val) ).argmin()

print( "arr = {0}".format( arr ) )
print( "val = {0}, result: arr[{1}] = {2}".format( 
       val, val_id, arr[val_id] ) )

arr = [  1.   2.   3.   4.   5.   6.   7.   8.   9.  10.]
val = 5.0, result: arr[4] = 5.0


Locate a value (actually its nearest value) in a multidimensional array using NumPy's `where` function.

In [42]:
# create 2D array of random floats
arr = np.random.random( size = (2,4) )

# locate nearest value
val     = .5
arr_tmp = np.abs( arr - val )
val_id  = np.where( arr_tmp == arr_tmp.min() )

# print info
print( "arr = " )
print( arr )
print( "val = {0}, result: arr[{1}] = {2}".format(
       val, val_id, arr[val_id] ) )

arr = 
[[ 0.53093458  0.44778316  0.55289309  0.59269672]
 [ 0.08085333  0.36965446  0.24215994  0.80313976]]
val = 0.5, result: arr[(array([0]), array([0]))] = [ 0.53093458]


Is can be useful sometimes to find all values larger or small than a certain quantity. This can be done with a simple call of the `where` function.

In [43]:
# create 1D array
arr = np.linspace( start=1, stop=10, num=10 )

# locate all values larger than a certain value
val  = 5.
find = np.where( arr > val )

# print info
print( "arr = {0}".format( arr ) )
print( "val = {0}, positions = {1}, arr_values = {2}".format( 
       val, find, arr[find] ) )

arr = [  1.   2.   3.   4.   5.   6.   7.   8.   9.  10.]
val = 5.0, positions = (array([5, 6, 7, 8, 9]),), arr_values = [  6.   7.   8.   9.  10.]
