Monday, July 30, 2018
8:28 AM

<h2>NumPy</h2>
<p>
In programming, an <i>array</i> is a "collection of the same type data items that can be selected by indices computed at run-time."  In Python, arrays are not a fundamental data type like <i>lists</i> or <i>tuples</i>. While lists and tuples are often used for simple array manipulations (e.g. using an index to select an item at a particular position, my_list[3]), they lack support for standard array functionality like array multiplication or transposition. For manipulations of the latter sort, as well as for more advanced functionality, other modules or libraries are typically used.
<p>
One of the most widely used of these "array modules" is the <a href= "http://numpy.org">NumPy</a> module or library.  It is part of a larger "ecosystem of open-source software for mathematics, science, and engineering" called <a href= "http://scipy.org">SciPy</a> (pronounced “Sigh Pie”). It also includes the SciPy library, Matplotlib, IPython (now part of Jupyter), Sympy, and Pandas.
<p>
For those interested in data science (which is the underlying focus of this notebook, NumPy has at least 3 key elements:
<ul>
    <li> a powerful N-dimensional array object</li>
    <li> a large collection of methods and functions for creating, managing and manipulating arrays</li>
    <li> an extensive collection of mathematical and statistical operations on arrays</li>
</ul>
<p>
What follows is a series of notes pertaining to the NumPy array object and associated functional capabilities. These notes are a compilation from a variety of documents and tutorials available at various web sites including the official documentation, reference, and tutorials for NumPy.
    
For "official details" on the objects, methods and functions in this note, as well as related topics, see
<ul>
    <li>https://docs.scipy.org/doc/numpy/</li>
    <li>https://docs.scipy.org/doc/numpy/contents.html</li>
</ul>

<h3><a class="anchor" id="toc">Table of Contents</a></h3>

1. [Creating Arrays](#Creating Arrays)<br>
   [Manual Creation](#Manual)<br>
   [Using Reshape](#Using Reshape)<br>
   [Using Lists or Tuples](#Using Lists or Tuples)<br>
   [Using Special Array Generators](#Using Special)<br>
   [Array Attributes](#Array Attributes)<br>
2. [Selecting and Setting Elements](#Selecting and Setting)<br>
   [Selecting from 1 Dimensional](#Selecting 1d)<br>
   [Selecting from 2 Dimensional Array](#Selecting 2d)<br>
   [Selecting from 3 Dimensional Array](#Selecting 3d)<br>
   [Selecting by Values](#Selecting by Values)<br>
   [Setting Values for a Selection](#Setting Values)<br>
3. [Manipulating and Converting Arrays](#Manip and Convert)<br>
   [Manipulation: Methods for Restructuring an Existing Array](#Manip)<br>
   [Manipulation: Methods for Adding or Removing Elements or SubArrays](#Add or Remove)<br>
4. [Key Mathematical Operations](#Math and Stats)<br>
   [Math and Stats on Individual Array](#Ops on Individual)<br>
   [Rounding](#Rounding)<br>
   [Math and Stats on Pairs of Arrays](#Ops on Pairs)<br>
   [Math and Stats on Specified Axis of an Individual Array](#Ops on Axis)<br>


In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

<a class="anchor" id="Creating Arrays"></a>
<h3>Creating an Numpy Array</h3>
* [Return to TOC](#toc)

"NumPy’s main object is the <i>homogeneous</i> multidimensional array. It is a table of <i>elements</i> (usually numbers), all of the same type, indexed by a tuple of positive integers. In NumPy dimensions are called <i>axes</i>."

Some of the more frequent <i>data types</i> include:
<ul>
<li><i>Integers</i>: int\_, int8 (byte), int16, int32, int64. The default is either int32 or int64.</li>
<li><i>Unsigned Integers</i>: uint8, uint16, uint32, uint64</li>
<li><i>Float:</i> float\_, float16, float32, float64. The default is float32.</li>
<li><i>Complex</i>: complex\_, complex64, complex128</li>
<li><i>String</i>: <Un ( fixed-width byte strings, e.g. <U4 ~ 4 max 4 characters in string)</li>
<li><i>Boolean</i>: bool_</li>
</ul>

In order to use NumPy you first have to <b>import</b> it. Once imported, it can be used to create an array by: 

<ul>
    <li>using the numpy.array method or function and entering each of the elements manually</li>
    <li>transforming an existing list or tuple into an array</li>
    <li>employing a number of special generators methods or functions</li>
</ul>

In [2]:
import numpy as np 

<a class="anchor" id="Manual"></a>
<h4>Manual Creation: Using the <i>array</i> method</h4>
* [Return to TOC](#toc)

The most straightforward way to create an array is to use the "array" method and type in the contents manually. The syntax is: np.array([<cell contents>]). Here, the cell contents varies depending on the structure of the array being created. The following examples illustrate the creation of a 1, 2, and 3 dimensional numeric array as well as a string array.

In [3]:
# Manual Creation

print('Create 1d array: my_arr = np.array([1,2,3,4,5,6]) - with 6 elements')
my_arr = np.array([1,2,3,4,5,6])
my_arr

print('Create 2d array: my_2d_arr = np.array([(1, 2, 3), (4, 5, 6)]) - with 2 rows and 3 cols')
my_2d_arr = np.array([(1, 2, 3), (4, 5, 6)])
my_2d_arr

print('Create 3d array: my_3d_arr = np.array([((1, 2, 3),(4, 5, 6)),((7, 8, 9),(10, 11, 12))]) - with (2*2*3) elements')
my_3d_arr = np.array([((1, 2, 3),(4, 5, 6)),((7, 8, 9),(10, 11, 12))])
my_3d_arr

print("Create string array: my_string_array = np.array(['abc','def','ghi']) - with 3 elements")
my_string_array = np.array(['abc','def','ghi'])
my_string_array

Create 1d array: my_arr = np.array([1,2,3,4,5,6]) - with 6 elements


array([1, 2, 3, 4, 5, 6])

Create 2d array: my_2d_arr = np.array([(1, 2, 3), (4, 5, 6)]) - with 2 rows and 3 cols


array([[1, 2, 3],
       [4, 5, 6]])

Create 3d array: my_3d_arr = np.array([((1, 2, 3),(4, 5, 6)),((7, 8, 9),(10, 11, 12))]) - with (2*2*3) elements


array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

Create string array: my_string_array = np.array(['abc','def','ghi']) - with 3 elements


array(['abc', 'def', 'ghi'], dtype='<U3')

<a class="anchor" id="Using Lists or Tuples"></a>
<h4>Using Lists or Tuples</h4>
* [Return to TOC](#toc)

In [4]:
print('Converting a list - my_list = [1,2,3,4,5,6] ')
my_list = [1,2,3,4,5,6]

print('into an array - my_list_arr = np.array(my_list)')
my_list_arr = np.array(my_list)
my_list_arr

print('Converting a tuple - my_tuple = (1,2,3,4,5,6)')
my_tuple = (1,2,3,4,5,6)

print('into an array - my_tuple_arr = np.array(my_tuple)')
my_tuple_arr = np.array(my_tuple)
my_tuple_arr

Converting a list - my_list = [1,2,3,4,5,6] 
into an array - my_list_arr = np.array(my_list)


array([1, 2, 3, 4, 5, 6])

Converting a tuple - my_tuple = (1,2,3,4,5,6)
into an array - my_tuple_arr = np.array(my_tuple)


array([1, 2, 3, 4, 5, 6])

<a class="anchor" id="Using Special"></a>
<h4>Using Special Array Generators:</h4>

Including arange, zeros, ones, empty, fill, and random numbers.
* [Return to TOC](#toc)

In [5]:
# Create a range of values - np.arange(*start,stop,*step) - * implies it is optional

print('Create array of integers:  my_range_arr1 = np.arange(1, 7, 1) - start at 1, end at 6 = 7-1, steps of 1')
my_range_arr1 = np.arange(1, 7, 1)
my_range_arr1

print('Create array of integers: my_range_arr2 = np.arange(6) - start at 0, end at 5 = 6-1')
my_range_arr2 = np.arange(6)
my_range_arr2

print('Create array of floats: my_range_arr3 = np.arange(0,7,.5) - start at 0.0, end at 6.5 = 7.0-0.5, steps of .5')
my_range_arr3 = np.arange(0,7,.5)
my_range_arr3

Create array of integers:  my_range_arr1 = np.arange(1, 7, 1) - start at 1, end at 6 = 7-1, steps of 1


array([1, 2, 3, 4, 5, 6])

Create array of integers: my_range_arr2 = np.arange(6) - start at 0, end at 5 = 6-1


array([0, 1, 2, 3, 4, 5])

Create array of floats: my_range_arr3 = np.arange(0,7,.5) - start at 0.0, end at 6.5 = 7.0-0.5, steps of .5


array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. ,
       6.5])

In [6]:
# Methods to initialize the elements of an array to a constant value

print('Create array zeros:  my_zero_arr = np.zeros((6)) - with 6 floats')
my_zero_arr = np.zeros((6))
my_zero_arr

print('Create array of zeros: my_zero_arr2 = np.zeros((6), dtype=np.int16 ) - with 6 "int16"s')
my_zero_arr2 = np.zeros((6), dtype=np.int16 )
my_zero_arr2

print(' Create array of ones: my_ones_arr = np.ones((6)) - with 6 floats')
my_ones_arr = np.ones((6))
my_ones_arr

print('Create empty array:  my_empty_arr = np.empty((6)) - with space for 6 elements, the values are meaningless')
my_empty_arr = np.empty((6))
my_empty_arr

print('Create array filled with specified value: my_full_arr = np.full(6,10) - 6 elements with integer value of 10')
my_full_arr = np.full(6,10)
my_full_arr

Create array zeros:  my_zero_arr = np.zeros((6)) - with 6 floats


array([0., 0., 0., 0., 0., 0.])

Create array of zeros: my_zero_arr2 = np.zeros((6), dtype=np.int16 ) - with 6 "int16"s


array([0, 0, 0, 0, 0, 0], dtype=int16)

 Create array of ones: my_ones_arr = np.ones((6)) - with 6 floats


array([1., 1., 1., 1., 1., 1.])

Create empty array:  my_empty_arr = np.empty((6)) - with space for 6 elements, the values are meaningless


array([1., 1., 1., 1., 1., 1.])

Create array filled with specified value: my_full_arr = np.full(6,10) - 6 elements with integer value of 10


array([10, 10, 10, 10, 10, 10])

In [7]:
# Create an array filled with random numbers from uniform distribution

print(' Create 1d array of random floats: my_rand_arr = np.random.random_sample(6) - 6 between 0.0 and 1.0')
my_rand_arr = np.random.random_sample(6)
my_rand_arr

print('Create 1d array of random floats: my_rand_arr2 = (6-1) * np.random.random_sample(6) + 1 - 6 in interval 1 to 6')
my_rand_arr2 = (6-1) * np.random.random_sample(6) + 1 # 6 random floats from interval (b - a) * random_sample() + a
my_rand_arr2

print('Create 1d array random integers: my_randints_arr = np.random.randint(1, 6, size=6) - 6 between 1 and 5')
my_randints_arr = np.random.randint(1, 6, size=6) # - array of 6 random integers between 1 and 6 from uniform distribution
my_randints_arr

print('Create 2d array of random integers: my_2d_rand_arr = np.random.randint(9, size=(3, 3)) - 9 between 1 and 8')
my_2d_rand_arr = np.random.randint(9, size=(3, 3))
my_2d_rand_arr

 Create 1d array of random floats: my_rand_arr = np.random.random_sample(6) - 6 between 0.0 and 1.0


array([0.08274712, 0.98445778, 0.77342933, 0.6393075 , 0.52651517,
       0.42488631])

Create 1d array of random floats: my_rand_arr2 = (6-1) * np.random.random_sample(6) + 1 - 6 in interval 1 to 6


array([3.46826914, 3.69781857, 3.84392569, 3.70871927, 4.43214228,
       2.57490267])

Create 1d array random integers: my_randints_arr = np.random.randint(1, 6, size=6) - 6 between 1 and 5


array([3, 2, 3, 3, 4, 5])

Create 2d array of random integers: my_2d_rand_arr = np.random.randint(9, size=(3, 3)) - 9 between 1 and 8


array([[0, 1, 7],
       [8, 4, 5],
       [8, 1, 1]])

<a class="anchor" id="Array Attributes"></a>
<h4>Array Attributes</h4>
* [Return to TOC](#toc)

Array attributes "reflect information that is intrinsic to the array itself." Among the key attributes are:

<ul>
<li>ndim: Number of array dimensions.</li>
<li>shape: Tuple of array dimensions.</li>
<li>size: Total number of elements across all dimensions in the array.</li>
<li>itemsize: Length of one array element in bytes.</li>
<li>nbytes: Total bytes consumed by the elements of the array.</li>
</ul>

In [8]:
# Display attributes for 1d, 2d, and 3d arrays (created in earlier steps)

print('Attributes for the arrays: my_arr, my_2d_arr, my_3d_arr','\n')

print('Number of dimensions: my_arr.ndim, my_2d_arr.ndim, my_3d_arr.ndim')
my_arr.ndim, my_2d_arr.ndim, my_3d_arr.ndim

print('Number of elements in each dimension: my_arr.shape, my_2d_arr.shape, my_3d_arr.shape ')
my_arr.shape, my_2d_arr.shape, my_3d_arr.shape 

print('Total number of elements across all dimensions: my_arr.size, my_2d_arr.size, my_3d_arr.size')
my_arr.size, my_2d_arr.size, my_3d_arr.size

print('Length of one array element in bytes: my_arr.itemsize, my_2d_arr.itemsize, my_3d_arr.itemsize')
my_arr.itemsize, my_2d_arr.itemsize, my_3d_arr.itemsize

print('Total bytes in array: my_arr.nbytes, my_2d_arr.nbytes, my_3d_arr.nbytes')
my_arr.nbytes, my_2d_arr.nbytes, my_3d_arr.nbytes


Attributes for the arrays: my_arr, my_2d_arr, my_3d_arr 

Number of dimensions: my_arr.ndim, my_2d_arr.ndim, my_3d_arr.ndim


(1, 2, 3)

Number of elements in each dimension: my_arr.shape, my_2d_arr.shape, my_3d_arr.shape 


((6,), (2, 3), (2, 2, 3))

Total number of elements across all dimensions: my_arr.size, my_2d_arr.size, my_3d_arr.size


(6, 6, 12)

Length of one array element in bytes: my_arr.itemsize, my_2d_arr.itemsize, my_3d_arr.itemsize


(4, 4, 4)

Total bytes in array: my_arr.nbytes, my_2d_arr.nbytes, my_3d_arr.nbytes


(24, 24, 48)

<a class="anchor" id="Selecting and Setting"></a>
<h3>Selecting and Setting the Elements of an Array</h3>
* [Return to TOC](#toc)

Like the elements of a list and tuple, indexing can be used to select the elements of an array.  For one dimensional arrays, the selection works exactly the same as a list/tuple, by specifying the index or indices of the desired elements. It is important to remember that indexing starts with 0 rather than 1. For arrays with 2 or more dimensions, additional indices must be specified for each dimension.

<a class="anchor" id="Selecting 1d"></a>
<h4>Selection for a one dimensional array</h4>
* [Return to TOC](#toc)

In [9]:
# first creating an array

print('Create array: my_arr2 = np.array([10, 20, 30, 40, 50, 60])')
my_arr2 = np.array([10, 20, 30, 40, 50, 60])
my_arr2

print('Select first element: my_arr2[0] - 0th position')
my_arr2[0]

print('Select 5th element: my_arr2[5]')
my_arr2[5]

print('Select range of elements: my_arr2[1:5:2] - starting at 1, ending at 5, in steps of 2')
my_arr2[1:5:2]

Create array: my_arr2 = np.array([10, 20, 30, 40, 50, 60])


array([10, 20, 30, 40, 50, 60])

Select first element: my_arr2[0] - 0th position


10

Select 5th element: my_arr2[5]


60

Select range of elements: my_arr2[1:5:2] - starting at 1, ending at 5, in steps of 2


array([20, 40])

<a class="anchor" id="Selecting 2d"></a>
<h4>Selection for a two dimensional array</h4>
* [Return to TOC](#toc)

In [10]:
print('Selections for 2d array: my_2d_arr2')
my_2d_arr2 = np.array([[1,2,3],[4,5,6],[7,8,9]])
my_2d_arr2

print('Selection: my_2d_arr2[0][2] - select element in row 0 & col 2')
my_2d_arr2[0][2]

print('Selection: my_2d_arr2[0,2] - select element in row 0 & col 2 - same as above')
my_2d_arr2[0,2]

print('Selection: my_2d_arr2[:2] - select rows up to but not including row 2')
my_2d_arr2[:2]

print('Selection: my_2d_arr2[:2,:1] - select rows up to but not including 2 and col 0')
my_2d_arr2[:2,:1]

print('Selection: my_2d_arr2[:2, 1:] - select rows up to but not including 2 and cols from 1 on')
my_2d_arr2[:2, 1:]

Selections for 2d array: my_2d_arr2


array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Selection: my_2d_arr2[0][2] - select element in row 0 & col 2


3

Selection: my_2d_arr2[0,2] - select element in row 0 & col 2 - same as above


3

Selection: my_2d_arr2[:2] - select rows up to but not including row 2


array([[1, 2, 3],
       [4, 5, 6]])

Selection: my_2d_arr2[:2,:1] - select rows up to but not including 2 and col 0


array([[1],
       [4]])

Selection: my_2d_arr2[:2, 1:] - select rows up to but not including 2 and cols from 1 on


array([[2, 3],
       [5, 6]])

<a class="anchor" id="Selecting 3d"></a>
<h4>Selection for a three dimensional array</h4>
* [Return to TOC](#toc)

In [11]:
print('Selections for 3d (2,2,3) array: my_3d_arr2')
my_3d_arr

# Selecting elements in 3d array
print('Selection:  my_3d_arr[0][1][2] - element whose indices are 0, 1, & 2 for the 1st, 2nd and 3rd dims')
my_3d_arr[0][1][2] # selecting 0th element in 1st dim, 1st element in 2nd dim, and 2nd1 element in 3rd dim 

print('Selection: my_3d_arr[0, 1, 2] - element whose indices are 0, 1, & 2 for the 1st, 2nd and 3rd dims')
my_3d_arr[0, 1, 2]

print('Selection: my_3d_arr[:1,0:,:2] - element whose indices are <1 in 1st dim, 0 and above in 2nd dim, <2 in 3rd dim')
my_3d_arr[:1,0:,:2]

Selections for 3d (2,2,3) array: my_3d_arr2


array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

Selection:  my_3d_arr[0][1][2] - element whose indices are 0, 1, & 2 for the 1st, 2nd and 3rd dims


6

Selection: my_3d_arr[0, 1, 2] - element whose indices are 0, 1, & 2 for the 1st, 2nd and 3rd dims


6

Selection: my_3d_arr[:1,0:,:2] - element whose indices are <1 in 1st dim, 0 and above in 2nd dim, <2 in 3rd dim


array([[[1, 2],
        [4, 5]]])

<a class="anchor" id="Selecting by Values"></a>
<h4>Selecting Elements based on their Values</h4>
* [Return to TOC](#toc)

In these cases the selection of an element is generally made on the basis of a boolean or logical comparison (e.g. x == value, x != value, x > value, etc.).  While there is a finite number of logical operators, the "x" in the comparisons can be virtually anything. 

In [12]:
# Selections based on numerical value

print('From 1d array:')
my_arr
print('Select values less than 4: my_arr[my_arr < 4]')
my_arr[my_arr < 4]

print("From 2d array:")
my_2d_arr
print('Select values greater than or equal to 3: my_2d_arr[my_2d_arr >= 3]')
my_2d_arr[my_2d_arr >= 3]

print("From 3d array:")
my_3d_arr
print('Select values != 6: my_3d_arr[my_3d_arr != 6] - i.e. not equal to 6')
my_3d_arr[my_3d_arr != 6]

From 1d array:


array([1, 2, 3, 4, 5, 6])

Select values less than 4: my_arr[my_arr < 4]


array([1, 2, 3])

From 2d array:


array([[1, 2, 3],
       [4, 5, 6]])

Select values greater than or equal to 3: my_2d_arr[my_2d_arr >= 3]


array([3, 4, 5, 6])

From 3d array:


array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

Select values != 6: my_3d_arr[my_3d_arr != 6] - i.e. not equal to 6


array([ 1,  2,  3,  4,  5,  7,  8,  9, 10, 11, 12])

In [13]:
# Selections based on the values in another array

print('Starting numerical 1d my_arr')
my_arr

print('Starting 1d my_locs_arr:')
my_locs_arr = np.array(['East','East','West','West','East','West'])
my_locs_arr

print('Numerical values in my_arr whose indices are same as those of "East" in my_locs_arr')
my_arr[my_locs_arr == 'East']

# Underlying the above comparisons and selections is actually a boolean array

print('Boolean array: my_bool_arr - True if East and False otherwise')
my_bool_arr = (my_locs_arr == 'East')
my_bool_arr

print('Numerical values in my_arr whose indices are the same as those of "True" in my_bool_arr')
my_arr[my_bool_arr]

Starting numerical 1d my_arr


array([1, 2, 3, 4, 5, 6])

Starting 1d my_locs_arr:


array(['East', 'East', 'West', 'West', 'East', 'West'], dtype='<U4')

Numerical values in my_arr whose indices are same as those of "East" in my_locs_arr


array([1, 2, 5])

Boolean array: my_bool_arr - True if East and False otherwise


array([ True,  True, False, False,  True, False])

Numerical values in my_arr whose indices are the same as those of "True" in my_bool_arr


array([1, 2, 5])

<a class="anchor" id="Setting Values"></a>
<h4>Setting the values of Selected Cells</h4>
* [Return to TOC](#toc)

The process is straightforward -- the desired selection is on the left-hand side of the "equation" and the desired value on the right-hand side.  In many cases, it's a good idea to make a <i>deep copy</i> of the original and use the copy. To make a deep copy, you use the <i>copy</i> method rather than simply setting the copy equal to the original (i.e. this -- my_copy_arr = my_arr.copy();not this -- my_copy = my_arr.  The reason why is that changes made to a deep copy don't impact the original of which it is a copy.  On the other hand, f you simply set the copy to the original, then that copy is simply an new label for the original object.  Any changes made to one (copy or original), changes the other. \[Note:  In a notebook this can impact arrays that were define earlier in the notebook\].

In [14]:
# Make a copy so original array is not overwritten

print('Copy original 1d my_arr: my_copy_arr = my_arr.copy()')
my_copy_arr = my_arr.copy()
my_copy_arr

print('Logical comparison: my_copy_arr is my_arr - "Is" it the same as original array (True), or a copy (False)?')
my_copy_arr is my_arr

# Setting the value(s)

print('Setting selection ":2" to 0 - values whose indices are below 2 set to 0 ')
my_copy_arr[:2] = 0
my_copy_arr

print('Setting selection: my_copy_arr[4:] = 1 - values whose indices are 4 and above are set to 1')
my_copy_arr[4:] = 1
my_copy_arr

print('Setting selection:  my_copy_arr[my_copy_arr > 1] = 2 - all elements whose values are > 1 set to 2')
my_copy_arr[my_copy_arr > 1] = 2
my_copy_arr

Copy original 1d my_arr: my_copy_arr = my_arr.copy()


array([1, 2, 3, 4, 5, 6])

Logical comparison: my_copy_arr is my_arr - "Is" it the same as original array (True), or a copy (False)?


False

Setting selection ":2" to 0 - values whose indices are below 2 set to 0 


array([0, 0, 3, 4, 5, 6])

Setting selection: my_copy_arr[4:] = 1 - values whose indices are 4 and above are set to 1


array([0, 0, 3, 4, 1, 1])

Setting selection:  my_copy_arr[my_copy_arr > 1] = 2 - all elements whose values are > 1 set to 2


array([0, 0, 2, 2, 1, 1])

<a class="anchor" id="Manip and Convert"></a>
<h3>Manipulating and Converting Arrays</h3>
* [Return to TOC](#toc)

Numpy provides a number of functions and methods for manipulating converting the elements of an array from one type to another, and for writing the results to a file.

https://docs.scipy.org/doc/numpy/user/quickstart.html#shape-manipulation

<a class="anchor" id="Manip"></a>
<h4>Manipulation: Methods for Restructuring Existing Arrays</h4>
* [Return to TOC](#toc)

<ul>
    <li>reshape: same data with a new shape</li>
    <li>split: divide into sub-arrays along specified axis (also hsplit, vsplit and dsplit)</li>
    <li>transpose: transpose axes of an array - basically flips rows to cols</li>
    <li>swapaxes: interchange axis1 with axis2 - for 2 dimensional this works like transpose</li>
    <li>sort: sort (by axis) an array, in-place - also can produce surprising results!</li>
</ul>


In [15]:
# Manipulating Existing Structure

print('Manipulating 1d array: my_arr2')
my_arr2 = np.array([1,2,2,4,6,5])
my_arr2

print('Reshape: my_2d_arr2 = np.reshape(my_arr,(2,6)) - reshaping 1d into 2d array with 2 rows and 6 cols') 
my_2d_arr2 = np.reshape(my_arr2, (2,3))
my_2d_arr2

print('Transpose: np.transpose(my_2d_arr2) - transpose (2,6) 2d array into (6,2) 2d array')
np.transpose(my_2d_arr2)

print('Swapaxes: np.swapaxes(my_2d_arr2,0,1) - interchange axis 0 with axis 1 - in a 2d array same as transpose')
np.swapaxes(my_2d_arr2,0,1)

print('Sort: np.sort(my_2d_arr2,1): sort by axis 0 in-place')
np.sort(my_2d_arr2,1)

Manipulating 1d array: my_arr2


array([1, 2, 2, 4, 6, 5])

Reshape: my_2d_arr2 = np.reshape(my_arr,(2,6)) - reshaping 1d into 2d array with 2 rows and 6 cols


array([[1, 2, 2],
       [4, 6, 5]])

Transpose: np.transpose(my_2d_arr2) - transpose (2,6) 2d array into (6,2) 2d array


array([[1, 4],
       [2, 6],
       [2, 5]])

Swapaxes: np.swapaxes(my_2d_arr2,0,1) - interchange axis 0 with axis 1 - in a 2d array same as transpose


array([[1, 4],
       [2, 6],
       [2, 5]])

Sort: np.sort(my_2d_arr2,1): sort by axis 0 in-place


array([[1, 2, 2],
       [4, 5, 6]])

<a class="anchor" id="Add or Remove"></a>
<h4>Manipulation: Methods Adding or removing Elements or Sub-Arrays</h4>
* [Return to TOC](#toc)

<ul>
    <li>insert: insert values along the given axis before the given indices</li>
    <li>append: append values to the end of an array</li>
    <li>delete: delete elements from specified position for specified axis</li>
    <li>split: split an array into N equal sub-arrays along specified axis</li>
    <li>unique: find unique values and sort into 1d array
</ul>

In [16]:
#Adding or Removing Elements or Sub-Array

print('All operations performed on 2d array - my_2d_arr')
my_2d_arr

print('Insert: np.insert(my_2d_arr,1,5,axis=0) - on axis 0 insert the value 5 after row 1')
np.insert(my_2d_arr2,1,5,axis=0)

print('Append: np.append(my_2d_arr2,[[3],[9]],axis=1) - on axis 1 append specified values to the end of an array')
np.append(my_2d_arr2,[[3],[9]],axis=1)

print('Delete: np.delete(my_2d_arr2,1,axis=0) - on axis 0 delete elements in index/row 1')
np.delete(my_2d_arr2,1,axis=0)

print('Split: np.split(my_2d_arr2,3,axis=1) - split axis 1 into 3 equal sub-arrays')
np.split(my_2d_arr2,3,axis=1)      

print('Unique: np.unique(my_2d_arr,0) - finds the unique values in an array and sorts them into 1d array')
np.unique(my_2d_arr2,0)

All operations performed on 2d array - my_2d_arr


array([[1, 2, 3],
       [4, 5, 6]])

Insert: np.insert(my_2d_arr,1,5,axis=0) - on axis 0 insert the value 5 after row 1


array([[1, 2, 2],
       [5, 5, 5],
       [4, 6, 5]])

Append: np.append(my_2d_arr2,[[3],[9]],axis=1) - on axis 1 append specified values to the end of an array


array([[1, 2, 2, 3],
       [4, 6, 5, 9]])

Delete: np.delete(my_2d_arr2,1,axis=0) - on axis 0 delete elements in index/row 1


array([[1, 2, 2]])

Split: np.split(my_2d_arr2,3,axis=1) - split axis 1 into 3 equal sub-arrays


[array([[1],
        [4]]), array([[2],
        [6]]), array([[2],
        [5]])]

Unique: np.unique(my_2d_arr,0) - finds the unique values in an array and sorts them into 1d array


array([1, 2, 4, 5, 6])

<a class="anchor" id="Math and Stats"></a>
<h3>Key Mathematical and Statistical Operations on Arrays</h3>
* [Return to TOC](#toc)

There is a long laundry list of mathematical functions or methods and a handful of statistical functions-- that can be applied to NumPy arrays. These are detailed in:

<ul>
<li>Mathematical: https://docs.scipy.org/doc/numpy-1.12.0/reference/routines.math.html</li>
<li>Statistical: https://docs.scipy.org/doc/numpy-1.12.0/reference/routines.statistics.html</li>
</ul>

While the official documentation covers a large number of individual mathematical and statistical categories,  I'm only concerned with a select number which have been combined into three general headings math and stat operations on:

<ul>
<li>Individual Arrays</li>
<li>Specified Axis of an Individual Array
<li>Pairs of Arrays</li>
</ul>

For all of these operations, the general assumptions are that: 

<ol>
    <li>That there are no missing values</li>
    <li>For operations operating on pairs of values, the shapes are the same</li>
</ol>

There are formal methods for handling situations where these assumptions are violated.  However, as they say, these methods are beyond the scope of this discussion.  So, in these notes, we are going to assume that the assumptions hold. For those who are interested in handling cases where they don't, see

<ul>
    <li>Missing values -- https://docs.scipy.org/doc/numpy-1.10.0/neps/missing-data.html#definition-of-missing-data.</li>
    <li>Different shapes -- https://docs.scipy.org/doc/numpy-1.14.0/reference/ufuncs.html#ufuncs).</li>
</ul>

<a class="anchor" id="Ops on Individual"></a>
<h4>Math and Stat Operations on Individual Arrays</h4>
* [Return to TOC](#toc)

The following are some of the primary arithmetic functions involving calculations on the elements of a standalone array:

<ul>
<li>reciprocal</li>
<li>sqrt</li>
<li>cbrt (cube root)</li>
<li>square</li>
<li>absolute (value)</li>
<li>positive (convert to positive number)</li>
<li>negative (convert to negative number)</li>
<li>exp (e\*\*x where e = 2.71828)</li>
<li>exp2 (2\*\*x)</li>
<li>log (base e)</li>
<li>log2 (base 2)</li>
<li>log10 (base 10)</li>
</ul>
    
Each of these takes the form -- <i>np.o_name(a1)</i> -- where the arithmetic function designated by the <i>o_name</i> is applied to each element in the array (a1).  A few examples indicate how they all work.

In [17]:
# Element-wise arithmetic operations on a single array

print('Manipulations performed on: my_arr4 = np.array([[2.0, -4.0],[-16.0, 8.0]])')
my_arr4 = np.array([[2.0, -4.0],[-16.0, 8.0]])
my_arr4

print('reciprocal: np.reciprocal(my_arr4)')
np.reciprocal(my_arr4)

print('square: np.square(my_arr4)')
np.square(my_arr4)
      
print('absolute value: np.absolute(my_arr4)')
np.absolute(my_arr4)

print('exponential (e raised to the power of each cell value): my_exp = np.exp(my_arr4)')
my_exp = np.exp(my_arr4)
np.round(my_exp,1)

print('log (base e): my_log = np.log(my_arr4) - log(0) produces error which is replaced with "nan"')
my_log = np.log(my_arr4)
np.round(my_log,1)

Manipulations performed on: my_arr4 = np.array([[2.0, -4.0],[-16.0, 8.0]])


array([[  2.,  -4.],
       [-16.,   8.]])

reciprocal: np.reciprocal(my_arr4)


array([[ 0.5   , -0.25  ],
       [-0.0625,  0.125 ]])

square: np.square(my_arr4)


array([[  4.,  16.],
       [256.,  64.]])

absolute value: np.absolute(my_arr4)


array([[ 2.,  4.],
       [16.,  8.]])

exponential (e raised to the power of each cell value): my_exp = np.exp(my_arr4)


array([[   7.4,    0. ],
       [   0. , 2981. ]])

log (base e): my_log = np.log(my_arr4) - log(0) produces error which is replaced with "nan"




array([[0.7, nan],
       [nan, 2.1]])

<a class="anchor" id="Rounding"></a>
<h5>Rounding Operations</h5>
* [Return to TOC](#toc)

Technically, rounding fits among the arithmetic operations operating on a single array. However, they are a bit of a special case so I've put them in an individual sub-section.

"Rounding a numerical value means replacing it by another value that is approximately equal but has a shorter, simpler, or more explicit representation." For example, a researcher has calculated the 2,000,000,000,000,000th digit of the mathematical constant pi (3.14159265358979...), but we rarely use more than a few of those digits in practice. So, we round to something shorter like 3.14.  If we want something a little longer, say 4 digits, do we round to 3.141 or 3.142 since the 5th digit is a 5.  There are no "hard and fast" rules. So, the answer depends on the rules we decide to use, i.e. round up, down, towards zero, away from zero, etc.  That's why mathematical modules and libraries, like NumPy, provide a number of rounding functions to serve different purposes.  Included in the rounding functions provided by NumPy are:

<ul>
    <li>around or round: to the given number of decimals</li>
    <li>rint: to the nearest integer</li>
    <li>fix: to nearest integer towards zero</li>
    <li>floor: to the floor of the input (largest integer i, such that i <= x)</li>
    <li>ceil: to the ceiling of the input (smallest integer i, such that i >= x)</li>
    <li>trunc: to the truncated value of the input (nearest integer i which is closer to zero than x is)</li>
</ul>

In [18]:
print('Rounding operations on 1d array: my_a3 = np.array([1.653, -2.891, 3.527, 4.264, -5.637])')
my_a3 = np.array([1.653, -2.891, 3.527, 4.264, -5.637])
my_a3

print('round: np.round(my_a3) -round to 0 decimals')
np.round(my_a3)

print('round: np.round(my_a3,2) - round to 2 decimals')
np.round(my_a3,2)

print('rint: np.rint(my_a3) -round to nearest integer')
np.rint(my_a3)

print('fix: np.fix(my_a3) -round to integer towards 0')
np.fix(my_a3)

print('floor: np.floor(my_a3) -round to floor of input')
np.floor(my_a3)

print('ceil: np.ceil(my_a3) - round to ceil of input')
np.ceil(my_a3)

print('trunc: np.trunc(my_a3) -truncated value')
np.trunc(my_a3)

Rounding operations on 1d array: my_a3 = np.array([1.653, -2.891, 3.527, 4.264, -5.637])


array([ 1.653, -2.891,  3.527,  4.264, -5.637])

round: np.round(my_a3) -round to 0 decimals


array([ 2., -3.,  4.,  4., -6.])

round: np.round(my_a3,2) - round to 2 decimals


array([ 1.65, -2.89,  3.53,  4.26, -5.64])

rint: np.rint(my_a3) -round to nearest integer


array([ 2., -3.,  4.,  4., -6.])

fix: np.fix(my_a3) -round to integer towards 0


array([ 1., -2.,  3.,  4., -5.])

floor: np.floor(my_a3) -round to floor of input


array([ 1., -3.,  3.,  4., -6.])

ceil: np.ceil(my_a3) - round to ceil of input


array([ 2., -2.,  4.,  5., -5.])

trunc: np.trunc(my_a3) -truncated value


array([ 1., -2.,  3.,  4., -5.])

<a class="anchor" id="Ops on Pairs"></a>
<h4>Math and Stat Operations on Pairs of Arrays</h4>
*[Return to TOC](#toc)

The following are some of the primary arithmetic functions involving calculations on the elements of pairs of arrays:
<ul>
<li>add</li>
<li>subtract</li>
<li>multiply</i>
<li>dot (dot product of two arrays)</li>
<li>power</li>
<li>divide</li>
<li>mod (remainder of division)</li>
<li>maximum (maximum of the elements in each pair)</li>
<li>minimum (minimum of the elements in each pair)</li>
</ul>
    
Each of these takes the form -- <i>np.o_name(a_name1,a_name2)</i> -- where the arithmetic function designated by the <i>o_name</i> is applied to the pair-wise elements of the two arrays which are typically of the same shape.  Again, a few examples indicate how they work.


In [19]:
print('Arithmetic operations on pairs of arrays')

print('1st array: my_a1 = np.array([[1.,4.],[6.,9.]])')
my_a1 = np.array([[1.,4.],[6.,9.]]) # 2 rows by 3 cols
my_a1

print('2nd array: my_a2 = np.array([[2.,2.],[3.,3.]])')
my_a2 = np.array([[2.,2.],[3.,3.]]) # 2 rows by 3 cols
my_a2

print('add: np.add(my_a1,my_a2) - a1 + a2 for corresponding elements in each pair')
np.add(my_a1,my_a2)

print('subtract: np.subtract(my_a1,my_a2) - a1 - a2 for corresponding elements in each pair')
np.subtract(my_a1,my_a2)

print('multiply: np.multiply(my_a1,my_a2) - a1 * a2 for corresponding elements in each pair')
np.multiply(my_a1,my_a2) 

print('dot product: np.dot(my_a1,my_a2) - technically part of the linear algebra operations')
np.dot(my_a1,my_a2)

print('power: np.power(my_a1,my_a2) - a1**a2 for corresponding elements in each pair')
np.power(my_a1,my_a2)

print('divide: np.divide(my_a1,my_a2) - a1/a2 for corresponding elements in each pair')
np.divide(my_a1,my_a2)

print('mod: np.mod(my_a1,my_a2) - remainder of a1/a2 for corresponding elements in each pair')
np.mod(my_a1,my_a2)

print('maximum: np.maximum(my_a1,my_a2) - larger of the two elements in each pair')
np.maximum(my_a1,my_a2)

print('minimum: np.minimum(my_a1,my_a2)) - smaller of the two elements in each pair')
np.minimum(my_a1,my_a2)

Arithmetic operations on pairs of arrays
1st array: my_a1 = np.array([[1.,4.],[6.,9.]])


array([[1., 4.],
       [6., 9.]])

2nd array: my_a2 = np.array([[2.,2.],[3.,3.]])


array([[2., 2.],
       [3., 3.]])

add: np.add(my_a1,my_a2) - a1 + a2 for corresponding elements in each pair


array([[ 3.,  6.],
       [ 9., 12.]])

subtract: np.subtract(my_a1,my_a2) - a1 - a2 for corresponding elements in each pair


array([[-1.,  2.],
       [ 3.,  6.]])

multiply: np.multiply(my_a1,my_a2) - a1 * a2 for corresponding elements in each pair


array([[ 2.,  8.],
       [18., 27.]])

dot product: np.dot(my_a1,my_a2) - technically part of the linear algebra operations


array([[14., 14.],
       [39., 39.]])

power: np.power(my_a1,my_a2) - a1**a2 for corresponding elements in each pair


array([[  1.,  16.],
       [216., 729.]])

divide: np.divide(my_a1,my_a2) - a1/a2 for corresponding elements in each pair


array([[0.5, 2. ],
       [2. , 3. ]])

mod: np.mod(my_a1,my_a2) - remainder of a1/a2 for corresponding elements in each pair


array([[1., 0.],
       [0., 0.]])

maximum: np.maximum(my_a1,my_a2) - larger of the two elements in each pair


array([[2., 4.],
       [6., 9.]])

minimum: np.minimum(my_a1,my_a2)) - smaller of the two elements in each pair


array([[1., 2.],
       [3., 3.]])

<a class="anchor" id="Ops on Axis"></a>
<h4>Math and Stat Operations on specified Axis of an Individual Array</h4>
*[Return to TOC](#toc)

Most of the NumPy's statistical functions fall into this category and are relatively simple in nature. Included are:

<i>Sums, Averages, and Variances:</i><br>
<ul>
    <li>sum: sum of array elements over a given axis</li>
    <li>cumsum: cumulative sum of array elements over a given axis</li>
    <li>prod: rolling product of array elements over a given axis</i>
    <li>cumprod: rolling product sum of array elements over a given axis</li>
    <li>median: median of array elements over a given axis</li>
    <li>mean: average of array elements over a given axis</li>
    <li>average: weighted average of array elements over a given axis</li>
    <li>std: standard deviation of array elements over a given axis</li>
    <li>var: variance of array elements over a given axis</li>
    <li>amin: minimum of array elements over a given axis</li>
    <li>amax: maximum of array elements over a given axis</li>
    <li>percentile: pth percentile of array elements over a given axis</li>
</ul>

Comparatively speaking, the list isn't very extensive. For more extensive coverage, see the statistical and other data science capabilities provided in:

<ul>
<li>the larger SciPy library of which NumPy is a part (see: https://docs.scipy.org/doc/scipy/reference/stats.html).</li>
<li>The Pandas library which is also a part of SciPy (see: http://pandas.pydata.org/)    

In [20]:
print('statistical operations on 2d array: my_stat_arr = ([[1.4, 4.3, 4.9, 2.2, 8.6],[2.5, 4.1, 3.4, 3.6, 2.8]])')
my_stat_arr = ([[1.4, 4.3, 4.9, 2.2, 8.6],[2.5, 4.1, 3.4, 3.6, 2.8]])
my_stat_arr

print('sum on axis 0: np.sum(my_stat_arr,axis=0) - sums by columns')
np.sum(my_stat_arr,axis=0)

print('sum on axis 1: np.sum(my_stat_arr,axis=1) - sums by rows')
np.sum(my_stat_arr,axis=1)

print('sum for entire array: np.sum(my_stat_arr) - no axis specified')
np.sum(my_stat_arr)

print('cumulative sum: np.cumsum(my_stat_arr,axis=1) - by rows')
np.cumsum(my_stat_arr,axis=1)

print('prod: np.prod(my_stat_arr,axis=1) - by rows')
my_prod = np.prod(my_stat_arr,axis=1)
np.round(my_prod,1)

print('cumprod: np.cumprod(my_stat_arr,axis=1) - by rows')
my_cumprod = np.cumprod(my_stat_arr,axis=1)
np.round(my_cumprod,1)

print('median: np.median(my_stat_arr,axis=1) - by rows')
np.median(my_stat_arr,axis=1)

print('mean: np.mean(my_stat_arr,axis=1) - by rows')
np.mean(my_stat_arr,axis=1)

print('average: np.average(my_stat_arr,axis=1) - by rows')
np.average(my_stat_arr,axis=1)

print('std: np.std(my_stat_arr,axis=1) - by rows')
np.std(my_stat_arr,axis=1)

print('var: np.var(my_stat_arr,axis=1) - by rows')
np.var(my_stat_arr,axis=1)

print('minimum: np.amin(my_stat_arr,axis=1) - by rows')
np.amin(my_stat_arr,axis=1)

print('maximum: np.amax(my_stat_arr,axis=1) - by rows')
np.amax(my_stat_arr,axis=1)

print('50th percentile: np.percentile(my_stat_arr,50,axis=1) - by rows')
np.percentile(my_stat_arr,50,axis=1)

print('25th percentile: np.percentile(my_stat_arr,50,axis=1) - 1st quartile by rows')
np.percentile(my_stat_arr,50,axis=1)

statistical operations on 2d array: my_stat_arr = ([[1.4, 4.3, 4.9, 2.2, 8.6],[2.5, 4.1, 3.4, 3.6, 2.8]])


[[1.4, 4.3, 4.9, 2.2, 8.6], [2.5, 4.1, 3.4, 3.6, 2.8]]

sum on axis 0: np.sum(my_stat_arr,axis=0) - sums by columns


array([ 3.9,  8.4,  8.3,  5.8, 11.4])

sum on axis 1: np.sum(my_stat_arr,axis=1) - sums by rows


array([21.4, 16.4])

sum for entire array: np.sum(my_stat_arr) - no axis specified


37.8

cumulative sum: np.cumsum(my_stat_arr,axis=1) - by rows


array([[ 1.4,  5.7, 10.6, 12.8, 21.4],
       [ 2.5,  6.6, 10. , 13.6, 16.4]])

prod: np.prod(my_stat_arr,axis=1) - by rows


array([558.1, 351.3])

cumprod: np.cumprod(my_stat_arr,axis=1) - by rows


array([[  1.4,   6. ,  29.5,  64.9, 558.1],
       [  2.5,  10.2,  34.8, 125.5, 351.3]])

median: np.median(my_stat_arr,axis=1) - by rows


array([4.3, 3.4])

mean: np.mean(my_stat_arr,axis=1) - by rows


array([4.28, 3.28])

average: np.average(my_stat_arr,axis=1) - by rows


array([4.28, 3.28])

std: np.std(my_stat_arr,axis=1) - by rows


array([2.51666446, 0.5706137 ])

var: np.var(my_stat_arr,axis=1) - by rows


array([6.3336, 0.3256])

minimum: np.amin(my_stat_arr,axis=1) - by rows


array([1.4, 2.5])

maximum: np.amax(my_stat_arr,axis=1) - by rows


array([8.6, 4.1])

50th percentile: np.percentile(my_stat_arr,50,axis=1) - by rows


array([4.3, 3.4])

25th percentile: np.percentile(my_stat_arr,50,axis=1) - 1st quartile by rows


array([4.3, 3.4])