<h1>Numpy Basic: Arrays and Vectorized Computation</h1>

<h3>Advantage of Numpy</h3>

<ul>
    <li>a ndarray, an efficient multidimensiona array that provides fast array-oreinted arithmetic operations and flexible broadcasting capabilities</li>
    <li>Mathematical functions for fast operations on entire arrays of data without having to write loops.</li>
    <li>Tools for reading/writing array data to disk and working with memory-mapped files.</li>
    <li>Linear algebra, random number generation, and Fourier transform capabilites.</li>
    <li>A C API for connecting Numpy with libraries written in C, C++ or FORTRAN. This allows straightforward passing of data to external libraries written in a low-level language and also for external libraries to return data to Python as Numpy arrays.</li>
</ul>

<h3>Performance Difference Between NumPy array and Equivalent Python List</h3>

In [2]:
import numpy as np

In [2]:
my_arr = np.arange(10000000)

In [3]:
my_list = list(range(1000000))

In [4]:
%time for _ in range(10): my_arr2 = my_arr * 2

Wall time: 529 ms


In [5]:
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]

Wall time: 1.74 s


<h3>NumPy ndarray: A Multidimensional Array Object </h3>

In [6]:
data = np.random.rand(2,3)

In [7]:
data

array([[0.43080837, 0.12346743, 0.2143593 ],
       [0.2236712 , 0.13532948, 0.4072278 ]])

<p>Performing mathematical operations on above data</p>

In [8]:
data * 10

array([[4.30808368, 1.23467435, 2.14359297],
       [2.236712  , 1.35329483, 4.07227798]])

In [9]:
data + data

array([[0.86161674, 0.24693487, 0.42871859],
       [0.4473424 , 0.27065897, 0.8144556 ]])

<p> Shape and type of data</p>


In [10]:
data.shape

(2, 3)

In [11]:
data.dtype

dtype('float64')

<h3>Creating ndarrays</h3>

![alt text](Images/np_array_creation.png "Title")

In [12]:
data1 = [6,7.5,8,0,1]

In [13]:
arr1 = np.array(data1)

In [14]:
arr1

array([6. , 7.5, 8. , 0. , 1. ])

<p>Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array.</p>

In [15]:
data2 = [[1,2,3,4],[5,6,7,8]]

In [16]:
arr2 = np.array(data2)

In [17]:
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [18]:
arr2.ndim

2

In [19]:
arr2.shape

(2, 4)

In [20]:
arr1.dtype

dtype('float64')

In [21]:
arr2.dtype

dtype('int32')

<p>Zeros, Ones and Empty Numpy arrays</p>

In [22]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [23]:
np.zeros((3,6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [24]:
np.empty((2,3,2))

array([[[0., 0.],
        [0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.],
        [0., 0.]]])

In [3]:
np.random.randint(-5,5,(2,3,4))

array([[[ 2, -1,  4,  0],
        [-2, -2, -3,  4],
        [-2,  3, -1,  4]],

       [[-2, -3,  1,  0],
        [-4, -3,  0, -2],
        [ 2,  2, -1, -3]]])

<p>Creating Numpy Arrays with diagonal elements set to 1 using eye and identity</p>

In [25]:
np.eye(3,4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.]])

In [26]:
np.identity(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

<p>Producing an array of the given shape and dtype with all the values set to indicated <b>'fill value'</b></p>

In [27]:
np.full((2,3),5)

array([[5, 5, 5],
       [5, 5, 5]])

In [28]:
np.full(shape=(3,4), fill_value=2)

array([[2, 2, 2, 2],
       [2, 2, 2, 2],
       [2, 2, 2, 2]])

In [29]:
data2 = [[1,2,3],[4,5,6],[7,8,9]]

In [30]:
np.full_like(data2, 7)

array([[7, 7, 7],
       [7, 7, 7],
       [7, 7, 7]])

<h3>Data Types for ndarrays</h3>

In [31]:
arr1 = np.array([1,2,3], dtype = np.float64)

In [32]:
arr2 = np.array([1,2,3], dtype = np.int32)

In [33]:
arr1.dtype

dtype('float64')

In [34]:
arr2.dtype

dtype('int32')

![alt text](Images/np_dtypes.png "Title")


<p>Explicitely converting or casting an array form one dtype to another using ndarray's <b>"astype"</b> method </p>

In [35]:
arr = np.array([1,2,3,4,5])

In [36]:
arr.dtype

dtype('int32')

In [37]:
float_arr = arr.astype(np.float64)

In [38]:
float_arr.dtype

dtype('float64')

In [39]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])

In [40]:
arr

array([ 3.7, -1.2, -2.6,  0.5, 12.9, 10.1])

In [41]:
arr.astype(np.int32)

array([ 3, -1, -2,  0, 12, 10])

<p>Converting an array of strings representing numbers to numeric form using <b>"astype"</b></p>

In [42]:
numeric_strings = np.array(['1.25','-9.6','42'], dtype= np.string_)

In [43]:
numeric_strings.astype(float)

array([ 1.25, -9.6 , 42.  ])

<p>Using another arrays dtype attribute</p>

In [44]:
int_array = np.arange(10)

In [45]:
calibers = np.array([.22,.270,.357,.380,.44,.50], dtype = np.float64)

In [46]:
int_array.astype(calibers.dtype)

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

<h3>Arithmetic with Numpy Arrays</h3>

In [47]:
arr = np.array([[1.,2.,3.], [4.,5.,6.]])

In [48]:
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [49]:
arr * arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [50]:
arr - arr

array([[0., 0., 0.],
       [0., 0., 0.]])

In [51]:
1 / arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [52]:
arr ** 0.5

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

<p><b>Note: </b> Comparision between arrays of same size yeilds boolean arrays</p>

In [53]:
arr2 = np.array([[0.,4.,1.], [7.,2.,12.]])

In [54]:
arr2

array([[ 0.,  4.,  1.],
       [ 7.,  2., 12.]])

In [55]:
arr2 > arr

array([[False,  True, False],
       [ True, False,  True]])

<h3>Basic Indexing and Slicing</h3>

In [56]:
arr = np.arange(10)

In [57]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [58]:
arr[5]

5

In [59]:
arr[5:8]

array([5, 6, 7])

<p><b>Note: An important first distinction form Python's built-in lists is that array slices are views on the original array. This means that the data is not copied, and any modicfication to the view will be reflected in the source array. </b></p>

In [60]:
arr[5:8] = 12

In [61]:
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

In [62]:
arr_slice = arr[5:8]

In [63]:
arr_slice

array([12, 12, 12])

In [64]:
arr_slice[1] = 12345

In [65]:
arr

array([    0,     1,     2,     3,     4,    12, 12345,    12,     8,
           9])

In [66]:
arr_slice[:] = 64

In [67]:
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

<p><b>Note: </b> If you want to copy of a slice of an ndarray instead of a view, you will need to explicitely copy the array using <b>'copy'</b></p>

In [68]:
copy_of_array_slice = arr_slice[:].copy()

In [69]:
copy_of_array_slice

array([64, 64, 64])

In [70]:
copy_of_array_slice[:] = 46

In [71]:
copy_of_array_slice

array([46, 46, 46])

In [72]:
arr_slice

array([64, 64, 64])

In [73]:
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

<p>Indexing and slicing in multi dimensional arrays</p>

In [74]:
arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])

In [75]:
arr2d[2]

array([7, 8, 9])

In [76]:
arr2d[0][2]

3

<p>Individual elements can be accessed recursively. But, we can also pass a comma separated list of indices to select individual elements.</p>

In [77]:
arr2d[0,2]

3

<p style='text-align: center'>Illustration of indexing on a 2D-array.</p>

![alt text](Images/np_indexing.png)

In [78]:
arr3d  = np.array([[[1,2,3],[4,5,6]], [[7,8,9],[10,11,12]]])

In [79]:
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [80]:
arr3d[0]

array([[1, 2, 3],
       [4, 5, 6]])

In [81]:
old_values = arr3d[0].copy()

In [82]:
arr3d[0] = 42

In [83]:
arr3d

array([[[42, 42, 42],
        [42, 42, 42]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

<p>Both scalar values and arrays can be assigned to arr3d[0]</p>

In [84]:
arr3d[0] = old_values

In [85]:
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [86]:
arr3d[1,0]

array([7, 8, 9])

In [87]:
x = arr3d[1]

In [88]:
x

array([[ 7,  8,  9],
       [10, 11, 12]])

In [89]:
x[0]

array([7, 8, 9])

<p>Indexing with slices</p>

In [90]:
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

In [91]:
arr[1:6]

array([ 1,  2,  3,  4, 64])

In [92]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

<p style="text-align:center">Two Dimensional Array Slicing</p>

![alt text](Images/np_2d_arrayslicing.png)

In [93]:
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [97]:
arr2d[:2,:2]

array([[1, 2],
       [4, 5]])

In [98]:
arr2d[:2,1:]

array([[2, 3],
       [5, 6]])

In [99]:
arr2d[1,:2]

array([4, 5])

In [100]:
arr2d[:2,2]

array([3, 6])

In [101]:
arr2d[:,:1]

array([[1],
       [4],
       [7]])

In [102]:
arr2d[:2,1:] = 0

In [103]:
arr2d

array([[1, 0, 0],
       [4, 0, 0],
       [7, 8, 9]])

<h3>Boolean Indexing</h3>

In [104]:
names = np.array(['Bob' , 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

In [105]:
data = np.random.randn(7,4)

In [106]:
names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')

In [107]:
data

array([[-0.14580664, -0.89627459,  1.60450647, -0.26944168],
       [-0.36486466, -0.64508106,  1.52416205,  1.45414305],
       [ 1.3864915 , -0.22531995,  0.36400716,  2.29545117],
       [-0.92669511, -1.88791689, -1.93134938, -1.12188264],
       [-0.7047887 , -1.72021439, -0.3733914 , -0.94547555],
       [-0.64798007,  0.88315323, -0.23725769, -1.29174352],
       [-0.13097228,  0.58232278,  1.91940462,  0.15527751]])

In [108]:
names == 'Bob'

array([ True, False, False,  True, False, False, False])

<p>The boolean array must be of the same length as the array axis it is indexing</p>

In [109]:
data[names=='Bob']

array([[-0.14580664, -0.89627459,  1.60450647, -0.26944168],
       [-0.92669511, -1.88791689, -1.93134938, -1.12188264]])

In [111]:
data[names == 'Bob', 2:]

array([[ 1.60450647, -0.26944168],
       [-1.93134938, -1.12188264]])

In [112]:
data[names=='Bob', 3]

array([-0.26944168, -1.12188264])

In [114]:
names!='Bob'

array([False,  True,  True, False,  True,  True,  True])

In [115]:
data[~(names=='Bob')]

array([[-0.36486466, -0.64508106,  1.52416205,  1.45414305],
       [ 1.3864915 , -0.22531995,  0.36400716,  2.29545117],
       [-0.7047887 , -1.72021439, -0.3733914 , -0.94547555],
       [-0.64798007,  0.88315323, -0.23725769, -1.29174352],
       [-0.13097228,  0.58232278,  1.91940462,  0.15527751]])

<p>Selecting two of the three names to combine multiple boolean conditions, use boolean arithmetic ooperators like & (and) and | (or): </p>



In [117]:
mask = (names=='Bob') | (names=='Will')

In [118]:
mask

array([ True, False,  True,  True,  True, False, False])

In [119]:
data[mask]

array([[-0.14580664, -0.89627459,  1.60450647, -0.26944168],
       [ 1.3864915 , -0.22531995,  0.36400716,  2.29545117],
       [-0.92669511, -1.88791689, -1.93134938, -1.12188264],
       [-0.7047887 , -1.72021439, -0.3733914 , -0.94547555]])

<p><b><i>Note: </i>Selecting data from an array by boolean indexing always creates a copy of the data, even if the returned array is unchanged</b></p>

In [120]:
data[data < 0] = 0

In [121]:
data

array([[0.        , 0.        , 1.60450647, 0.        ],
       [0.        , 0.        , 1.52416205, 1.45414305],
       [1.3864915 , 0.        , 0.36400716, 2.29545117],
       [0.        , 0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        , 0.        ],
       [0.        , 0.88315323, 0.        , 0.        ],
       [0.        , 0.58232278, 1.91940462, 0.15527751]])

In [122]:
data[names!='Joe'] = 7

In [123]:
data

array([[7.        , 7.        , 7.        , 7.        ],
       [0.        , 0.        , 1.52416205, 1.45414305],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [0.        , 0.88315323, 0.        , 0.        ],
       [0.        , 0.58232278, 1.91940462, 0.15527751]])

<h3>Fancy Indexing</h3>

In [124]:
arr = np.empty((8,4))

In [125]:
for i in range(8):
    arr[i] = i

In [126]:
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

In [127]:
arr.shape

(8, 4)

In [128]:
arr.ndim

2

<p>To select out a subset of the rows in a particualr order, you can simply pass a list or ndarray of integers specifying the desired order. </p>

In [129]:
arr[[4,3,0,6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

<p>Using negative indices seelcts rows from the end</p>

In [130]:
arr[[-3,-5,-7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

<p>We can build a one dimensional array using arange and then transform it to another shape using the <b>'reshape'</b>. <i><b>Note:</b></i> The total number of elements must be equal to the multplication of the shape</p>

In [131]:
arr = np.arange(32).reshape((8,4))

In [132]:
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

<p>Passin multiple index arrays does something slightly different; it selects a one-dimensional array of elements corresponding to each tuple of indices</p>

In [133]:
arr[[1,5,7,2],[0,3,1,2]]

array([ 4, 23, 29, 10])

<p><b><i>Note: </i></b> Regardless of how many dimensions the array has, the result of fancy indexing is always one-dimensional</p>

In [134]:
arr[[1,5,7,2]][:,[0,3,1,2]]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

<p><b><i>Note: </i></b> Keep in mind taht fancy indexing, unlike slicing always copies the data into a new array</p>

<h3>Transposing Arrays and Swapping Axes</h3>

In [135]:
arr = np.arange(15).reshape((3,5))

In [136]:
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [137]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

In [138]:
arr = np.random.randn(6,3)

In [139]:
arr

array([[-1.17980915, -0.59419206, -0.86960623],
       [ 1.92296014,  0.66279532,  0.23585959],
       [ 0.93044717,  1.90603992, -1.99634153],
       [ 0.67268465,  0.19186999, -0.41069517],
       [ 1.89473953, -2.00989235,  2.53322197],
       [-0.09328204, -1.02259697,  0.44713689]])

In [140]:
np.dot(arr.T, arr)

array([[10.00670133,  0.16526711,  4.10384529],
       [ 0.16526711,  9.54753594, -8.75961125],
       [ 4.10384529, -8.75961125, 11.5830397 ]])

<p>For Highere dimensional arrays, transpose will accept a tuple of axis numbers to permute the axes (for extra mind bending)</p>

In [141]:
arr = np.arange(16).reshape((2,2,4))

In [142]:
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [144]:
arr.transpose((1,0,2))

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

In [145]:
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [151]:
arr.swapaxes(1,2)

array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

<p>swapaxes similarly returns a view on the data without making a copy.</p>

<h3>Universal Functions: Fast Element-Wise Array Functions</h3>

In [153]:
arr = np.arange(10)

In [154]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

<p>Unary ufunc such as:- <b>sqrt</b> & <b>exp</b></p>

In [155]:
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [156]:
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

<p><b>Binary Ufuncs</b> such as: <b>add</b> & <b>maximum</b></p>

In [157]:
x = np.random.randn(8)

In [158]:
y = np.random.randn(8)

In [159]:
x 

array([-0.72407242,  0.2569305 , -0.25753284,  1.44887192, -1.59510737,
        0.34576061,  1.06604304, -0.18545324])

In [160]:
y

array([-0.60594785,  0.78343565, -0.22023443,  1.45007591,  0.85047379,
        2.7653445 ,  0.01950479,  1.04596145])

In [161]:
np.maximum(x,y)

array([-0.60594785,  0.78343565, -0.22023443,  1.45007591,  0.85047379,
        2.7653445 ,  1.06604304,  1.04596145])

<p>While not common, a ufunc can return multiple arrays. 'modf' is one example, a vectorized version of built-in Python divmod; it returns the fractional and integral parts of a floating-point array: </p>

In [162]:
arr = np.random.randn(7) * 5

In [163]:
arr

array([ 6.79797628, -5.80316437, -1.06265228,  4.98985737, -2.78805951,
        7.0060351 ,  5.01768681])

In [164]:
remainder, whole_part = np.modf(arr)

In [165]:
remainder

array([ 0.79797628, -0.80316437, -0.06265228,  0.98985737, -0.78805951,
        0.0060351 ,  0.01768681])

In [166]:
whole_part

array([ 6., -5., -1.,  4., -2.,  7.,  5.])

<p>Ufuncs accept an optional 'out' argument that allows them to operate in-place on arrays. </p>

In [167]:
arr

array([ 6.79797628, -5.80316437, -1.06265228,  4.98985737, -2.78805951,
        7.0060351 ,  5.01768681])

In [169]:
np.sqrt(arr)

  """Entry point for launching an IPython kernel.


array([2.6072929 ,        nan,        nan, 2.23379887,        nan,
       2.64689159, 2.24001938])

In [170]:
np.sqrt(arr,arr)

  """Entry point for launching an IPython kernel.


array([2.6072929 ,        nan,        nan, 2.23379887,        nan,
       2.64689159, 2.24001938])

In [171]:
arr

array([2.6072929 ,        nan,        nan, 2.23379887,        nan,
       2.64689159, 2.24001938])

<p style="text-align: center">List of available Ufuncs</p>

![alt text](Images/np_unary_ufuncs.png)

<p style="text-align: center">List of available Binary Universal Functions</p>

![alt Text](Images/np_binary_func1.png)

![alt Text](Images/np_binary_func2.png)

<h3>Array-Oriented Programming with Arrays</h3>

<p>Using Numpy arrays enables you to express many kinds of data processing tasks as concise array expressions that might otherwise requrie writing loops. This practice of replacing explicit loops with array expression is commonly referred to as <b>Vectorization</b></p>

In [173]:
points = np.arange(-5,5,0.01)

In [174]:
xs, ys = np.meshgrid(points, points)

In [175]:
ys

array([[-5.  , -5.  , -5.  , ..., -5.  , -5.  , -5.  ],
       [-4.99, -4.99, -4.99, ..., -4.99, -4.99, -4.99],
       [-4.98, -4.98, -4.98, ..., -4.98, -4.98, -4.98],
       ...,
       [ 4.97,  4.97,  4.97, ...,  4.97,  4.97,  4.97],
       [ 4.98,  4.98,  4.98, ...,  4.98,  4.98,  4.98],
       [ 4.99,  4.99,  4.99, ...,  4.99,  4.99,  4.99]])

In [176]:
xs

array([[-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       ...,
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99]])

In [177]:
z = np.sqrt(xs**2 + ys**2)

In [178]:
z

array([[7.07106781, 7.06400028, 7.05693985, ..., 7.04988652, 7.05693985,
        7.06400028],
       [7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815,
        7.05692568],
       [7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354,
        7.04985815],
       ...,
       [7.04988652, 7.04279774, 7.03571603, ..., 7.0286414 , 7.03571603,
        7.04279774],
       [7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354,
        7.04985815],
       [7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815,
        7.05692568]])

<h3>Expressing Conditional Logic as Array Operations</h3>

<p>The numpy.where function is a vectorized version of the ternary expression x if condition else y.</p>

In [180]:
xarr = np.array([1.1,1.2,1.3,1.4,1.5])

In [181]:
yarr = np.array([2.1,2.2,2.3,2.4,2.5])

In [182]:
cond = np.array([True, False , True, True, False])

<p>Suppose we wanted to take a value from xarr whenever the corresponding value in cond is True, and otherwise take the value for yarr. A list comprehension doing this might look like: </p>

In [185]:
result= [(x if c else y) for x, y , c in zip(xarr, yarr, cond)]

In [186]:
result

[1.1, 2.2, 1.3, 1.4, 2.5]

<p>With numpy.where we can write this logic in a very concise manner </p>

In [187]:
result = np.where(cond, xarr, yarr)

In [188]:
result

array([1.1, 2.2, 1.3, 1.4, 2.5])

<p>A typical use of <b>"where"</b> in data analysis is to produce a new array of values based on another array. </p>

In [190]:
arr

array([[-0.48065643, -2.81817224,  0.54333817, -0.13055175],
       [ 1.15502978,  1.51233339,  0.32503482, -1.16809876],
       [-0.10695435,  0.39521007, -0.86649927, -1.01064733],
       [ 0.17105542, -0.58660999,  0.09998155, -0.37066836]])

<p>Suppose we want to replace all positive values with 2 and all negative values with -2</p>

In [191]:
arr > 0

array([[False, False,  True, False],
       [ True,  True,  True, False],
       [False,  True, False, False],
       [ True, False,  True, False]])

In [192]:
np.where(arr>0,2, -2)

array([[-2, -2,  2, -2],
       [ 2,  2,  2, -2],
       [-2,  2, -2, -2],
       [ 2, -2,  2, -2]])

<p>Combining scalars and ararys when using np.where</p>

In [194]:
np.where(arr > 0, 2, arr)

array([[-0.48065643, -2.81817224,  2.        , -0.13055175],
       [ 2.        ,  2.        ,  2.        , -1.16809876],
       [-0.10695435,  2.        , -0.86649927, -1.01064733],
       [ 2.        , -0.58660999,  2.        , -0.37066836]])

<h3>Mathematical and Statistical Methods</h3>

In [202]:
arr = np.random.randint(0,9,(5,4))

In [203]:
arr

array([[4, 5, 3, 6],
       [0, 2, 7, 2],
       [2, 0, 1, 7],
       [3, 8, 0, 1],
       [1, 5, 8, 4]])

In [204]:
arr.mean()

3.45

In [205]:
np.mean(arr)

3.45

In [206]:
arr.sum()

69

In [207]:
arr.mean(axis = 1)

array([4.5 , 2.75, 2.5 , 3.  , 4.5 ])

In [208]:
arr.sum(axis = 0)

array([10, 20, 19, 20])

<p> Here, arr.mean(axis=1) means "compute mean across the columns" whereas arr.sum(axis=0) means "compute sum down the rows."</p>

<p>Other methods like 'cumsum' and 'cumprod' do not aggregate, instead producing an array of the intermediate results</p>

In [209]:
arr = np.array([0,1,2,3,4,5,6,7])

In [210]:
arr.cumsum()

array([ 0,  1,  3,  6, 10, 15, 21, 28], dtype=int32)

In [211]:
arr = np.array([[0,1,2],[3,4,5],[6,7,8]])

<p>In multidimensional arrays, accumulation functions like cumsum return an array of the same size, but with the partial aggregates computed along the indicated axis according to each lower dimensional slice:</p>

In [212]:
arr

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [213]:
arr.cumsum(axis=0)

array([[ 0,  1,  2],
       [ 3,  5,  7],
       [ 9, 12, 15]], dtype=int32)

In [214]:
arr.cumprod(axis=1)

array([[  0,   0,   0],
       [  3,  12,  60],
       [  6,  42, 336]], dtype=int32)

<p style="text-align: center">Basic array statistical methods</p>


![alt Text](Images/np_stats_methods.png)

<h3>Methods for Boolean Arrays</h3>

<p>The 'sum' is often used as a means of counting True values in a boolean array.</p>

In [216]:
arr = np.random.randn(100)

In [217]:
(arr > 0).sum()

46

<p>There are two additional methods, 'any' and 'all', useful especially for boolean arrays. any tests whether one or mor values in an array is True, while all checks if every value is True</p>

In [218]:
bools = np.array([False, False, True, False])

In [219]:
bools.any()

True

In [220]:
bools.all()

False

<h3>Sorting</h3>

In [243]:
arr = np.random.randint(0,9,6)

In [244]:
arr

array([7, 1, 0, 6, 4, 8])

In [245]:
arr.sort()

In [246]:
arr

array([0, 1, 4, 6, 7, 8])

<p>Sorting each one-dimensional section of values in a multidimensional array in place along an axis by passing the axis number to sort: </p>

In [247]:
arr = np.random.randn(5,3)

In [248]:
arr

array([[ 0.15552443, -1.83551277, -0.38912536],
       [ 0.63773061,  2.94583503, -0.28652125],
       [-0.52824312, -0.15161566,  0.54227612],
       [-0.4198695 ,  0.31127691,  1.314549  ],
       [ 2.34376229, -0.22362016, -1.41845158]])

In [249]:
arr.sort(1)

In [250]:
arr

array([[-1.83551277, -0.38912536,  0.15552443],
       [-0.28652125,  0.63773061,  2.94583503],
       [-0.52824312, -0.15161566,  0.54227612],
       [-0.4198695 ,  0.31127691,  1.314549  ],
       [-1.41845158, -0.22362016,  2.34376229]])

<h3>Unique and Other Set Logic</h3>

In [251]:
names = np.array(['Bob', 'Joe','Will','Bob', 'Will', 'Joe', 'Joe'])

In [252]:
np.unique(names)

array(['Bob', 'Joe', 'Will'], dtype='<U4')

In [253]:
ints = np.array([3,3,3,2,2,1,1,4,4])

In [257]:
np.unique(ints)

array([1, 2, 3, 4])

<p>The function np.in1d tests the membership of the values in one array in another returning a boolean array. </p>

In [258]:
values = np.array([6,0,0,3,2,5,6])

In [259]:
np.in1d(values, [2,3,6])

array([ True, False, False,  True,  True, False,  True])

<p style='text-align: center'> Array Set Operations</p>

![alt Text](Images/np_set_operations.png)

<h3>File Input and Output with Arrays</h3>

In [260]:
arr = np.arange(10)

In [261]:
np.save('some_array', arr)

In [263]:
np.load('some_array.npy')

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

<p>We can save mutliple arrays in an uncompressed archive using np.savez and passing the arrays as keyword arguments.</p>

In [264]:
np.savez('array_archive.npz', a = arr, b= arr)

In [266]:
arch = np.load('array_archive.npz')

In [267]:
arch

<numpy.lib.npyio.NpzFile at 0x17ebd2c2c48>

In [270]:
list(arch.keys())

['a', 'b']

In [271]:
arch['b']

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

<p>If the data compressese well, we can use numpy.savez_compressed instead<p>

In [272]:
np.savez_compressed('array_compressed.npz', a = arr, b= arr)

<h3>Linear Algebra </h3>

In [273]:
x = np.array([[1., 2., 3.],[4.,5.,6.]])

In [274]:
y = np.array([[6., 23.], [-1,7],[8,9]])

In [275]:
x

array([[1., 2., 3.],
       [4., 5., 6.]])

In [276]:
y

array([[ 6., 23.],
       [-1.,  7.],
       [ 8.,  9.]])

In [277]:
x.dot(y)

array([[ 28.,  64.],
       [ 67., 181.]])

<p>x.dot(y) is equivalent to np.dot(x,y)</p>

In [278]:
np.dot(x,y)

array([[ 28.,  64.],
       [ 67., 181.]])

<p>A matrix product between a two-dimensional array and a suitably sized one-dimensional array results in a one-dimensional array</p>

In [279]:
np.dot(x, np.ones(3))

array([ 6., 15.])

<p>numpy.linalg has a standard set of matrix decompositions and things like inverse and determinant.</p>

In [280]:
from numpy.linalg import inv, qr

In [281]:
X = np.random.randn(5,5)

In [282]:
mat = X.T.dot(X)

In [283]:
inv(mat)

array([[ 0.72178013, -1.8664021 , -1.73746541, -1.81136638,  0.64430092],
       [-1.8664021 ,  9.97372816,  9.44518551,  9.38967743, -2.88426276],
       [-1.73746541,  9.44518551,  9.2441282 ,  9.04006693, -2.83514337],
       [-1.81136638,  9.38967743,  9.04006693,  9.22842685, -2.82853107],
       [ 0.64430092, -2.88426276, -2.83514337, -2.82853107,  1.11297788]])

In [284]:
mat.dot(inv(mat))

array([[ 1.00000000e+00,  2.46840933e-16, -3.05077466e-17,
         3.17666575e-16, -1.64722365e-16],
       [ 5.02382436e-16,  1.00000000e+00, -3.96464833e-15,
        -1.80347238e-15,  7.90665015e-16],
       [-6.24666811e-16,  3.92807605e-15,  1.00000000e+00,
         1.70985851e-15, -5.65200208e-16],
       [-5.19813931e-16,  3.33928224e-15,  1.20874420e-15,
         1.00000000e+00, -2.41604192e-16],
       [-2.80002547e-17, -1.27658301e-15, -1.93488004e-15,
        -1.26359431e-15,  1.00000000e+00]])

In [285]:
q, r = qr(mat)

In [286]:
r

array([[-3.77464366, -2.40417359,  3.47005625, -0.60326541,  3.59860262],
       [ 0.        , -4.71899293,  3.42839412,  1.86685463,  1.24907183],
       [ 0.        ,  0.        , -3.44570784,  3.3579111 , -0.30252813],
       [ 0.        ,  0.        ,  0.        , -1.23317335, -3.94111141],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.196073  ]])

<p>The expression X.T.dot(X) computes the dot product of X with it's transpose X.T.</p>

<p style="text-align: center">Commonly used numpy.linalg functions</p>

![alt Text](Images/np_linalg1.png)

![alt Text](Images/np_linagl2.png)

<h3>Pseudorandom Number Generation</h3>

In [289]:
samples = np.random.normal(size=(4,4))

In [291]:
samples

array([[-0.88258796, -0.37443675, -0.44506959,  0.33885328],
       [ 0.78605152,  1.44260843, -0.51252105, -0.1707295 ],
       [ 0.6924641 , -0.14254136,  0.14876018,  0.50057194],
       [-1.60558308, -0.4467475 ,  0.06052549,  0.80113621]])

<p>The data generatoin functions in numpy.random use a global random seed. To avoid global state, you can use numpy.random.RandomState to create a random number generator isolated from others. </p>

In [293]:
rng = np.random.RandomState(1234)

In [294]:
rng.randn(10)

array([ 0.47143516, -1.19097569,  1.43270697, -0.3126519 , -0.72058873,
        0.88716294,  0.85958841, -0.6365235 ,  0.01569637, -2.24268495])

<p style='text-align: center'>Partial List of numpy.random functions </p>

![alt Text](Images/np_random.png)