## Numpy

Numpy is one of the most important foundational blocks of numerical computing in Python, which is integrated with C/C++ programming language.

Numpy is designed to work efficiently with large arrays of data, given that Numpy internally stores data in a contiguous block of memory with lower memory size, independent of other built-in Python objects.

In addition, Numpy excels at performing complex computations on entire arrays without the need of Python loops.

Refer to the following link for the documentation of Numpy library: https://numpy.org/doc/1.21/reference/index.html

In [1]:
# Import numpy library from Python
import numpy as np

### Creating N-dimensional arrays

In [2]:
# Creating 1-dimensional array
arr1 = np.array([2,3,4,5,6])
arr1

array([2, 3, 4, 5, 6])

In [3]:
print("Shape of array:", arr1.shape)
print("Number of array dimension:", arr1.ndim)
print("Data type of array:", arr1.dtype)

Shape of array: (5,)
Number of array dimension: 1
Data type of array: int32


In [4]:
# Creating 2-dimensional array
arr2 = np.array([[5.4,3.4,2.1,4.1,2.7],[1.4,2.6,1.8,1.5,2.4]])
arr2

array([[5.4, 3.4, 2.1, 4.1, 2.7],
       [1.4, 2.6, 1.8, 1.5, 2.4]])

In [5]:
print("Shape of array:", arr2.shape)
print("Number of array dimension:", arr2.ndim)
print("Data type of array:", arr2.dtype)

Shape of array: (2, 5)
Number of array dimension: 2
Data type of array: float64


In [6]:
# Creating 3-dimensional array
arr3 = np.array([[[245,123,267],[123,45,356],[234,265,765]], [[234,11,321],[562,23,432],[61,74,56]]])
arr3

array([[[245, 123, 267],
        [123,  45, 356],
        [234, 265, 765]],

       [[234,  11, 321],
        [562,  23, 432],
        [ 61,  74,  56]]])

In [7]:
print("Shape of array:", arr3.shape)
print("Number of array dimension:", arr3.ndim)
print("Data type of array:", arr3.dtype)

Shape of array: (2, 3, 3)
Number of array dimension: 3
Data type of array: int32


There are also other built-in functions available in numpy for creating special types of arrays.

In [8]:
# Create an array of sequential values (Alternative to range function, which returns a list)
arr4 = np.arange(2,100,5)
arr4

array([ 2,  7, 12, 17, 22, 27, 32, 37, 42, 47, 52, 57, 62, 67, 72, 77, 82,
       87, 92, 97])

In [9]:
# Create an array with values of 1 at specified dimension (tuple form)
arr5 = np.ones((2,3))
arr5

array([[1., 1., 1.],
       [1., 1., 1.]])

In [10]:
# Create an array with values of 0 at specified dimension (tuple form)
arr6 = np.zeros((2,3,4))
arr6

array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]])

In [11]:
# Create a new array without populating any values at specified dimension (tuple form)
arr7 = np.empty((5,5))
arr7

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [12]:
# Create a new array with specific value at specified dimension (tuple form)
arr8 = np.full((5,5), 2)
arr8

array([[2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2]])

In [13]:
# Create a new NxN array (identity matrix) with 1s on diagonal and 0s elsewhere
arr9 = np.identity(4)
arr9

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

### Converting data types of arrays

Arrays can be converted to different data types (explicit casting) as shown below:

1. int, uint (signed and unsigned integers)
2. float
3. complex
4. bool
5. object
6. string_
7. unicode_

In [14]:
# Converting array data type from integer to object
arr1.astype('object')

array([2, 3, 4, 5, 6], dtype=object)

In [15]:
# Converting array data type from integer to float
arr3.astype('float')

array([[[245., 123., 267.],
        [123.,  45., 356.],
        [234., 265., 765.]],

       [[234.,  11., 321.],
        [562.,  23., 432.],
        [ 61.,  74.,  56.]]])

### Arithmetic operations on arrays

Arithmetic operations performed on arrays are always computed in element-wise (vectorized form).

Note that arrays may not necessary need to have the same shape for mathematical operations to perform, since Numpy uses the concept of "broadcasting".

In [16]:
# Addition of two arrays with different shapes
arr1 + arr2

array([[7.4, 6.4, 6.1, 9.1, 8.7],
       [3.4, 5.6, 5.8, 6.5, 8.4]])

In [17]:
# Dividing every element of array by scalar value of 2
arr3 / 2

array([[[122.5,  61.5, 133.5],
        [ 61.5,  22.5, 178. ],
        [117. , 132.5, 382.5]],

       [[117. ,   5.5, 160.5],
        [281. ,  11.5, 216. ],
        [ 30.5,  37. ,  28. ]]])

In [18]:
# Checking for multiple conditions of an array element-wise
(arr4 > 20) & (arr4 < 50)

array([False, False, False, False,  True,  True,  True,  True,  True,
        True, False, False, False, False, False, False, False, False,
       False, False])

Note the following rule of broadcasting:

<b>Two arrays with different shapes are compatible for broadcasting if for each trailing dimension (i.e., starting from the end) the axis lengths match or if either of the lengths is 1. </b>

<b>Broadcasting is then performed over the missing axis or length 1 dimensions.</b>

<div>
<br>
<img src="attachment:image.png" width="400" align="left"/>
<img src="attachment:image-2.png" width="400" align="right"/>
</div>

### Array Indexing and Slicing

Array slices are views on the original array, such that changes in values assigned to a slice of array is also reflected on the original array.

Boolean indexing can also be done on arrays with the help of the following operators:

1. & (and)
2. | (or)
3. ~ (not)

Note that using "and", "or", "not" keywords will not work in boolean indexing of arrays.

In [19]:
# Slicing 1-dimensional arrays in various forms
arr1, arr1[2:], arr1[:3], arr1[1:4], arr1[:-1]

(array([2, 3, 4, 5, 6]),
 array([4, 5, 6]),
 array([2, 3, 4]),
 array([3, 4, 5]),
 array([2, 3, 4, 5]))

In [20]:
# Slicing 2-dimensional arrays in various forms
arr2, arr2[1,2:], arr2[:,:3], arr2[0:1,2:4]

(array([[5.4, 3.4, 2.1, 4.1, 2.7],
        [1.4, 2.6, 1.8, 1.5, 2.4]]),
 array([1.8, 1.5, 2.4]),
 array([[5.4, 3.4, 2.1],
        [1.4, 2.6, 1.8]]),
 array([[2.1, 4.1]]))

In [21]:
# Overwriting array slice values will also affect the original array
arr_slice = arr1[2:]
arr_slice[:] = 2.5
arr_slice, arr1

(array([2, 2, 2]), array([2, 3, 2, 2, 2]))

In [22]:
# Overwriting array copy values will not affect the original array
arr1 = np.array([2,3,4,5,6])
arr_slice = arr1[2:].copy()
arr_slice[:] = 2.5
arr_slice, arr1

(array([2, 2, 2]), array([2, 3, 4, 5, 6]))

In [23]:
# Boolean indexing with & operator
arr4[(arr4 > 20) & (arr4 < 50)]

array([22, 27, 32, 37, 42, 47])

In [24]:
# Boolean indexing with &, ~ operator
arr4[~((arr4 > 20) & (arr4 < 50))]

array([ 2,  7, 12, 17, 52, 57, 62, 67, 72, 77, 82, 87, 92, 97])

In [25]:
# Boolean indexing with | operator
arr4[(arr4 > 80) | (arr4 < 20)]

array([ 2,  7, 12, 17, 82, 87, 92, 97])

### Array shape transformation

Shapes of arrays can either be transformed by transposing it (swap axis dimensions) or reshaping it (specify new array dimension).

In [26]:
# Transposing arrays (swap axis dimensions)
arr2.T

array([[5.4, 1.4],
       [3.4, 2.6],
       [2.1, 1.8],
       [4.1, 1.5],
       [2.7, 2.4]])

In [27]:
# Reshaping arrays (specify new array dimension)
arr2.reshape((10,1))

array([[5.4],
       [3.4],
       [2.1],
       [4.1],
       [2.7],
       [1.4],
       [2.6],
       [1.8],
       [1.5],
       [2.4]])

### Numpy Universal Functions

Numpy has many universal functions that perform element-wise operations on data in ndarrays.

Numpy's universal functions consists of but not limited to the following:

In [28]:
arr1 = [2, -3, -5, 6]
arr2 = [-4, 2, -3, 2]

In [29]:
print("First array:",arr1)
print("Second array:",arr2)
print("Absolute value:",np.abs(arr1))
print("Square root:",np.sqrt(np.abs(arr1)))
print("Square:",np.square(arr1))
print("Exponential:",np.exp(arr2))
print("Natural Logarithm:",np.log(np.abs(arr1)))
print("Sign of values:",np.sign(arr1)) # (1 for positive, -1 for negative and 0 for zero)
print("Ceiling value:",np.ceil(np.log(np.abs(arr1)))) #Lowest integer value that is greater than or equal to specified value
print("Floor value:",np.floor(np.log(np.abs(arr1)))) #Lowest integer value that is less than or equal to specified value
print("Remainder and whole value:",np.modf([2.45,3.65,-2.56,1.23])) # Remainder and whole part of every element as two seperate arrays

First array: [2, -3, -5, 6]
Second array: [-4, 2, -3, 2]
Absolute value: [2 3 5 6]
Square root: [1.41421356 1.73205081 2.23606798 2.44948974]
Square: [ 4  9 25 36]
Exponential: [0.01831564 7.3890561  0.04978707 7.3890561 ]
Natural Logarithm: [0.69314718 1.09861229 1.60943791 1.79175947]
Sign of values: [ 1 -1 -1  1]
Ceiling value: [1. 2. 2. 2.]
Floor value: [0. 1. 1. 1.]
Remainder and whole value: (array([ 0.45,  0.65, -0.56,  0.23]), array([ 2.,  3., -2.,  1.]))


In [30]:
print("First array:",arr1)
print("Second array:",arr2)
print("Adding two arrays:",np.add(arr1, arr2))
print("Subtracting two arrays:",np.subtract(arr1, arr2))
print("Multiply two arrays:",np.multiply(arr1, arr2))
print("Divide two arrays:",np.divide(arr1, arr2))
print("Divide two arrays (Floor division):",np.floor_divide(arr1, arr2))
print("Power of two arrays:",np.power(arr1,np.abs(arr2)))
print("Maximum value between two arrays:",np.maximum(arr1,arr2))
print("Minimum value between two arrays:",np.minimum(arr1,arr2))
print("Remainder from dividing two arrays:",np.mod(arr1, arr2))

First array: [2, -3, -5, 6]
Second array: [-4, 2, -3, 2]
Adding two arrays: [-2 -1 -8  8]
Subtracting two arrays: [ 6 -5 -2  4]
Multiply two arrays: [-8 -6 15 12]
Divide two arrays: [-0.5        -1.5         1.66666667  3.        ]
Divide two arrays (Floor division): [-1 -2  1  3]
Power of two arrays: [  16    9 -125   36]
Maximum value between two arrays: [ 2  2 -3  6]
Minimum value between two arrays: [-4 -3 -5  2]
Remainder from dividing two arrays: [-2  1 -2  0]


In [31]:
# Boolean indexing with universal functions
print("First array:",[2,5,-3,0])
print("Second array:",[3,1,-3,2])
print("Greater comparison:",np.greater([2,5,-3,0],[3,1,-3,2]))
print("Greater or equal comparison:",np.greater_equal([2,5,-3,0],[3,1,-3,2]))
print("Less comparison:",np.less([2,5,-3,0],[3,1,-3,2]))
print("Less or equal comparison:",np.less_equal([2,5,-3,0],[3,1,-3,2]))
print("Equal comparison:",np.equal([2,5,-3,0],[3,1,-3,2]))
print("Not equal comparison:",np.not_equal([2,5,-3,0],[3,1,-3,2]))

First array: [2, 5, -3, 0]
Second array: [3, 1, -3, 2]
Greater comparison: [False  True False False]
Greater or equal comparison: [False  True  True False]
Less comparison: [ True False False  True]
Less or equal comparison: [ True False  True  True]
Equal comparison: [False False  True False]
Not equal comparison: [ True  True False  True]


### Other array-oriented numpy functions

In [32]:
# Producing new array of values based on values of other arrays
np.where(arr3>200, "More", arr3)

array([[['More', '123', 'More'],
        ['123', '45', 'More'],
        ['More', 'More', 'More']],

       [['More', '11', 'More'],
        ['More', '23', 'More'],
        ['61', '74', '56']]], dtype='<U11')

Note that 2nd argument of np.where() function is used for true condition and 3rd argument is for false condition.

In [33]:
# Statistical numpy functions
print("Original array:", arr4)
print("Sum of array values:",np.sum(arr4))
print("Mean of array values:",np.mean(arr4))
print("Median of array values:",np.median(arr4))
print("Standard Deviation of array values:",np.std(arr4))
print("Variance of array values:",np.var(arr4))
print("Minimum of array values:",np.min(arr4))
print("Maximum of array values:",np.max(arr4))
print("Index of Minimum of array values:",np.argmin(arr4))
print("Index of Maximum of array values:",np.argmax(arr4))

Original array: [ 2  7 12 17 22 27 32 37 42 47 52 57 62 67 72 77 82 87 92 97]
Sum of array values: 990
Mean of array values: 49.5
Median of array values: 49.5
Standard Deviation of array values: 28.83140648667699
Variance of array values: 831.25
Minimum of array values: 2
Maximum of array values: 97
Index of Minimum of array values: 0
Index of Maximum of array values: 19


In [34]:
print("Cumulative sum of array values:",np.cumsum(arr4))
print("Cumulative product of array values:",np.cumprod(arr4))

Cumulative sum of array values: [  2   9  21  38  60  87 119 156 198 245 297 354 416 483 555 632 714 801
 893 990]
Cumulative product of array values: [          2          14         168        2856       62832     1696464
    54286848  2008613376 -1537584128   747990016   240775168   839282688
   495919104 -1133158400    16973824  1306984448  -201457664  -346947584
 -1854406656   511180800]


Note that all statistical numpy functions by default compute operations over flattened array, unless axis argument is specified.

In [35]:
# Boolean array methods
boolean = np.array([True, False, False, True])
print("Any value in array returns True:",boolean.any())
print("All values in array returns True:",boolean.all())

Any value in array returns True: True
All values in array returns True: False


In [36]:
# Sorting array methods (Does not return a copy)
arr2.sort(reverse=True)
arr2

[2, 2, -3, -4]

In [37]:
# Sorting array function (returns a copy in ascending order)
np.sort(arr1)

array([-5, -3,  2,  6])

In [38]:
# Set numpy functions
print("First array:",arr1)
print("Second array:",arr2)
print("Array with unique values:",np.unique(arr2))
print("Array intersection:",np.intersect1d(arr1,arr2))
print("Array union:",np.union1d(arr1,arr2))
print("Array left outer join:",np.setdiff1d(arr1,arr2))
print("Array (excluding intersection)",np.setxor1d(arr1,arr2))
print("Boolean array of elements from first array contained in second array:",np.in1d(arr1,arr2))

First array: [2, -3, -5, 6]
Second array: [2, 2, -3, -4]
Array with unique values: [-4 -3  2]
Array intersection: [-3  2]
Array union: [-5 -4 -3  2  6]
Array left outer join: [-5  6]
Array (excluding intersection) [-5 -4  6]
Boolean array of elements from first array contained in second array: [ True  True False False]


In [39]:
# Saving single array object
np.save("first_array",arr1)

In [40]:
# Loading single array object
np.load("first_array.npy")

array([ 2, -3, -5,  6])

In [41]:
# Saving multiple array objects in uncompressed format
np.savez("array_objects_uncompressed",first=arr1,second=arr2,third=arr3)

# Saving multiple array objects in compressed format
np.savez_compressed("array_objects_compressed",first=arr1,second=arr2,third=arr3)

In [42]:
# Loading multiple array objects (individual arrays can be accessed in dictionary form)
load = np.load("array_objects_compressed.npz")
load['first'], load['second'], load['third']

(array([ 2, -3, -5,  6]),
 array([ 2,  2, -3, -4]),
 array([[[245, 123, 267],
         [123,  45, 356],
         [234, 265, 765]],
 
        [[234,  11, 321],
         [562,  23, 432],
         [ 61,  74,  56]]]))

Random samples from given distributions can also be generated using several functions from np.random as follows (but not limited to the following):

In [43]:
# Generate random values between 0 and 1 at specified dimension
np.random.rand(3,3)

array([[0.8634232 , 0.76298977, 0.63075693],
       [0.62930598, 0.98314141, 0.57933684],
       [0.36020306, 0.03197587, 0.12424035]])

In [44]:
# Generate random values from standard normal distribution at specified dimension
np.random.randn(3,3)

array([[ 1.09500296, -0.11629458, -1.4753114 ],
       [-1.90152733, -1.26260037,  0.49278828],
       [-0.16974993, -1.15908179, -1.30911038]])

In [45]:
# Generate values from specific list at random with specified dimension
np.random.choice([2,4,6,8,10],(3,3))

array([[ 8,  4,  4],
       [ 6,  2,  4],
       [ 8, 10, 10]])

In [46]:
# Generate integers from specified range at random with specified dimension
np.random.randint(2,10,(2,3))

array([[8, 9, 4],
       [2, 7, 8]])