## Introduction to Numpy

### NumPy stands for Numerical Python.

It was released as an open source project in 2005 with the goal of bringing scientific computing to python. It was based on two earlier packages - Numeric and Numarray. Numpy leverages BLAS - Basic Linear Algebra Subprogram and LAPACK - Linear Algebra PACKage. It uses these to supercharge its linear algebra capabilities.

Numpy is a python library focused  on numbers and excels in numerical analysis, linear algebra and simulation. Below are some of the uses of numpy.

![Numpy](https://drive.google.com/uc?export=view&id=1L-oiDmd6HTB6FlyVPFRKsedDb3LOiPyA)


However when it comes to data analysis and manipulation working with a wide range of data sources, that's where pandas steps in.

### Characteristics of Numpy
- Numpy arrays have a fixed size at creation, unlike python lists that can grow dynamically. Changing the size of an ndarray will create a new array and delete the original.


- The elements in a numpy array are required to be of a homogenous datatype, hence they will be the same size in memory. Albeit, one can have arrays of objects thereby allowing for arrays of different sized elements.


- It supports an object-oriented approach


- Numpy is fast due to vectorization and broadcasting capabilities.

**Vectorization** describes the absence of explicit looping and indexing. It is more concise and easier to read because they are fewer lines of codes and generally fewer bugs. The code more closely resemebles standard mathematical notation

**Broadcasting** describes the implicit element-by-element behaviour of operations. All numpy operations broadcast.

In [1]:
#installing numpy

#!pip install numpy

#### Why `!pip install numpy` Works in Jupyter Notebooks

When you write `!pip install numpy`:

- The ! tells Jupyter to execute the command in the shell instead of trying to interpret it as Python code.
- This is why `!pip install numpy` works as expected and installs the numpy package.

Without the !, Jupyter would treat pip install numpy as invalid Python code, since pip install is not a Python command but a shell command.

In [2]:
lst = [1,2,3,4,5,6,7,8,9]
print(type(lst))
print('List: ', lst)

<class 'list'>
List:  [1, 2, 3, 4, 5, 6, 7, 8, 9]


### Creating a numpy array

In [3]:
import numpy as np

In [4]:
my_array = np.array(lst)
print(type(my_array))
print('Numpy array: ', my_array)

<class 'numpy.ndarray'>
Numpy array:  [1 2 3 4 5 6 7 8 9]


In [13]:
# Creating an array from sub-classes

arr = np.array(np.asmatrix('1 2; 3 4'))
arr

array([[1, 2],
       [3, 4]])

## Creating a simple numpy using np.arange

It creates arrays of evenly spaced values within a specific range.

In [5]:
arr = np.arange(30)

print(type(arr))
print(arr)

<class 'numpy.ndarray'>
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29]


### Comparing lists and numpy in terms of execution time

In [6]:
%%time

#always put the "%%time" at the very beginning of the code, before any comments

#what we are trying to do is do an element-wise multiplication between two lists

#first list that we defined
lst = list(range(1000000))

#second list defintion and multiplication

for i in range(1000000):
    lst[i] * lst[i]

#Note that "i" serves as index here to get the current
#index value and then multiply it by itself

CPU times: total: 250 ms
Wall time: 819 ms


In [7]:
%%time

arr = np.arange(1000000)

arr = arr*arr

CPU times: total: 0 ns
Wall time: 6.49 ms


You can note the difference in the computation time. Wall time means the elapsed real time or running time. It is the actual time taken from the start of a process to its completion measured by a real-world clock.

### Numpy Array and it's Attributes

In [8]:
import numpy as np

arr = np.array([[1,2,3,4], [5,6,7,8]])

print(f'Array: {arr}')

print()

# print shape of the array
print('Shape: ', arr.shape)

# print the datatype
print('Datatype: ', arr.dtype)

# print item size in byte of each element
print('Item size: ', arr.itemsize)

# print the dimensionality of the numpy array
print('Dimensionality: ', arr.ndim)

Array: [[1 2 3 4]
 [5 6 7 8]]

Shape:  (2, 4)
Datatype:  int64
Item size:  8
Dimensionality:  2


In [9]:
arr

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

## Functions for creating numpy array

### 1. np.arange()

Takes 1 to 3 arguments.

We've seen the one argument case above.

Notice also that for np.arange(10) it returns 10 integers, starting from 0 to 9.

In [10]:
#two argument case
import numpy as np

arr = np.arange(1, 11) # second parameter defines the stop point. It is exclusive as well

print('Array: ', arr)
print('Shape: ', arr.shape)

Array:  [ 1  2  3  4  5  6  7  8  9 10]
Shape:  (10,)


In [11]:
#two argument case

import numpy as np
arr = np.arange(1,22,2) #third parameter defines the step

print('Array: ', arr)
print('Shape:',arr.shape)

Array:  [ 1  3  5  7  9 11 13 15 17 19 21]
Shape: (11,)


### 2. np.ones()

In [279]:
# creating n dimensional array of ones values
arr = np.ones((5,3))

print('Array: \n', arr)
print('Shape: ', arr.shape, flush=True)
print('Data type: ', arr.dtype)
print('Item size: ', arr.itemsize)
print('We have created n dimensional array of ones values', "Yes that's true!", sep='|')

Array: 
 [[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
Shape:  (5, 3)
Data type:  float64
Item size:  8
We have created n dimensional array of ones values|Yes that's true!


### 3. np.zeros()

In [278]:
# creating n dimensional array of zero values
arr = np.zeros((3,3))

print('Array: \n', arr)
print('Shape: ', arr.shape)
print('Data type: ', arr.dtype)
print('Item size: ', arr.itemsize)

Array: 
 [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
Shape:  (3, 3)
Data type:  float64
Item size:  8


### 4. np.eye()

In [26]:
# creating identity matrix
arr = np.eye(3,3)

print('Array: ', arr)
print('Shape: ', arr.shape)
print('Data type: ', arr.dtype)
print('Item size: ', arr.itemsize)

Array:  [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
Shape:  (3, 3)
Data type:  float64
Item size:  8


In [27]:
arr = np.eye(3,5)

print('Array: ', arr)
print('Shape: ', arr.shape)
print('Data type: ', arr.dtype)
print('Item size: ', arr.itemsize)

Array:  [[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]]
Shape:  (3, 5)
Data type:  float64
Item size:  8


### Data Types in Numpy

- i = integer
- b = boolean
- str = string
- f = float
- m = timedelta
- M = datetime
- O = object
- u = unsigned integer
- c = complex float
- U = unicode string
- V = fixed chunk of memory of other type (void)

In [28]:
# Data type 'str'
arr = np.array([1,3,5,7,9], dtype='str')

print('Array: ', arr)
print('Data type: ', arr.dtype)

Array:  ['1' '3' '5' '7' '9']
Data type:  <U1


In [29]:
# Data type 'int'
arr = np.array([1,3,5,7,9], dtype='i')

print('Array: ', arr)
print('Data type: ', arr.dtype)

Array:  [1 3 5 7 9]
Data type:  int32


In [30]:
# Data type 'object'
arr = np.array([1,3,5,7,9], dtype='O')

print('Array: ', arr)
print('Data type: ', arr.dtype)

Array:  [1 3 5 7 9]
Data type:  object


In [31]:
# Data type 'timedelta'
arr = np.array([1,3,5,7,9], dtype='m')

print('Array: ', arr)
print('Data type: ', arr.dtype)

Array:  [1 3 5 7 9]
Data type:  timedelta64


In [33]:
# Data type 'float'
arr = np.array([1,3,5,7,9], dtype='f')

print('Array: ', arr)
print('Data type: ', arr.dtype)

Array:  [1. 3. 5. 7. 9.]
Data type:  float32


In [36]:
# Data type 'complex float'
arr = np.array([1,3,5,7,9], dtype='c')

print('Array: ', arr)
print('Data type: ', arr.dtype)

Array:  [b'1' b'3' b'5' b'7' b'9']
Data type:  |S1


In [37]:
# Data type 'Unicode string'
arr = np.array([1,3,5,7,9], dtype='U')

print('Array: ', arr)
print('Data type: ', arr.dtype)

Array:  ['1' '3' '5' '7' '9']
Data type:  <U1


In [38]:
# Data type 'boolean'
arr = np.array([1,3,5,7,9], dtype='b')

print('Array: ', arr)
print('Data type: ', arr.dtype)

Array:  [1 3 5 7 9]
Data type:  int8


## Numpy Random numbers

1. np.random.rand = generates an array with random numbers that are **uniformly distributed** between 0 and 1


2. np.random.randn = generates an array with random numbers that are **normally distributed** with mean  = 0 and stdev = 1


3. np.random.randint = generates an array wih random numbers (integers) that are **uniformly distributed** between 0 and a given number


4. np.random.uniform = generates an array with random(float) numbers between given numbers.

In [41]:
%%time

# prints 5 random numbers generated from a uniform distribution (between 0 and 1)
arr = np.random.rand(5)

print('Array: ', arr)

Array:  [0.63844608 0.63014599 0.45708624 0.7101011  0.11611919]
CPU times: total: 0 ns
Wall time: 55.3 ms


In [42]:
%%time

# prints random array numbers with 5 rows, 3 columns, generated from a uniform distribution (between 0 and 1)
arr = np.random.rand(5,3)

print('Array: ', arr)

Array:  [[0.06961958 0.95656652 0.47302907]
 [0.55781346 0.67178947 0.20095381]
 [0.46863269 0.63416614 0.93686677]
 [0.62481294 0.84544864 0.26956353]
 [0.98763037 0.15735044 0.95136808]]
CPU times: total: 0 ns
Wall time: 999 µs


In [43]:
%%time

# randomly generates an array from a normal distribution
arr = np.random.randn(5)

print('Array: ', arr)

Array:  [-1.05578252 -1.03269187  0.29833901  0.06590327 -1.19803406]
CPU times: total: 15.6 ms
Wall time: 41.6 ms


In [44]:
# Two-by-four array of samples from the normal distribution with mean 3 and standard deviation 2.5:

arr = 3 + 2.5 * np.random.randn(2, 4)

print('Array: ', arr)

Array:  [[ 5.00892721  4.81495441 -1.25742362  2.11488244]
 [ 3.6647073  -1.66644095  1.198783    0.1050486 ]]


In [45]:
# randomly generates a 3 by 5 random array of numbers normally distributed
arr = np.random.randn(3,5)

print('Array: ', arr)

Array:  [[-0.65173926 -1.00272801  0.31318817 -1.81647601 -0.91675043]
 [-0.19339138  0.44733544  0.87475533  0.37247977 -0.13689342]
 [ 0.24062196  0.02249276 -0.41301169 -1.32458408 -0.71288266]]


In [52]:
# generates one random number between 0 to 9
value = np.random.randint(10)

print(value)

7


In [58]:
# generates random numbers between 0 and 4 with dimensional array of 1 x 10

np.random.randint(5, size=10)

array([3, 2, 2, 3, 1, 3, 3, 0, 2, 4], dtype=int32)

In [59]:
# Generates a 2 x 4 array of ints between 0 and 4, inclusive:

np.random.randint(5, size=(2,4))

array([[2, 0, 4, 2],
       [4, 4, 4, 3]], dtype=int32)

In [65]:
np.random.randint(2, 8)

6

In [71]:
# Generates a 1 x 3 array with 3 different upper bounds
np.random.randint(2, [5,8,10])

array([3, 7, 7], dtype=int32)

In [78]:
# Generates a 1 by 3 array with 3 different lower bounds
np.random.randint([2,5,7], 10)

array([6, 7, 8], dtype=int32)

In [79]:
# Generate a 2 by 4 array using broadcasting with dtype of uint8
np.random.randint([1, 3, 5, 7], [[10], [20]], dtype=np.uint8)

array([[ 7,  9,  5,  7],
       [ 2,  5, 18,  7]], dtype=uint8)

In [88]:
# Generate a 2 by 4 array using broadcasting with dtype of uint8
np.random.randint([1, 3, 5, 7], [[10], [20]], dtype=np.uint8)

array([[ 1,  9,  6,  9],
       [ 1,  3, 12,  7]], dtype=uint8)

In [81]:
# randomly generating a 5 by 10 array containing values in the range of 10 to 50
np.random.randint(10, 51, size=(5,10))

array([[50, 33, 30, 49, 12, 47, 47, 38, 20, 32],
       [44, 33, 11, 39, 38, 34, 19, 44, 17, 26],
       [35, 12, 35, 23, 24, 30, 25, 23, 46, 47],
       [31, 14, 43, 17, 48, 14, 11, 47, 25, 50],
       [13, 46, 22, 12, 50, 46, 43, 13, 46, 10]], dtype=int32)

In [84]:
# randomly generate one decimal number between 0 to 10
value = np.random.uniform(10)

print(value)

8.430605568000571


In [89]:
# generate random uniform decimals from 10 to 50 exclusive with 2 by 3 dimension
np.random.uniform(10, 50, size=(2,3))

array([[38.73050124, 22.26716554, 37.49265805],
       [18.19149713, 49.56110742, 23.73392674]])

## Numpy array - indexing, slicing and updating

Data in numpy arrays is stored sequentially, therefore it is possible to access the data with the help of indexing and slicing operations. It offers more indexing facilities than regular python sequences. Also, it is possible to not only index using integers and slices, arrays can be indexed by arrays of integers and arrays of booleans.

In addition, numpy arrays are mutable, that is, data stored in numpy arrays can be changed or updated.

### Data Accessing Using Indexing

In [91]:
# randomly generating 1 dimensional array
arr = np.random.randint(100, size=(5,))

print('Array: ', arr)

Array:  [97 22 14 20 51]


In [92]:
# Accessing values at index 2 and 4
print(arr[2])
print(arr[4])

14
51


In [93]:
#randomly generating 2 dimensional array

arr = np.random.randint(100, size=(4,6))
print('Array: \n', arr)

Array: 
 [[32 80 13 49 19 29]
 [57 76 69 24 85 69]
 [ 4 66 76 78 80 94]
 [95 20 64 70 37 44]]


In [94]:
## accessing the second index
# remember that the result above is a list of lists

print(arr[2])

[ 4 66 76 78 80 94]


In [95]:
#to retrieve the second value of the second list (index 2)

print(arr[2,1])

#OR

print(arr[2][1])
#this method is useful to access multiple values

66
66


In [96]:
#to retrieves the 4th and 2nd values from the first row

#the first list specifies the rows to select from
#the second list specifies the columns to select from

print(arr[[0,0], [3,1]])

[49 80]


In [97]:
#to retrieves the third value from the first row
#and the 4th value from the 4th row

print(arr[[0,3], [2,3]])

[13 70]


### Data Accessing Using Slicing

In [98]:
#randomly generating 1D array

import numpy as np

arr = np.random.randint(100, size=(10,))
print('Array: ', arr)

Array:  [95 26 25 22 82 83 86 73 65 50]


In [99]:
# accessing the data using slicing

print(arr[1:4])
print(arr[0:-4])

[26 25 22]
[95 26 25 22 82 83]


In [100]:
# retrieving data from the beginning to the end, at 2 steps interval
print(arr[::2])

[95 25 82 86 65]


In [101]:
# creating numpy array
arr = np.array([[1,2,3,4], [5,6,7,8], [9,0,1,2], [3,4,5,6]])
print('Array: \n', arr)
print('Shape: ', arr.shape)

Array: 
 [[1 2 3 4]
 [5 6 7 8]
 [9 0 1 2]
 [3 4 5 6]]
Shape:  (4, 4)


In [102]:
# using 2 way accessing
print(arr[0:2, 2:4])

[[3 4]
 [7 8]]


### Indexing with Boolean Arrays

In [103]:
# randomly generating one dimensional array
arr = np.random.randint(100, size=(10,))

print('Array: ', arr)

Array:  [77 61 80 95 66 52 78 65 36 79]


In [106]:
index = [True, False, False, True, False, True, True, True, False, True]

print(arr[index])

[77 95 52 78 65 79]


### Updating Data in Numpy Array

In [110]:
# randomly generate 2 dimensional array
arr = np.random.randint(100, size=(5,2))

print('Original Array: \n', arr)

Original Array: 
 [[73 32]
 [18 78]
 [63 74]
 [54 44]
 [47 44]]


In [112]:
# Updated Array
arr[1,1] = 20

print('Updated Array: \n', arr)

Updated Array: 
 [[73 32]
 [18 20]
 [63 74]
 [54 44]
 [47 44]]


### Numpy Flatten and Ravel

In [113]:
#randomly generating a 5 by 10 array

arr = np.random.randint(10, 40, size = (5,10))

print('Array: \n', arr)
print('Shape: ', arr.shape)

Array: 
 [[23 37 21 15 18 39 26 27 28 15]
 [23 15 32 17 19 31 15 11 17 39]
 [24 14 31 37 11 30 28 19 15 35]
 [26 31 19 12 32 14 10 22 38 37]
 [27 16 13 37 30 28 21 14 20 22]]
Shape:  (5, 10)


In [114]:
# let's flatten the array using flatten() method
flatten_arr = arr.flatten()

print('Flattened Array: ', flatten_arr)
print('Shape: ', flatten_arr.shape)

Flattened Array:  [23 37 21 15 18 39 26 27 28 15 23 15 32 17 19 31 15 11 17 39 24 14 31 37
 11 30 28 19 15 35 26 31 19 12 32 14 10 22 38 37 27 16 13 37 30 28 21 14
 20 22]
Shape:  (50,)


In [124]:
flatten_arr = arr.flatten('F')

print('Flatten in Column-major: \n', flatten_arr)

Flatten in Column-major: 
 [59 56 81 58 54 52 52 89 65 69 65 63 74 69 53 53 71 84 62 71 68 87 82 62
 82 81 60 70 79 80 52 65 60 85 84 89 88 61 79 86 68 66 66 63 66 67 74 82
 79 72]


In [115]:
# let's ravel the array using ravel() method
ravel_arr = arr.ravel()

print('Raveled Array: ', ravel_arr)
print('Shape: ', ravel_arr.shape)

Raveled Array:  [23 37 21 15 18 39 26 27 28 15 23 15 32 17 19 31 15 11 17 39 24 14 31 37
 11 30 28 19 15 35 26 31 19 12 32 14 10 22 38 37 27 16 13 37 30 28 21 14
 20 22]
Shape:  (50,)


Both functions perform the same action. But what's the difference? Let's see...

In [116]:
# updating value in the array
print('Original array: \n', arr)

arr[1,1] = 78

print('Updated array: \n', arr)

Original array: 
 [[23 37 21 15 18 39 26 27 28 15]
 [23 15 32 17 19 31 15 11 17 39]
 [24 14 31 37 11 30 28 19 15 35]
 [26 31 19 12 32 14 10 22 38 37]
 [27 16 13 37 30 28 21 14 20 22]]
Updated array: 
 [[23 37 21 15 18 39 26 27 28 15]
 [23 78 32 17 19 31 15 11 17 39]
 [24 14 31 37 11 30 28 19 15 35]
 [26 31 19 12 32 14 10 22 38 37]
 [27 16 13 37 30 28 21 14 20 22]]


In [117]:
flatten_arr

array([23, 37, 21, 15, 18, 39, 26, 27, 28, 15, 23, 15, 32, 17, 19, 31, 15,
       11, 17, 39, 24, 14, 31, 37, 11, 30, 28, 19, 15, 35, 26, 31, 19, 12,
       32, 14, 10, 22, 38, 37, 27, 16, 13, 37, 30, 28, 21, 14, 20, 22],
      dtype=int32)

The difference between the two are as follows:

- Flatten creates a copy of the original array with a flatteneed layout. Any modifications made to the flattened array won't affect the original. Also, because it creates a new copy, it consumes more memory.

- Ravel returns a view of the original array. It is more memory efficient since it leverages a view when possible. Any changes made to the flattened array will also modify the original array since they point to the same data.

### Numpy Reshape

In [277]:
#randomly generating a 5 by 10 array

arr = np.random.randint(50, 90, size=(5,10))

print('Array: \n', arr)
print('Shape: ', arr.shape)

Array: 
 [[84 54 85 61 64 78 78 55 77 64]
 [85 64 69 80 65 72 87 63 69 81]
 [53 66 82 76 54 55 51 69 86 56]
 [71 68 51 71 57 65 73 72 68 88]
 [74 80 67 53 77 70 75 69 82 85]]
Shape:  (5, 10)


In [135]:
np.ravel(arr)

array([59, 52, 65, 53, 68, 81, 52, 89, 68, 67, 56, 52, 63, 71, 87, 60, 65,
       88, 66, 74, 81, 89, 74, 84, 82, 70, 60, 61, 66, 82, 58, 65, 69, 62,
       62, 79, 85, 79, 63, 79, 54, 69, 53, 71, 82, 80, 84, 86, 66, 72],
      dtype=int32)

In [119]:
# reshaping the array
arr_reshaped = arr.reshape(10,5)
print(arr_reshaped)

[[59 52 65 53 68]
 [81 52 89 68 67]
 [56 52 63 71 87]
 [60 65 88 66 74]
 [81 89 74 84 82]
 [70 60 61 66 82]
 [58 65 69 62 62]
 [79 85 79 63 79]
 [54 69 53 71 82]
 [80 84 86 66 72]]


In [122]:
arr_reshaped = arr.reshape(2,25)
print(arr_reshaped)

[[59 52 65 53 68 81 52 89 68 67 56 52 63 71 87 60 65 88 66 74 81 89 74 84
  82]
 [70 60 61 66 82 58 65 69 62 62 79 85 79 63 79 54 69 53 71 82 80 84 86 66
  72]]


In [125]:
arr_reshaped = arr.reshape(3,6)
print(arr_reshaped)

ValueError: cannot reshape array of size 50 into shape (3,6)

We have a ValueError. **Why did this Error Occur?**

The reshape function requires that the total number of elements in the new shape matches the number of elements in the original array. Here’s a breakdown:

- The array you are trying to reshape has 50 elements.
- The target shape (3, 6) would require 3 × 6 = 18 elements.
- Since 50 and 18 do not match, the reshape operation cannot be completed, resulting in this ValueError.

### Iterating Over Numpy Arrays

In [126]:
# generating random one dimensional array
arr1 = np.random.randint(10, 49, size=(10,))

print('Array: \n', arr)
print('Shape: ', arr.shape)

Array: 
 [[59 52 65 53 68 81 52 89 68 67]
 [56 52 63 71 87 60 65 88 66 74]
 [81 89 74 84 82 70 60 61 66 82]
 [58 65 69 62 62 79 85 79 63 79]
 [54 69 53 71 82 80 84 86 66 72]]
Shape:  (5, 10)


In [129]:
arr1

array([22, 45, 33, 17, 36, 35, 45, 20, 13, 48], dtype=int32)

In [130]:
# looping over items in the array
for i in arr1:
    print(i, sep=',')

22
45
33
17
36
35
45
20
13
48


In [132]:
# redoing the above
for i in arr1:
    print(i, end=' ')

22 45 33 17 36 35 45 20 13 48 

### Iterating Over a 2 Dimensional Array

In [133]:
#randomly generating a 5 by 10 array

arr2 = np.random.randint(50, 90, size = (5,10))

print('Array: \n', arr)
print('Shape: ', arr.shape)

Array: 
 [[59 52 65 53 68 81 52 89 68 67]
 [56 52 63 71 87 60 65 88 66 74]
 [81 89 74 84 82 70 60 61 66 82]
 [58 65 69 62 62 79 85 79 63 79]
 [54 69 53 71 82 80 84 86 66 72]]
Shape:  (5, 10)


In [136]:
# looping over rows in the 2D array
for row in arr2:
    print(row)

[64 60 72 85 78 63 79 69 75 56]
[88 74 52 53 55 68 76 51 73 55]
[87 61 66 74 61 80 64 80 80 67]
[57 71 68 84 61 77 71 89 70 50]
[72 56 73 65 85 70 74 62 50 69]


In [137]:
arr2

array([[64, 60, 72, 85, 78, 63, 79, 69, 75, 56],
       [88, 74, 52, 53, 55, 68, 76, 51, 73, 55],
       [87, 61, 66, 74, 61, 80, 64, 80, 80, 67],
       [57, 71, 68, 84, 61, 77, 71, 89, 70, 50],
       [72, 56, 73, 65, 85, 70, 74, 62, 50, 69]], dtype=int32)

In [138]:
# looping through each item in the arr2 array
for item in arr2.ravel():
    print(item, end=' ')

64 60 72 85 78 63 79 69 75 56 88 74 52 53 55 68 76 51 73 55 87 61 66 74 61 80 64 80 80 67 57 71 68 84 61 77 71 89 70 50 72 56 73 65 85 70 74 62 50 69 

### Iterating using np.nditer()

In [139]:
# using nditer() for the one dimensional array
for item in np.nditer(arr1):
    print(item, end=' ')

22 45 33 17 36 35 45 20 13 48 

In [140]:
# using nditer() for the 2D array
for item in np.nditer(arr2):
    print(item, end=' ')

64 60 72 85 78 63 79 69 75 56 88 74 52 53 55 68 76 51 73 55 87 61 66 74 61 80 64 80 80 67 57 71 68 84 61 77 71 89 70 50 72 56 73 65 85 70 74 62 50 69 

In [141]:
# performing some calculations with nditer()
print('Original Array: ', arr1)

for item in np.nditer(arr1):
    if item > 20:
        item[...] = (item*0)
    print('Updated Array: ', arr1)

Original Array:  [22 45 33 17 36 35 45 20 13 48]


ValueError: assignment destination is read-only

We get this error because we use nditer to iterate over the elements of an array. However, it creates a copy of the underlying data for iteration. So when we try to modify, we are attempting to change the copy and not the actual element in the array. The copy is read-only hence the error.

In [147]:
# trying again
print('Original Array: ', arr1)

for item in np.nditer(arr1, op_flags = ['readwrite']):
    if item > 20:
        item[...] = (item * 0)
print('Updated Array: ', arr1)

Original Array:  [ 0  0  0 17  0  0  0 20 13  0]
Updated Array:  [ 0  0  0 17  0  0  0 20 13  0]


### **Exercise 1**

Write a program to generate an array with shape 5 by 4 at random containing positive integers. Perform an update by replacing all odd numbers using -1 (Using a loop)

In [151]:
# generating random integer numbers with 5 by 4 dimension
arr = np.random.randint(10, 51, size=(5,4))
print('Original Array: \n', arr)

for item in np.nditer(arr, op_flags=['readwrite']):
    if item%2 != 0:
        item[...] = -1
print('Replaced odd numbers: \n', arr)

Original Array: 
 [[50 32 23 34]
 [11 10 16 37]
 [24 35 44 33]
 [44 41 34 46]
 [31 15 26 44]]
Replaced odd numbers: 
 [[50 32 -1 34]
 [-1 10 16 -1]
 [24 -1 44 -1]
 [44 -1 34 46]
 [-1 -1 26 44]]


### **Exercise 2**

Given an array [1, -10, 2, 3, 0, 6], print the array in this order [0, 6, -10, 2, 1, 3]

In [159]:
arr1 = np.array([1, -10, 2, 3, 0, 6])

arr2 = np.concatenate((arr1[4:6], arr1[1:3], np.array([arr1[0]]), np.array([arr1[3]])))

arr2

array([  0,   6, -10,   2,   1,   3])

### Python Operators on Numpy Array

In [161]:
# create a numpy array
x = np.array([[1,2,5,7,6], [4,2,7,5,9]])

print('Array: \n', x)

Array: 
 [[1 2 5 7 6]
 [4 2 7 5 9]]


In [162]:
print(x+5)

[[ 6  7 10 12 11]
 [ 9  7 12 10 14]]


In [163]:
print(x%2)

[[1 0 1 1 0]
 [0 0 1 1 1]]


In [166]:
print(x>=3)

[[False False  True  True  True]
 [ True False  True  True  True]]


In [167]:
print(x//2)

[[0 1 2 3 3]
 [2 1 3 2 4]]


### Exercise

Write a program to generate an array with shape 5 by 4 at random containing positive integer. Perform an update by replacing all odd numbers with -1. (Without using a Loop)

In [197]:
# creating a random integer array of 5 by 4 dimensions
arr = np.random.randint(100, size=(5,4))
arr

array([[53, 88, 94, 97],
       [13, 66, 77, 98],
       [ 5, 40, 36, 51],
       [54,  9, 76, 73],
       [61, 37, 35, 65]], dtype=int32)

In [198]:
index = (arr%2 != 0)
index

array([[ True, False, False,  True],
       [ True, False,  True, False],
       [ True, False, False,  True],
       [False,  True, False,  True],
       [ True,  True,  True,  True]])

In [199]:
#making a copy of the array
arr_copy = arr.copy()

arr_copy[index] = -1
arr_copy

array([[-1, 88, 94, -1],
       [-1, 66, -1, 98],
       [-1, 40, 36, -1],
       [54, -1, 76, -1],
       [-1, -1, -1, -1]], dtype=int32)

### Exercise

Write a program to filter the values from the array based on below mentioned conditions:

- Either value should be divisible by 5.
- (or) value should be an odd number and factor of 7.

In [200]:
# make a copy of the array
arr_copy2 = arr.copy()
arr_copy2

array([[53, 88, 94, 97],
       [13, 66, 77, 98],
       [ 5, 40, 36, 51],
       [54,  9, 76, 73],
       [61, 37, 35, 65]], dtype=int32)

In [205]:
for item in np.nditer(arr_copy2, op_flags=['readwrite']):
    if item%5 == 2 or (item%2 != 0 and 7%item == 0):
        item[...] = item
    else:
        item[...] = 0
print(arr_copy2)

[[ 0  0  0 97]
 [ 0  0 77  0]
 [ 0  0  0  0]
 [ 0  0  0  0]
 [ 0 37  0  0]]


## Numpy Maths

- `np.sqrt()`
- `np.exp()`
- `np.sin()`
- `np.cos()`
- etcetera

### Element wise operations

- `np.add()`
- `np.subtract()`
- `np.multiply()`
- `np.divide()`

### Matrix multiplication
Either of the functions below produce the same result

- `np.matmul()`
- `np.dot()`
- `@`

### Others
- `np.diag()`
- `T` - for transpose

In [207]:
arr = np.array([[1,2,3], [4,5,6]])
arr

array([[1, 2, 3],
       [4, 5, 6]])

In [209]:
print('Square Root: \n', np.sqrt(arr))
print('Exponent: \n', np.exp(arr))
print('Sine Rule: \n', np.sin(arr))
print('Cosine Rule: \n', np.cos(arr))

Square Root: 
 [[1.         1.41421356 1.73205081]
 [2.         2.23606798 2.44948974]]
Exponent: 
 [[  2.71828183   7.3890561   20.08553692]
 [ 54.59815003 148.4131591  403.42879349]]
Sine Rule: 
 [[ 0.84147098  0.90929743  0.14112001]
 [-0.7568025  -0.95892427 -0.2794155 ]]
Cosine Rule: 
 [[ 0.54030231 -0.41614684 -0.9899925 ]
 [-0.65364362  0.28366219  0.96017029]]


In [210]:
## Element wise operations
x = np.random.randint(10,21, size = (2,2))

y = np.random.randint(10,21, size = (2,2))

print('First array: \n', x)
print('Second array: \n', y)

First array: 
 [[13 11]
 [15 14]]
Second array: 
 [[12 16]
 [15 10]]


In [211]:
# Let E represents Elementwise
print('EAddition: \n', np.add(x,y))
print('ESubtraction: \n', np.subtract(x,y))
print('EMultiply: \n', np.multiply(x,y))
print('EDivide: \n', np.divide(x,y))

EAddition: 
 [[25 27]
 [30 24]]
ESubtraction: 
 [[ 1 -5]
 [ 0  4]]
EMultiply: 
 [[156 176]
 [225 140]]
EDivide: 
 [[1.08333333 0.6875    ]
 [1.         1.4       ]]


In [212]:
## Matrix multiplication --> Let MM represent Matrix Multiplication

print('MM (way-1): \n', np.matmul(x,y), '\n')
print('MM (way-2): \n', np.dot(x,y), '\n')
print('MM (way-3): \n', x @ y, '\n')

MM (way-1): 
 [[321 318]
 [390 380]] 

MM (way-2): 
 [[321 318]
 [390 380]] 

MM (way-3): 
 [[321 318]
 [390 380]] 



#to retrieve the diagonal elements

x = np.random.randint(10,25, size = (2,3))

print('Original array: \n', x, '\n')

print('Diagonal values: \n', np.diag(x))

In [214]:
#to transpose an array

print('Transposed matrix: \n', x.T)

Transposed matrix: 
 [[11 13]
 [12 10]
 [16 11]]


## Numpy statistics
You will often see the need to specify the axis for operation.

`axis = 0` ==> Column wise operation

`axis = 1` ==> Row wise operation

- `np.sum()`
- `np.min() and np.max()`
- `np.median(), np.mean()`
- `np.var()`
- `np.std()`
- `np.corrcoef()` - calculates the Pearson product-moment correlation coefficient between two sets of data. It indicates the strength and direction of the linear relationship between two variables.

In [220]:
# create an array of 3 x 3 dimension using arange().reshape() function
arr = np.arange(9).reshape(3,3)

arr

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [231]:
print('Min: ', np.min(arr))
print('Sum: ', np.sum(arr))
print('Var: ', round(np.var(arr), 2))
print('Std: ', round(np.std(arr), 2))

print('Row-sum: ', np.sum(arr, axis=1))
print('Column-sum: ', np.sum(arr, axis=0))
print('Row-median: ', np.median(arr, axis=1))
print('Column-median: ', np.median(arr, axis=0))

Min:  0
Sum:  36
Var:  6.67
Std:  2.58
Row-sum:  [ 3 12 21]
Column-sum:  [ 9 12 15]
Row-median:  [1. 4. 7.]
Column-median:  [3. 4. 5.]


In [232]:
heights = np.random.randint(150, 200, size=(6,))

weights = np.random.randint(40, 61, size=(6,))

np.corrcoef(heights, weights, rowvar=False)

array([[ 1.        , -0.03535923],
       [-0.03535923,  1.        ]])

## More Numpy Functions

### Linspace
Generates an array of evenly spaced numbers over a specified interval

`Syntax => np.linspace(begin, end, #number of elements)`


In [233]:
print(np.linspace(10, 25, 8))

[10.         12.14285714 14.28571429 16.42857143 18.57142857 20.71428571
 22.85714286 25.        ]


In [253]:
print(np.linspace(10, 25, 8, endpoint=False))

[10.    11.875 13.75  15.625 17.5   19.375 21.25  23.125]


### Sorting

Column and row wise sorting works here as well.

`Syntax => np.sort(array, axis = )`

In [234]:
arr = np.random.randint(10, 50, size=(5,10))
arr

array([[47, 40, 16, 39, 31, 27, 38, 41, 45, 11],
       [32, 21, 24, 40, 36, 20, 27, 46, 11, 46],
       [31, 46, 41, 43, 14, 18, 17, 28, 16, 14],
       [26, 21, 28, 32, 37, 43, 30, 36, 32, 15],
       [44, 42, 43, 29, 37, 29, 31, 18, 40, 27]], dtype=int32)

In [235]:
print(np.sort(arr))

[[11 16 27 31 38 39 40 41 45 47]
 [11 20 21 24 27 32 36 40 46 46]
 [14 14 16 17 18 28 31 41 43 46]
 [15 21 26 28 30 32 32 36 37 43]
 [18 27 29 29 31 37 40 42 43 44]]


In [236]:
# Column-wise sorting by setting axis = 0
print(np.sort(arr, axis=0))

[[26 21 16 29 14 18 17 18 11 11]
 [31 21 24 32 31 20 27 28 16 14]
 [32 40 28 39 36 27 30 36 32 15]
 [44 42 41 40 37 29 31 41 40 27]
 [47 46 43 43 37 43 38 46 45 46]]


In [237]:
# Row-wise sorting by setting axis = 1
print(np.sort(arr, axis=1))

[[11 16 27 31 38 39 40 41 45 47]
 [11 20 21 24 27 32 36 40 46 46]
 [14 14 16 17 18 28 31 41 43 46]
 [15 21 26 28 30 32 32 36 37 43]
 [18 27 29 29 31 37 40 42 43 44]]


### Stacking

We have the horizontal stacking = adds arrays side by side. Here, the number of rows in both arrays has to be the same.

`Syntax ==> np.hstack([array1, array2])`


Vertical stacking = adds arrays on top each other, that is, vertically. Here, the number of rows has to be the same.

`Syntax ==> np.vstack([array1, array2])`

In [238]:
arr1 = np.arange(5, 15).reshape(2,5)
arr1

array([[ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [240]:
arr2 = np.arange(25, 35).reshape(2,5)
arr2

array([[25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34]])

In [244]:
# Vertical stacking
v_stack = np.vstack([arr1, arr2])
v_stack

array([[ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34]])

In [245]:
# horizontal stacking
h_stack = np.hstack([arr1, arr2])
h_stack

array([[ 5,  6,  7,  8,  9, 25, 26, 27, 28, 29],
       [10, 11, 12, 13, 14, 30, 31, 32, 33, 34]])

### Concatenate

- Horizontal concatenation which is also = horizontal (row) stacking
`Syntax ==> np.concatenate([array1, array2], axis = 1)`

- Vertical concatenation = vertical (column) stacking
`Syntax ==> np.concatenate([array1, array2], axis = 0)`

In [246]:
# horizontal concatenation
h_concat = np.concatenate([arr1, arr2], axis=1)
h_concat

array([[ 5,  6,  7,  8,  9, 25, 26, 27, 28, 29],
       [10, 11, 12, 13, 14, 30, 31, 32, 33, 34]])

In [247]:
# vertical concatenation
v_concat = np.concatenate([arr1, arr2], axis=0)
v_concat

array([[ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34]])

### Append

- Horizontal append = horizontal (row) stacking
`Syntax ==> np.append(array1, array2, axis = 1)`


- Vertical append = vertical (column) stacking
`Syntax ==> np.append(array1, array2, axis = 0)`

In [255]:
# horizontal append
h_append = np.append(arr1, arr2, axis=1)
h_append

array([[ 5,  6,  7,  8,  9, 25, 26, 27, 28, 29],
       [10, 11, 12, 13, 14, 30, 31, 32, 33, 34]])

In [249]:
# vertical append
v_append = np.append(arr1, arr2, axis=0)
v_append

array([[ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34]])

## Where - np.where()

Processes array elements conditionally.

`np.where(condition, x,y)`

This reads: Where True, yield x, otherwise y.

In [250]:
arr = np.arange(50,100).reshape(5,10)
arr

array([[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

In [251]:
# using where() function
where = np.where(arr < 65, 0, 1)
where

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])

In [252]:
#another example
#where arr is > 64 return arr/10, else return the arr value

where = np.where(arr>70, arr/10, arr)
where

array([[50. , 51. , 52. , 53. , 54. , 55. , 56. , 57. , 58. , 59. ],
       [60. , 61. , 62. , 63. , 64. , 65. , 66. , 67. , 68. , 69. ],
       [70. ,  7.1,  7.2,  7.3,  7.4,  7.5,  7.6,  7.7,  7.8,  7.9],
       [ 8. ,  8.1,  8.2,  8.3,  8.4,  8.5,  8.6,  8.7,  8.8,  8.9],
       [ 9. ,  9.1,  9.2,  9.3,  9.4,  9.5,  9.6,  9.7,  9.8,  9.9]])

## argsort

Used to sort an array indirectly.

- Instead of modifying the original array, it retuns an array of indices sorting the original array in a specific order (usually ascending by default)

- This output contains the positions/indices of the elements in the original array that would result in the sorted order

- This indices is then used with other functions to get the actual sorted array

In [256]:
# argsort()
arr = np.array([10, -5, 8, -3])

print('Indices: ', arr.argsort())

print('Sorted array: ', arr[arr.argsort()])

Indices:  [1 3 2 0]
Sorted array:  [-5 -3  8 10]


A bit confusing right?

Here's how it works:

We know that indexes always start at 0, therefore the array below would have the following indices

`[10, -5, 7, -3] = [0, 1, 2, 3]`

This is the natural index for the numbers in the array. Now when we use argsort, it uses this natural index to place the numbers where they need to be in an ascending order.

`` `My array`       `Natural index`   `argsort()`

`[10, -5, 7, -3]= [0, 1, 2, 3] = [1, 3, 2, 0]`



## Numpy broadcasting

This is a mechanism that allows us perform arithmetic operations on arrays with different shapes under certain conditions. It automates the process of making arrays compatible for element-wise operations. How does it work?

- Because numpy needs arrays to be compatible shape wise to allow for element wise operations, broadcasting comes into play where these arrays are not exactly the same.

- It allows a smaller array to be stretched to match the shape of the bigger one

In [258]:
# broadcasting in numpy
arr1 = np.array([[1,2,3], [4,5,6]])

arr2 = np.array([1,2,3])

print(arr1+arr2)

[[2 4 6]
 [5 7 9]]


In [259]:
arr1 = np.array([[1,2,3], [4,5,6]])

arr2 = np.array([1])

print(arr1+arr2)

[[2 3 4]
 [5 6 7]]


In [260]:
arr1 = np.array([[1,2,3], [4,5,6]])

arr2 = np.array([[1],[2]])

print(arr1+arr2)

[[2 3 4]
 [6 7 8]]


In [261]:
arr1 = np.array([[1],[2],[3], [4],[5],[6]])

arr2 = np.array([1,2,3])

print('Array 1: \n', arr1, '\n')

print('Array 2: \n', arr2, '\n')

print('Result: \n', arr1+arr2)

Array 1: 
 [[1]
 [2]
 [3]
 [4]
 [5]
 [6]] 

Array 2: 
 [1 2 3] 

Result: 
 [[2 3 4]
 [3 4 5]
 [4 5 6]
 [5 6 7]
 [6 7 8]
 [7 8 9]]


## Numpy masking

This is the process of creating a boolean array (also called a mask) based on certain conditions applied to another array. This boolean acts as a filter, indicating which elements of the original array satisfy the specified condition. You can use np.where() to create a mask as well.

In [266]:
# masking in numpy
import numpy as np

arr = np.random.randint(10, 31, size=(10,))

# creating a mask for elements that are even
mask = (arr%2 == 0)

# using the mask to filter elements from the original array
filtered_arr = arr[mask]

print('Original array: ', arr)
print('Mask: ', mask)

print('Filtered array: ', filtered_arr)

Original array:  [17 25 28 11 14 20 25 27 15 12]
Mask:  [False False  True False  True  True False False False  True]
Filtered array:  [28 14 20 12]


## Reading CSV file into a Numpy Array

In [267]:
import numpy as np

In [274]:
csv_file = np.loadtxt('weather_data.csv', dtype='str', delimiter=',')
print(csv_file)

[['ï»¿day' 'temperature' 'windspeed' 'event']
 ['1/1/2017' '32' '6' 'Rain']
 ['1/2/2017' '35' '7' 'Sunny']
 ['1/3/2017' '28' '2' 'Snow']
 ['1/4/2017' '24' '7' 'Snow']
 ['1/5/2017' '32' '4' 'Rain']
 ['1/6/2017' '31' '2' 'Sunny']]


# Task -  Dice Rolling Simulation

Create a simulation of a dice rolling game using NumPy. The game will involve rolling two dice and calculating the sum of their values. Players will guess whether the next roll will result in a higher, lower, or equal sum compared to the previous roll. The simulation will track the player's score based on their guesses.

In [275]:
#number of rounds in the game
num_of_rounds = 4

#range of possible values for the die roll
min_value = 1
max_value = 6

#intializing player's score
score = 0

#a list to record the dice sums
sums_list = []

while num_of_rounds >0 :
    rolling = str(input('\nType "roll": '))


    if rolling == 'roll':
        rolls_array = np.random.randint(min_value, max_value+1, size = (2,))
        dice_sums = np.sum(rolls_array)
        print ("Dice sum:", dice_sums)
        sums_list.append(dice_sums)

        if len(sums_list) > 1:
            if (guess == 'higher' and (sums_list[-1] > sums_list[-2])) or \
               (guess == 'lower' and (sums_list[-1] < sums_list[-2])) or \
               (guess == 'equal' and (sums_list[-1] == sums_list[-2])):
                print('Correct guess!\n')
                score += 1
            else:
                print('Incorrect guess!\n')

    else:
        print('Please type "roll".')



    print('\nWill the next sum be higher, lower, or equal?')

    guess = str(input()).strip().lower()

    if guess not in ["higher", "lower", "equal"]:
        print("Invalid input! Please enter 'higher', 'lower', or 'equal'.")

    num_of_rounds -= 1

else:
    print('\nRounds exhausted! Come back later\n')
    print('View of dice rolls: ', sums_list)
    print('Your total score:', score)


Type "roll":  roll


Dice sum: 9

Will the next sum be higher, lower, or equal?


 higher

Type "roll":  roll


Dice sum: 2
Incorrect guess!


Will the next sum be higher, lower, or equal?


 lower

Type "roll":  roll


Dice sum: 9
Incorrect guess!


Will the next sum be higher, lower, or equal?


 equal

Type "roll":  roll


Dice sum: 5
Incorrect guess!


Will the next sum be higher, lower, or equal?


 higher



Rounds exhausted! Come back later

View of dice rolls:  [np.int64(9), np.int64(2), np.int64(9), np.int64(5)]
Your total score: 0


**Note:**

Whenever you enter "roll", the simulation kicks in by generating two random numbers between 1 and 6, and then sums it up before returning the results. It also keeps track of your score which is equal to the number of correct guesses you make.